Spaces:
Runtime error
Runtime error
File size: 10,878 Bytes
aaef24a 9583f97 aaef24a 9583f97 aaef24a 806c8fe aaef24a 4644cb5 aaef24a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | # /// script
# requires-python = ">=3.13"
# dependencies = [
# "marimo",
# "marimo-learn>=0.12.0",
# "polars==1.24.0",
# "sqlalchemy",
# ]
# ///
import marimo
__generated_with = "0.20.4"
app = marimo.App(width="medium")
with app.setup:
import marimo as mo
import marimo_learn as mol
from marimo_learn import MultipleChoiceWidget, OrderingWidget
import sqlalchemy
db_path = mol.localize_file(
"penguins.db",
"https://raw.githubusercontent.com/marimo-team/learn/main/sql/public/penguins.db"
)
DATABASE_URL = f"sqlite:///{db_path}"
engine = sqlalchemy.create_engine(DATABASE_URL)
@app.cell(hide_code=True)
def _():
mo.md(r"""
# Basic Selection
This tutorial shows how to select values from a single table in a database using SQL. We have already made a connection between this notebook and our `penguins.db` database—we'll show you how to do that later—so let's have a look at the data in the `penguins` table.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select * from penguins;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
Almost every **query** in SQL starts with the word `select`. The value immediately after it tells the database manager what we want to see. In this case, we use the shorthand `*` to mean "all the columns". We then say `from penguins` to tell the database manager which table we want to get the data from. The semi-colon at the end marks the end of the query.
Note that the database manager doesn't format the output nicely, draw the little distribution histograms above columns, or give us the page-forward/page-backward controls: all the credit for that belongs to the Marimo notebook.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Choosing Columns
We don't have to select all of the columns every time we get data from a table. If we only want specific columns, we give their names instead of using `*` to mean "all". As the output below shows, the columns are displayed in the order in which we gave their names.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select sex, island, species from penguins;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Try editing the SQL in the query cell to change the column order, or to get the `bill_length_mm` column.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Upper and Lower Case
We can write the query above in any mixture of upper and lower case and get the same result.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
SELECT Sex, island, SPECIES frOM pEnGuInS;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
Please don't do this: it makes your queries very hard to read. It *is* common to use upper case for keywords like `SELECT` and `FROM`, and lower case for column names like `penguins` and `island`; whatever you choose, the most important thing is to be consistent.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Sorting
When we look at a spreadsheet or a printed table, the rows are in a particular order. A database manager, on the other hand, might rearrange rows for the sake of efficiency as data is added or deleted, which means the rows displayed by `select` can be in whatever order it wants. If we want a particular order, we can add `order by` and the names of one or more columns to our query.
Note that we have split the query below across several lines to make it easier to read. Just as SQL doesn't care about upper and lower case, it doesn't care about line breaks. As our queries become larger and more complicated, formatting them like this will make them a lot easier to understand.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select island, species, sex
from penguins
order by island, species;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
If you page through the output from the query above, you'll see that our penguins have been ordered by island: Biscoe before Dream, and Dream before Torgersen. Within each of those groups, the penguins are sub-ordered by species (Adelie, Chinstrap, and then Gentoo). The penguins aren't ordered by sex, but they could be: as with island and species, the sorting goes from left to right.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Try rearranging the order of columns in the `select` while leaving the order in `order by` alone and vice versa. Notice that you don't have to sort in the order in which the columns are displayed (but you usually should to make the output easier to understand).
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> What do you think will happen if you select `island` and `species` but `order by sex`? How can you tell if your prediction is correct?
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Limiting Output
The `penguins` table has 344 rows. If we only want to see the first five, we can add a `limit` clause to our query, which specifies the maximum number of rows we want.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select * from penguins limit 5;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
What if we want the next five? Or the five after that? To get those, we can add an offset, which is the number of rows to skip before selecting as many rows as we've asked for.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select * from penguins
limit 5 offset 5;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
Selecting one chunk of data after another is called **paging**. Applications frequently do this in order to save memory and bandwidth: people can't look at 100,000 rows at once, so there's usually no point grabbing that many in one gulp.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Add a cell below to get rows 12 through 17 from the `penguins` table. Think carefully about what the `offset` and `limit` need to be to get precisely these rows.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Try changing the query above to be `offset 5 limit 5`. Do you understand the result?
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> 1. What happens if you specify a limit that is greater than the number of rows in the table?
> 1. What happens if you specify an offset that is greater than the number of rows in the table?
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Suppose your program is paging through a table while another application is adding and deleting rows. What would you want to happen? What do you think will happen?
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Removing Duplicates
Suppose we want to find out which kinds of penguins were seen on which islands. We could scroll through the data, taking note of each unique (species, island) pair we see, but SQL will do this for us if we add the `distinct` keyword to our query.
Note that the query below includes a comment explaining what it does. While comments in Python start with `#`, comments in SQL start with `--` and run to the end of the line.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
-- Show unique (species, island) pairs.
select distinct species, island
from penguins;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Modify the query above to show (island, species) instead of (species, island), and to sort by island name and then by species name.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Doing Calculations
The `penguins` table records the penguins' masses in grams (at least, that's what we think the `_g` suffix on the column name means). If we want the mass in kilograms, we can divide the given values by 1000.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select species, sex, body_mass_g, body_mass_g / 1000
from penguins
limit 10;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
The query above shows both the mass in grams and the mass in kilograms so that we can check the latter against the former. However, the name that the database manager automatically gives the calculated column isn't particular readable. Let's use `as` to fix that.
""")
return
@app.cell
def _():
_df = mo.sql(
f"""
select species, sex, body_mass_g, body_mass_g / 1000 as mass_kg
from penguins
limit 10;
""",
engine=engine
)
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Can you use `as` to select a column from the table but display it with a different name? Should you?
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
> Write a query to calculate the ratio of bill length and bill height for every penguin. Call the calculated column `bill_ratio`.
""")
return
@app.cell(hide_code=True)
def _():
mo.md(r"""
## Check Understanding

""")
return
@app.cell(hide_code=True)
def _():
_widget = mo.ui.anywidget(
OrderingWidget(
question="Arrange these SQL clauses in the order they must appear in a query.",
items=["SELECT", "FROM", "ORDER BY", "LIMIT"],
)
)
_widget
return
@app.cell(hide_code=True)
def _():
_widget = mo.ui.anywidget(
MultipleChoiceWidget(
question="What does `SELECT *` mean in a SQL query?",
options=[
"Select only the first row of the table",
"Select all columns from the table",
"Select all rows but only the first column",
"Count the total number of rows",
],
correct_answer=1,
explanation="* is shorthand for 'all columns'. SELECT * retrieves every column; the number of rows returned depends on whether you add WHERE, LIMIT, or other clauses.",
)
)
_widget
return
if __name__ == "__main__":
app.run()
|