File size: 10,878 Bytes
aaef24a
 
 
 
9583f97
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9583f97
 
 
 
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
806c8fe
aaef24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4644cb5
aaef24a
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "marimo",
#     "marimo-learn>=0.12.0",
#     "polars==1.24.0",
#     "sqlalchemy",
# ]
# ///

import marimo

__generated_with = "0.20.4"
app = marimo.App(width="medium")

with app.setup:
    import marimo as mo
    import marimo_learn as mol
    from marimo_learn import MultipleChoiceWidget, OrderingWidget
    import sqlalchemy

    db_path = mol.localize_file(
        "penguins.db",
        "https://raw.githubusercontent.com/marimo-team/learn/main/sql/public/penguins.db"
    )
    DATABASE_URL = f"sqlite:///{db_path}"
    engine = sqlalchemy.create_engine(DATABASE_URL)


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    # Basic Selection

    This tutorial shows how to select values from a single table in a database using SQL. We have already made a connection between this notebook and our `penguins.db` database—we'll show you how to do that later—so let's have a look at the data in the `penguins` table.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select * from penguins;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    Almost every **query** in SQL starts with the word `select`. The value immediately after it tells the database manager what we want to see. In this case, we use the shorthand `*` to mean "all the columns". We then say `from penguins` to tell the database manager which table we want to get the data from. The semi-colon at the end marks the end of the query.

    Note that the database manager doesn't format the output nicely, draw the little distribution histograms above columns, or give us the page-forward/page-backward controls: all the credit for that belongs to the Marimo notebook.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Choosing Columns

    We don't have to select all of the columns every time we get data from a table. If we only want specific columns, we give their names instead of using `*` to mean "all". As the output below shows, the columns are displayed in the order in which we gave their names.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select sex, island, species from penguins;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Try editing the SQL in the query cell to change the column order, or to get the `bill_length_mm` column.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Upper and Lower Case

    We can write the query above in any mixture of upper and lower case and get the same result.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        SELECT Sex, island, SPECIES frOM pEnGuInS;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    Please don't do this: it makes your queries very hard to read. It *is* common to use upper case for keywords like `SELECT` and `FROM`, and lower case for column names like `penguins` and `island`; whatever you choose, the most important thing is to be consistent.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Sorting

    When we look at a spreadsheet or a printed table, the rows are in a particular order. A database manager, on the other hand, might rearrange rows for the sake of efficiency as data is added or deleted, which means the rows displayed by `select` can be in whatever order it wants. If we want a particular order, we can add `order by` and the names of one or more columns to our query.

    Note that we have split the query below across several lines to make it easier to read. Just as SQL doesn't care about upper and lower case, it doesn't care about line breaks. As our queries become larger and more complicated, formatting them like this will make them a lot easier to understand.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select island, species, sex
        from penguins
        order by island, species;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    If you page through the output from the query above, you'll see that our penguins have been ordered by island: Biscoe before Dream, and Dream before Torgersen. Within each of those groups, the penguins are sub-ordered by species (Adelie, Chinstrap, and then Gentoo). The penguins aren't ordered by sex, but they could be: as with island and species, the sorting goes from left to right.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Try rearranging the order of columns in the `select` while leaving the order in `order by` alone and vice versa. Notice that you don't have to sort in the order in which the columns are displayed (but you usually should to make the output easier to understand).
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > What do you think will happen if you select `island` and `species` but `order by sex`? How can you tell if your prediction is correct?
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Limiting Output

    The `penguins` table has 344 rows. If we only want to see the first five, we can add a `limit` clause to our query, which specifies the maximum number of rows we want.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select * from penguins limit 5;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    What if we want the next five? Or the five after that? To get those, we can add an offset, which is the number of rows to skip before selecting as many rows as we've asked for.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select * from penguins
        limit 5 offset 5;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    Selecting one chunk of data after another is called **paging**. Applications frequently do this in order to save memory and bandwidth: people can't look at 100,000 rows at once, so there's usually no point grabbing that many in one gulp.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Add a cell below to get rows 12 through 17 from the `penguins` table. Think carefully about what the `offset` and `limit` need to be to get precisely these rows.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Try changing the query above to be `offset 5 limit 5`. Do you understand the result?
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > 1. What happens if you specify a limit that is greater than the number of rows in the table?
    > 1. What happens if you specify an offset that is greater than the number of rows in the table?
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Suppose your program is paging through a table while another application is adding and deleting rows. What would you want to happen? What do you think will happen?
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Removing Duplicates

    Suppose we want to find out which kinds of penguins were seen on which islands. We could scroll through the data, taking note of each unique (species, island) pair we see, but SQL will do this for us if we add the `distinct` keyword to our query.

    Note that the query below includes a comment explaining what it does. While comments in Python start with `#`, comments in SQL start with `--` and run to the end of the line.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        -- Show unique (species, island) pairs.
        select distinct species, island
        from penguins;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Modify the query above to show (island, species) instead of (species, island), and to sort by island name and then by species name.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Doing Calculations

    The `penguins` table records the penguins' masses in grams (at least, that's what we think the `_g` suffix on the column name means). If we want the mass in kilograms, we can divide the given values by 1000.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select species, sex, body_mass_g, body_mass_g / 1000
        from penguins
        limit 10;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    The query above shows both the mass in grams and the mass in kilograms so that we can check the latter against the former. However, the name that the database manager automatically gives the calculated column isn't particular readable. Let's use `as` to fix that.
    """)
    return


@app.cell
def _():
    _df = mo.sql(
        f"""
        select species, sex, body_mass_g, body_mass_g / 1000 as mass_kg
        from penguins
        limit 10;
        """,
        engine=engine
    )
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Can you use `as` to select a column from the table but display it with a different name? Should you?
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    > Write a query to calculate the ratio of bill length and bill height for every penguin. Call the calculated column `bill_ratio`.
    """)
    return


@app.cell(hide_code=True)
def _():
    mo.md(r"""
    ## Check Understanding

    ![concept map](https://raw.githubusercontent.com/marimo-team/learn/main/sql/public/01_concepts.svg)
    """)
    return


@app.cell(hide_code=True)
def _():
    _widget = mo.ui.anywidget(
        OrderingWidget(
            question="Arrange these SQL clauses in the order they must appear in a query.",
            items=["SELECT", "FROM", "ORDER BY", "LIMIT"],
        )
    )
    _widget
    return


@app.cell(hide_code=True)
def _():
    _widget = mo.ui.anywidget(
        MultipleChoiceWidget(
            question="What does `SELECT *` mean in a SQL query?",
            options=[
                "Select only the first row of the table",
                "Select all columns from the table",
                "Select all rows but only the first column",
                "Count the total number of rows",
            ],
            correct_answer=1,
            explanation="* is shorthand for 'all columns'. SELECT * retrieves every column; the number of rows returned depends on whether you add WHERE, LIMIT, or other clauses.",
        )
    )
    _widget
    return


if __name__ == "__main__":
    app.run()