Spaces:
Runtime error
feat: overhaul for relaunch
Browse files- Create `requirements.txt` with pinned dependencies for building site.
- Update .gitignore.
- Enhance Makefile.
- Rename existing lessons to be lesson/dd_specific.py (two digits, lower case).
- Notebooks with other names are not included in index page.
- Add SQL tutorial.
- Add scripts to regenerate SQLite databases.
- Add queueing theory tutorial.
- Rename `scripts` directory to `bin`.
- Replace old `build.py` with:
- `bin/extract.py` gets metadata from `*/index.md` lesson pages.
- `bin/build.py` builds root home page and lesson home pages.
- `bin/check_empty_cells.py`: look for empty cells in notebooks (enhanced).
- `bin/check_missing_titles.py`: look for notebooks without an H1 title.
- `bin/check_notebook_packages.py`: check consistency of package versions within lesson.
- Add `make check_packages NOTEBOOKS="*/??_*.py"` to check package consistency within lesson.
- If `NOTEBOOKS` not specified, all notebooks are checked.
- Add `make check_exec NOTEBOOKS="*/??_*.py"` to check notebook execution.
- If `NOTEBOOKS` not specified, all notebooks are executed (slow).
- Fix missing package imports in notebook headers.
- Pin package versions in notebook headers.
- Make content of lesson home pages uniform.
- Update GitHub workflows to launch commands from Makefile.
- Requires using `uv` in workflows.
- Extract and modify CSS.
- Putt SVG icons in includeable files in `templates/icons/*.svg`.
- Make titles of notebooks more uniform.
- Building `pages/*.md` using `templates/page.html`.
- Add link checker.
- Requires local server to be running, and takes 10 minutes or more to execute.
- Fix multiple bugs in individual lessons.
- Most introduced by package version pinning.
- See notes below for outstanding issues.
Note: build [`marimo_learn`](https://github.com/gvwilson/marimo_learn) package
with utilities to localize SQLite database files.
Add `disabled=True` to prevent execution of deliberately buggy cells in script mode (?).
The code at line 497–499 calls `lz.sink_csv(..., lazy=True)`. The
`lazy=True` argument was added to return a lazy sink that could be
passed to pl.collect_all() for parallel execution — rather than
immediately writing the file. However, in polars 1.24.0, the lazy
parameter was removed from sink_csv() (and likely sink_parquet(),
sink_ndjson() too). The API for collecting multiple sinks in parallel
has changed.
These notebook use the `hf://` protocol to stream a parquet file
directly from Hugging Face:
```
URL = f"hf://datasets/{repo_id}@{branch}/{file_path}"
```
Polars is URL-encoding the slash in the repo name when it calls the HF
API, which then rejects it as an invalid repo name. The fix is to
download the file and store it locally, or make it available in some
other location.
Kagglehub requires Kaggle API credentials not available in the
browser. Either remove the data-loading step or substitute a bundled
sample dataset.
Replace numba with a pure-Python alternative for the WASM version, or
gate the numba cells with a WASM check and change prose accordingly:
```
import sys
if "pyodide" not in sys.modules:
import numba
```
- Add Altair notebooks from https://uwdata.github.io/visualization-curriculum/
- Add formative assessment widgets
- Update Little's Law notebook
- .github/workflows/check-empty-cells.yml +4 -1
- .github/workflows/deploy.yml +1 -1
- .gitignore +9 -0
- .typos.toml +3 -0
- Makefile +105 -13
- _server/README.md +0 -5
- _server/main.py +3 -3
- altair/01_introduction.py +671 -0
- altair/02_marks_encoding.py +1126 -0
- altair/03_data_transformation.py +641 -0
- altair/04_scales_axes_legends.py +840 -0
- altair/05_view_composition.py +818 -0
- altair/06_interaction.py +671 -0
- altair/07_cartographic.py +898 -0
- altair/08_debugging.py +370 -0
- altair/altair_introduction.py.lock +0 -0
- altair/index.md +14 -0
- assets/styles.css +51 -0
- bin/build.py +93 -0
- {scripts → bin}/check_empty_cells.py +2 -3
- bin/check_missing_titles.py +21 -0
- bin/check_notebook_packages.py +110 -0
- bin/create_sql_lab.sql +22 -0
- bin/create_sql_penguins.py +50 -0
- bin/create_sql_survey.py +175 -0
- bin/extract.py +47 -0
- {scripts → bin}/preview.py +1 -2
- bin/run_notebooks.sh +11 -0
- bin/utils.py +14 -0
- daft/README.md +0 -31
- daft/_index.md +13 -0
- data/penguins.csv +345 -0
- duckdb/01_getting_started.py +8 -11
- duckdb/{008_loading_parquet.py → 08_loading_parquet.py} +3 -3
- duckdb/{009_loading_json.py → 09_loading_json.py} +3 -3
- duckdb/{011_working_with_apache_arrow.py → 11_working_with_apache_arrow.py} +21 -28
- duckdb/DuckDB_Loading_CSVs.py +3 -4
- duckdb/README.md +0 -37
- duckdb/index.md +16 -0
- {functional_programming → functional}/05_functors.py +1 -1
- {functional_programming → functional}/06_applicatives.py +1 -1
- functional/_index.md +25 -0
- functional_programming/CHANGELOG.md +0 -129
- functional_programming/README.md +0 -77
- optimization/01_least_squares.py +3 -3
- optimization/02_linear_program.py +5 -5
- optimization/03_minimum_fuel_optimal_control.py +8 -4
- optimization/04_quadratic_program.py +5 -5
- optimization/05_portfolio_optimization.py +9 -9
- optimization/06_convex_optimization.py +3 -3
|
@@ -17,6 +17,9 @@ jobs:
|
|
| 17 |
- name: 🔄 Checkout code
|
| 18 |
uses: actions/checkout@v4
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
- name: 🐍 Set up Python
|
| 21 |
uses: actions/setup-python@v5
|
| 22 |
with:
|
|
@@ -24,7 +27,7 @@ jobs:
|
|
| 24 |
|
| 25 |
- name: 🔍 Check for empty cells
|
| 26 |
run: |
|
| 27 |
-
|
| 28 |
|
| 29 |
- name: 📊 Report results
|
| 30 |
if: failure()
|
|
|
|
| 17 |
- name: 🔄 Checkout code
|
| 18 |
uses: actions/checkout@v4
|
| 19 |
|
| 20 |
+
- name: 🚀 Install uv
|
| 21 |
+
uses: astral-sh/setup-uv@v4
|
| 22 |
+
|
| 23 |
- name: 🐍 Set up Python
|
| 24 |
uses: actions/setup-python@v5
|
| 25 |
with:
|
|
|
|
| 27 |
|
| 28 |
- name: 🔍 Check for empty cells
|
| 29 |
run: |
|
| 30 |
+
make check_empty
|
| 31 |
|
| 32 |
- name: 📊 Report results
|
| 33 |
if: failure()
|
|
@@ -32,7 +32,7 @@ jobs:
|
|
| 32 |
|
| 33 |
- name: 🛠️ Export notebooks
|
| 34 |
run: |
|
| 35 |
-
|
| 36 |
|
| 37 |
- name: 📤 Upload artifact
|
| 38 |
uses: actions/upload-pages-artifact@v3
|
|
|
|
| 32 |
|
| 33 |
- name: 🛠️ Export notebooks
|
| 34 |
run: |
|
| 35 |
+
make build
|
| 36 |
|
| 37 |
- name: 📤 Upload artifact
|
| 38 |
uses: actions/upload-pages-artifact@v3
|
|
@@ -175,3 +175,12 @@ __marimo__
|
|
| 175 |
|
| 176 |
# Generated site content
|
| 177 |
_site/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
|
| 176 |
# Generated site content
|
| 177 |
_site/
|
| 178 |
+
|
| 179 |
+
# Editors
|
| 180 |
+
*~
|
| 181 |
+
|
| 182 |
+
# Temporary build files
|
| 183 |
+
tmp/
|
| 184 |
+
example.db
|
| 185 |
+
example.db.wal
|
| 186 |
+
log_data_filtered*.*
|
|
@@ -15,7 +15,10 @@ extend-ignore-re = [
|
|
| 15 |
|
| 16 |
# Words to explicitly accept
|
| 17 |
[default.extend-words]
|
|
|
|
| 18 |
pn = "pn"
|
|
|
|
|
|
|
| 19 |
|
| 20 |
# You can also exclude specific files or directories if needed
|
| 21 |
# [files]
|
|
|
|
| 15 |
|
| 16 |
# Words to explicitly accept
|
| 17 |
[default.extend-words]
|
| 18 |
+
bimap = "bimap"
|
| 19 |
pn = "pn"
|
| 20 |
+
setp = "setp"
|
| 21 |
+
Plas = "Plas"
|
| 22 |
|
| 23 |
# You can also exclude specific files or directories if needed
|
| 24 |
# [files]
|
|
@@ -1,24 +1,116 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
@grep -h -E '^##' ${MAKEFILE_LIST} | sed -e 's/## //g' | column -t -s ':'
|
| 7 |
|
| 8 |
-
## install: install
|
| 9 |
install:
|
| 10 |
-
uv pip install
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
##
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
serve:
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
##
|
|
|
|
|
|
|
| 22 |
clean:
|
| 23 |
@find . -name '*~' -exec rm {} +
|
| 24 |
@find . -name '.DS_Store' -exec rm {} +
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
ROOT := .
|
| 2 |
+
SITE := _site
|
| 3 |
+
TMP := ./tmp
|
| 4 |
+
LESSON_DATA := ${TMP}/lessons.json
|
| 5 |
+
TEMPLATES := $(wildcard templates/*.html)
|
| 6 |
|
| 7 |
+
NOTEBOOK_INDEX := $(wildcard */index.md)
|
| 8 |
+
NOTEBOOK_DIR := $(patsubst %/index.md,%,${NOTEBOOK_INDEX})
|
| 9 |
+
NOTEBOOK_SRC := $(foreach dir,$(NOTEBOOK_DIR),$(wildcard $(dir)/??_*.py))
|
| 10 |
+
NOTEBOOK_OUT := $(patsubst %.py,${SITE}/%.html,$(NOTEBOOK_SRC))
|
| 11 |
+
|
| 12 |
+
DATABASES := \
|
| 13 |
+
sql/public/lab.db \
|
| 14 |
+
sql/public/penguins.db \
|
| 15 |
+
sql/public/survey.db
|
| 16 |
+
|
| 17 |
+
MARIMO := uv run marimo
|
| 18 |
+
PYTHON := uv run python
|
| 19 |
+
|
| 20 |
+
# Default target
|
| 21 |
+
all: commands
|
| 22 |
+
|
| 23 |
+
## commands : show all commands
|
| 24 |
+
commands:
|
| 25 |
@grep -h -E '^##' ${MAKEFILE_LIST} | sed -e 's/## //g' | column -t -s ':'
|
| 26 |
|
| 27 |
+
## install: install required packages
|
| 28 |
install:
|
| 29 |
+
uv pip install -r requirements.txt
|
| 30 |
+
|
| 31 |
+
## check: run all simple checks
|
| 32 |
+
check:
|
| 33 |
+
-@make check_empty
|
| 34 |
+
-@make check_titles
|
| 35 |
+
-@make check_typos
|
| 36 |
+
-@make check_packages
|
| 37 |
|
| 38 |
+
## check_exec: run notebooks to check for runtime errors
|
| 39 |
+
check_exec:
|
| 40 |
+
@if [ -z "$(NOTEBOOKS)" ]; then \
|
| 41 |
+
bash bin/run_notebooks.sh $(NOTEBOOK_SRC); \
|
| 42 |
+
else \
|
| 43 |
+
bash bin/run_notebooks.sh $(NOTEBOOKS); \
|
| 44 |
+
fi
|
| 45 |
|
| 46 |
+
## build: build website
|
| 47 |
+
build: ${LESSON_DATA} ${NOTEBOOK_OUT} ${TEMPLATES}
|
| 48 |
+
${PYTHON} bin/build.py --root ${ROOT} --output ${SITE} --data ${LESSON_DATA}
|
| 49 |
+
|
| 50 |
+
## links: check links locally (while 'make serve')
|
| 51 |
+
links:
|
| 52 |
+
linkchecker -F text http://localhost:8000
|
| 53 |
+
|
| 54 |
+
## serve: run local web server without rebuilding
|
| 55 |
serve:
|
| 56 |
+
${PYTHON} -m http.server --directory ${SITE}
|
| 57 |
+
|
| 58 |
+
## databases: rebuild datasets for SQL lessons
|
| 59 |
+
databases: ${DATABASES}
|
| 60 |
|
| 61 |
+
## ---: ---
|
| 62 |
+
|
| 63 |
+
## clean: clean up stray files
|
| 64 |
clean:
|
| 65 |
@find . -name '*~' -exec rm {} +
|
| 66 |
@find . -name '.DS_Store' -exec rm {} +
|
| 67 |
+
@rm -rf ${TMP}
|
| 68 |
+
@rm -f log_data_filtered*.*
|
| 69 |
+
|
| 70 |
+
## check_empty: check for empty cells
|
| 71 |
+
check_empty:
|
| 72 |
+
@${PYTHON} bin/check_empty_cells.py
|
| 73 |
+
|
| 74 |
+
## check_titles: check for missing titles in notebooks
|
| 75 |
+
check_titles:
|
| 76 |
+
@${PYTHON} bin/check_missing_titles.py
|
| 77 |
+
|
| 78 |
+
## check_packages: check for inconsistent package versions across notebooks
|
| 79 |
+
check_packages:
|
| 80 |
+
@if [ -z "$(NOTEBOOKS)" ]; then \
|
| 81 |
+
${PYTHON} bin/check_notebook_packages.py $(NOTEBOOK_SRC); \
|
| 82 |
+
else \
|
| 83 |
+
${PYTHON} bin/check_notebook_packages.py $(NOTEBOOKS); \
|
| 84 |
+
fi
|
| 85 |
+
|
| 86 |
+
## check_typos: check for typos
|
| 87 |
+
check_typos:
|
| 88 |
+
@typos ${TEMPLATES} ${NOTEBOOK_INDEX} ${NOTEBOOK_SRC}
|
| 89 |
+
|
| 90 |
+
## extract: extract lesson data
|
| 91 |
+
extract: ${LESSON_DATA}
|
| 92 |
+
|
| 93 |
+
#
|
| 94 |
+
# subsidiary targets
|
| 95 |
+
#
|
| 96 |
+
|
| 97 |
+
tmp/lessons.json: $(NOTEBOOK_INDEX)
|
| 98 |
+
${PYTHON} bin/extract.py --root ${ROOT} --data ${LESSON_DATA}
|
| 99 |
+
|
| 100 |
+
${SITE}/%.html: %.py
|
| 101 |
+
${MARIMO} export html-wasm --force --mode edit $< -o $@ --sandbox
|
| 102 |
+
|
| 103 |
+
sql/public/lab.db: bin/create_sql_lab.sql
|
| 104 |
+
@rm -f $@
|
| 105 |
+
@mkdir -p sql/public
|
| 106 |
+
sqlite3 $@ < $<
|
| 107 |
+
|
| 108 |
+
sql/public/penguins.db: bin/create_sql_penguins.py data/penguins.csv
|
| 109 |
+
@rm -f $@
|
| 110 |
+
@mkdir -p sql/public
|
| 111 |
+
${PYTHON} $< data/penguins.csv $@
|
| 112 |
+
|
| 113 |
+
sql/public/survey.db: bin/create_sql_survey.py
|
| 114 |
+
@rm -f $@
|
| 115 |
+
@mkdir -p sql/public
|
| 116 |
+
${PYTHON} $< $@ 192837
|
|
@@ -1,8 +1,3 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Readme
|
| 3 |
-
marimo-version: 0.18.4
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
# marimo learn server
|
| 7 |
|
| 8 |
This folder contains server code for hosting marimo apps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# marimo learn server
|
| 2 |
|
| 3 |
This folder contains server code for hosting marimo apps.
|
|
@@ -6,14 +6,14 @@
|
|
| 6 |
# "starlette",
|
| 7 |
# "python-dotenv",
|
| 8 |
# "pydantic",
|
| 9 |
-
# "duckdb==1.
|
| 10 |
-
# "altair==
|
| 11 |
# "beautifulsoup4==4.13.3",
|
| 12 |
# "httpx==0.28.1",
|
| 13 |
# "marimo",
|
| 14 |
# "nest-asyncio==1.6.0",
|
| 15 |
# "numba==0.61.0",
|
| 16 |
-
# "numpy==2.
|
| 17 |
# "polars==1.24.0",
|
| 18 |
# ]
|
| 19 |
# ///
|
|
|
|
| 6 |
# "starlette",
|
| 7 |
# "python-dotenv",
|
| 8 |
# "pydantic",
|
| 9 |
+
# "duckdb==1.4.4",
|
| 10 |
+
# "altair==6.0.0",
|
| 11 |
# "beautifulsoup4==4.13.3",
|
| 12 |
# "httpx==0.28.1",
|
| 13 |
# "marimo",
|
| 14 |
# "nest-asyncio==1.6.0",
|
| 15 |
# "numba==0.61.0",
|
| 16 |
+
# "numpy==2.4.3",
|
| 17 |
# "polars==1.24.0",
|
| 18 |
# ]
|
| 19 |
# ///
|
|
@@ -0,0 +1,671 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# "vega_datasets==0.9.0",
|
| 8 |
+
# ]
|
| 9 |
+
# ///
|
| 10 |
+
|
| 11 |
+
import marimo
|
| 12 |
+
|
| 13 |
+
__generated_with = "0.20.4"
|
| 14 |
+
app = marimo.App()
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
@app.cell
|
| 18 |
+
def _():
|
| 19 |
+
import marimo as mo
|
| 20 |
+
|
| 21 |
+
return (mo,)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@app.cell(hide_code=True)
|
| 25 |
+
def _(mo):
|
| 26 |
+
mo.md(r"""
|
| 27 |
+
# Introduction to Altair
|
| 28 |
+
|
| 29 |
+
[Altair](https://altair-viz.github.io/) is a declarative statistical visualization library for Python. Altair offers a powerful and concise visualization grammar for quickly building a wide range of statistical graphics.
|
| 30 |
+
|
| 31 |
+
By *declarative*, we mean that you can provide a high-level specification of *what* you want the visualization to include, in terms of *data*, *graphical marks*, and *encoding channels*, rather than having to specify *how* to implement the visualization in terms of for-loops, low-level drawing commands, *etc*. The key idea is that you declare links between data fields and visual encoding channels, such as the x-axis, y-axis, color, *etc*. The rest of the plot details are handled automatically. Building on this declarative plotting idea, a surprising range of simple to sophisticated visualizations can be created using a concise grammar.
|
| 32 |
+
|
| 33 |
+
Altair is based on [Vega-Lite](https://vega.github.io/vega-lite/), a high-level grammar of interactive graphics. Altair provides a friendly Python [API (Application Programming Interface)](https://en.wikipedia.org/wiki/Application_programming_interface) that generates Vega-Lite specifications in [JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) format. Environments such as Jupyter Notebooks, JupyterLab, and Colab can then take this specification and render it directly in the web browser. To learn more about the motivation and basic concepts behind Altair and Vega-Lite, watch the [Vega-Lite presentation video from OpenVisConf 2017](https://www.youtube.com/watch?v=9uaHRWj04D4).
|
| 34 |
+
|
| 35 |
+
This notebook will guide you through the basic process of creating visualizations in Altair. First, you will need to make sure you have the Altair package and its dependencies installed (for more, see the [Altair installation documentation](https://altair-viz.github.io/getting_started/installation.html)), or you are using a notebook environment that includes the dependencies pre-installed.
|
| 36 |
+
|
| 37 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 38 |
+
""")
|
| 39 |
+
return
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
@app.cell(hide_code=True)
|
| 43 |
+
def _(mo):
|
| 44 |
+
mo.md(r"""
|
| 45 |
+
## Imports
|
| 46 |
+
|
| 47 |
+
To start, we must import the necessary libraries: Pandas for data frames and Altair for visualization.
|
| 48 |
+
""")
|
| 49 |
+
return
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
@app.cell
|
| 53 |
+
def _():
|
| 54 |
+
import pandas as pd
|
| 55 |
+
import altair as alt
|
| 56 |
+
|
| 57 |
+
return alt, pd
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
@app.cell(hide_code=True)
|
| 61 |
+
def _(mo):
|
| 62 |
+
mo.md(r"""
|
| 63 |
+
## Renderers
|
| 64 |
+
|
| 65 |
+
Depending on your environment, you may need to specify a [renderer](https://altair-viz.github.io/user_guide/display_frontends.html) for Altair. If you are using __JupyterLab__, __Jupyter Notebook__, or __Google Colab__ with a live Internet connection you should not need to do anything. Otherwise, please read the documentation for [Displaying Altair Charts](https://altair-viz.github.io/user_guide/display_frontends.html).
|
| 66 |
+
""")
|
| 67 |
+
return
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
@app.cell(hide_code=True)
|
| 71 |
+
def _(mo):
|
| 72 |
+
mo.md(r"""
|
| 73 |
+
## Data
|
| 74 |
+
|
| 75 |
+
Data in Altair is built around the Pandas data frame, which consists of a set of named data *columns*. We will also regularly refer to data columns as data *fields*.
|
| 76 |
+
|
| 77 |
+
When using Altair, datasets are commonly provided as data frames. Alternatively, Altair can also accept a URL to load a network-accessible dataset. As we will see, the named columns of the data frame are an essential piece of plotting with Altair.
|
| 78 |
+
|
| 79 |
+
We will often use datasets from the [vega-datasets](https://github.com/vega/vega-datasets) repository. Some of these datasets are directly available as Pandas data frames:
|
| 80 |
+
""")
|
| 81 |
+
return
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
@app.cell
|
| 85 |
+
def _():
|
| 86 |
+
from vega_datasets import data # import vega_datasets
|
| 87 |
+
cars = data.cars() # load cars data as a Pandas data frame
|
| 88 |
+
cars.head() # display the first five rows
|
| 89 |
+
return cars, data
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
@app.cell(hide_code=True)
|
| 93 |
+
def _(mo):
|
| 94 |
+
mo.md(r"""
|
| 95 |
+
Datasets in the vega-datasets collection can also be accessed via URLs:
|
| 96 |
+
""")
|
| 97 |
+
return
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
@app.cell
|
| 101 |
+
def _(data):
|
| 102 |
+
data.cars.url
|
| 103 |
+
return
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
@app.cell(hide_code=True)
|
| 107 |
+
def _(mo):
|
| 108 |
+
mo.md(r"""
|
| 109 |
+
Dataset URLs can be passed directly to Altair (for supported formats like JSON and [CSV](https://en.wikipedia.org/wiki/Comma-separated_values)), or loaded into a Pandas data frame like so:
|
| 110 |
+
""")
|
| 111 |
+
return
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
@app.cell
|
| 115 |
+
def _(data, pd):
|
| 116 |
+
pd.read_json(data.cars.url).head() # load JSON data into a data frame
|
| 117 |
+
return
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
@app.cell(hide_code=True)
|
| 121 |
+
def _(mo):
|
| 122 |
+
mo.md(r"""
|
| 123 |
+
For more information about data frames - and some useful transformations to prepare Pandas data frames for plotting with Altair! - see the [Specifying Data with Altair documentation](https://altair-viz.github.io/user_guide/data.html).
|
| 124 |
+
""")
|
| 125 |
+
return
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
@app.cell(hide_code=True)
|
| 129 |
+
def _(mo):
|
| 130 |
+
mo.md(r"""
|
| 131 |
+
### Weather Data
|
| 132 |
+
|
| 133 |
+
Statistical visualization in Altair begins with ["tidy"](http://vita.had.co.nz/papers/tidy-data.html) data frames. Here, we'll start by creating a simple data frame (`df`) containing the average precipitation (`precip`) for a given `city` and `month` :
|
| 134 |
+
""")
|
| 135 |
+
return
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
@app.cell
|
| 139 |
+
def _(pd):
|
| 140 |
+
df = pd.DataFrame({
|
| 141 |
+
'city': ['Seattle', 'Seattle', 'Seattle', 'New York', 'New York', 'New York', 'Chicago', 'Chicago', 'Chicago'],
|
| 142 |
+
'month': ['Apr', 'Aug', 'Dec', 'Apr', 'Aug', 'Dec', 'Apr', 'Aug', 'Dec'],
|
| 143 |
+
'precip': [2.68, 0.87, 5.31, 3.94, 4.13, 3.58, 3.62, 3.98, 2.56]
|
| 144 |
+
})
|
| 145 |
+
|
| 146 |
+
df
|
| 147 |
+
return (df,)
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
@app.cell(hide_code=True)
|
| 151 |
+
def _(mo):
|
| 152 |
+
mo.md(r"""
|
| 153 |
+
## The Chart Object
|
| 154 |
+
|
| 155 |
+
The fundamental object in Altair is the `Chart`, which takes a data frame as a single argument:
|
| 156 |
+
""")
|
| 157 |
+
return
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
@app.cell
|
| 161 |
+
def _(alt, df):
|
| 162 |
+
_chart = alt.Chart(df)
|
| 163 |
+
return
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
@app.cell(hide_code=True)
|
| 167 |
+
def _(mo):
|
| 168 |
+
mo.md(r"""
|
| 169 |
+
So far, we have defined the `Chart` object and passed it the simple data frame we generated above. We have not yet told the chart to *do* anything with the data.
|
| 170 |
+
""")
|
| 171 |
+
return
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
@app.cell(hide_code=True)
|
| 175 |
+
def _(mo):
|
| 176 |
+
mo.md(r"""
|
| 177 |
+
## Marks and Encodings
|
| 178 |
+
|
| 179 |
+
With a chart object in hand, we can now specify how we would like the data to be visualized. We first indicate what kind of graphical *mark* (geometric shape) we want to use to represent the data. We can set the `mark` attribute of the chart object using the the `Chart.mark_*` methods.
|
| 180 |
+
|
| 181 |
+
For example, we can show the data as a point using `Chart.mark_point()`:
|
| 182 |
+
""")
|
| 183 |
+
return
|
| 184 |
+
|
| 185 |
+
|
| 186 |
+
@app.cell
|
| 187 |
+
def _(alt, df):
|
| 188 |
+
alt.Chart(df).mark_point()
|
| 189 |
+
return
|
| 190 |
+
|
| 191 |
+
|
| 192 |
+
@app.cell(hide_code=True)
|
| 193 |
+
def _(mo):
|
| 194 |
+
mo.md(r"""
|
| 195 |
+
Here the rendering consists of one point per row in the dataset, all plotted on top of each other, since we have not yet specified positions for these points.
|
| 196 |
+
|
| 197 |
+
To visually separate the points, we can map various *encoding channels*, or *channels* for short, to fields in the dataset. For example, we could *encode* the field `city` of the data using the `y` channel, which represents the y-axis position of the points. To specify this, use the `encode` method:
|
| 198 |
+
""")
|
| 199 |
+
return
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
@app.cell
|
| 203 |
+
def _(alt, df):
|
| 204 |
+
alt.Chart(df).mark_point().encode(
|
| 205 |
+
y='city',
|
| 206 |
+
)
|
| 207 |
+
return
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
@app.cell(hide_code=True)
|
| 211 |
+
def _(mo):
|
| 212 |
+
mo.md(r"""
|
| 213 |
+
The `encode()` method builds a key-value mapping between encoding channels (such as `x`, `y`, `color`, `shape`, `size`, *etc.*) to fields in the dataset, accessed by field name. For Pandas data frames, Altair automatically determines an appropriate data type for the mapped column, which in this case is the *nominal* type, indicating unordered, categorical values.
|
| 214 |
+
|
| 215 |
+
Though we've now separated the data by one attribute, we still have multiple points overlapping within each category. Let's further separate these by adding an `x` encoding channel, mapped to the `'precip'` field:
|
| 216 |
+
""")
|
| 217 |
+
return
|
| 218 |
+
|
| 219 |
+
|
| 220 |
+
@app.cell
|
| 221 |
+
def _(alt, df):
|
| 222 |
+
alt.Chart(df).mark_point().encode(
|
| 223 |
+
x='precip',
|
| 224 |
+
y='city'
|
| 225 |
+
)
|
| 226 |
+
return
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
@app.cell(hide_code=True)
|
| 230 |
+
def _(mo):
|
| 231 |
+
mo.md(r"""
|
| 232 |
+
_Seattle exhibits both the least-rainiest and most-rainiest months!_
|
| 233 |
+
|
| 234 |
+
The data type of the `'precip'` field is again automatically inferred by Altair, and this time is treated as a *quantitative* type (that is, a real-valued number). We see that grid lines and appropriate axis titles are automatically added as well.
|
| 235 |
+
|
| 236 |
+
Above we have specified key-value pairs using keyword arguments (`x='precip'`). In addition, Altair provides construction methods for encoding definitions, using the syntax `alt.X('precip')`. This alternative is useful for providing more parameters to an encoding, as we will see later in this notebook.
|
| 237 |
+
""")
|
| 238 |
+
return
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
@app.cell
|
| 242 |
+
def _(alt, df):
|
| 243 |
+
alt.Chart(df).mark_point().encode(
|
| 244 |
+
alt.X('precip'),
|
| 245 |
+
alt.Y('city')
|
| 246 |
+
)
|
| 247 |
+
return
|
| 248 |
+
|
| 249 |
+
|
| 250 |
+
@app.cell(hide_code=True)
|
| 251 |
+
def _(mo):
|
| 252 |
+
mo.md(r"""
|
| 253 |
+
The two styles of specifying encodings can be interleaved: `x='precip', alt.Y('city')` is also a valid input to the `encode` function.
|
| 254 |
+
|
| 255 |
+
In the examples above, the data type for each field is inferred automatically based on its type within the Pandas data frame. We can also explicitly indicate the data type to Altair by annotating the field name:
|
| 256 |
+
|
| 257 |
+
- `'b:N'` indicates a *nominal* type (unordered, categorical data),
|
| 258 |
+
- `'b:O'` indicates an *ordinal* type (rank-ordered data),
|
| 259 |
+
- `'b:Q'` indicates a *quantitative* type (numerical data with meaningful magnitudes), and
|
| 260 |
+
- `'b:T'` indicates a *temporal* type (date/time data)
|
| 261 |
+
|
| 262 |
+
For example, `alt.X('precip:N')`.
|
| 263 |
+
|
| 264 |
+
Explicit annotation of data types is necessary when data is loaded from an external URL directly by Vega-Lite (skipping Pandas entirely), or when we wish to use a type that differs from the type that was automatically inferred.
|
| 265 |
+
|
| 266 |
+
What do you think will happen to our chart above if we treat `precip` as a nominal or ordinal variable, rather than a quantitative variable? _Modify the code above and find out!_
|
| 267 |
+
|
| 268 |
+
We will take a closer look at data types and encoding channels in the next notebook of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum#data-visualization-curriculum).
|
| 269 |
+
""")
|
| 270 |
+
return
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
@app.cell(hide_code=True)
|
| 274 |
+
def _(mo):
|
| 275 |
+
mo.md(r"""
|
| 276 |
+
## Data Transformation: Aggregation
|
| 277 |
+
|
| 278 |
+
To allow for more flexibility in how data are visualized, Altair has a built-in syntax for *aggregation* of data. For example, we can compute the average of all values by specifying an aggregation function along with the field name:
|
| 279 |
+
""")
|
| 280 |
+
return
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
@app.cell
|
| 284 |
+
def _(alt, df):
|
| 285 |
+
alt.Chart(df).mark_point().encode(
|
| 286 |
+
x='average(precip)',
|
| 287 |
+
y='city'
|
| 288 |
+
)
|
| 289 |
+
return
|
| 290 |
+
|
| 291 |
+
|
| 292 |
+
@app.cell(hide_code=True)
|
| 293 |
+
def _(mo):
|
| 294 |
+
mo.md(r"""
|
| 295 |
+
Now within each x-axis category, we see a single point reflecting the *average* of the values within that category.
|
| 296 |
+
|
| 297 |
+
_Does Seattle really have the lowest average precipitation of these cities? (It does!) Still, how might this plot mislead? Which months are included? What counts as precipitation?_
|
| 298 |
+
|
| 299 |
+
Altair supports a variety of aggregation functions, including `count`, `min` (minimum), `max` (maximum), `average`, `median`, and `stdev` (standard deviation). In a later notebook, we will take a tour of data transformations, including aggregation, sorting, filtering, and creation of new derived fields using calculation formulas.
|
| 300 |
+
""")
|
| 301 |
+
return
|
| 302 |
+
|
| 303 |
+
|
| 304 |
+
@app.cell(hide_code=True)
|
| 305 |
+
def _(mo):
|
| 306 |
+
mo.md(r"""
|
| 307 |
+
## Changing the Mark Type
|
| 308 |
+
|
| 309 |
+
Let's say we want to represent our aggregated values using rectangular bars rather than circular points. We can do this by replacing `Chart.mark_point` with `Chart.mark_bar`:
|
| 310 |
+
""")
|
| 311 |
+
return
|
| 312 |
+
|
| 313 |
+
|
| 314 |
+
@app.cell
|
| 315 |
+
def _(alt, df):
|
| 316 |
+
alt.Chart(df).mark_bar().encode(
|
| 317 |
+
x='average(precip)',
|
| 318 |
+
y='city'
|
| 319 |
+
)
|
| 320 |
+
return
|
| 321 |
+
|
| 322 |
+
|
| 323 |
+
@app.cell(hide_code=True)
|
| 324 |
+
def _(mo):
|
| 325 |
+
mo.md(r"""
|
| 326 |
+
Because the nominal field `a` is mapped to the `y`-axis, the result is a horizontal bar chart. To get a vertical bar chart, we can simply swap the `x` and `y` keywords:
|
| 327 |
+
""")
|
| 328 |
+
return
|
| 329 |
+
|
| 330 |
+
|
| 331 |
+
@app.cell
|
| 332 |
+
def _(alt, df):
|
| 333 |
+
alt.Chart(df).mark_bar().encode(
|
| 334 |
+
x='city',
|
| 335 |
+
y='average(precip)'
|
| 336 |
+
)
|
| 337 |
+
return
|
| 338 |
+
|
| 339 |
+
|
| 340 |
+
@app.cell(hide_code=True)
|
| 341 |
+
def _(mo):
|
| 342 |
+
mo.md(r"""
|
| 343 |
+
## Customizing a Visualization
|
| 344 |
+
|
| 345 |
+
By default Altair / Vega-Lite make some choices about properties of the visualization, but these can be changed using methods to customize the look of the visualization. For example, we can specify the axis titles using the `axis` attribute of channel classes, we can modify scale properties using the `scale` attribute, and we can specify the color of the marking by setting the `color` keyword of the `Chart.mark_*` methods to any valid [CSS color string](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value):
|
| 346 |
+
""")
|
| 347 |
+
return
|
| 348 |
+
|
| 349 |
+
|
| 350 |
+
@app.cell
|
| 351 |
+
def _(alt, df):
|
| 352 |
+
alt.Chart(df).mark_point(color='firebrick').encode(
|
| 353 |
+
alt.X('precip', scale=alt.Scale(type='log'), axis=alt.Axis(title='Log-Scaled Values')),
|
| 354 |
+
alt.Y('city', axis=alt.Axis(title='Category')),
|
| 355 |
+
)
|
| 356 |
+
return
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
@app.cell(hide_code=True)
|
| 360 |
+
def _(mo):
|
| 361 |
+
mo.md(r"""
|
| 362 |
+
A subsequent module will explore the various options available for scales, axes, and legends to create customized charts.
|
| 363 |
+
""")
|
| 364 |
+
return
|
| 365 |
+
|
| 366 |
+
|
| 367 |
+
@app.cell(hide_code=True)
|
| 368 |
+
def _(mo):
|
| 369 |
+
mo.md(r"""
|
| 370 |
+
## Multiple Views
|
| 371 |
+
|
| 372 |
+
As we've seen above, the Altair `Chart` object represents a plot with a single mark type. What about more complicated diagrams, involving multiple charts or layers? Using a set of *view composition* operators, Altair can take multiple chart definitions and combine them to create more complex views.
|
| 373 |
+
|
| 374 |
+
As a starting point, let's plot the cars dataset in a line chart showing the average mileage by the year of manufacture:
|
| 375 |
+
""")
|
| 376 |
+
return
|
| 377 |
+
|
| 378 |
+
|
| 379 |
+
@app.cell
|
| 380 |
+
def _(alt, cars):
|
| 381 |
+
alt.Chart(cars).mark_line().encode(
|
| 382 |
+
alt.X('Year'),
|
| 383 |
+
alt.Y('average(Miles_per_Gallon)')
|
| 384 |
+
)
|
| 385 |
+
return
|
| 386 |
+
|
| 387 |
+
|
| 388 |
+
@app.cell(hide_code=True)
|
| 389 |
+
def _(mo):
|
| 390 |
+
mo.md(r"""
|
| 391 |
+
To augment this plot, we might like to add `circle` marks for each averaged data point. (The `circle` mark is just a convenient shorthand for `point` marks that used filled circles.)
|
| 392 |
+
|
| 393 |
+
We can start by defining each chart separately: first a line plot, then a scatter plot. We can then use the `layer` operator to combine the two into a layered chart. Here we use the shorthand `+` (plus) operator to invoke layering:
|
| 394 |
+
""")
|
| 395 |
+
return
|
| 396 |
+
|
| 397 |
+
|
| 398 |
+
@app.cell
|
| 399 |
+
def _(alt, cars):
|
| 400 |
+
line = alt.Chart(cars).mark_line().encode(
|
| 401 |
+
alt.X('Year'),
|
| 402 |
+
alt.Y('average(Miles_per_Gallon)')
|
| 403 |
+
)
|
| 404 |
+
|
| 405 |
+
point = alt.Chart(cars).mark_circle().encode(
|
| 406 |
+
alt.X('Year'),
|
| 407 |
+
alt.Y('average(Miles_per_Gallon)')
|
| 408 |
+
)
|
| 409 |
+
|
| 410 |
+
line + point
|
| 411 |
+
return
|
| 412 |
+
|
| 413 |
+
|
| 414 |
+
@app.cell(hide_code=True)
|
| 415 |
+
def _(mo):
|
| 416 |
+
mo.md(r"""
|
| 417 |
+
We can also create this chart by *reusing* and *modifying* a previous chart definition! Rather than completely re-write a chart, we can start with the line chart, then invoke the `mark_point` method to generate a new chart definition with a different mark type:
|
| 418 |
+
""")
|
| 419 |
+
return
|
| 420 |
+
|
| 421 |
+
|
| 422 |
+
@app.cell
|
| 423 |
+
def _(alt, cars):
|
| 424 |
+
mpg = alt.Chart(cars).mark_line().encode(
|
| 425 |
+
alt.X('Year'),
|
| 426 |
+
alt.Y('average(Miles_per_Gallon)')
|
| 427 |
+
)
|
| 428 |
+
|
| 429 |
+
mpg + mpg.mark_circle()
|
| 430 |
+
return (mpg,)
|
| 431 |
+
|
| 432 |
+
|
| 433 |
+
@app.cell(hide_code=True)
|
| 434 |
+
def _(mo):
|
| 435 |
+
mo.md(r"""
|
| 436 |
+
<em>(The need to place points on lines is so common, the `line` mark also includes a shorthand to generate a new layer for you. Trying adding the argument `point=True` to the `mark_line` method!)</em>
|
| 437 |
+
|
| 438 |
+
Now, what if we'd like to see this chart alongside other plots, such as the average horsepower over time?
|
| 439 |
+
|
| 440 |
+
We can use *concatenation* operators to place multiple charts side-by-side, either vertically or horizontally. Here, we'll use the `|` (pipe) operator to perform horizontal concatenation of two charts:
|
| 441 |
+
""")
|
| 442 |
+
return
|
| 443 |
+
|
| 444 |
+
|
| 445 |
+
@app.cell
|
| 446 |
+
def _(alt, cars, mpg):
|
| 447 |
+
hp = alt.Chart(cars).mark_line().encode(
|
| 448 |
+
alt.X('Year'),
|
| 449 |
+
alt.Y('average(Horsepower)')
|
| 450 |
+
)
|
| 451 |
+
|
| 452 |
+
(mpg + mpg.mark_circle()) | (hp + hp.mark_circle())
|
| 453 |
+
return
|
| 454 |
+
|
| 455 |
+
|
| 456 |
+
@app.cell(hide_code=True)
|
| 457 |
+
def _(mo):
|
| 458 |
+
mo.md(r"""
|
| 459 |
+
_We can see that, in this dataset, over the 1970s and early '80s the average fuel efficiency improved while the average horsepower decreased._
|
| 460 |
+
|
| 461 |
+
A later notebook will focus on *view composition*, including not only layering and concatenation, but also the `facet` operator for splitting data into sub-plots and the `repeat` operator to concisely generate concatenated charts from a template.
|
| 462 |
+
""")
|
| 463 |
+
return
|
| 464 |
+
|
| 465 |
+
|
| 466 |
+
@app.cell(hide_code=True)
|
| 467 |
+
def _(mo):
|
| 468 |
+
mo.md(r"""
|
| 469 |
+
## Interactivity
|
| 470 |
+
|
| 471 |
+
In addition to basic plotting and view composition, one of Altair and Vega-Lite's most exciting features is its support for interaction.
|
| 472 |
+
|
| 473 |
+
To create a simple interactive plot that supports panning and zooming, we can invoke the `interactive()` method of the `Chart` object. In the chart below, click and drag to *pan* or use the scroll wheel to *zoom*:
|
| 474 |
+
""")
|
| 475 |
+
return
|
| 476 |
+
|
| 477 |
+
|
| 478 |
+
@app.cell
|
| 479 |
+
def _(alt, cars):
|
| 480 |
+
alt.Chart(cars).mark_point().encode(
|
| 481 |
+
x='Horsepower',
|
| 482 |
+
y='Miles_per_Gallon',
|
| 483 |
+
color='Origin',
|
| 484 |
+
).interactive()
|
| 485 |
+
return
|
| 486 |
+
|
| 487 |
+
|
| 488 |
+
@app.cell(hide_code=True)
|
| 489 |
+
def _(mo):
|
| 490 |
+
mo.md(r"""
|
| 491 |
+
To provide more details upon mouse hover, we can use the `tooltip` encoding channel:
|
| 492 |
+
""")
|
| 493 |
+
return
|
| 494 |
+
|
| 495 |
+
|
| 496 |
+
@app.cell
|
| 497 |
+
def _(alt, cars):
|
| 498 |
+
alt.Chart(cars).mark_point().encode(
|
| 499 |
+
x='Horsepower',
|
| 500 |
+
y='Miles_per_Gallon',
|
| 501 |
+
color='Origin',
|
| 502 |
+
tooltip=['Name', 'Origin'] # show Name and Origin in a tooltip
|
| 503 |
+
).interactive()
|
| 504 |
+
return
|
| 505 |
+
|
| 506 |
+
|
| 507 |
+
@app.cell(hide_code=True)
|
| 508 |
+
def _(mo):
|
| 509 |
+
mo.md(r"""
|
| 510 |
+
For more complex interactions, such as linked charts and cross-filtering, Altair provides a *selection* abstraction for defining interactive selections and then binding them to components of a chart. We will cover this is in detail in a later notebook.
|
| 511 |
+
|
| 512 |
+
Below is a more complex example. The upper histogram shows the count of cars per year and uses an interactive selection to modify the opacity of points in the lower scatter plot, which shows horsepower versus mileage.
|
| 513 |
+
|
| 514 |
+
_Drag out an interval in the upper chart and see how it affects the points in the lower chart. As you examine the code, **don't worry if parts don't make sense yet!** This is an aspirational example, and we will fill in all the needed details over the course of the different notebooks._
|
| 515 |
+
""")
|
| 516 |
+
return
|
| 517 |
+
|
| 518 |
+
|
| 519 |
+
@app.cell
|
| 520 |
+
def _(alt, cars):
|
| 521 |
+
# create an interval selection over an x-axis encoding
|
| 522 |
+
brush = alt.selection_interval(encodings=['x'])
|
| 523 |
+
|
| 524 |
+
# determine opacity based on brush
|
| 525 |
+
opacity = alt.condition(brush, alt.value(0.9), alt.value(0.1))
|
| 526 |
+
|
| 527 |
+
# an overview histogram of cars per year
|
| 528 |
+
# add the interval brush to select cars over time
|
| 529 |
+
overview = alt.Chart(cars).mark_bar().encode(
|
| 530 |
+
alt.X('Year:O', timeUnit='year', # extract year unit, treat as ordinal
|
| 531 |
+
axis=alt.Axis(title=None, labelAngle=0) # no title, no label angle
|
| 532 |
+
),
|
| 533 |
+
alt.Y('count()', title=None), # counts, no axis title
|
| 534 |
+
opacity=opacity
|
| 535 |
+
).add_params(
|
| 536 |
+
brush # add interval brush selection to the chart
|
| 537 |
+
).properties(
|
| 538 |
+
width=400, # set the chart width to 400 pixels
|
| 539 |
+
height=50 # set the chart height to 50 pixels
|
| 540 |
+
)
|
| 541 |
+
|
| 542 |
+
# a detail scatterplot of horsepower vs. mileage
|
| 543 |
+
# modulate point opacity based on the brush selection
|
| 544 |
+
detail = alt.Chart(cars).mark_point().encode(
|
| 545 |
+
alt.X('Horsepower'),
|
| 546 |
+
alt.Y('Miles_per_Gallon'),
|
| 547 |
+
# set opacity based on brush selection
|
| 548 |
+
opacity=opacity
|
| 549 |
+
).properties(width=400) # set chart width to match the first chart
|
| 550 |
+
|
| 551 |
+
# vertically concatenate (vconcat) charts using the '&' operator
|
| 552 |
+
overview & detail
|
| 553 |
+
return
|
| 554 |
+
|
| 555 |
+
|
| 556 |
+
@app.cell(hide_code=True)
|
| 557 |
+
def _(mo):
|
| 558 |
+
mo.md(r"""
|
| 559 |
+
## Aside: Examining the JSON Output
|
| 560 |
+
|
| 561 |
+
As a Python API to Vega-Lite, Altair's main purpose is to convert plot specifications to a JSON string that conforms to the Vega-Lite schema. Using the `Chart.to_json` method, we can inspect the JSON specification that Altair is exporting and sending to Vega-Lite:
|
| 562 |
+
""")
|
| 563 |
+
return
|
| 564 |
+
|
| 565 |
+
|
| 566 |
+
@app.cell
|
| 567 |
+
def _(alt, df):
|
| 568 |
+
_chart = alt.Chart(df).mark_bar().encode(x='average(precip)', y='city')
|
| 569 |
+
print(_chart.to_json())
|
| 570 |
+
return
|
| 571 |
+
|
| 572 |
+
|
| 573 |
+
@app.cell(hide_code=True)
|
| 574 |
+
def _(mo):
|
| 575 |
+
mo.md(r"""
|
| 576 |
+
Notice here that `encode(x='average(precip)')` has been expanded to a JSON structure with a `field` name, a `type` for the data, and includes an `aggregate` field. The `encode(y='city')` statement has been expanded similarly.
|
| 577 |
+
|
| 578 |
+
As we saw earlier, Altair's shorthand syntax includes a way to specify the type of the field as well:
|
| 579 |
+
""")
|
| 580 |
+
return
|
| 581 |
+
|
| 582 |
+
|
| 583 |
+
@app.cell
|
| 584 |
+
def _(alt):
|
| 585 |
+
_x = alt.X('average(precip):Q')
|
| 586 |
+
print(_x.to_json())
|
| 587 |
+
return
|
| 588 |
+
|
| 589 |
+
|
| 590 |
+
@app.cell(hide_code=True)
|
| 591 |
+
def _(mo):
|
| 592 |
+
mo.md(r"""
|
| 593 |
+
This short-hand is equivalent to spelling-out the attributes by name:
|
| 594 |
+
""")
|
| 595 |
+
return
|
| 596 |
+
|
| 597 |
+
|
| 598 |
+
@app.cell
|
| 599 |
+
def _(alt):
|
| 600 |
+
_x = alt.X(aggregate='average', field='precip', type='quantitative')
|
| 601 |
+
print(_x.to_json())
|
| 602 |
+
return
|
| 603 |
+
|
| 604 |
+
|
| 605 |
+
@app.cell(hide_code=True)
|
| 606 |
+
def _(mo):
|
| 607 |
+
mo.md(r"""
|
| 608 |
+
## Publishing a Visualization
|
| 609 |
+
|
| 610 |
+
Once you have visualized your data, perhaps you would like to publish it somewhere on the web. This can be done straightforwardly using the [vega-embed JavaScript package](https://github.com/vega/vega-embed). A simple example of a stand-alone HTML document can be generated for any chart using the `Chart.save` method:
|
| 611 |
+
|
| 612 |
+
```python
|
| 613 |
+
chart = alt.Chart(df).mark_bar().encode(
|
| 614 |
+
x='average(precip)',
|
| 615 |
+
y='city',
|
| 616 |
+
)
|
| 617 |
+
chart.save('chart.html')
|
| 618 |
+
```
|
| 619 |
+
|
| 620 |
+
|
| 621 |
+
The basic HTML template produces output that looks like this, where the JSON specification for your plot produced by `Chart.to_json` should be stored in the `spec` JavaScript variable:
|
| 622 |
+
|
| 623 |
+
```html
|
| 624 |
+
<!DOCTYPE html>
|
| 625 |
+
<html>
|
| 626 |
+
<head>
|
| 627 |
+
<script src="https://cdn.jsdelivr.net/npm/vega@5"></script>
|
| 628 |
+
<script src="https://cdn.jsdelivr.net/npm/vega-lite@4"></script>
|
| 629 |
+
<script src="https://cdn.jsdelivr.net/npm/vega-embed@6"></script>
|
| 630 |
+
</head>
|
| 631 |
+
<body>
|
| 632 |
+
<div id="vis"></div>
|
| 633 |
+
<script>
|
| 634 |
+
(function(vegaEmbed) {
|
| 635 |
+
var spec = {}; /* JSON output for your chart's specification */
|
| 636 |
+
var embedOpt = {"mode": "vega-lite"}; /* Options for the embedding */
|
| 637 |
+
|
| 638 |
+
function showError(el, error){
|
| 639 |
+
el.innerHTML = ('<div style="color:red;">'
|
| 640 |
+
+ '<p>JavaScript Error: ' + error.message + '</p>'
|
| 641 |
+
+ "<p>This usually means there's a typo in your chart specification. "
|
| 642 |
+
+ "See the javascript console for the full traceback.</p>"
|
| 643 |
+
+ '</div>');
|
| 644 |
+
throw error;
|
| 645 |
+
}
|
| 646 |
+
const el = document.getElementById('vis');
|
| 647 |
+
vegaEmbed("#vis", spec, embedOpt)
|
| 648 |
+
.catch(error => showError(el, error));
|
| 649 |
+
})(vegaEmbed);
|
| 650 |
+
</script>
|
| 651 |
+
</body>
|
| 652 |
+
</html>
|
| 653 |
+
```
|
| 654 |
+
|
| 655 |
+
The `Chart.save` method provides a convenient way to save such HTML output to file. For more information on embedding Altair/Vega-Lite, see the [documentation of the vega-embed project](https://github.com/vega/vega-embed).
|
| 656 |
+
""")
|
| 657 |
+
return
|
| 658 |
+
|
| 659 |
+
|
| 660 |
+
@app.cell(hide_code=True)
|
| 661 |
+
def _(mo):
|
| 662 |
+
mo.md(r"""
|
| 663 |
+
## Next Steps
|
| 664 |
+
|
| 665 |
+
🎉 Hooray, you've completed the introduction to Altair! In the next notebook, we will dive deeper into creating visualizations using Altair's model of data types, graphical marks, and visual encoding channels.
|
| 666 |
+
""")
|
| 667 |
+
return
|
| 668 |
+
|
| 669 |
+
|
| 670 |
+
if __name__ == "__main__":
|
| 671 |
+
app.run()
|
|
@@ -0,0 +1,1126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# "vega_datasets==0.9.0",
|
| 8 |
+
# ]
|
| 9 |
+
# ///
|
| 10 |
+
|
| 11 |
+
import marimo
|
| 12 |
+
|
| 13 |
+
__generated_with = "0.20.4"
|
| 14 |
+
app = marimo.App()
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
@app.cell
|
| 18 |
+
def _():
|
| 19 |
+
import marimo as mo
|
| 20 |
+
|
| 21 |
+
return (mo,)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@app.cell(hide_code=True)
|
| 25 |
+
def _(mo):
|
| 26 |
+
mo.md(r"""
|
| 27 |
+
# Data Types, Graphical Marks, and Visual Encoding Channels
|
| 28 |
+
|
| 29 |
+
A visualization represents data using a collection of _graphical marks_ (bars, lines, points, etc.). The attributes of a mark — such as its position, shape, size, or color — serve as _channels_ through which we can encode underlying data values.
|
| 30 |
+
""")
|
| 31 |
+
return
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
@app.cell(hide_code=True)
|
| 35 |
+
def _(mo):
|
| 36 |
+
mo.md(r"""
|
| 37 |
+
With a basic framework of _data types_, _marks_, and _encoding channels_, we can concisely create a wide variety of visualizations. In this notebook, we explore each of these elements and show how to use them to create custom statistical graphics.
|
| 38 |
+
|
| 39 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 40 |
+
""")
|
| 41 |
+
return
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
@app.cell
|
| 45 |
+
def _():
|
| 46 |
+
import pandas as pd
|
| 47 |
+
import altair as alt
|
| 48 |
+
|
| 49 |
+
return (alt,)
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
@app.cell(hide_code=True)
|
| 53 |
+
def _(mo):
|
| 54 |
+
mo.md(r"""
|
| 55 |
+
## Global Development Data
|
| 56 |
+
""")
|
| 57 |
+
return
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
@app.cell(hide_code=True)
|
| 61 |
+
def _(mo):
|
| 62 |
+
mo.md(r"""
|
| 63 |
+
We will be visualizing global health and population data for a number of countries, over the time period of 1955 to 2005. The data was collected by the [Gapminder Foundation](https://www.gapminder.org/) and shared in [Hans Rosling's popular TED talk](https://www.youtube.com/watch?v=hVimVzgtD6w). If you haven't seen the talk, we encourage you to watch it first!
|
| 64 |
+
|
| 65 |
+
Let's first load the dataset from the [vega-datasets](https://github.com/vega/vega-datasets) collection into a Pandas data frame.
|
| 66 |
+
""")
|
| 67 |
+
return
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
@app.cell
|
| 71 |
+
def _():
|
| 72 |
+
from vega_datasets import data as vega_data
|
| 73 |
+
data = vega_data.gapminder()
|
| 74 |
+
return (data,)
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
@app.cell(hide_code=True)
|
| 78 |
+
def _(mo):
|
| 79 |
+
mo.md(r"""
|
| 80 |
+
How big is the data?
|
| 81 |
+
""")
|
| 82 |
+
return
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
@app.cell
|
| 86 |
+
def _(data):
|
| 87 |
+
data.shape
|
| 88 |
+
return
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
@app.cell(hide_code=True)
|
| 92 |
+
def _(mo):
|
| 93 |
+
mo.md(r"""
|
| 94 |
+
693 rows and 6 columns! Let's take a peek at the data content:
|
| 95 |
+
""")
|
| 96 |
+
return
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
@app.cell
|
| 100 |
+
def _(data):
|
| 101 |
+
data.head(5)
|
| 102 |
+
return
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
@app.cell(hide_code=True)
|
| 106 |
+
def _(mo):
|
| 107 |
+
mo.md(r"""
|
| 108 |
+
For each `country` and `year` (in 5-year intervals), we have measures of fertility in terms of the number of children per woman (`fertility`), life expectancy in years (`life_expect`), and total population (`pop`).
|
| 109 |
+
|
| 110 |
+
We also see a `cluster` field with an integer code. What might this represent? We'll try and solve this mystery as we visualize the data!
|
| 111 |
+
""")
|
| 112 |
+
return
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
@app.cell(hide_code=True)
|
| 116 |
+
def _(mo):
|
| 117 |
+
mo.md(r"""
|
| 118 |
+
Let's also create a smaller data frame, filtered down to values for the year 2000 only:
|
| 119 |
+
""")
|
| 120 |
+
return
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
@app.cell
|
| 124 |
+
def _(data):
|
| 125 |
+
data2000 = data.loc[data['year'] == 2000]
|
| 126 |
+
return (data2000,)
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
@app.cell
|
| 130 |
+
def _(data2000):
|
| 131 |
+
data2000.head(5)
|
| 132 |
+
return
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
@app.cell(hide_code=True)
|
| 136 |
+
def _(mo):
|
| 137 |
+
mo.md(r"""
|
| 138 |
+
## Data Types
|
| 139 |
+
""")
|
| 140 |
+
return
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
@app.cell(hide_code=True)
|
| 144 |
+
def _(mo):
|
| 145 |
+
mo.md(r"""
|
| 146 |
+
The first ingredient in effective visualization is the input data. Data values can represent different forms of measurement. What kinds of comparisons do those measurements support? And what kinds of visual encodings then support those comparisons?
|
| 147 |
+
|
| 148 |
+
We will start by looking at the basic data types that Altair uses to inform visual encoding choices. These data types determine the kinds of comparisons we can make, and thereby guide our visualization design decisions.
|
| 149 |
+
|
| 150 |
+
### Nominal (N)
|
| 151 |
+
|
| 152 |
+
*Nominal* data (also called *categorical* data) consist of category names.
|
| 153 |
+
|
| 154 |
+
With nominal data we can compare the equality of values: *is value A the same or different than value B? (A = B)*, supporting statements like “A is equal to B” or “A is not equal to B”.
|
| 155 |
+
In the dataset above, the `country` field is nominal.
|
| 156 |
+
|
| 157 |
+
When visualizing nominal data we should readily be able to see if values are the same or different: position, color hue (blue, red, green, *etc.*), and shape can help. However, using a size channel to encode nominal data might mislead us, suggesting rank-order or magnitude differences among values that do not exist!
|
| 158 |
+
|
| 159 |
+
### Ordinal (O)
|
| 160 |
+
|
| 161 |
+
*Ordinal* data consist of values that have a specific ordering.
|
| 162 |
+
|
| 163 |
+
With ordinal data we can compare the rank-ordering of values: *does value A come before or after value B? (A < B)*, supporting statements like “A is less than B” or “A is greater than B”.
|
| 164 |
+
In the dataset above, we can treat the `year` field as ordinal.
|
| 165 |
+
|
| 166 |
+
When visualizing ordinal data, we should perceive a sense of rank-order. Position, size, or color value (brightness) might be appropriate, where as color hue (which is not perceptually ordered) would be less appropriate.
|
| 167 |
+
|
| 168 |
+
### Quantitative (Q)
|
| 169 |
+
|
| 170 |
+
With *quantitative* data we can measure numerical differences among values. There are multiple sub-types of quantitative data:
|
| 171 |
+
|
| 172 |
+
For *interval* data we can measure the distance (interval) between points: *what is the distance to value A from value B? (A - B)*, supporting statements such as “A is 12 units away from B”.
|
| 173 |
+
|
| 174 |
+
For *ratio* data the zero-point is meaningful and so we can also measure proportions or scale factors: *value A is what proportion of value B? (A / B)*, supporting statements such as “A is 10% of B” or “B is 7 times larger than A”.
|
| 175 |
+
|
| 176 |
+
In the dataset above, `year` is a quantitative interval field (the value of year "zero" is subjective), whereas `fertility` and `life_expect` are quantitative ratio fields (zero is meaningful for calculating proportions).
|
| 177 |
+
Vega-Lite represents quantitative data, but does not make a distinction between interval and ratio types.
|
| 178 |
+
|
| 179 |
+
Quantitative values can be visualized using position, size, or color value, among other channels. An axis with a zero baseline is essential for proportional comparisons of ratio values, but can be safely omitted for interval comparisons.
|
| 180 |
+
|
| 181 |
+
### Temporal (T)
|
| 182 |
+
|
| 183 |
+
*Temporal* values measure time points or intervals. This type is a special case of quantitative values (timestamps) with rich semantics and conventions (i.e., the [Gregorian calendar](https://en.wikipedia.org/wiki/Gregorian_calendar)). The temporal type in Vega-Lite supports reasoning about time units (year, month, day, hour, etc.), and provides methods for requesting specific time intervals.
|
| 184 |
+
|
| 185 |
+
Example temporal values include date strings such as `“2019-01-04”` and `“Jan 04 2019”`, as well as standardized date-times such as the [ISO date-time format](https://en.wikipedia.org/wiki/ISO_8601): `“2019-01-04T17:50:35.643Z”`.
|
| 186 |
+
|
| 187 |
+
There are no temporal values in our global development dataset above, as the `year` field is simply encoded as an integer. For more details about using temporal data in Altair, see the [Times and Dates documentation](https://altair-viz.github.io/user_guide/times_and_dates.html).
|
| 188 |
+
|
| 189 |
+
### Summary
|
| 190 |
+
|
| 191 |
+
These data types are not mutually exclusive, but rather form a hierarchy: ordinal data support nominal (equality) comparisons, while quantitative data support ordinal (rank-order) comparisons.
|
| 192 |
+
|
| 193 |
+
Moreover, these data types do _not_ provide a fixed categorization. Just because a data field is represented using a number doesn't mean we have to treat it as a quantitative type! For example, we might interpret a set of ages (10 years old, 20 years old, etc) as nominal (underage or overage), ordinal (grouped by year), or quantitative (calculate average age).
|
| 194 |
+
|
| 195 |
+
Now let's examine how to visually encode these data types!
|
| 196 |
+
""")
|
| 197 |
+
return
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
@app.cell(hide_code=True)
|
| 201 |
+
def _(mo):
|
| 202 |
+
mo.md(r"""
|
| 203 |
+
## Encoding Channels
|
| 204 |
+
|
| 205 |
+
At the heart of Altair is the use of *encodings* that bind data fields (with a given data type) to available encoding *channels* of a chosen *mark* type. In this notebook we'll examine the following encoding channels:
|
| 206 |
+
|
| 207 |
+
- `x`: Horizontal (x-axis) position of the mark.
|
| 208 |
+
- `y`: Vertical (y-axis) position of the mark.
|
| 209 |
+
- `size`: Size of the mark. May correspond to area or length, depending on the mark type.
|
| 210 |
+
- `color`: Mark color, specified as a [legal CSS color](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value).
|
| 211 |
+
- `opacity`: Mark opacity, ranging from 0 (fully transparent) to 1 (fully opaque).
|
| 212 |
+
- `shape`: Plotting symbol shape for `point` marks.
|
| 213 |
+
- `tooltip`: Tooltip text to display upon mouse hover over the mark.
|
| 214 |
+
- `order`: Mark ordering, determines line/area point order and drawing order.
|
| 215 |
+
- `column`: Facet the data into horizontally-aligned subplots.
|
| 216 |
+
- `row`: Facet the data into vertically-aligned subplots.
|
| 217 |
+
|
| 218 |
+
For a complete list of available channels, see the [Altair encoding documentation](https://altair-viz.github.io/user_guide/encodings/index.html).
|
| 219 |
+
""")
|
| 220 |
+
return
|
| 221 |
+
|
| 222 |
+
|
| 223 |
+
@app.cell(hide_code=True)
|
| 224 |
+
def _(mo):
|
| 225 |
+
mo.md(r"""
|
| 226 |
+
### X
|
| 227 |
+
|
| 228 |
+
The `x` encoding channel sets a mark's horizontal position (x-coordinate). In addition, default choices of axis and title are made automatically. In the chart below, the choice of a quantitative data type results in a continuous linear axis scale:
|
| 229 |
+
""")
|
| 230 |
+
return
|
| 231 |
+
|
| 232 |
+
|
| 233 |
+
@app.cell
|
| 234 |
+
def _(alt, data2000):
|
| 235 |
+
alt.Chart(data2000).mark_point().encode(
|
| 236 |
+
alt.X('fertility:Q')
|
| 237 |
+
)
|
| 238 |
+
return
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
@app.cell(hide_code=True)
|
| 242 |
+
def _(mo):
|
| 243 |
+
mo.md(r"""
|
| 244 |
+
### Y
|
| 245 |
+
|
| 246 |
+
The `y` encoding channel sets a mark's vertical position (y-coordinate). Here we've added the `cluster` field using an ordinal (`O`) data type. The result is a discrete axis that includes a sized band, with a default step size, for each unique value:
|
| 247 |
+
""")
|
| 248 |
+
return
|
| 249 |
+
|
| 250 |
+
|
| 251 |
+
@app.cell
|
| 252 |
+
def _(alt, data2000):
|
| 253 |
+
alt.Chart(data2000).mark_point().encode(
|
| 254 |
+
alt.X('fertility:Q'),
|
| 255 |
+
alt.Y('cluster:O')
|
| 256 |
+
)
|
| 257 |
+
return
|
| 258 |
+
|
| 259 |
+
|
| 260 |
+
@app.cell(hide_code=True)
|
| 261 |
+
def _(mo):
|
| 262 |
+
mo.md(r"""
|
| 263 |
+
_What happens to the chart above if you swap the `O` and `Q` field types?_
|
| 264 |
+
|
| 265 |
+
If we instead add the `life_expect` field as a quantitative (`Q`) variable, the result is a scatter plot with linear scales for both axes:
|
| 266 |
+
""")
|
| 267 |
+
return
|
| 268 |
+
|
| 269 |
+
|
| 270 |
+
@app.cell
|
| 271 |
+
def _(alt, data2000):
|
| 272 |
+
alt.Chart(data2000).mark_point().encode(
|
| 273 |
+
alt.X('fertility:Q'),
|
| 274 |
+
alt.Y('life_expect:Q')
|
| 275 |
+
)
|
| 276 |
+
return
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
@app.cell(hide_code=True)
|
| 280 |
+
def _(mo):
|
| 281 |
+
mo.md(r"""
|
| 282 |
+
By default, axes for linear quantitative scales include zero to ensure a proper baseline for comparing ratio-valued data. In some cases, however, a zero baseline may be meaningless or you may want to focus on interval comparisons. To disable automatic inclusion of zero, configure the scale mapping using the encoding `scale` attribute:
|
| 283 |
+
""")
|
| 284 |
+
return
|
| 285 |
+
|
| 286 |
+
|
| 287 |
+
@app.cell
|
| 288 |
+
def _(alt, data2000):
|
| 289 |
+
alt.Chart(data2000).mark_point().encode(
|
| 290 |
+
alt.X('fertility:Q', scale=alt.Scale(zero=False)),
|
| 291 |
+
alt.Y('life_expect:Q', scale=alt.Scale(zero=False))
|
| 292 |
+
)
|
| 293 |
+
return
|
| 294 |
+
|
| 295 |
+
|
| 296 |
+
@app.cell(hide_code=True)
|
| 297 |
+
def _(mo):
|
| 298 |
+
mo.md(r"""
|
| 299 |
+
Now the axis scales no longer include zero by default. Some padding still remains, as the axis domain end points are automatically snapped to _nice_ numbers like multiples of 5 or 10.
|
| 300 |
+
|
| 301 |
+
_What happens if you also add `nice=False` to the scale attribute above?_
|
| 302 |
+
""")
|
| 303 |
+
return
|
| 304 |
+
|
| 305 |
+
|
| 306 |
+
@app.cell(hide_code=True)
|
| 307 |
+
def _(mo):
|
| 308 |
+
mo.md(r"""
|
| 309 |
+
### Size
|
| 310 |
+
""")
|
| 311 |
+
return
|
| 312 |
+
|
| 313 |
+
|
| 314 |
+
@app.cell(hide_code=True)
|
| 315 |
+
def _(mo):
|
| 316 |
+
mo.md(r"""
|
| 317 |
+
The `size` encoding channel sets a mark's size or extent. The meaning of the channel can vary based on the mark type. For `point` marks, the `size` channel maps to the pixel area of the plotting symbol, such that the diameter of the point matches the square root of the size value.
|
| 318 |
+
|
| 319 |
+
Let's augment our scatter plot by encoding population (`pop`) on the `size` channel. As a result, the chart now also includes a legend for interpreting the size values.
|
| 320 |
+
""")
|
| 321 |
+
return
|
| 322 |
+
|
| 323 |
+
|
| 324 |
+
@app.cell
|
| 325 |
+
def _(alt, data2000):
|
| 326 |
+
alt.Chart(data2000).mark_point().encode(
|
| 327 |
+
alt.X('fertility:Q'),
|
| 328 |
+
alt.Y('life_expect:Q'),
|
| 329 |
+
alt.Size('pop:Q')
|
| 330 |
+
)
|
| 331 |
+
return
|
| 332 |
+
|
| 333 |
+
|
| 334 |
+
@app.cell(hide_code=True)
|
| 335 |
+
def _(mo):
|
| 336 |
+
mo.md(r"""
|
| 337 |
+
In some cases we might be unsatisfied with the default size range. To provide a customized span of sizes, set the `range` parameter of the `scale` attribute to an array indicating the smallest and largest sizes. Here we update the size encoding to range from 0 pixels (for zero values) to 1,000 pixels (for the maximum value in the scale domain):
|
| 338 |
+
""")
|
| 339 |
+
return
|
| 340 |
+
|
| 341 |
+
|
| 342 |
+
@app.cell
|
| 343 |
+
def _(alt, data2000):
|
| 344 |
+
alt.Chart(data2000).mark_point().encode(
|
| 345 |
+
alt.X('fertility:Q'),
|
| 346 |
+
alt.Y('life_expect:Q'),
|
| 347 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000]))
|
| 348 |
+
)
|
| 349 |
+
return
|
| 350 |
+
|
| 351 |
+
|
| 352 |
+
@app.cell(hide_code=True)
|
| 353 |
+
def _(mo):
|
| 354 |
+
mo.md(r"""
|
| 355 |
+
### Color and Opacity
|
| 356 |
+
""")
|
| 357 |
+
return
|
| 358 |
+
|
| 359 |
+
|
| 360 |
+
@app.cell(hide_code=True)
|
| 361 |
+
def _(mo):
|
| 362 |
+
mo.md(r"""
|
| 363 |
+
The `color` encoding channel sets a mark's color. The style of color encoding is highly dependent on the data type: nominal data will default to a multi-hued qualitative color scheme, whereas ordinal and quantitative data will use perceptually ordered color gradients.
|
| 364 |
+
|
| 365 |
+
Here, we encode the `cluster` field using the `color` channel and a nominal (`N`) data type, resulting in a distinct hue for each cluster value. Can you start to guess what the `cluster` field might indicate?
|
| 366 |
+
""")
|
| 367 |
+
return
|
| 368 |
+
|
| 369 |
+
|
| 370 |
+
@app.cell
|
| 371 |
+
def _(alt, data2000):
|
| 372 |
+
alt.Chart(data2000).mark_point().encode(
|
| 373 |
+
alt.X('fertility:Q'),
|
| 374 |
+
alt.Y('life_expect:Q'),
|
| 375 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 376 |
+
alt.Color('cluster:N')
|
| 377 |
+
)
|
| 378 |
+
return
|
| 379 |
+
|
| 380 |
+
|
| 381 |
+
@app.cell(hide_code=True)
|
| 382 |
+
def _(mo):
|
| 383 |
+
mo.md(r"""
|
| 384 |
+
If we prefer filled shapes, we can can pass a `filled=True` parameter to the `mark_point` method:
|
| 385 |
+
""")
|
| 386 |
+
return
|
| 387 |
+
|
| 388 |
+
|
| 389 |
+
@app.cell
|
| 390 |
+
def _(alt, data2000):
|
| 391 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 392 |
+
alt.X('fertility:Q'),
|
| 393 |
+
alt.Y('life_expect:Q'),
|
| 394 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 395 |
+
alt.Color('cluster:N')
|
| 396 |
+
)
|
| 397 |
+
return
|
| 398 |
+
|
| 399 |
+
|
| 400 |
+
@app.cell(hide_code=True)
|
| 401 |
+
def _(mo):
|
| 402 |
+
mo.md(r"""
|
| 403 |
+
By default, Altair uses a bit of transparency to help combat over-plotting. We are free to further adjust the opacity, either by passing a default value to the `mark_*` method, or using a dedicated encoding channel.
|
| 404 |
+
|
| 405 |
+
Here we demonstrate how to provide a constant value to an encoding channel instead of binding a data field:
|
| 406 |
+
""")
|
| 407 |
+
return
|
| 408 |
+
|
| 409 |
+
|
| 410 |
+
@app.cell
|
| 411 |
+
def _(alt, data2000):
|
| 412 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 413 |
+
alt.X('fertility:Q'),
|
| 414 |
+
alt.Y('life_expect:Q'),
|
| 415 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 416 |
+
alt.Color('cluster:N'),
|
| 417 |
+
alt.OpacityValue(0.5)
|
| 418 |
+
)
|
| 419 |
+
return
|
| 420 |
+
|
| 421 |
+
|
| 422 |
+
@app.cell(hide_code=True)
|
| 423 |
+
def _(mo):
|
| 424 |
+
mo.md(r"""
|
| 425 |
+
### Shape
|
| 426 |
+
""")
|
| 427 |
+
return
|
| 428 |
+
|
| 429 |
+
|
| 430 |
+
@app.cell(hide_code=True)
|
| 431 |
+
def _(mo):
|
| 432 |
+
mo.md(r"""
|
| 433 |
+
The `shape` encoding channel sets the geometric shape used by `point` marks. Unlike the other channels we have seen so far, the `shape` channel can not be used by other mark types. The shape encoding channel should only be used with nominal data, as perceptual rank-order and magnitude comparisons are not supported.
|
| 434 |
+
|
| 435 |
+
Let's encode the `cluster` field using `shape` as well as `color`. Using multiple channels for the same underlying data field is known as a *redundant encoding*. The resulting chart combines both color and shape information into a single symbol legend:
|
| 436 |
+
""")
|
| 437 |
+
return
|
| 438 |
+
|
| 439 |
+
|
| 440 |
+
@app.cell
|
| 441 |
+
def _(alt, data2000):
|
| 442 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 443 |
+
alt.X('fertility:Q'),
|
| 444 |
+
alt.Y('life_expect:Q'),
|
| 445 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 446 |
+
alt.Color('cluster:N'),
|
| 447 |
+
alt.OpacityValue(0.5),
|
| 448 |
+
alt.Shape('cluster:N')
|
| 449 |
+
)
|
| 450 |
+
return
|
| 451 |
+
|
| 452 |
+
|
| 453 |
+
@app.cell(hide_code=True)
|
| 454 |
+
def _(mo):
|
| 455 |
+
mo.md(r"""
|
| 456 |
+
### Tooltips & Ordering
|
| 457 |
+
""")
|
| 458 |
+
return
|
| 459 |
+
|
| 460 |
+
|
| 461 |
+
@app.cell(hide_code=True)
|
| 462 |
+
def _(mo):
|
| 463 |
+
mo.md(r"""
|
| 464 |
+
By this point, you might feel a bit frustrated: we've built up a chart, but we still don't know what countries the visualized points correspond to! Let's add interactive tooltips to enable exploration.
|
| 465 |
+
|
| 466 |
+
The `tooltip` encoding channel determines tooltip text to show when a user moves the mouse cursor over a mark. Let's add a tooltip encoding for the `country` field, then investigate which countries are being represented.
|
| 467 |
+
""")
|
| 468 |
+
return
|
| 469 |
+
|
| 470 |
+
|
| 471 |
+
@app.cell
|
| 472 |
+
def _(alt, data2000):
|
| 473 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 474 |
+
alt.X('fertility:Q'),
|
| 475 |
+
alt.Y('life_expect:Q'),
|
| 476 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 477 |
+
alt.Color('cluster:N'),
|
| 478 |
+
alt.OpacityValue(0.5),
|
| 479 |
+
alt.Tooltip('country')
|
| 480 |
+
)
|
| 481 |
+
return
|
| 482 |
+
|
| 483 |
+
|
| 484 |
+
@app.cell(hide_code=True)
|
| 485 |
+
def _(mo):
|
| 486 |
+
mo.md(r"""
|
| 487 |
+
As you mouse around you may notice that you can not select some of the points. For example, the largest dark blue circle corresponds to India, which is drawn on top of a country with a smaller population, preventing the mouse from hovering over that country. To fix this problem, we can use the `order` encoding channel.
|
| 488 |
+
|
| 489 |
+
The `order` encoding channel determines the order of data points, affecting both the order in which they are drawn and, for `line` and `area` marks, the order in which they are connected to one another.
|
| 490 |
+
|
| 491 |
+
Let's order the values in descending rank order by the population (`pop`), ensuring that smaller circles are drawn later than larger circles:
|
| 492 |
+
""")
|
| 493 |
+
return
|
| 494 |
+
|
| 495 |
+
|
| 496 |
+
@app.cell
|
| 497 |
+
def _(alt, data2000):
|
| 498 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 499 |
+
alt.X('fertility:Q'),
|
| 500 |
+
alt.Y('life_expect:Q'),
|
| 501 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 502 |
+
alt.Color('cluster:N'),
|
| 503 |
+
alt.OpacityValue(0.5),
|
| 504 |
+
alt.Tooltip('country:N'),
|
| 505 |
+
alt.Order('pop:Q', sort='descending')
|
| 506 |
+
)
|
| 507 |
+
return
|
| 508 |
+
|
| 509 |
+
|
| 510 |
+
@app.cell(hide_code=True)
|
| 511 |
+
def _(mo):
|
| 512 |
+
mo.md(r"""
|
| 513 |
+
Now we can identify the smaller country being obscured by India: it's Bangladesh!
|
| 514 |
+
|
| 515 |
+
We can also now figure out what the `cluster` field represents. Mouse over the various colored points to formulate your own explanation.
|
| 516 |
+
""")
|
| 517 |
+
return
|
| 518 |
+
|
| 519 |
+
|
| 520 |
+
@app.cell(hide_code=True)
|
| 521 |
+
def _(mo):
|
| 522 |
+
mo.md(r"""
|
| 523 |
+
At this point we've added tooltips that show only a single property of the underlying data record. To show multiple values, we can provide the `tooltip` channel an array of encodings, one for each field we want to include:
|
| 524 |
+
""")
|
| 525 |
+
return
|
| 526 |
+
|
| 527 |
+
|
| 528 |
+
@app.cell
|
| 529 |
+
def _(alt, data2000):
|
| 530 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 531 |
+
alt.X('fertility:Q'),
|
| 532 |
+
alt.Y('life_expect:Q'),
|
| 533 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 534 |
+
alt.Color('cluster:N'),
|
| 535 |
+
alt.OpacityValue(0.5),
|
| 536 |
+
alt.Order('pop:Q', sort='descending'),
|
| 537 |
+
tooltip = [
|
| 538 |
+
alt.Tooltip('country:N'),
|
| 539 |
+
alt.Tooltip('fertility:Q'),
|
| 540 |
+
alt.Tooltip('life_expect:Q')
|
| 541 |
+
]
|
| 542 |
+
)
|
| 543 |
+
return
|
| 544 |
+
|
| 545 |
+
|
| 546 |
+
@app.cell(hide_code=True)
|
| 547 |
+
def _(mo):
|
| 548 |
+
mo.md(r"""
|
| 549 |
+
Now we can see multiple data fields upon mouse over!
|
| 550 |
+
""")
|
| 551 |
+
return
|
| 552 |
+
|
| 553 |
+
|
| 554 |
+
@app.cell(hide_code=True)
|
| 555 |
+
def _(mo):
|
| 556 |
+
mo.md(r"""
|
| 557 |
+
### Column and Row Facets
|
| 558 |
+
""")
|
| 559 |
+
return
|
| 560 |
+
|
| 561 |
+
|
| 562 |
+
@app.cell(hide_code=True)
|
| 563 |
+
def _(mo):
|
| 564 |
+
mo.md(r"""
|
| 565 |
+
Spatial position is one of the most powerful and flexible channels for visual encoding, but what can we do if we already have assigned fields to the `x` and `y` channels? One valuable technique is to create a *trellis plot*, consisting of sub-plots that show a subset of the data. A trellis plot is one example of the more general technique of presenting data using [small multiples](https://en.wikipedia.org/wiki/Small_multiple) of views.
|
| 566 |
+
|
| 567 |
+
The `column` and `row` encoding channels generate either a horizontal (columns) or vertical (rows) set of sub-plots, in which the data is partitioned according to the provided data field.
|
| 568 |
+
|
| 569 |
+
Here is a trellis plot that divides the data into one column per \`cluster\` value:
|
| 570 |
+
""")
|
| 571 |
+
return
|
| 572 |
+
|
| 573 |
+
|
| 574 |
+
@app.cell
|
| 575 |
+
def _(alt, data2000):
|
| 576 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 577 |
+
alt.X('fertility:Q'),
|
| 578 |
+
alt.Y('life_expect:Q'),
|
| 579 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000])),
|
| 580 |
+
alt.Color('cluster:N'),
|
| 581 |
+
alt.OpacityValue(0.5),
|
| 582 |
+
alt.Tooltip('country:N'),
|
| 583 |
+
alt.Order('pop:Q', sort='descending'),
|
| 584 |
+
alt.Column('cluster:N')
|
| 585 |
+
)
|
| 586 |
+
return
|
| 587 |
+
|
| 588 |
+
|
| 589 |
+
@app.cell(hide_code=True)
|
| 590 |
+
def _(mo):
|
| 591 |
+
mo.md(r"""
|
| 592 |
+
The plot above does not fit on screen, making it difficult to compare all the sub-plots to each other! We can set the default `width` and `height` properties to create a smaller set of multiples. Also, as the column headers already label the `cluster` values, let's remove our `color` legend by setting it to `None`. To make better use of space we can also orient our `size` legend to the `'bottom'` of the chart.
|
| 593 |
+
""")
|
| 594 |
+
return
|
| 595 |
+
|
| 596 |
+
|
| 597 |
+
@app.cell
|
| 598 |
+
def _(alt, data2000):
|
| 599 |
+
alt.Chart(data2000).mark_point(filled=True).encode(
|
| 600 |
+
alt.X('fertility:Q'),
|
| 601 |
+
alt.Y('life_expect:Q'),
|
| 602 |
+
alt.Size('pop:Q', scale=alt.Scale(range=[0,1000]),
|
| 603 |
+
legend=alt.Legend(orient='bottom', titleOrient='left')),
|
| 604 |
+
alt.Color('cluster:N', legend=None),
|
| 605 |
+
alt.OpacityValue(0.5),
|
| 606 |
+
alt.Tooltip('country:N'),
|
| 607 |
+
alt.Order('pop:Q', sort='descending'),
|
| 608 |
+
alt.Column('cluster:N')
|
| 609 |
+
).properties(width=135, height=135)
|
| 610 |
+
return
|
| 611 |
+
|
| 612 |
+
|
| 613 |
+
@app.cell(hide_code=True)
|
| 614 |
+
def _(mo):
|
| 615 |
+
mo.md(r"""
|
| 616 |
+
Underneath the hood, the `column` and `row` encodings are translated into a new specification that uses the `facet` view composition operator. We will re-visit faceting in greater depth later on!
|
| 617 |
+
|
| 618 |
+
In the meantime, _can you rewrite the chart above to facet into rows instead of columns?_
|
| 619 |
+
""")
|
| 620 |
+
return
|
| 621 |
+
|
| 622 |
+
|
| 623 |
+
@app.cell(hide_code=True)
|
| 624 |
+
def _(mo):
|
| 625 |
+
mo.md(r"""
|
| 626 |
+
### A Peek Ahead: Interactive Filtering
|
| 627 |
+
|
| 628 |
+
In later modules, we'll dive into interaction techniques for data exploration. Here is a sneak peak: binding a range slider to the `year` field to enable interactive scrubbing through each year of data. Don't worry if the code below is a bit confusing at this point, as we will cover interaction in detail later.
|
| 629 |
+
|
| 630 |
+
_Drag the slider back and forth to see how the data values change over time!_
|
| 631 |
+
""")
|
| 632 |
+
return
|
| 633 |
+
|
| 634 |
+
|
| 635 |
+
@app.cell
|
| 636 |
+
def _(alt, data):
|
| 637 |
+
select_year = alt.selection_point(
|
| 638 |
+
name='select', fields=['year'], value=[{'year': 1955}],
|
| 639 |
+
bind=alt.binding_range(min=1955, max=2005, step=5)
|
| 640 |
+
)
|
| 641 |
+
|
| 642 |
+
alt.Chart(data).mark_point(filled=True).encode(
|
| 643 |
+
alt.X('fertility:Q', scale=alt.Scale(domain=[0,9])),
|
| 644 |
+
alt.Y('life_expect:Q', scale=alt.Scale(domain=[0,90])),
|
| 645 |
+
alt.Size('pop:Q', scale=alt.Scale(domain=[0, 1200000000], range=[0,1000])),
|
| 646 |
+
alt.Color('cluster:N', legend=None),
|
| 647 |
+
alt.OpacityValue(0.5),
|
| 648 |
+
alt.Tooltip('country:N'),
|
| 649 |
+
alt.Order('pop:Q', sort='descending')
|
| 650 |
+
).add_params(select_year).transform_filter(select_year)
|
| 651 |
+
return
|
| 652 |
+
|
| 653 |
+
|
| 654 |
+
@app.cell(hide_code=True)
|
| 655 |
+
def _(mo):
|
| 656 |
+
mo.md(r"""
|
| 657 |
+
## Graphical Marks
|
| 658 |
+
|
| 659 |
+
Our exploration of encoding channels above exclusively uses `point` marks to visualize the data. However, the `point` mark type is only one of the many geometric shapes that can be used to visually represent data. Altair includes a number of built-in mark types, including:
|
| 660 |
+
|
| 661 |
+
- `mark_area()` - Filled areas defined by a top-line and a baseline.
|
| 662 |
+
- `mark_bar()` - Rectangular bars.
|
| 663 |
+
- `mark_circle()` - Scatter plot points as filled circles.
|
| 664 |
+
- `mark_line()` - Connected line segments.
|
| 665 |
+
- `mark_point()` - Scatter plot points with configurable shapes.
|
| 666 |
+
- `mark_rect()` - Filled rectangles, useful for heatmaps.
|
| 667 |
+
- `mark_rule()` - Vertical or horizontal lines spanning the axis.
|
| 668 |
+
- `mark_square()` - Scatter plot points as filled squares.
|
| 669 |
+
- `mark_text()` - Scatter plot points represented by text.
|
| 670 |
+
- `mark_tick()` - Vertical or horizontal tick marks.
|
| 671 |
+
|
| 672 |
+
For a complete list, and links to examples, see the [Altair marks documentation](https://altair-viz.github.io/user_guide/marks/index.html). Next, we will step through a number of the most commonly used mark types for statistical graphics.
|
| 673 |
+
""")
|
| 674 |
+
return
|
| 675 |
+
|
| 676 |
+
|
| 677 |
+
@app.cell(hide_code=True)
|
| 678 |
+
def _(mo):
|
| 679 |
+
mo.md(r"""
|
| 680 |
+
### Point Marks
|
| 681 |
+
|
| 682 |
+
The `point` mark type conveys specific points, as in *scatter plots* and *dot plots*. In addition to `x` and `y` encoding channels (to specify 2D point positions), point marks can use `color`, `size`, and `shape` encodings to convey additional data fields.
|
| 683 |
+
|
| 684 |
+
Below is a dot plot of `fertility`, with the `cluster` field redundantly encoded using both the `y` and `shape` channels.
|
| 685 |
+
""")
|
| 686 |
+
return
|
| 687 |
+
|
| 688 |
+
|
| 689 |
+
@app.cell
|
| 690 |
+
def _(alt, data2000):
|
| 691 |
+
alt.Chart(data2000).mark_point().encode(
|
| 692 |
+
alt.X('fertility:Q'),
|
| 693 |
+
alt.Y('cluster:N'),
|
| 694 |
+
alt.Shape('cluster:N')
|
| 695 |
+
)
|
| 696 |
+
return
|
| 697 |
+
|
| 698 |
+
|
| 699 |
+
@app.cell(hide_code=True)
|
| 700 |
+
def _(mo):
|
| 701 |
+
mo.md(r"""
|
| 702 |
+
In addition to encoding channels, marks can be stylized by providing values to the `mark_*()` methods.
|
| 703 |
+
|
| 704 |
+
For example: point marks are drawn with stroked outlines by default, but can be specified to use `filled` shapes instead. Similarly, you can set a default `size` to set the total pixel area of the point mark.
|
| 705 |
+
""")
|
| 706 |
+
return
|
| 707 |
+
|
| 708 |
+
|
| 709 |
+
@app.cell
|
| 710 |
+
def _(alt, data2000):
|
| 711 |
+
alt.Chart(data2000).mark_point(filled=True, size=100).encode(
|
| 712 |
+
alt.X('fertility:Q'),
|
| 713 |
+
alt.Y('cluster:N'),
|
| 714 |
+
alt.Shape('cluster:N')
|
| 715 |
+
)
|
| 716 |
+
return
|
| 717 |
+
|
| 718 |
+
|
| 719 |
+
@app.cell(hide_code=True)
|
| 720 |
+
def _(mo):
|
| 721 |
+
mo.md(r"""
|
| 722 |
+
### Circle Marks
|
| 723 |
+
|
| 724 |
+
The `circle` mark type is a convenient shorthand for `point` marks drawn as filled circles.
|
| 725 |
+
""")
|
| 726 |
+
return
|
| 727 |
+
|
| 728 |
+
|
| 729 |
+
@app.cell
|
| 730 |
+
def _(alt, data2000):
|
| 731 |
+
alt.Chart(data2000).mark_circle(size=100).encode(
|
| 732 |
+
alt.X('fertility:Q'),
|
| 733 |
+
alt.Y('cluster:N'),
|
| 734 |
+
alt.Shape('cluster:N')
|
| 735 |
+
)
|
| 736 |
+
return
|
| 737 |
+
|
| 738 |
+
|
| 739 |
+
@app.cell(hide_code=True)
|
| 740 |
+
def _(mo):
|
| 741 |
+
mo.md(r"""
|
| 742 |
+
### Square Marks
|
| 743 |
+
|
| 744 |
+
The `square` mark type is a convenient shorthand for `point` marks drawn as filled squares.
|
| 745 |
+
""")
|
| 746 |
+
return
|
| 747 |
+
|
| 748 |
+
|
| 749 |
+
@app.cell
|
| 750 |
+
def _(alt, data2000):
|
| 751 |
+
alt.Chart(data2000).mark_square(size=100).encode(
|
| 752 |
+
alt.X('fertility:Q'),
|
| 753 |
+
alt.Y('cluster:N'),
|
| 754 |
+
alt.Shape('cluster:N')
|
| 755 |
+
)
|
| 756 |
+
return
|
| 757 |
+
|
| 758 |
+
|
| 759 |
+
@app.cell(hide_code=True)
|
| 760 |
+
def _(mo):
|
| 761 |
+
mo.md(r"""
|
| 762 |
+
### Tick Marks
|
| 763 |
+
|
| 764 |
+
The `tick` mark type conveys a data point using a short line segment or "tick". These are particularly useful for comparing values along a single dimension with minimal overlap. A *dot plot* drawn with tick marks is sometimes referred to as a *strip plot*.
|
| 765 |
+
""")
|
| 766 |
+
return
|
| 767 |
+
|
| 768 |
+
|
| 769 |
+
@app.cell
|
| 770 |
+
def _(alt, data2000):
|
| 771 |
+
alt.Chart(data2000).mark_tick().encode(
|
| 772 |
+
alt.X('fertility:Q'),
|
| 773 |
+
alt.Y('cluster:N'),
|
| 774 |
+
alt.Shape('cluster:N')
|
| 775 |
+
)
|
| 776 |
+
return
|
| 777 |
+
|
| 778 |
+
|
| 779 |
+
@app.cell(hide_code=True)
|
| 780 |
+
def _(mo):
|
| 781 |
+
mo.md(r"""
|
| 782 |
+
### Bar Marks
|
| 783 |
+
|
| 784 |
+
The \`bar\` mark type draws a rectangle with a position, width, and height.
|
| 785 |
+
|
| 786 |
+
The plot below is a simple bar chart of the population (\`pop\`) of each country.
|
| 787 |
+
""")
|
| 788 |
+
return
|
| 789 |
+
|
| 790 |
+
|
| 791 |
+
@app.cell
|
| 792 |
+
def _(alt, data2000):
|
| 793 |
+
alt.Chart(data2000).mark_bar().encode(
|
| 794 |
+
alt.X('country:N'),
|
| 795 |
+
alt.Y('pop:Q')
|
| 796 |
+
)
|
| 797 |
+
return
|
| 798 |
+
|
| 799 |
+
|
| 800 |
+
@app.cell(hide_code=True)
|
| 801 |
+
def _(mo):
|
| 802 |
+
mo.md(r"""
|
| 803 |
+
The bar width is set to a default size. We will discuss how to adjust the bar width later in this notebook. (A subsequent notebook will take a closer look at configuring axes, scales, and legends.)
|
| 804 |
+
|
| 805 |
+
Bars can also be stacked. Let's change the `x` encoding to use the `cluster` field, and encode `country` using the `color` channel. We'll also disable the legend (which would be very long with colors for all countries!) and use tooltips for the country name.
|
| 806 |
+
""")
|
| 807 |
+
return
|
| 808 |
+
|
| 809 |
+
|
| 810 |
+
@app.cell
|
| 811 |
+
def _(alt, data2000):
|
| 812 |
+
alt.Chart(data2000).mark_bar().encode(
|
| 813 |
+
alt.X('cluster:N'),
|
| 814 |
+
alt.Y('pop:Q'),
|
| 815 |
+
alt.Color('country:N', legend=None),
|
| 816 |
+
alt.Tooltip('country:N')
|
| 817 |
+
)
|
| 818 |
+
return
|
| 819 |
+
|
| 820 |
+
|
| 821 |
+
@app.cell(hide_code=True)
|
| 822 |
+
def _(mo):
|
| 823 |
+
mo.md(r"""
|
| 824 |
+
In the chart above, the use of the `color` encoding channel causes Altair / Vega-Lite to automatically stack the bar marks. Otherwise, bars would be drawn on top of each other! Try adding the parameter `stack=None` to the `y` encoding channel to see what happens if we don't apply stacking...
|
| 825 |
+
""")
|
| 826 |
+
return
|
| 827 |
+
|
| 828 |
+
|
| 829 |
+
@app.cell(hide_code=True)
|
| 830 |
+
def _(mo):
|
| 831 |
+
mo.md(r"""
|
| 832 |
+
The examples above create bar charts from a zero-baseline, and the `y` channel only encodes the non-zero value (or height) of the bar. However, the bar mark also allows you to specify starting and ending points to convey ranges.
|
| 833 |
+
|
| 834 |
+
The chart below uses the `x` (starting point) and `x2` (ending point) channels to show the range of life expectancies within each regional cluster. Below we use the `min` and `max` aggregation functions to determine the end points of the range; we will discuss aggregation in greater detail in the next notebook!
|
| 835 |
+
|
| 836 |
+
Alternatively, you can use `x` and `width` to provide a starting point plus offset, such that `x2 = x + width`.
|
| 837 |
+
""")
|
| 838 |
+
return
|
| 839 |
+
|
| 840 |
+
|
| 841 |
+
@app.cell
|
| 842 |
+
def _(alt, data2000):
|
| 843 |
+
alt.Chart(data2000).mark_bar().encode(
|
| 844 |
+
alt.X('min(life_expect):Q'),
|
| 845 |
+
alt.X2('max(life_expect):Q'),
|
| 846 |
+
alt.Y('cluster:N')
|
| 847 |
+
)
|
| 848 |
+
return
|
| 849 |
+
|
| 850 |
+
|
| 851 |
+
@app.cell(hide_code=True)
|
| 852 |
+
def _(mo):
|
| 853 |
+
mo.md(r"""
|
| 854 |
+
### Line Marks
|
| 855 |
+
|
| 856 |
+
The `line` mark type connects plotted points with line segments, for example so that a line's slope conveys information about the rate of change.
|
| 857 |
+
|
| 858 |
+
Let's plot a line chart of fertility per country over the years, using the full, unfiltered global development data frame. We'll again hide the legend and use tooltips instead.
|
| 859 |
+
""")
|
| 860 |
+
return
|
| 861 |
+
|
| 862 |
+
|
| 863 |
+
@app.cell
|
| 864 |
+
def _(alt, data):
|
| 865 |
+
alt.Chart(data).mark_line().encode(
|
| 866 |
+
alt.X('year:O'),
|
| 867 |
+
alt.Y('fertility:Q'),
|
| 868 |
+
alt.Color('country:N', legend=None),
|
| 869 |
+
alt.Tooltip('country:N')
|
| 870 |
+
).properties(
|
| 871 |
+
width=400
|
| 872 |
+
)
|
| 873 |
+
return
|
| 874 |
+
|
| 875 |
+
|
| 876 |
+
@app.cell(hide_code=True)
|
| 877 |
+
def _(mo):
|
| 878 |
+
mo.md(r"""
|
| 879 |
+
We can see interesting variations per country, but overall trends for lower numbers of children per family over time. Also note that we set a custom width of 400 pixels. _Try changing (or removing) the widths and see what happens!_
|
| 880 |
+
|
| 881 |
+
Let's change some of the default mark parameters to customize the plot. We can set the `strokeWidth` to determine the thickness of the lines and the `opacity` to add some transparency. By default, the `line` mark uses straight line segments to connect data points. In some cases we might want to smooth the lines. We can adjust the interpolation used to connect data points by setting the `interpolate` mark parameter. Let's use `'monotone'` interpolation to provide smooth lines that are also guaranteed not to inadvertently generate "false" minimum or maximum values as a result of the interpolation.
|
| 882 |
+
""")
|
| 883 |
+
return
|
| 884 |
+
|
| 885 |
+
|
| 886 |
+
@app.cell
|
| 887 |
+
def _(alt, data):
|
| 888 |
+
alt.Chart(data).mark_line(
|
| 889 |
+
strokeWidth=3,
|
| 890 |
+
opacity=0.5,
|
| 891 |
+
interpolate='monotone'
|
| 892 |
+
).encode(
|
| 893 |
+
alt.X('year:O'),
|
| 894 |
+
alt.Y('fertility:Q'),
|
| 895 |
+
alt.Color('country:N', legend=None),
|
| 896 |
+
alt.Tooltip('country:N')
|
| 897 |
+
).properties(
|
| 898 |
+
width=400
|
| 899 |
+
)
|
| 900 |
+
return
|
| 901 |
+
|
| 902 |
+
|
| 903 |
+
@app.cell(hide_code=True)
|
| 904 |
+
def _(mo):
|
| 905 |
+
mo.md(r"""
|
| 906 |
+
The `line` mark can also be used to create *slope graphs*, charts that highlight the change in value between two comparison points using line slopes.
|
| 907 |
+
|
| 908 |
+
Below let's create a slope graph comparing the populations of each country at minimum and maximum years in our full dataset: 1955 and 2005. We first create a new Pandas data frame filtered to those years, then use Altair to create the slope graph.
|
| 909 |
+
|
| 910 |
+
By default, Altair places the years close together. To better space out the years along the x-axis, we can indicate the size (in pixels) of discrete steps along the width of our chart as indicated by the comment below. Try adjusting the width `step` value below and see how the chart changes in response.
|
| 911 |
+
""")
|
| 912 |
+
return
|
| 913 |
+
|
| 914 |
+
|
| 915 |
+
@app.cell
|
| 916 |
+
def _(alt, data):
|
| 917 |
+
dataTime = data.loc[(data['year'] == 1955) | (data['year'] == 2005)]
|
| 918 |
+
|
| 919 |
+
alt.Chart(dataTime).mark_line(opacity=0.5).encode(
|
| 920 |
+
alt.X('year:O'),
|
| 921 |
+
alt.Y('pop:Q'),
|
| 922 |
+
alt.Color('country:N', legend=None),
|
| 923 |
+
alt.Tooltip('country:N')
|
| 924 |
+
).properties(
|
| 925 |
+
width={"step": 50} # adjust the step parameter
|
| 926 |
+
)
|
| 927 |
+
return
|
| 928 |
+
|
| 929 |
+
|
| 930 |
+
@app.cell(hide_code=True)
|
| 931 |
+
def _(mo):
|
| 932 |
+
mo.md(r"""
|
| 933 |
+
### Area Marks
|
| 934 |
+
|
| 935 |
+
The `area` mark type combines aspects of `line` and `bar` marks: it visualizes connections (slopes) among data points, but also shows a filled region, with one edge defaulting to a zero-valued baseline.
|
| 936 |
+
""")
|
| 937 |
+
return
|
| 938 |
+
|
| 939 |
+
|
| 940 |
+
@app.cell(hide_code=True)
|
| 941 |
+
def _(mo):
|
| 942 |
+
mo.md(r"""
|
| 943 |
+
The chart below is an area chart of population over time for just the United States:
|
| 944 |
+
""")
|
| 945 |
+
return
|
| 946 |
+
|
| 947 |
+
|
| 948 |
+
@app.cell
|
| 949 |
+
def _(alt, data):
|
| 950 |
+
dataUS = data.loc[data['country'] == 'United States']
|
| 951 |
+
|
| 952 |
+
alt.Chart(dataUS).mark_area().encode(
|
| 953 |
+
alt.X('year:O'),
|
| 954 |
+
alt.Y('fertility:Q')
|
| 955 |
+
)
|
| 956 |
+
return (dataUS,)
|
| 957 |
+
|
| 958 |
+
|
| 959 |
+
@app.cell(hide_code=True)
|
| 960 |
+
def _(mo):
|
| 961 |
+
mo.md(r"""
|
| 962 |
+
Similar to `line` marks, `area` marks support an `interpolate` parameter.
|
| 963 |
+
""")
|
| 964 |
+
return
|
| 965 |
+
|
| 966 |
+
|
| 967 |
+
@app.cell
|
| 968 |
+
def _(alt, dataUS):
|
| 969 |
+
alt.Chart(dataUS).mark_area(interpolate='monotone').encode(
|
| 970 |
+
alt.X('year:O'),
|
| 971 |
+
alt.Y('fertility:Q')
|
| 972 |
+
)
|
| 973 |
+
return
|
| 974 |
+
|
| 975 |
+
|
| 976 |
+
@app.cell(hide_code=True)
|
| 977 |
+
def _(mo):
|
| 978 |
+
mo.md(r"""
|
| 979 |
+
Similar to `bar` marks, `area` marks also support stacking. Here we create a new data frame with data for the three North American countries, then plot them using an `area` mark and a `color` encoding channel to stack by country.
|
| 980 |
+
""")
|
| 981 |
+
return
|
| 982 |
+
|
| 983 |
+
|
| 984 |
+
@app.cell
|
| 985 |
+
def _(alt, data):
|
| 986 |
+
dataNA = data.loc[
|
| 987 |
+
(data['country'] == 'United States') |
|
| 988 |
+
(data['country'] == 'Canada') |
|
| 989 |
+
(data['country'] == 'Mexico')
|
| 990 |
+
]
|
| 991 |
+
|
| 992 |
+
alt.Chart(dataNA).mark_area().encode(
|
| 993 |
+
alt.X('year:O'),
|
| 994 |
+
alt.Y('pop:Q'),
|
| 995 |
+
alt.Color('country:N')
|
| 996 |
+
)
|
| 997 |
+
return (dataNA,)
|
| 998 |
+
|
| 999 |
+
|
| 1000 |
+
@app.cell(hide_code=True)
|
| 1001 |
+
def _(mo):
|
| 1002 |
+
mo.md(r"""
|
| 1003 |
+
By default, stacking is performed relative to a zero baseline. However, other `stack` options are available:
|
| 1004 |
+
|
| 1005 |
+
* `center` - to stack relative to a baseline in the center of the chart, creating a *streamgraph* visualization, and
|
| 1006 |
+
* `normalize` - to normalize the summed data at each stacking point to 100%, enabling percentage comparisons.
|
| 1007 |
+
|
| 1008 |
+
Below we adapt the chart by setting the `y` encoding `stack` attribute to `center`. What happens if you instead set it `normalize`?
|
| 1009 |
+
""")
|
| 1010 |
+
return
|
| 1011 |
+
|
| 1012 |
+
|
| 1013 |
+
@app.cell
|
| 1014 |
+
def _(alt, dataNA):
|
| 1015 |
+
alt.Chart(dataNA).mark_area().encode(
|
| 1016 |
+
alt.X('year:O'),
|
| 1017 |
+
alt.Y('pop:Q', stack='center'),
|
| 1018 |
+
alt.Color('country:N')
|
| 1019 |
+
)
|
| 1020 |
+
return
|
| 1021 |
+
|
| 1022 |
+
|
| 1023 |
+
@app.cell(hide_code=True)
|
| 1024 |
+
def _(mo):
|
| 1025 |
+
mo.md(r"""
|
| 1026 |
+
To disable stacking altogether, set the `stack` attribute to `None`. We can also add `opacity` as a default mark parameter to ensure we see the overlapping areas!
|
| 1027 |
+
""")
|
| 1028 |
+
return
|
| 1029 |
+
|
| 1030 |
+
|
| 1031 |
+
@app.cell
|
| 1032 |
+
def _(alt, dataNA):
|
| 1033 |
+
alt.Chart(dataNA).mark_area(opacity=0.5).encode(
|
| 1034 |
+
alt.X('year:O'),
|
| 1035 |
+
alt.Y('pop:Q', stack=None),
|
| 1036 |
+
alt.Color('country:N')
|
| 1037 |
+
)
|
| 1038 |
+
return
|
| 1039 |
+
|
| 1040 |
+
|
| 1041 |
+
@app.cell(hide_code=True)
|
| 1042 |
+
def _(mo):
|
| 1043 |
+
mo.md(r"""
|
| 1044 |
+
The `area` mark type also supports data-driven baselines, with both the upper and lower series determined by data fields. As with `bar` marks, we can use the `x` and `x2` (or `y` and `y2`) channels to provide end points for the area mark.
|
| 1045 |
+
|
| 1046 |
+
The chart below visualizes the range of minimum and maximum fertility, per year, for North American countries:
|
| 1047 |
+
""")
|
| 1048 |
+
return
|
| 1049 |
+
|
| 1050 |
+
|
| 1051 |
+
@app.cell
|
| 1052 |
+
def _(alt, dataNA):
|
| 1053 |
+
alt.Chart(dataNA).mark_area().encode(
|
| 1054 |
+
alt.X('year:O'),
|
| 1055 |
+
alt.Y('min(fertility):Q'),
|
| 1056 |
+
alt.Y2('max(fertility):Q')
|
| 1057 |
+
).properties(
|
| 1058 |
+
width={"step": 40}
|
| 1059 |
+
)
|
| 1060 |
+
return
|
| 1061 |
+
|
| 1062 |
+
|
| 1063 |
+
@app.cell(hide_code=True)
|
| 1064 |
+
def _(mo):
|
| 1065 |
+
mo.md(r"""
|
| 1066 |
+
We can see a larger range of values in 1995, from just under 4 to just under 7. By 2005, both the overall fertility values and the variability have declined, centered around 2 children per familty.
|
| 1067 |
+
""")
|
| 1068 |
+
return
|
| 1069 |
+
|
| 1070 |
+
|
| 1071 |
+
@app.cell(hide_code=True)
|
| 1072 |
+
def _(mo):
|
| 1073 |
+
mo.md(r"""
|
| 1074 |
+
All the `area` mark examples above use a vertically oriented area. However, Altair and Vega-Lite support horizontal areas as well. Let's transpose the chart above, simply by swapping the `x` and `y` channels.
|
| 1075 |
+
""")
|
| 1076 |
+
return
|
| 1077 |
+
|
| 1078 |
+
|
| 1079 |
+
@app.cell
|
| 1080 |
+
def _(alt, dataNA):
|
| 1081 |
+
alt.Chart(dataNA).mark_area().encode(
|
| 1082 |
+
alt.Y('year:O'),
|
| 1083 |
+
alt.X('min(fertility):Q'),
|
| 1084 |
+
alt.X2('max(fertility):Q')
|
| 1085 |
+
).properties(
|
| 1086 |
+
width={"step": 40}
|
| 1087 |
+
)
|
| 1088 |
+
return
|
| 1089 |
+
|
| 1090 |
+
|
| 1091 |
+
@app.cell(hide_code=True)
|
| 1092 |
+
def _(mo):
|
| 1093 |
+
mo.md(r"""
|
| 1094 |
+
## Summary
|
| 1095 |
+
|
| 1096 |
+
We've completed our tour of data types, encoding channels, and graphical marks! You should now be well-equipped to further explore the space of encodings, mark types, and mark parameters. For a comprehensive reference – including features we've skipped over here! – see the Altair [marks](https://altair-viz.github.io/user_guide/marks/index.html) and [encoding](https://altair-viz.github.io/user_guide/encodings/index.html) documentation.
|
| 1097 |
+
|
| 1098 |
+
In the next module, we will look at the use of data transformations to create charts that summarize data or visualize new derived fields. In a later module, we'll examine how to further customize your charts by modifying scales, axes, and legends.
|
| 1099 |
+
|
| 1100 |
+
Interested in learning more about visual encoding?
|
| 1101 |
+
""")
|
| 1102 |
+
return
|
| 1103 |
+
|
| 1104 |
+
|
| 1105 |
+
@app.cell(hide_code=True)
|
| 1106 |
+
def _(mo):
|
| 1107 |
+
mo.md(r"""
|
| 1108 |
+
<img title="Bertin's Taxonomy of Visual Encoding Channels" src="https://cdn-images-1.medium.com/max/2000/1*jsb78Rr2cDy6zrE7j2IKig.png" style="max-width: 650px;"><br/>
|
| 1109 |
+
|
| 1110 |
+
<small>Bertin's taxonomy of visual encodings from <a href="https://books.google.com/books/about/Semiology_of_Graphics.html?id=X5caQwAACAAJ"><em>Sémiologie Graphique</em></a>, as adapted by <a href="https://bost.ocks.org/mike/">Mike Bostock</a>.</small>
|
| 1111 |
+
""")
|
| 1112 |
+
return
|
| 1113 |
+
|
| 1114 |
+
|
| 1115 |
+
@app.cell(hide_code=True)
|
| 1116 |
+
def _(mo):
|
| 1117 |
+
mo.md(r"""
|
| 1118 |
+
- The systematic study of marks, visual encodings, and backing data types was initiated by [Jacques Bertin](https://en.wikipedia.org/wiki/Jacques_Bertin) in his pioneering 1967 work [_Sémiologie Graphique (The Semiology of Graphics)_](https://books.google.com/books/about/Semiology_of_Graphics.html?id=X5caQwAACAAJ). The image above illustrates position, size, value (brightness), texture, color (hue), orientation, and shape channels, alongside Bertin's recommendations for the data types they support.
|
| 1119 |
+
- The framework of data types, marks, and channels also guides _automated_ visualization design tools, starting with [Mackinlay's APT (A Presentation Tool)](https://scholar.google.com/scholar?cluster=10191273548472217907) in 1986 and continuing in more recent systems such as [Voyager](http://idl.cs.washington.edu/papers/voyager/) and [Draco](http://idl.cs.washington.edu/papers/draco/).
|
| 1120 |
+
- The identification of nominal, ordinal, interval, and ratio types dates at least as far back as S. S. Steven's 1947 article [_On the theory of scales of measurement_](https://scholar.google.com/scholar?cluster=14356809180080326415).
|
| 1121 |
+
""")
|
| 1122 |
+
return
|
| 1123 |
+
|
| 1124 |
+
|
| 1125 |
+
if __name__ == "__main__":
|
| 1126 |
+
app.run()
|
|
@@ -0,0 +1,641 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# ]
|
| 8 |
+
# ///
|
| 9 |
+
|
| 10 |
+
import marimo
|
| 11 |
+
|
| 12 |
+
__generated_with = "0.20.4"
|
| 13 |
+
app = marimo.App()
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@app.cell
|
| 17 |
+
def _():
|
| 18 |
+
import marimo as mo
|
| 19 |
+
|
| 20 |
+
return (mo,)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
@app.cell(hide_code=True)
|
| 24 |
+
def _(mo):
|
| 25 |
+
mo.md(r"""
|
| 26 |
+
# Data Transformation
|
| 27 |
+
|
| 28 |
+
In previous notebooks we learned how to use marks and visual encodings to represent individual data records. Here we will explore methods for *transforming* data, including the use of aggregates to summarize multiple records. Data transformation is an integral part of visualization: choosing the variables to show and their level of detail is just as important as choosing appropriate visual encodings. After all, it doesn't matter how well chosen your visual encodings are if you are showing the wrong information!
|
| 29 |
+
|
| 30 |
+
As you work through this module, we recommend that you open the [Altair Data Transformations documentation](https://altair-viz.github.io/user_guide/transform/) in another tab. It will be a useful resource if at any point you'd like more details or want to see what other transformations are available.
|
| 31 |
+
|
| 32 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 33 |
+
""")
|
| 34 |
+
return
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
@app.cell
|
| 38 |
+
def _():
|
| 39 |
+
import pandas as pd
|
| 40 |
+
import altair as alt
|
| 41 |
+
|
| 42 |
+
return alt, pd
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
@app.cell(hide_code=True)
|
| 46 |
+
def _(mo):
|
| 47 |
+
mo.md(r"""
|
| 48 |
+
## The Movies Dataset
|
| 49 |
+
""")
|
| 50 |
+
return
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
@app.cell(hide_code=True)
|
| 54 |
+
def _(mo):
|
| 55 |
+
mo.md(r"""
|
| 56 |
+
We will be working with a table of data about motion pictures, taken from the [vega-datasets](https://vega.github.io/vega-datasets/) collection. The data includes variables such as the film name, director, genre, release date, ratings, and gross revenues. However, _be careful when working with this data_: the films are from unevenly sampled years, using data combined from multiple sources. If you dig in you will find issues with missing values and even some subtle errors! Nevertheless, the data should prove interesting to explore...
|
| 57 |
+
|
| 58 |
+
Let's retrieve the URL for the JSON data file from the vega_datasets package, and then read the data into a Pandas data frame so that we can inspect its contents.
|
| 59 |
+
""")
|
| 60 |
+
return
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
@app.cell
|
| 64 |
+
def _(pd):
|
| 65 |
+
movies_url = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/movies.json'
|
| 66 |
+
movies = pd.read_json(movies_url)
|
| 67 |
+
return movies, movies_url
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
@app.cell(hide_code=True)
|
| 71 |
+
def _(mo):
|
| 72 |
+
mo.md(r"""
|
| 73 |
+
How many rows (records) and columns (fields) are in the movies dataset?
|
| 74 |
+
""")
|
| 75 |
+
return
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
@app.cell
|
| 79 |
+
def _(movies):
|
| 80 |
+
movies.shape
|
| 81 |
+
return
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
@app.cell(hide_code=True)
|
| 85 |
+
def _(mo):
|
| 86 |
+
mo.md(r"""
|
| 87 |
+
Now let's peek at the first 5 rows of the table to get a sense of the fields and data types...
|
| 88 |
+
""")
|
| 89 |
+
return
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
@app.cell
|
| 93 |
+
def _(movies):
|
| 94 |
+
movies.head(5)
|
| 95 |
+
return
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
@app.cell(hide_code=True)
|
| 99 |
+
def _(mo):
|
| 100 |
+
mo.md(r"""
|
| 101 |
+
## Histograms
|
| 102 |
+
|
| 103 |
+
We'll start our transformation tour by _binning_ data into discrete groups and _counting_ records to summarize those groups. The resulting plots are known as [_histograms_](https://en.wikipedia.org/wiki/Histogram).
|
| 104 |
+
|
| 105 |
+
Let's first look at unaggregated data: a scatter plot showing movie ratings from Rotten Tomatoes versus ratings from IMDB users. We'll provide data to Altair by passing the movies data URL to the `Chart` method. (We could also pass the Pandas data frame directly to get the same result.) We can then encode the Rotten Tomatoes and IMDB ratings fields using the `x` and `y` channels:
|
| 106 |
+
""")
|
| 107 |
+
return
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
@app.cell
|
| 111 |
+
def _(alt, movies_url):
|
| 112 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 113 |
+
alt.X('Rotten_Tomatoes_Rating:Q'),
|
| 114 |
+
alt.Y('IMDB_Rating:Q')
|
| 115 |
+
)
|
| 116 |
+
return
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
@app.cell(hide_code=True)
|
| 120 |
+
def _(mo):
|
| 121 |
+
mo.md(r"""
|
| 122 |
+
To summarize this data, we can *bin* a data field to group numeric values into discrete groups. Here we bin along the x-axis by adding `bin=True` to the `x` encoding channel. The result is a set of ten bins of equal step size, each corresponding to a span of ten ratings points.
|
| 123 |
+
""")
|
| 124 |
+
return
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
@app.cell
|
| 128 |
+
def _(alt, movies_url):
|
| 129 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 130 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=True),
|
| 131 |
+
alt.Y('IMDB_Rating:Q')
|
| 132 |
+
)
|
| 133 |
+
return
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
@app.cell(hide_code=True)
|
| 137 |
+
def _(mo):
|
| 138 |
+
mo.md(r"""
|
| 139 |
+
Setting `bin=True` uses default binning settings, but we can exercise more control if desired. Let's instead set the maximum bin count (`maxbins`) to 20, which has the effect of doubling the number of bins. Now each bin corresponds to a span of five ratings points.
|
| 140 |
+
""")
|
| 141 |
+
return
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
@app.cell
|
| 145 |
+
def _(alt, movies_url):
|
| 146 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 147 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 148 |
+
alt.Y('IMDB_Rating:Q')
|
| 149 |
+
)
|
| 150 |
+
return
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
@app.cell(hide_code=True)
|
| 154 |
+
def _(mo):
|
| 155 |
+
mo.md(r"""
|
| 156 |
+
With the data binned, let's now summarize the distribution of Rotten Tomatoes ratings. We will drop the IMDB ratings for now and instead use the `y` encoding channel to show an aggregate `count` of records, so that the vertical position of each point indicates the number of movies per Rotten Tomatoes rating bin.
|
| 157 |
+
|
| 158 |
+
As the `count` aggregate counts the number of total records in each bin regardless of the field values, we do not need to include a field name in the `y` encoding.
|
| 159 |
+
""")
|
| 160 |
+
return
|
| 161 |
+
|
| 162 |
+
|
| 163 |
+
@app.cell
|
| 164 |
+
def _(alt, movies_url):
|
| 165 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 166 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 167 |
+
alt.Y('count()')
|
| 168 |
+
)
|
| 169 |
+
return
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
@app.cell(hide_code=True)
|
| 173 |
+
def _(mo):
|
| 174 |
+
mo.md(r"""
|
| 175 |
+
To arrive at a standard histogram, let's change the mark type from `circle` to `bar`:
|
| 176 |
+
""")
|
| 177 |
+
return
|
| 178 |
+
|
| 179 |
+
|
| 180 |
+
@app.cell
|
| 181 |
+
def _(alt, movies_url):
|
| 182 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 183 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 184 |
+
alt.Y('count()')
|
| 185 |
+
)
|
| 186 |
+
return
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
@app.cell(hide_code=True)
|
| 190 |
+
def _(mo):
|
| 191 |
+
mo.md(r"""
|
| 192 |
+
_We can now examine the distribution of ratings more clearly: we can see fewer movies on the negative end, and a bit more movies on the high end, but a generally uniform distribution overall. Rotten Tomatoes ratings are determined by taking "thumbs up" and "thumbs down" judgments from film critics and calculating the percentage of positive reviews. It appears this approach does a good job of utilizing the full range of rating values._
|
| 193 |
+
|
| 194 |
+
Similarly, we can create a histogram for IMDB ratings by changing the field in the `x` encoding channel:
|
| 195 |
+
""")
|
| 196 |
+
return
|
| 197 |
+
|
| 198 |
+
|
| 199 |
+
@app.cell
|
| 200 |
+
def _(alt, movies_url):
|
| 201 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 202 |
+
alt.X('IMDB_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 203 |
+
alt.Y('count()')
|
| 204 |
+
)
|
| 205 |
+
return
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
@app.cell(hide_code=True)
|
| 209 |
+
def _(mo):
|
| 210 |
+
mo.md(r"""
|
| 211 |
+
_In contrast to the more uniform distribution we saw before, IMDB ratings exhibit a bell-shaped (though [negatively skewed](https://en.wikipedia.org/wiki/Skewness)) distribution. IMDB ratings are formed by averaging scores (ranging from 1 to 10) provided by the site's users. We can see that this form of measurement leads to a different shape than the Rotten Tomatoes ratings. We can also see that the mode of the distribution is between 6.5 and 7: people generally enjoy watching movies, potentially explaining the positive bias!_
|
| 212 |
+
|
| 213 |
+
Now let's turn back to our scatter plot of Rotten Tomatoes and IMDB ratings. Here's what happens if we bin *both* axes of our original plot.
|
| 214 |
+
""")
|
| 215 |
+
return
|
| 216 |
+
|
| 217 |
+
|
| 218 |
+
@app.cell
|
| 219 |
+
def _(alt, movies_url):
|
| 220 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 221 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 222 |
+
alt.Y('IMDB_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 223 |
+
)
|
| 224 |
+
return
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
@app.cell(hide_code=True)
|
| 228 |
+
def _(mo):
|
| 229 |
+
mo.md(r"""
|
| 230 |
+
Detail is lost due to *overplotting*, with many points drawn directly on top of each other.
|
| 231 |
+
|
| 232 |
+
To form a two-dimensional histogram we can add a `count` aggregate as before. As both the `x` and `y` encoding channels are already taken, we must use a different encoding channel to convey the counts. Here is the result of using circular area by adding a *size* encoding channel.
|
| 233 |
+
""")
|
| 234 |
+
return
|
| 235 |
+
|
| 236 |
+
|
| 237 |
+
@app.cell
|
| 238 |
+
def _(alt, movies_url):
|
| 239 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 240 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 241 |
+
alt.Y('IMDB_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 242 |
+
alt.Size('count()')
|
| 243 |
+
)
|
| 244 |
+
return
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
@app.cell(hide_code=True)
|
| 248 |
+
def _(mo):
|
| 249 |
+
mo.md(r"""
|
| 250 |
+
Alternatively, we can encode counts using the `color` channel and change the mark type to `bar`. The result is a two-dimensional histogram in the form of a [*heatmap*](https://en.wikipedia.org/wiki/Heat_map).
|
| 251 |
+
""")
|
| 252 |
+
return
|
| 253 |
+
|
| 254 |
+
|
| 255 |
+
@app.cell
|
| 256 |
+
def _(alt, movies_url):
|
| 257 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 258 |
+
alt.X('Rotten_Tomatoes_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 259 |
+
alt.Y('IMDB_Rating:Q', bin=alt.BinParams(maxbins=20)),
|
| 260 |
+
alt.Color('count()')
|
| 261 |
+
)
|
| 262 |
+
return
|
| 263 |
+
|
| 264 |
+
|
| 265 |
+
@app.cell(hide_code=True)
|
| 266 |
+
def _(mo):
|
| 267 |
+
mo.md(r"""
|
| 268 |
+
Compare the *size* and *color*-based 2D histograms above. Which encoding do you think should be preferred? Why? In which plot can you more precisely compare the magnitude of individual values? In which plot can you more accurately see the overall density of ratings?
|
| 269 |
+
""")
|
| 270 |
+
return
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
@app.cell(hide_code=True)
|
| 274 |
+
def _(mo):
|
| 275 |
+
mo.md(r"""
|
| 276 |
+
## Aggregation
|
| 277 |
+
|
| 278 |
+
Counts are just one type of aggregate. We might also calculate summaries using measures such as the `average`, `median`, `min`, or `max`. The Altair documentation includes the [full set of available aggregation functions](https://altair-viz.github.io/user_guide/transform/aggregate.html#user-guide-aggregate-transform).
|
| 279 |
+
|
| 280 |
+
Let's look at some examples!
|
| 281 |
+
""")
|
| 282 |
+
return
|
| 283 |
+
|
| 284 |
+
|
| 285 |
+
@app.cell(hide_code=True)
|
| 286 |
+
def _(mo):
|
| 287 |
+
mo.md(r"""
|
| 288 |
+
### Averages and Sorting
|
| 289 |
+
|
| 290 |
+
_Do different genres of films receive consistently different ratings from critics?_ As a first step towards answering this question, we might examine the [*average* (a.k.a. the *arithmetic mean*)](https://en.wikipedia.org/wiki/Arithmetic_mean) rating for each genre of movie.
|
| 291 |
+
|
| 292 |
+
Let's visualize genre along the `y` axis and plot `average` Rotten Tomatoes ratings along the `x` axis.
|
| 293 |
+
""")
|
| 294 |
+
return
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
@app.cell
|
| 298 |
+
def _(alt, movies_url):
|
| 299 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 300 |
+
alt.X('average(Rotten_Tomatoes_Rating):Q'),
|
| 301 |
+
alt.Y('Major_Genre:N')
|
| 302 |
+
)
|
| 303 |
+
return
|
| 304 |
+
|
| 305 |
+
|
| 306 |
+
@app.cell(hide_code=True)
|
| 307 |
+
def _(mo):
|
| 308 |
+
mo.md(r"""
|
| 309 |
+
_There does appear to be some interesting variation, but looking at the data as an alphabetical list is not very helpful for ranking critical reactions to the genres._
|
| 310 |
+
|
| 311 |
+
For a tidier picture, let's sort the genres in descending order of average rating. To do so, we will add a `sort` parameter to the `y` encoding channel, stating that we wish to sort by the *average* (`op`, the aggregate operation) Rotten Tomatoes rating (the `field`) in descending `order`.
|
| 312 |
+
""")
|
| 313 |
+
return
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
@app.cell
|
| 317 |
+
def _(alt, movies_url):
|
| 318 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 319 |
+
alt.X('average(Rotten_Tomatoes_Rating):Q'),
|
| 320 |
+
alt.Y('Major_Genre:N', sort=alt.EncodingSortField(
|
| 321 |
+
op='average', field='Rotten_Tomatoes_Rating', order='descending')
|
| 322 |
+
)
|
| 323 |
+
)
|
| 324 |
+
return
|
| 325 |
+
|
| 326 |
+
|
| 327 |
+
@app.cell(hide_code=True)
|
| 328 |
+
def _(mo):
|
| 329 |
+
mo.md(r"""
|
| 330 |
+
_The sorted plot suggests that critics think highly of documentaries, musicals, westerns, and dramas, but look down upon romantic comedies and horror films... and who doesn't love `null` movies!?_
|
| 331 |
+
""")
|
| 332 |
+
return
|
| 333 |
+
|
| 334 |
+
|
| 335 |
+
@app.cell(hide_code=True)
|
| 336 |
+
def _(mo):
|
| 337 |
+
mo.md(r"""
|
| 338 |
+
### Medians and the Inter-Quartile Range
|
| 339 |
+
|
| 340 |
+
While averages are a common way to summarize data, they can sometimes mislead. For example, very large or very small values ([*outliers*](https://en.wikipedia.org/wiki/Outlier)) might skew the average. To be safe, we can compare the genres according to the [*median*](https://en.wikipedia.org/wiki/Median) ratings as well.
|
| 341 |
+
|
| 342 |
+
The median is a point that splits the data evenly, such that half of the values are less than the median and the other half are greater. The median is less sensitive to outliers and so is referred to as a [*robust* statistic](https://en.wikipedia.org/wiki/Robust_statistics). For example, arbitrarily increasing the largest rating value will not cause the median to change.
|
| 343 |
+
|
| 344 |
+
Let's update our plot to use a `median` aggregate and sort by those values:
|
| 345 |
+
""")
|
| 346 |
+
return
|
| 347 |
+
|
| 348 |
+
|
| 349 |
+
@app.cell
|
| 350 |
+
def _(alt, movies_url):
|
| 351 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 352 |
+
alt.X('median(Rotten_Tomatoes_Rating):Q'),
|
| 353 |
+
alt.Y('Major_Genre:N', sort=alt.EncodingSortField(
|
| 354 |
+
op='median', field='Rotten_Tomatoes_Rating', order='descending')
|
| 355 |
+
)
|
| 356 |
+
)
|
| 357 |
+
return
|
| 358 |
+
|
| 359 |
+
|
| 360 |
+
@app.cell(hide_code=True)
|
| 361 |
+
def _(mo):
|
| 362 |
+
mo.md(r"""
|
| 363 |
+
_We can see that some of the genres with similar averages have swapped places (films of unknown genre, or `null`, are now rated highest!), but the overall groups have stayed stable. Horror films continue to get little love from professional film critics._
|
| 364 |
+
|
| 365 |
+
It's a good idea to stay skeptical when viewing aggregate statistics. So far we've only looked at *point estimates*. We have not examined how ratings vary within a genre.
|
| 366 |
+
|
| 367 |
+
Let's visualize the variation among the ratings to add some nuance to our rankings. Here we will encode the [*inter-quartile range* (IQR)](https://en.wikipedia.org/wiki/Interquartile_range) for each genre. The IQR is the range in which the middle half of data values reside. A [*quartile*](https://en.wikipedia.org/wiki/Quartile) contains 25% of the data values. The inter-quartile range consists of the two middle quartiles, and so contains the middle 50%.
|
| 368 |
+
|
| 369 |
+
To visualize ranges, we can use the `x` and `x2` encoding channels to indicate the starting and ending points. We use the aggregate functions `q1` (the lower quartile boundary) and `q3` (the upper quartile boundary) to provide the inter-quartile range. (In case you are wondering, *q2* would be the median.)
|
| 370 |
+
""")
|
| 371 |
+
return
|
| 372 |
+
|
| 373 |
+
|
| 374 |
+
@app.cell
|
| 375 |
+
def _(alt, movies_url):
|
| 376 |
+
alt.Chart(movies_url).mark_bar().encode(
|
| 377 |
+
alt.X('q1(Rotten_Tomatoes_Rating):Q'),
|
| 378 |
+
alt.X2('q3(Rotten_Tomatoes_Rating):Q'),
|
| 379 |
+
alt.Y('Major_Genre:N', sort=alt.EncodingSortField(
|
| 380 |
+
op='median', field='Rotten_Tomatoes_Rating', order='descending')
|
| 381 |
+
)
|
| 382 |
+
)
|
| 383 |
+
return
|
| 384 |
+
|
| 385 |
+
|
| 386 |
+
@app.cell(hide_code=True)
|
| 387 |
+
def _(mo):
|
| 388 |
+
mo.md(r"""
|
| 389 |
+
### Time Units
|
| 390 |
+
|
| 391 |
+
_Now let's ask a completely different question: do box office returns vary by season?_
|
| 392 |
+
|
| 393 |
+
To get an initial answer, let's plot the median U.S. gross revenue by month.
|
| 394 |
+
|
| 395 |
+
To make this chart, use the `timeUnit` transform to map release dates to the `month` of the year. The result is similar to binning, but using meaningful time intervals. Other valid time units include `year`, `quarter`, `date` (numeric day in month), `day` (day of the week), and `hours`, as well as compound units such as `yearmonth` or `hoursminutes`. See the Altair documentation for a [complete list of time units](https://altair-viz.github.io/user_guide/transform/timeunit.html#user-guide-timeunit-transform).
|
| 396 |
+
""")
|
| 397 |
+
return
|
| 398 |
+
|
| 399 |
+
|
| 400 |
+
@app.cell
|
| 401 |
+
def _(alt, movies_url):
|
| 402 |
+
alt.Chart(movies_url).mark_area().encode(
|
| 403 |
+
alt.X('month(Release_Date):T'),
|
| 404 |
+
alt.Y('median(US_Gross):Q')
|
| 405 |
+
)
|
| 406 |
+
return
|
| 407 |
+
|
| 408 |
+
|
| 409 |
+
@app.cell(hide_code=True)
|
| 410 |
+
def _(mo):
|
| 411 |
+
mo.md(r"""
|
| 412 |
+
_Looking at the resulting plot, median movie sales in the U.S. appear to spike around the summer blockbuster season and the end of year holiday period. Of course, people around the world (not just the U.S.) go out to the movies. Does a similar pattern arise for worldwide gross revenue?_
|
| 413 |
+
""")
|
| 414 |
+
return
|
| 415 |
+
|
| 416 |
+
|
| 417 |
+
@app.cell
|
| 418 |
+
def _(alt, movies_url):
|
| 419 |
+
alt.Chart(movies_url).mark_area().encode(
|
| 420 |
+
alt.X('month(Release_Date):T'),
|
| 421 |
+
alt.Y('median(Worldwide_Gross):Q')
|
| 422 |
+
)
|
| 423 |
+
return
|
| 424 |
+
|
| 425 |
+
|
| 426 |
+
@app.cell(hide_code=True)
|
| 427 |
+
def _(mo):
|
| 428 |
+
mo.md(r"""
|
| 429 |
+
_Yes!_
|
| 430 |
+
""")
|
| 431 |
+
return
|
| 432 |
+
|
| 433 |
+
|
| 434 |
+
@app.cell(hide_code=True)
|
| 435 |
+
def _(mo):
|
| 436 |
+
mo.md(r"""
|
| 437 |
+
## Advanced Data Transformation
|
| 438 |
+
|
| 439 |
+
The examples above all use transformations (*bin*, *timeUnit*, *aggregate*, *sort*) that are defined relative to an encoding channel. However, at times you may want to apply a chain of multiple transformations prior to visualization, or use transformations that don't integrate into encoding definitions. For such cases, Altair and Vega-Lite support data transformations defined separately from encodings. These transformations are applied to the data *before* any encodings are considered.
|
| 440 |
+
|
| 441 |
+
We *could* also perform transformations using Pandas directly, and then visualize the result. However, using the built-in transforms allows our visualizations to be published more easily in other contexts; for example, exporting the Vega-Lite JSON to use in a stand-alone web interface. Let's look at the built-in transforms supported by Altair, such as `calculate`, `filter`, `aggregate`, and `window`.
|
| 442 |
+
""")
|
| 443 |
+
return
|
| 444 |
+
|
| 445 |
+
|
| 446 |
+
@app.cell(hide_code=True)
|
| 447 |
+
def _(mo):
|
| 448 |
+
mo.md(r"""
|
| 449 |
+
### Calculate
|
| 450 |
+
|
| 451 |
+
_Think back to our comparison of U.S. gross and worldwide gross. Doesn't worldwide revenue include the U.S.? (Indeed it does.) How might we get a better sense of trends outside the U.S.?_
|
| 452 |
+
|
| 453 |
+
With the `calculate` transform we can derive new fields. Here we want to subtract U.S. gross from worldwide gross. The `calculate` transform takes a [Vega expression string](https://vega.github.io/vega/docs/expressions/) to define a formula over a single record. Vega expressions use JavaScript syntax. The `datum.` prefix accesses a field value on the input record.
|
| 454 |
+
""")
|
| 455 |
+
return
|
| 456 |
+
|
| 457 |
+
|
| 458 |
+
@app.cell
|
| 459 |
+
def _(alt, movies):
|
| 460 |
+
alt.Chart(movies).mark_area().transform_calculate(
|
| 461 |
+
NonUS_Gross='datum.Worldwide_Gross - datum.US_Gross'
|
| 462 |
+
).encode(
|
| 463 |
+
alt.X('month(Release_Date):T'),
|
| 464 |
+
alt.Y('median(NonUS_Gross):Q')
|
| 465 |
+
)
|
| 466 |
+
return
|
| 467 |
+
|
| 468 |
+
|
| 469 |
+
@app.cell(hide_code=True)
|
| 470 |
+
def _(mo):
|
| 471 |
+
mo.md(r"""
|
| 472 |
+
_We can see that seasonal trends hold outside the U.S., but with a more pronounced decline in the non-peak months._
|
| 473 |
+
""")
|
| 474 |
+
return
|
| 475 |
+
|
| 476 |
+
|
| 477 |
+
@app.cell(hide_code=True)
|
| 478 |
+
def _(mo):
|
| 479 |
+
mo.md(r"""
|
| 480 |
+
### Filter
|
| 481 |
+
|
| 482 |
+
The *filter* transform creates a new table with a subset of the original data, removing rows that fail to meet a provided [*predicate*](https://en.wikipedia.org/wiki/Predicate_%28mathematical_logic%29) test. Similar to the *calculate* transform, filter predicates are expressed using the [Vega expression language](https://vega.github.io/vega/docs/expressions/).
|
| 483 |
+
|
| 484 |
+
Below we add a filter to limit our initial scatter plot of IMDB vs. Rotten Tomatoes ratings to only films in the major genre of "Romantic Comedy".
|
| 485 |
+
""")
|
| 486 |
+
return
|
| 487 |
+
|
| 488 |
+
|
| 489 |
+
@app.cell
|
| 490 |
+
def _(alt, movies_url):
|
| 491 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 492 |
+
alt.X('Rotten_Tomatoes_Rating:Q'),
|
| 493 |
+
alt.Y('IMDB_Rating:Q')
|
| 494 |
+
).transform_filter('datum.Major_Genre == "Romantic Comedy"')
|
| 495 |
+
return
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
@app.cell(hide_code=True)
|
| 499 |
+
def _(mo):
|
| 500 |
+
mo.md(r"""
|
| 501 |
+
_How does the plot change if we filter to view other genres? Edit the filter expression to find out._
|
| 502 |
+
|
| 503 |
+
Now let's filter to look at films released before 1970.
|
| 504 |
+
""")
|
| 505 |
+
return
|
| 506 |
+
|
| 507 |
+
|
| 508 |
+
@app.cell
|
| 509 |
+
def _(alt, movies_url):
|
| 510 |
+
alt.Chart(movies_url).mark_circle().encode(
|
| 511 |
+
alt.X('Rotten_Tomatoes_Rating:Q'),
|
| 512 |
+
alt.Y('IMDB_Rating:Q')
|
| 513 |
+
).transform_filter('year(datum.Release_Date) < 1970')
|
| 514 |
+
return
|
| 515 |
+
|
| 516 |
+
|
| 517 |
+
@app.cell(hide_code=True)
|
| 518 |
+
def _(mo):
|
| 519 |
+
mo.md(r"""
|
| 520 |
+
_They seem to score unusually high! Are older films simply better, or is there a [selection bias](https://en.wikipedia.org/wiki/Selection%5Fbias) towards more highly-rated older films in this dataset?_
|
| 521 |
+
""")
|
| 522 |
+
return
|
| 523 |
+
|
| 524 |
+
|
| 525 |
+
@app.cell(hide_code=True)
|
| 526 |
+
def _(mo):
|
| 527 |
+
mo.md(r"""
|
| 528 |
+
### Aggregate
|
| 529 |
+
|
| 530 |
+
We have already seen `aggregate` transforms such as `count` and `average` in the context of encoding channels. We can also specify aggregates separately, as a pre-processing step for other transforms (as in the `window` transform examples below). The output of an `aggregate` transform is a new data table with records that contain both the `groupby` fields and the computed `aggregate` measures.
|
| 531 |
+
|
| 532 |
+
Let's recreate our plot of average ratings by genre, but this time using a separate `aggregate` transform. The output table from the aggregate transform contains 13 rows, one for each genre.
|
| 533 |
+
|
| 534 |
+
To order the `y` axis we must include a required aggregate operation in our sorting instructions. Here we use the `max` operator, which works fine because there is only one output record per genre. We could similarly use the `min` operator and end up with the same plot.
|
| 535 |
+
""")
|
| 536 |
+
return
|
| 537 |
+
|
| 538 |
+
|
| 539 |
+
@app.cell
|
| 540 |
+
def _(alt, movies_url):
|
| 541 |
+
alt.Chart(movies_url).mark_bar().transform_aggregate(
|
| 542 |
+
groupby=['Major_Genre'],
|
| 543 |
+
Average_Rating='average(Rotten_Tomatoes_Rating)'
|
| 544 |
+
).encode(
|
| 545 |
+
alt.X('Average_Rating:Q'),
|
| 546 |
+
alt.Y('Major_Genre:N', sort=alt.EncodingSortField(
|
| 547 |
+
op='max', field='Average_Rating', order='descending'
|
| 548 |
+
)
|
| 549 |
+
)
|
| 550 |
+
)
|
| 551 |
+
return
|
| 552 |
+
|
| 553 |
+
|
| 554 |
+
@app.cell(hide_code=True)
|
| 555 |
+
def _(mo):
|
| 556 |
+
mo.md(r"""
|
| 557 |
+
### Window
|
| 558 |
+
|
| 559 |
+
The `window` transform performs calculations over sorted groups of data records. Window transforms are quite powerful, supporting tasks such as ranking, lead/lag analysis, cumulative totals, and running sums or averages. Values calculated by a `window` transform are written back to the input data table as new fields. Window operations include the aggregate operations we've seen earlier, as well as specialized operations such as `rank`, `row_number`, `lead`, and `lag`. The Vega-Lite documentation lists [all valid window operations](https://vega.github.io/vega-lite/docs/window.html#ops).
|
| 560 |
+
|
| 561 |
+
One use case for a `window` transform is to calculate top-k lists. Let's plot the top 20 directors in terms of total worldwide gross.
|
| 562 |
+
|
| 563 |
+
We first use a `filter` transform to remove records for which we don't know the director. Otherwise, the director `null` would dominate the list! We then apply an `aggregate` to sum up the worldwide gross for all films, grouped by director. At this point we could plot a sorted bar chart, but we'd end up with hundreds and hundreds of directors. How can we limit the display to the top 20?
|
| 564 |
+
|
| 565 |
+
The `window` transform allows us to determine the top directors by calculating their rank order. Within our `window` transform definition we can `sort` by gross and use the `rank` operation to calculate rank scores according to that sort order. We can then add a subsequent `filter` transform to limit the data to only records with a rank value less than or equal to 20.
|
| 566 |
+
""")
|
| 567 |
+
return
|
| 568 |
+
|
| 569 |
+
|
| 570 |
+
@app.cell
|
| 571 |
+
def _(alt, movies_url):
|
| 572 |
+
alt.Chart(movies_url).mark_bar().transform_filter(
|
| 573 |
+
'datum.Director != null'
|
| 574 |
+
).transform_aggregate(
|
| 575 |
+
Gross='sum(Worldwide_Gross)',
|
| 576 |
+
groupby=['Director']
|
| 577 |
+
).transform_window(
|
| 578 |
+
Rank='rank()',
|
| 579 |
+
sort=[alt.SortField('Gross', order='descending')]
|
| 580 |
+
).transform_filter(
|
| 581 |
+
'datum.Rank < 20'
|
| 582 |
+
).encode(
|
| 583 |
+
alt.X('Gross:Q'),
|
| 584 |
+
alt.Y('Director:N', sort=alt.EncodingSortField(
|
| 585 |
+
op='max', field='Gross', order='descending'
|
| 586 |
+
))
|
| 587 |
+
)
|
| 588 |
+
return
|
| 589 |
+
|
| 590 |
+
|
| 591 |
+
@app.cell(hide_code=True)
|
| 592 |
+
def _(mo):
|
| 593 |
+
mo.md(r"""
|
| 594 |
+
_We can see that Steven Spielberg has been quite successful in his career! However, showing sums might favor directors who have had longer careers, and so have made more movies and thus more money. What happens if we change the choice of aggregate operation? Who is the most successful director in terms of `average` or `median` gross per film? Modify the aggregate transform above!_
|
| 595 |
+
|
| 596 |
+
Earlier in this notebook we looked at histograms, which approximate the [*probability density function*](https://en.wikipedia.org/wiki/Probability_density_function) of a set of values. A complementary approach is to look at the [*cumulative distribution*](https://en.wikipedia.org/wiki/Cumulative_distribution_function). For example, think of a histogram in which each bin includes not only its own count but also the counts from all previous bins — the result is a _running total_, with the last bin containing the total number of records. A cumulative chart directly shows us, for a given reference value, how many data values are less than or equal to that reference.
|
| 597 |
+
|
| 598 |
+
As a concrete example, let's look at the cumulative distribution of films by running time (in minutes). Only a subset of records actually include running time information, so we first `filter` down to the subset of films for which we have running times. Next, we apply an `aggregate` to count the number of films per duration (implicitly using "bins" of 1 minute each). We then use a `window` transform to compute a running total of counts across bins, sorted by increasing running time.
|
| 599 |
+
""")
|
| 600 |
+
return
|
| 601 |
+
|
| 602 |
+
|
| 603 |
+
@app.cell
|
| 604 |
+
def _(alt, movies_url):
|
| 605 |
+
alt.Chart(movies_url).mark_line(interpolate='step-before').transform_filter(
|
| 606 |
+
'datum.Running_Time_min != null'
|
| 607 |
+
).transform_aggregate(
|
| 608 |
+
groupby=['Running_Time_min'],
|
| 609 |
+
Count='count()',
|
| 610 |
+
).transform_window(
|
| 611 |
+
Cumulative_Sum='sum(Count)',
|
| 612 |
+
sort=[alt.SortField('Running_Time_min', order='ascending')]
|
| 613 |
+
).encode(
|
| 614 |
+
alt.X('Running_Time_min:Q', axis=alt.Axis(title='Duration (min)')),
|
| 615 |
+
alt.Y('Cumulative_Sum:Q', axis=alt.Axis(title='Cumulative Count of Films'))
|
| 616 |
+
)
|
| 617 |
+
return
|
| 618 |
+
|
| 619 |
+
|
| 620 |
+
@app.cell(hide_code=True)
|
| 621 |
+
def _(mo):
|
| 622 |
+
mo.md(r"""
|
| 623 |
+
_Let's examine the cumulative distribution of film lengths. We can see that films under 110 minutes make up about half of all the films for which we have running times. We see a steady accumulation of films between 90 minutes and 2 hours, after which the distribution begins to taper off. Though rare, the dataset does contain multiple films more than 3 hours long!_
|
| 624 |
+
""")
|
| 625 |
+
return
|
| 626 |
+
|
| 627 |
+
|
| 628 |
+
@app.cell(hide_code=True)
|
| 629 |
+
def _(mo):
|
| 630 |
+
mo.md(r"""
|
| 631 |
+
## Summary
|
| 632 |
+
|
| 633 |
+
We've only scratched the surface of what data transformations can do! For more details, including all the available transformations and their parameters, see the [Altair data transformation documentation](https://altair-viz.github.io/user_guide/transform/index.html).
|
| 634 |
+
|
| 635 |
+
Sometimes you will need to perform significant data transformation to prepare your data _prior_ to using visualization tools. To engage in [_data wrangling_](https://en.wikipedia.org/wiki/Data_wrangling) right here in Python, you can use the [Pandas library](https://pandas.pydata.org/).
|
| 636 |
+
""")
|
| 637 |
+
return
|
| 638 |
+
|
| 639 |
+
|
| 640 |
+
if __name__ == "__main__":
|
| 641 |
+
app.run()
|
|
@@ -0,0 +1,840 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# ]
|
| 8 |
+
# ///
|
| 9 |
+
|
| 10 |
+
import marimo
|
| 11 |
+
|
| 12 |
+
__generated_with = "0.20.4"
|
| 13 |
+
app = marimo.App()
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@app.cell
|
| 17 |
+
def _():
|
| 18 |
+
import marimo as mo
|
| 19 |
+
|
| 20 |
+
return (mo,)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
@app.cell(hide_code=True)
|
| 24 |
+
def _(mo):
|
| 25 |
+
mo.md(r"""
|
| 26 |
+
# Scales, Axes, and Legends
|
| 27 |
+
|
| 28 |
+
Visual encoding – mapping data to visual variables such as position, size, shape, or color – is the beating heart of data visualization. The workhorse that actually performs this mapping is the *scale*: a function that takes a data value as input (the scale *domain*) and returns a visual value, such as a pixel position or RGB color, as output (the scale *range*). Of course, a visualization is useless if no one can figure out what it conveys! In addition to graphical marks, a chart needs reference elements, or *guides*, that allow readers to decode the graphic. Guides such as *axes* (which visualize scales with spatial ranges) and *legends* (which visualize scales with color, size, or shape ranges), are the unsung heroes of effective data visualization!
|
| 29 |
+
|
| 30 |
+
In this notebook, we will explore the options Altair provides to support customized designs of scale mappings, axes, and legends, using a running example about the effectiveness of antibiotic drugs.
|
| 31 |
+
|
| 32 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 33 |
+
""")
|
| 34 |
+
return
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
@app.cell
|
| 38 |
+
def _():
|
| 39 |
+
import pandas as pd
|
| 40 |
+
import altair as alt
|
| 41 |
+
|
| 42 |
+
return alt, pd
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
@app.cell(hide_code=True)
|
| 46 |
+
def _(mo):
|
| 47 |
+
mo.md(r"""
|
| 48 |
+
## Antibiotics Data
|
| 49 |
+
""")
|
| 50 |
+
return
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
@app.cell(hide_code=True)
|
| 54 |
+
def _(mo):
|
| 55 |
+
mo.md(r"""
|
| 56 |
+
After World War II, antibiotics were considered "wonder drugs", as they were an easy remedy for what had been intractable ailments. To learn which drug worked most effectively for which bacterial infection, performance of the three most popular antibiotics on 16 bacteria were gathered.
|
| 57 |
+
""")
|
| 58 |
+
return
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
@app.cell(hide_code=True)
|
| 62 |
+
def _(mo):
|
| 63 |
+
mo.md(r"""
|
| 64 |
+
We will be using an antibiotics dataset from the [vega-datasets collection](https://github.com/vega/vega-datasets). In the examples below, we will pass the URL directly to Altair:
|
| 65 |
+
""")
|
| 66 |
+
return
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
@app.cell
|
| 70 |
+
def _():
|
| 71 |
+
antibiotics = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/burtin.json'
|
| 72 |
+
return (antibiotics,)
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
@app.cell(hide_code=True)
|
| 76 |
+
def _(mo):
|
| 77 |
+
mo.md(r"""
|
| 78 |
+
We can first load the data with Pandas to view the dataset in its entirety and get acquainted with the available fields:
|
| 79 |
+
""")
|
| 80 |
+
return
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
@app.cell
|
| 84 |
+
def _(antibiotics, pd):
|
| 85 |
+
pd.read_json(antibiotics)
|
| 86 |
+
return
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
@app.cell(hide_code=True)
|
| 90 |
+
def _(mo):
|
| 91 |
+
mo.md(r"""
|
| 92 |
+
The numeric values in the table indicate the [minimum inhibitory concentration (MIC)](https://en.wikipedia.org/wiki/Minimum_inhibitory_concentration), a measure of the effectiveness of the antibiotic, which represents the concentration of antibiotic (in micrograms per milliliter) required to prevent growth in vitro. The reaction of the bacteria to a procedure called [Gram staining](https://en.wikipedia.org/wiki/Gram_stain) is described by the nominal field `Gram_Staining`. Bacteria that turn dark blue or violet are Gram-positive. Otherwise, they are Gram-negative.
|
| 93 |
+
|
| 94 |
+
As we examine different visualizations of this dataset, ask yourself: What might we learn about the relative effectiveness of the antibiotics? What might we learn about the bacterial species based on their antibiotic response?
|
| 95 |
+
""")
|
| 96 |
+
return
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
@app.cell(hide_code=True)
|
| 100 |
+
def _(mo):
|
| 101 |
+
mo.md(r"""
|
| 102 |
+
## Configuring Scales and Axes
|
| 103 |
+
""")
|
| 104 |
+
return
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
@app.cell(hide_code=True)
|
| 108 |
+
def _(mo):
|
| 109 |
+
mo.md(r"""
|
| 110 |
+
### Plotting Antibiotic Resistance: Adjusting the Scale Type
|
| 111 |
+
|
| 112 |
+
Let's start by looking at a simple dot plot of the MIC for Neomycin.
|
| 113 |
+
""")
|
| 114 |
+
return
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
@app.cell
|
| 118 |
+
def _(alt, antibiotics):
|
| 119 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 120 |
+
alt.X('Neomycin:Q')
|
| 121 |
+
)
|
| 122 |
+
return
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
@app.cell(hide_code=True)
|
| 126 |
+
def _(mo):
|
| 127 |
+
mo.md(r"""
|
| 128 |
+
_We can see that the MIC values span orders of magnitude: most points to cluster on the left, with a few large outliers to the right._
|
| 129 |
+
|
| 130 |
+
By default Altair uses a `linear` mapping between the domain values (MIC) and the range values (pixels). To get a better overview of the data, we can apply a different scale transformation.
|
| 131 |
+
""")
|
| 132 |
+
return
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
@app.cell(hide_code=True)
|
| 136 |
+
def _(mo):
|
| 137 |
+
mo.md(r"""
|
| 138 |
+
To change the scale type, we'll set the `scale` attribute, using the `alt.Scale` method and `type` parameter.
|
| 139 |
+
|
| 140 |
+
Here's the result of using a square root (`sqrt`) scale type. Distances in the pixel range now correspond to the square root of distances in the data domain.
|
| 141 |
+
""")
|
| 142 |
+
return
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
@app.cell
|
| 146 |
+
def _(alt, antibiotics):
|
| 147 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 148 |
+
alt.X('Neomycin:Q',
|
| 149 |
+
scale=alt.Scale(type='sqrt'))
|
| 150 |
+
)
|
| 151 |
+
return
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
@app.cell(hide_code=True)
|
| 155 |
+
def _(mo):
|
| 156 |
+
mo.md(r"""
|
| 157 |
+
_The points on the left are now better differentiated, but we still see some heavy skew._
|
| 158 |
+
|
| 159 |
+
Let's try using a [logarithmic scale](https://en.wikipedia.org/wiki/Logarithmic_scale) (`log`) instead:
|
| 160 |
+
""")
|
| 161 |
+
return
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
@app.cell
|
| 165 |
+
def _(alt, antibiotics):
|
| 166 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 167 |
+
alt.X('Neomycin:Q',
|
| 168 |
+
scale=alt.Scale(type='log'))
|
| 169 |
+
)
|
| 170 |
+
return
|
| 171 |
+
|
| 172 |
+
|
| 173 |
+
@app.cell(hide_code=True)
|
| 174 |
+
def _(mo):
|
| 175 |
+
mo.md(r"""
|
| 176 |
+
_Now the data is much more evenly distributed and we can see the very large differences in concentrations required for different bacteria._
|
| 177 |
+
|
| 178 |
+
In a standard linear scale, a visual (pixel) distance of 10 units might correspond to an *addition* of 10 units in the data domain. A logarithmic transform maps between multiplication and addition, such that `log(u) + log(v) = log(u*v)`. As a result, in a logarithmic scale, a visual distance of 10 units instead corresponds to *multiplication* by 10 units in the data domain, assuming a base 10 logarithm. The `log` scale above defaults to using the logarithm base 10, but we can adjust this by providing a `base` parameter to the scale.
|
| 179 |
+
""")
|
| 180 |
+
return
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
@app.cell(hide_code=True)
|
| 184 |
+
def _(mo):
|
| 185 |
+
mo.md(r"""
|
| 186 |
+
### Styling an Axis
|
| 187 |
+
|
| 188 |
+
Lower dosages indicate higher effectiveness. However, some people may expect values that are "better" to be "up and to the right" within a chart. If we want to cater to this convention, we can reverse the axis to encode "effectiveness" as a reversed MIC scale.
|
| 189 |
+
|
| 190 |
+
To do this, we can set the encoding `sort` property to `'descending'`:
|
| 191 |
+
""")
|
| 192 |
+
return
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
@app.cell
|
| 196 |
+
def _(alt, antibiotics):
|
| 197 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 198 |
+
alt.X('Neomycin:Q',
|
| 199 |
+
sort='descending',
|
| 200 |
+
scale=alt.Scale(type='log'))
|
| 201 |
+
)
|
| 202 |
+
return
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
@app.cell(hide_code=True)
|
| 206 |
+
def _(mo):
|
| 207 |
+
mo.md(r"""
|
| 208 |
+
_Unfortunately the axis is starting to get a bit confusing: we're plotting data on a logarithmic scale, in the reverse direction, and without a clear indication of what our units are!_
|
| 209 |
+
|
| 210 |
+
Let's add a more informative axis title: we'll use the `title` property of the encoding to provide the desired title text:
|
| 211 |
+
""")
|
| 212 |
+
return
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
@app.cell
|
| 216 |
+
def _(alt, antibiotics):
|
| 217 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 218 |
+
alt.X('Neomycin:Q',
|
| 219 |
+
sort='descending',
|
| 220 |
+
scale=alt.Scale(type='log'),
|
| 221 |
+
title='Neomycin MIC (μg/ml, reverse log scale)')
|
| 222 |
+
)
|
| 223 |
+
return
|
| 224 |
+
|
| 225 |
+
|
| 226 |
+
@app.cell(hide_code=True)
|
| 227 |
+
def _(mo):
|
| 228 |
+
mo.md(r"""
|
| 229 |
+
Much better!
|
| 230 |
+
|
| 231 |
+
By default, Altair places the x-axis along the bottom of the chart. To change these defaults, we can add an `axis` attribute with `orient='top'`:
|
| 232 |
+
""")
|
| 233 |
+
return
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
@app.cell
|
| 237 |
+
def _(alt, antibiotics):
|
| 238 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 239 |
+
alt.X('Neomycin:Q',
|
| 240 |
+
sort='descending',
|
| 241 |
+
scale=alt.Scale(type='log'),
|
| 242 |
+
axis=alt.Axis(orient='top'),
|
| 243 |
+
title='Neomycin MIC (μg/ml, reverse log scale)')
|
| 244 |
+
)
|
| 245 |
+
return
|
| 246 |
+
|
| 247 |
+
|
| 248 |
+
@app.cell(hide_code=True)
|
| 249 |
+
def _(mo):
|
| 250 |
+
mo.md(r"""
|
| 251 |
+
Similarly, the y-axis defaults to a `'left'` orientation, but can be set to `'right'`.
|
| 252 |
+
""")
|
| 253 |
+
return
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
@app.cell(hide_code=True)
|
| 257 |
+
def _(mo):
|
| 258 |
+
mo.md(r"""
|
| 259 |
+
### Comparing Antibiotics: Adjusting Grid Lines, Tick Counts, and Sizing
|
| 260 |
+
|
| 261 |
+
_How does neomycin compare to other antibiotics, such as streptomycin and penicillin?_
|
| 262 |
+
|
| 263 |
+
To start answering this question, we can create scatter plots, adding a y-axis encoding for another antibiotic that mirrors the design of our x-axis for neomycin.
|
| 264 |
+
""")
|
| 265 |
+
return
|
| 266 |
+
|
| 267 |
+
|
| 268 |
+
@app.cell
|
| 269 |
+
def _(alt, antibiotics):
|
| 270 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 271 |
+
alt.X('Neomycin:Q',
|
| 272 |
+
sort='descending',
|
| 273 |
+
scale=alt.Scale(type='log'),
|
| 274 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 275 |
+
alt.Y('Streptomycin:Q',
|
| 276 |
+
sort='descending',
|
| 277 |
+
scale=alt.Scale(type='log'),
|
| 278 |
+
title='Streptomycin MIC (μg/ml, reverse log scale)')
|
| 279 |
+
)
|
| 280 |
+
return
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
@app.cell(hide_code=True)
|
| 284 |
+
def _(mo):
|
| 285 |
+
mo.md(r"""
|
| 286 |
+
_We can see that neomycin and streptomycin appear highly correlated, as the bacterial strains respond similarly to both antibiotics._
|
| 287 |
+
|
| 288 |
+
Let's move on and compare neomycin with penicillin:
|
| 289 |
+
""")
|
| 290 |
+
return
|
| 291 |
+
|
| 292 |
+
|
| 293 |
+
@app.cell
|
| 294 |
+
def _(alt, antibiotics):
|
| 295 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 296 |
+
alt.X('Neomycin:Q',
|
| 297 |
+
sort='descending',
|
| 298 |
+
scale=alt.Scale(type='log'),
|
| 299 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 300 |
+
alt.Y('Penicillin:Q',
|
| 301 |
+
sort='descending',
|
| 302 |
+
scale=alt.Scale(type='log'),
|
| 303 |
+
title='Penicillin MIC (μg/ml, reverse log scale)')
|
| 304 |
+
)
|
| 305 |
+
return
|
| 306 |
+
|
| 307 |
+
|
| 308 |
+
@app.cell(hide_code=True)
|
| 309 |
+
def _(mo):
|
| 310 |
+
mo.md(r"""
|
| 311 |
+
_Now we see a more differentiated response: some bacteria respond well to neomycin but not penicillin, and vice versa!_
|
| 312 |
+
|
| 313 |
+
While this plot is useful, we can make it better. The x and y axes use the same units, but have different extents (the chart width is larger than the height) and different domains (0.001 to 100 for the x-axis, and 0.001 to 1,000 for the y-axis).
|
| 314 |
+
|
| 315 |
+
Let's equalize the axes: we can add explicit `width` and `height` settings for the chart, and specify matching domains using the scale `domain` property.
|
| 316 |
+
""")
|
| 317 |
+
return
|
| 318 |
+
|
| 319 |
+
|
| 320 |
+
@app.cell
|
| 321 |
+
def _(alt, antibiotics):
|
| 322 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 323 |
+
alt.X('Neomycin:Q',
|
| 324 |
+
sort='descending',
|
| 325 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 326 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 327 |
+
alt.Y('Penicillin:Q',
|
| 328 |
+
sort='descending',
|
| 329 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 330 |
+
title='Penicillin MIC (μg/ml, reverse log scale)')
|
| 331 |
+
).properties(width=250, height=250)
|
| 332 |
+
return
|
| 333 |
+
|
| 334 |
+
|
| 335 |
+
@app.cell(hide_code=True)
|
| 336 |
+
def _(mo):
|
| 337 |
+
mo.md(r"""
|
| 338 |
+
_The resulting plot is more balanced, and less prone to subtle misinterpretations!_
|
| 339 |
+
|
| 340 |
+
However, the grid lines are now rather dense. If we want to remove grid lines altogether, we can add `grid=False` to the `axis` attribute. But what if we instead want to reduce the number of tick marks, for example only including grid lines for each order of magnitude?
|
| 341 |
+
|
| 342 |
+
To change the number of ticks, we can specify a target `tickCount` property for an `Axis` object. The `tickCount` is treated as a *suggestion* to Altair, to be considered alongside other aspects such as using nice, human-friendly intervals. We may not get *exactly* the number of tick marks we request, but we should get something close.
|
| 343 |
+
""")
|
| 344 |
+
return
|
| 345 |
+
|
| 346 |
+
|
| 347 |
+
@app.cell
|
| 348 |
+
def _(alt, antibiotics):
|
| 349 |
+
alt.Chart(antibiotics).mark_circle().encode(
|
| 350 |
+
alt.X('Neomycin:Q',
|
| 351 |
+
sort='descending',
|
| 352 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 353 |
+
axis=alt.Axis(tickCount=5),
|
| 354 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 355 |
+
alt.Y('Penicillin:Q',
|
| 356 |
+
sort='descending',
|
| 357 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 358 |
+
axis=alt.Axis(tickCount=5),
|
| 359 |
+
title='Penicillin MIC (μg/ml, reverse log scale)')
|
| 360 |
+
).properties(width=250, height=250)
|
| 361 |
+
return
|
| 362 |
+
|
| 363 |
+
|
| 364 |
+
@app.cell(hide_code=True)
|
| 365 |
+
def _(mo):
|
| 366 |
+
mo.md(r"""
|
| 367 |
+
By setting the `tickCount` to 5, we have the desired effect.
|
| 368 |
+
|
| 369 |
+
Our scatter plot points feel a bit small. Let's change the default size by setting the `size` property of the circle mark. This size value is the *area* of the mark in pixels.
|
| 370 |
+
""")
|
| 371 |
+
return
|
| 372 |
+
|
| 373 |
+
|
| 374 |
+
@app.cell
|
| 375 |
+
def _(alt, antibiotics):
|
| 376 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 377 |
+
alt.X('Neomycin:Q',
|
| 378 |
+
sort='descending',
|
| 379 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 380 |
+
axis=alt.Axis(tickCount=5),
|
| 381 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 382 |
+
alt.Y('Penicillin:Q',
|
| 383 |
+
sort='descending',
|
| 384 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 385 |
+
axis=alt.Axis(tickCount=5),
|
| 386 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 387 |
+
).properties(width=250, height=250)
|
| 388 |
+
return
|
| 389 |
+
|
| 390 |
+
|
| 391 |
+
@app.cell(hide_code=True)
|
| 392 |
+
def _(mo):
|
| 393 |
+
mo.md(r"""
|
| 394 |
+
Here we've set the circle mark area to 80 pixels. _Further adjust the value as you see fit!_
|
| 395 |
+
""")
|
| 396 |
+
return
|
| 397 |
+
|
| 398 |
+
|
| 399 |
+
@app.cell(hide_code=True)
|
| 400 |
+
def _(mo):
|
| 401 |
+
mo.md(r"""
|
| 402 |
+
## Configuring Color Legends
|
| 403 |
+
""")
|
| 404 |
+
return
|
| 405 |
+
|
| 406 |
+
|
| 407 |
+
@app.cell(hide_code=True)
|
| 408 |
+
def _(mo):
|
| 409 |
+
mo.md(r"""
|
| 410 |
+
### Color by Gram Staining
|
| 411 |
+
|
| 412 |
+
_Above we saw that neomycin is more effective for some bacteria, while penicillin is more effective for others. But how can we tell which antibiotic to use if we don't know the specific species of bacteria? Gram staining serves as a diagnostic for discriminating classes of bacteria!_
|
| 413 |
+
|
| 414 |
+
Let's encode `Gram_Staining` on the `color` channel as a nominal data type:
|
| 415 |
+
""")
|
| 416 |
+
return
|
| 417 |
+
|
| 418 |
+
|
| 419 |
+
@app.cell
|
| 420 |
+
def _(alt, antibiotics):
|
| 421 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 422 |
+
alt.X('Neomycin:Q',
|
| 423 |
+
sort='descending',
|
| 424 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 425 |
+
axis=alt.Axis(tickCount=5),
|
| 426 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 427 |
+
alt.Y('Penicillin:Q',
|
| 428 |
+
sort='descending',
|
| 429 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 430 |
+
axis=alt.Axis(tickCount=5),
|
| 431 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 432 |
+
alt.Color('Gram_Staining:N')
|
| 433 |
+
).properties(width=250, height=250)
|
| 434 |
+
return
|
| 435 |
+
|
| 436 |
+
|
| 437 |
+
@app.cell(hide_code=True)
|
| 438 |
+
def _(mo):
|
| 439 |
+
mo.md(r"""
|
| 440 |
+
_We can see that Gram-positive bacteria seem most susceptible to penicillin, whereas neomycin is more effective for Gram-negative bacteria!_
|
| 441 |
+
|
| 442 |
+
The color scheme above was automatically chosen to provide perceptually-distinguishable colors for nominal (equal or not equal) comparisons. However, we might wish to customize the colors used. In this case, Gram staining results in [distinctive physical colorings: pink for Gram-negative, purple for Gram-positive](https://en.wikipedia.org/wiki/Gram_stain#/media/File:Gram_stain_01.jpg).
|
| 443 |
+
|
| 444 |
+
Let's use those colors by specifying an explicit scale mapping from the data `domain` to the color `range`:
|
| 445 |
+
""")
|
| 446 |
+
return
|
| 447 |
+
|
| 448 |
+
|
| 449 |
+
@app.cell
|
| 450 |
+
def _(alt, antibiotics):
|
| 451 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 452 |
+
alt.X('Neomycin:Q',
|
| 453 |
+
sort='descending',
|
| 454 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 455 |
+
axis=alt.Axis(tickCount=5),
|
| 456 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 457 |
+
alt.Y('Penicillin:Q',
|
| 458 |
+
sort='descending',
|
| 459 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 460 |
+
axis=alt.Axis(tickCount=5),
|
| 461 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 462 |
+
alt.Color('Gram_Staining:N',
|
| 463 |
+
scale=alt.Scale(domain=['negative', 'positive'], range=['hotpink', 'purple'])
|
| 464 |
+
)
|
| 465 |
+
).properties(width=250, height=250)
|
| 466 |
+
return
|
| 467 |
+
|
| 468 |
+
|
| 469 |
+
@app.cell(hide_code=True)
|
| 470 |
+
def _(mo):
|
| 471 |
+
mo.md(r"""
|
| 472 |
+
By default legends are placed on the right side of the chart. Similar to axes, we can change the legend orientation using the `orient` parameter:
|
| 473 |
+
""")
|
| 474 |
+
return
|
| 475 |
+
|
| 476 |
+
|
| 477 |
+
@app.cell
|
| 478 |
+
def _(alt, antibiotics):
|
| 479 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 480 |
+
alt.X('Neomycin:Q',
|
| 481 |
+
sort='descending',
|
| 482 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 483 |
+
axis=alt.Axis(tickCount=5),
|
| 484 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 485 |
+
alt.Y('Penicillin:Q',
|
| 486 |
+
sort='descending',
|
| 487 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 488 |
+
axis=alt.Axis(tickCount=5),
|
| 489 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 490 |
+
alt.Color('Gram_Staining:N',
|
| 491 |
+
scale=alt.Scale(domain=['negative', 'positive'], range=['hotpink', 'purple']),
|
| 492 |
+
legend=alt.Legend(orient='left')
|
| 493 |
+
)
|
| 494 |
+
).properties(width=250, height=250)
|
| 495 |
+
return
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
@app.cell(hide_code=True)
|
| 499 |
+
def _(mo):
|
| 500 |
+
mo.md(r"""
|
| 501 |
+
We can also remove a legend entirely by specifying `legend=None`:
|
| 502 |
+
""")
|
| 503 |
+
return
|
| 504 |
+
|
| 505 |
+
|
| 506 |
+
@app.cell
|
| 507 |
+
def _(alt, antibiotics):
|
| 508 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 509 |
+
alt.X('Neomycin:Q',
|
| 510 |
+
sort='descending',
|
| 511 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 512 |
+
axis=alt.Axis(tickCount=5),
|
| 513 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 514 |
+
alt.Y('Penicillin:Q',
|
| 515 |
+
sort='descending',
|
| 516 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 517 |
+
axis=alt.Axis(tickCount=5),
|
| 518 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 519 |
+
alt.Color('Gram_Staining:N',
|
| 520 |
+
scale=alt.Scale(domain=['negative', 'positive'], range=['hotpink', 'purple']),
|
| 521 |
+
legend=None
|
| 522 |
+
)
|
| 523 |
+
).properties(width=250, height=250)
|
| 524 |
+
return
|
| 525 |
+
|
| 526 |
+
|
| 527 |
+
@app.cell(hide_code=True)
|
| 528 |
+
def _(mo):
|
| 529 |
+
mo.md(r"""
|
| 530 |
+
### Color by Species
|
| 531 |
+
|
| 532 |
+
_So far we've considered the effectiveness of antibiotics. Let's turn around and ask a different question: what might antibiotic response teach us about the different species of bacteria?_
|
| 533 |
+
|
| 534 |
+
To start, let's encode `Bacteria` (a nominal data field) using the `color` channel:
|
| 535 |
+
""")
|
| 536 |
+
return
|
| 537 |
+
|
| 538 |
+
|
| 539 |
+
@app.cell
|
| 540 |
+
def _(alt, antibiotics):
|
| 541 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 542 |
+
alt.X('Neomycin:Q',
|
| 543 |
+
sort='descending',
|
| 544 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 545 |
+
axis=alt.Axis(tickCount=5),
|
| 546 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 547 |
+
alt.Y('Penicillin:Q',
|
| 548 |
+
sort='descending',
|
| 549 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 550 |
+
axis=alt.Axis(tickCount=5),
|
| 551 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 552 |
+
alt.Color('Bacteria:N')
|
| 553 |
+
).properties(width=250, height=250)
|
| 554 |
+
return
|
| 555 |
+
|
| 556 |
+
|
| 557 |
+
@app.cell(hide_code=True)
|
| 558 |
+
def _(mo):
|
| 559 |
+
mo.md(r"""
|
| 560 |
+
_The result is a bit of a mess!_ There are enough unique bacteria that Altair starts repeating colors from its default 10-color palette for nominal values.
|
| 561 |
+
|
| 562 |
+
To use custom colors, we can update the color encoding `scale` property. One option is to provide explicit scale `domain` and `range` values to indicate the precise color mappings per value, as we did above for Gram staining. Another option is to use an alternative color scheme. Altair includes a variety of built-in color schemes. For a complete list, see the [Vega color scheme documentation](https://vega.github.io/vega/docs/schemes/#reference).
|
| 563 |
+
|
| 564 |
+
Let's try switching to a built-in 20-color scheme, `tableau20`, and set that using the scale `scheme` property.
|
| 565 |
+
""")
|
| 566 |
+
return
|
| 567 |
+
|
| 568 |
+
|
| 569 |
+
@app.cell
|
| 570 |
+
def _(alt, antibiotics):
|
| 571 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 572 |
+
alt.X('Neomycin:Q',
|
| 573 |
+
sort='descending',
|
| 574 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 575 |
+
axis=alt.Axis(tickCount=5),
|
| 576 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 577 |
+
alt.Y('Penicillin:Q',
|
| 578 |
+
sort='descending',
|
| 579 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 580 |
+
axis=alt.Axis(tickCount=5),
|
| 581 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 582 |
+
alt.Color('Bacteria:N',
|
| 583 |
+
scale=alt.Scale(scheme='tableau20'))
|
| 584 |
+
).properties(width=250, height=250)
|
| 585 |
+
return
|
| 586 |
+
|
| 587 |
+
|
| 588 |
+
@app.cell(hide_code=True)
|
| 589 |
+
def _(mo):
|
| 590 |
+
mo.md(r"""
|
| 591 |
+
_We now have a unique color for each bacteria, but the chart is still a mess. Among other issues, the encoding takes no account of bacteria that belong to the same genus. In the chart above, the two different Salmonella strains have very different hues (teal and pink), despite being biological cousins._
|
| 592 |
+
|
| 593 |
+
To try a different scheme, we can also change the data type from nominal to ordinal. The default ordinal scheme uses blue shades, ramping from light to dark:
|
| 594 |
+
""")
|
| 595 |
+
return
|
| 596 |
+
|
| 597 |
+
|
| 598 |
+
@app.cell
|
| 599 |
+
def _(alt, antibiotics):
|
| 600 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 601 |
+
alt.X('Neomycin:Q',
|
| 602 |
+
sort='descending',
|
| 603 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 604 |
+
axis=alt.Axis(tickCount=5),
|
| 605 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 606 |
+
alt.Y('Penicillin:Q',
|
| 607 |
+
sort='descending',
|
| 608 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 609 |
+
axis=alt.Axis(tickCount=5),
|
| 610 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 611 |
+
alt.Color('Bacteria:O')
|
| 612 |
+
).properties(width=250, height=250)
|
| 613 |
+
return
|
| 614 |
+
|
| 615 |
+
|
| 616 |
+
@app.cell(hide_code=True)
|
| 617 |
+
def _(mo):
|
| 618 |
+
mo.md(r"""
|
| 619 |
+
_Some of those blue shades may be hard to distinguish._
|
| 620 |
+
|
| 621 |
+
For more differentiated colors, we can experiment with alternatives to the default `blues` color scheme. The `viridis` scheme ramps through both hue and luminance:
|
| 622 |
+
""")
|
| 623 |
+
return
|
| 624 |
+
|
| 625 |
+
|
| 626 |
+
@app.cell
|
| 627 |
+
def _(alt, antibiotics):
|
| 628 |
+
alt.Chart(antibiotics).mark_circle(size=80).encode(
|
| 629 |
+
alt.X('Neomycin:Q',
|
| 630 |
+
sort='descending',
|
| 631 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 632 |
+
axis=alt.Axis(tickCount=5),
|
| 633 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 634 |
+
alt.Y('Penicillin:Q',
|
| 635 |
+
sort='descending',
|
| 636 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 637 |
+
axis=alt.Axis(tickCount=5),
|
| 638 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 639 |
+
alt.Color('Bacteria:O',
|
| 640 |
+
scale=alt.Scale(scheme='viridis'))
|
| 641 |
+
).properties(width=250, height=250)
|
| 642 |
+
return
|
| 643 |
+
|
| 644 |
+
|
| 645 |
+
@app.cell(hide_code=True)
|
| 646 |
+
def _(mo):
|
| 647 |
+
mo.md(r"""
|
| 648 |
+
_Bacteria from the same genus now have more similar colors than before, but the chart still remains confusing. There are many colors, they are hard to look up in the legend accurately, and two bacteria may have similar colors but different genus._
|
| 649 |
+
""")
|
| 650 |
+
return
|
| 651 |
+
|
| 652 |
+
|
| 653 |
+
@app.cell(hide_code=True)
|
| 654 |
+
def _(mo):
|
| 655 |
+
mo.md(r"""
|
| 656 |
+
### Color by Genus
|
| 657 |
+
|
| 658 |
+
Let's try to color by genus instead of bacteria. To do so, we will add a `calculate` transform that splits up the bacteria name on space characters and takes the first word in the resulting array. We can then encode the resulting `Genus` field using the `tableau20` color scheme.
|
| 659 |
+
|
| 660 |
+
(Note that the antibiotics dataset includes a pre-calculated `Genus` field, but we will ignore it here in order to further explore Altair's data transformations.)
|
| 661 |
+
""")
|
| 662 |
+
return
|
| 663 |
+
|
| 664 |
+
|
| 665 |
+
@app.cell
|
| 666 |
+
def _(alt, antibiotics):
|
| 667 |
+
alt.Chart(antibiotics).mark_circle(size=80).transform_calculate(
|
| 668 |
+
Genus='split(datum.Bacteria, " ")[0]'
|
| 669 |
+
).encode(
|
| 670 |
+
alt.X('Neomycin:Q',
|
| 671 |
+
sort='descending',
|
| 672 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 673 |
+
axis=alt.Axis(tickCount=5),
|
| 674 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 675 |
+
alt.Y('Penicillin:Q',
|
| 676 |
+
sort='descending',
|
| 677 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 678 |
+
axis=alt.Axis(tickCount=5),
|
| 679 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 680 |
+
alt.Color('Genus:N',
|
| 681 |
+
scale=alt.Scale(scheme='tableau20'))
|
| 682 |
+
).properties(width=250, height=250)
|
| 683 |
+
return
|
| 684 |
+
|
| 685 |
+
|
| 686 |
+
@app.cell(hide_code=True)
|
| 687 |
+
def _(mo):
|
| 688 |
+
mo.md(r"""
|
| 689 |
+
_Hmm... While the data are better segregated by genus, this cacapohony of colors doesn't seem particularly useful._
|
| 690 |
+
|
| 691 |
+
_If we look at some of the previous charts carefully, we can see that only a handful of bacteria have a genus shared with another bacteria: Salmonella, Staphylococcus, and Streptococcus. To focus our comparison, we might add colors only for these repeated genus values._
|
| 692 |
+
|
| 693 |
+
Let's add another `calculate` transform that takes a genus name, keeps it if it is one of the repeated values, and otherwise uses the string `"Other"`.
|
| 694 |
+
|
| 695 |
+
In addition, we can add custom color encodings using explicit `domain` and `range` arrays for the color encoding `scale`.
|
| 696 |
+
""")
|
| 697 |
+
return
|
| 698 |
+
|
| 699 |
+
|
| 700 |
+
@app.cell
|
| 701 |
+
def _(alt, antibiotics):
|
| 702 |
+
alt.Chart(antibiotics).mark_circle(size=80).transform_calculate(
|
| 703 |
+
Split='split(datum.Bacteria, " ")[0]'
|
| 704 |
+
).transform_calculate(
|
| 705 |
+
Genus='indexof(["Salmonella", "Staphylococcus", "Streptococcus"], datum.Split) >= 0 ? datum.Split : "Other"'
|
| 706 |
+
).encode(
|
| 707 |
+
alt.X('Neomycin:Q',
|
| 708 |
+
sort='descending',
|
| 709 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 710 |
+
axis=alt.Axis(tickCount=5),
|
| 711 |
+
title='Neomycin MIC (μg/ml, reverse log scale)'),
|
| 712 |
+
alt.Y('Penicillin:Q',
|
| 713 |
+
sort='descending',
|
| 714 |
+
scale=alt.Scale(type='log', domain=[0.001, 1000]),
|
| 715 |
+
axis=alt.Axis(tickCount=5),
|
| 716 |
+
title='Penicillin MIC (μg/ml, reverse log scale)'),
|
| 717 |
+
alt.Color('Genus:N',
|
| 718 |
+
scale=alt.Scale(
|
| 719 |
+
domain=['Salmonella', 'Staphylococcus', 'Streptococcus', 'Other'],
|
| 720 |
+
range=['rgb(76,120,168)', 'rgb(84,162,75)', 'rgb(228,87,86)', 'rgb(121,112,110)']
|
| 721 |
+
))
|
| 722 |
+
).properties(width=250, height=250)
|
| 723 |
+
return
|
| 724 |
+
|
| 725 |
+
|
| 726 |
+
@app.cell(hide_code=True)
|
| 727 |
+
def _(mo):
|
| 728 |
+
mo.md(r"""
|
| 729 |
+
_We now have a much more revealing plot, made possible by customizations to the axes and legend. Take a moment to examine the plot above. Notice any surprising groupings?_
|
| 730 |
+
|
| 731 |
+
_The upper-left region has a cluster of red Streptococcus bacteria, but with a grey Other bacteria alongside them. Meanwhile, towards the middle-right we see another red Streptococcus placed far away from its "cousins". Might we expect bacteria from the same genus (and thus presumably more genetically similar) to be grouped closer together?_
|
| 732 |
+
|
| 733 |
+
As it so happens, the underlying dataset actually contains errors. The dataset reflects the species designations used in the early 1950s. However, the scientific consensus has since been overturned. That gray point in the upper-left? It's now considered a Streptococcus! That red point towards the middle-right? It's no longer considered a Streptococcus!
|
| 734 |
+
|
| 735 |
+
Of course, on its own, this dataset doesn't fully justify these reclassifications. Nevertheless, the data contain valuable biological clues that went overlooked for decades! Visualization, when used by an appropriately skilled and inquisitive viewer, can be a powerful tool for discovery.
|
| 736 |
+
|
| 737 |
+
This example also reinforces an important lesson: **_always be skeptical of your data!_**
|
| 738 |
+
""")
|
| 739 |
+
return
|
| 740 |
+
|
| 741 |
+
|
| 742 |
+
@app.cell(hide_code=True)
|
| 743 |
+
def _(mo):
|
| 744 |
+
mo.md(r"""
|
| 745 |
+
### Color by Antibiotic Response
|
| 746 |
+
|
| 747 |
+
We might also use the `color` channel to encode quantitative values. Though keep in mind that typically color is not as effective for conveying quantities as position or size encodings!
|
| 748 |
+
|
| 749 |
+
Here is a basic heatmap of penicillin MIC values for each bacteria. We'll use a `rect` mark and sort the bacteria by descending MIC values (from most to least resistant):
|
| 750 |
+
""")
|
| 751 |
+
return
|
| 752 |
+
|
| 753 |
+
|
| 754 |
+
@app.cell
|
| 755 |
+
def _(alt, antibiotics):
|
| 756 |
+
alt.Chart(antibiotics).mark_rect().encode(
|
| 757 |
+
alt.Y('Bacteria:N',
|
| 758 |
+
sort=alt.EncodingSortField(field='Penicillin', op='max', order='descending')
|
| 759 |
+
),
|
| 760 |
+
alt.Color('Penicillin:Q')
|
| 761 |
+
)
|
| 762 |
+
return
|
| 763 |
+
|
| 764 |
+
|
| 765 |
+
@app.cell(hide_code=True)
|
| 766 |
+
def _(mo):
|
| 767 |
+
mo.md(r"""
|
| 768 |
+
We can further improve this chart by combining features we've seen thus far: a log-transformed scale, a change of axis orientation, a custom color scheme (`plasma`), tick count adjustment, and custom title text. We'll also exercise configuration options to adjust the axis title placement and legend title alignment.
|
| 769 |
+
""")
|
| 770 |
+
return
|
| 771 |
+
|
| 772 |
+
|
| 773 |
+
@app.cell
|
| 774 |
+
def _(alt, antibiotics):
|
| 775 |
+
alt.Chart(antibiotics).mark_rect().encode(
|
| 776 |
+
alt.Y('Bacteria:N',
|
| 777 |
+
sort=alt.EncodingSortField(field='Penicillin', op='max', order='descending'),
|
| 778 |
+
axis=alt.Axis(
|
| 779 |
+
orient='right', # orient axis on right side of chart
|
| 780 |
+
titleX=7, # set x-position to 7 pixels right of chart
|
| 781 |
+
titleY=-2, # set y-position to 2 pixels above chart
|
| 782 |
+
titleAlign='left', # use left-aligned text
|
| 783 |
+
titleAngle=0 # undo default title rotation
|
| 784 |
+
)
|
| 785 |
+
),
|
| 786 |
+
alt.Color('Penicillin:Q',
|
| 787 |
+
scale=alt.Scale(type='log', scheme='plasma', nice=True),
|
| 788 |
+
legend=alt.Legend(titleOrient='right', tickCount=5),
|
| 789 |
+
title='Penicillin MIC (μg/ml)'
|
| 790 |
+
)
|
| 791 |
+
)
|
| 792 |
+
return
|
| 793 |
+
|
| 794 |
+
|
| 795 |
+
@app.cell(hide_code=True)
|
| 796 |
+
def _(mo):
|
| 797 |
+
mo.md(r"""
|
| 798 |
+
Alternatively, we can remove the axis title altogether, and use the top-level `title` property to add a title for the entire chart:
|
| 799 |
+
""")
|
| 800 |
+
return
|
| 801 |
+
|
| 802 |
+
|
| 803 |
+
@app.cell
|
| 804 |
+
def _(alt, antibiotics):
|
| 805 |
+
alt.Chart(antibiotics, title='Penicillin Resistance of Bacterial Strains').mark_rect().encode(
|
| 806 |
+
alt.Y('Bacteria:N',
|
| 807 |
+
sort=alt.EncodingSortField(field='Penicillin', op='max', order='descending'),
|
| 808 |
+
axis=alt.Axis(orient='right', title=None)
|
| 809 |
+
),
|
| 810 |
+
alt.Color('Penicillin:Q',
|
| 811 |
+
scale=alt.Scale(type='log', scheme='plasma', nice=True),
|
| 812 |
+
legend=alt.Legend(titleOrient='right', tickCount=5),
|
| 813 |
+
title='Penicillin MIC (μg/ml)'
|
| 814 |
+
)
|
| 815 |
+
).configure_title(
|
| 816 |
+
anchor='start', # anchor and left-align title
|
| 817 |
+
offset=5 # set title offset from chart
|
| 818 |
+
)
|
| 819 |
+
return
|
| 820 |
+
|
| 821 |
+
|
| 822 |
+
@app.cell(hide_code=True)
|
| 823 |
+
def _(mo):
|
| 824 |
+
mo.md(r"""
|
| 825 |
+
## Summary
|
| 826 |
+
|
| 827 |
+
Integrating what we've learned across the notebooks so far about encodings, data transforms, and customization, you should now be prepared to make a wide variety of statistical graphics. Now you can put Altair into everyday use for exploring and communicating data!
|
| 828 |
+
|
| 829 |
+
Interested in learning more about this topic?
|
| 830 |
+
|
| 831 |
+
- Start with the [Altair Customizing Visualizations documentation](https://altair-viz.github.io/user_guide/customization.html).
|
| 832 |
+
- For a complementary discussion of scale mappings, see ["Introducing d3-scale"](https://medium.com/@mbostock/introducing-d3-scale-61980c51545f).
|
| 833 |
+
- For a more in-depth exploration of all the ways axes and legends can be styled by the underlying Vega library (which powers Altair and Vega-Lite), see ["A Guide to Guides: Axes & Legends in Vega"](https://beta.observablehq.com/@jheer/a-guide-to-guides-axes-legends-in-vega).
|
| 834 |
+
- For a fascinating history of the antibiotics dataset, see [Wainer & Lysen's "That's Funny..."](https://www.americanscientist.org/article/thats-funny) in the _American Scientist_.
|
| 835 |
+
""")
|
| 836 |
+
return
|
| 837 |
+
|
| 838 |
+
|
| 839 |
+
if __name__ == "__main__":
|
| 840 |
+
app.run()
|
|
@@ -0,0 +1,818 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# ]
|
| 8 |
+
# ///
|
| 9 |
+
|
| 10 |
+
import marimo
|
| 11 |
+
|
| 12 |
+
__generated_with = "0.20.4"
|
| 13 |
+
app = marimo.App()
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@app.cell
|
| 17 |
+
def _():
|
| 18 |
+
import marimo as mo
|
| 19 |
+
|
| 20 |
+
return (mo,)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
@app.cell(hide_code=True)
|
| 24 |
+
def _(mo):
|
| 25 |
+
mo.md(r"""
|
| 26 |
+
# Multi-View Composition
|
| 27 |
+
|
| 28 |
+
When visualizing a number of different data fields, we might be tempted to use as many visual encoding channels as we can: `x`, `y`, `color`, `size`, `shape`, and so on. However, as the number of encoding channels increases, a chart can rapidly become cluttered and difficult to read. An alternative to "over-loading" a single chart is to instead _compose multiple charts_ in a way that facilitates rapid comparisons.
|
| 29 |
+
|
| 30 |
+
In this notebook, we will examine a variety of operations for _multi-view composition_:
|
| 31 |
+
|
| 32 |
+
- _layer_: place compatible charts directly on top of each other,
|
| 33 |
+
- _facet_: partition data into multiple charts, organized in rows or columns,
|
| 34 |
+
- _concatenate_: position arbitrary charts within a shared layout, and
|
| 35 |
+
- _repeat_: take a base chart specification and apply it to multiple data fields.
|
| 36 |
+
|
| 37 |
+
We'll then look at how these operations form a _view composition algebra_, in which the operations can be combined to build a variety of complex multi-view displays.
|
| 38 |
+
|
| 39 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 40 |
+
""")
|
| 41 |
+
return
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
@app.cell
|
| 45 |
+
def _():
|
| 46 |
+
import pandas as pd
|
| 47 |
+
import altair as alt
|
| 48 |
+
|
| 49 |
+
return alt, pd
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
@app.cell(hide_code=True)
|
| 53 |
+
def _(mo):
|
| 54 |
+
mo.md(r"""
|
| 55 |
+
## Weather Data
|
| 56 |
+
|
| 57 |
+
We will be visualizing weather statistics for the U.S. cities of Seattle and New York. Let's load the dataset and peek at the first and last 10 rows:
|
| 58 |
+
""")
|
| 59 |
+
return
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
@app.cell
|
| 63 |
+
def _():
|
| 64 |
+
weather = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/weather.csv'
|
| 65 |
+
return (weather,)
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
@app.cell
|
| 69 |
+
def _(pd, weather):
|
| 70 |
+
df = pd.read_csv(weather)
|
| 71 |
+
df.head(10)
|
| 72 |
+
return (df,)
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
@app.cell
|
| 76 |
+
def _(df):
|
| 77 |
+
df.tail(10)
|
| 78 |
+
return
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
@app.cell(hide_code=True)
|
| 82 |
+
def _(mo):
|
| 83 |
+
mo.md(r"""
|
| 84 |
+
We will create multi-view displays to examine weather within and across the cities.
|
| 85 |
+
""")
|
| 86 |
+
return
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
@app.cell(hide_code=True)
|
| 90 |
+
def _(mo):
|
| 91 |
+
mo.md(r"""
|
| 92 |
+
## Layer
|
| 93 |
+
""")
|
| 94 |
+
return
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
@app.cell(hide_code=True)
|
| 98 |
+
def _(mo):
|
| 99 |
+
mo.md(r"""
|
| 100 |
+
One of the most common ways of combining multiple charts is to *layer* marks on top of each other. If the underlying scale domains are compatible, we can merge them to form _shared axes_. If either of the `x` or `y` encodings is not compatible, we might instead create a _dual-axis chart_, which overlays marks using separate scales and axes.
|
| 101 |
+
""")
|
| 102 |
+
return
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
@app.cell(hide_code=True)
|
| 106 |
+
def _(mo):
|
| 107 |
+
mo.md(r"""
|
| 108 |
+
### Shared Axes
|
| 109 |
+
""")
|
| 110 |
+
return
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
@app.cell(hide_code=True)
|
| 114 |
+
def _(mo):
|
| 115 |
+
mo.md(r"""
|
| 116 |
+
Let's start by plotting the minimum and maximum average temperatures per month:
|
| 117 |
+
""")
|
| 118 |
+
return
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
@app.cell
|
| 122 |
+
def _(alt, weather):
|
| 123 |
+
alt.Chart(weather).mark_area().encode(
|
| 124 |
+
alt.X('month(date):T'),
|
| 125 |
+
alt.Y('average(temp_max):Q'),
|
| 126 |
+
alt.Y2('average(temp_min):Q')
|
| 127 |
+
)
|
| 128 |
+
return
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
@app.cell(hide_code=True)
|
| 132 |
+
def _(mo):
|
| 133 |
+
mo.md(r"""
|
| 134 |
+
_The plot shows us temperature ranges for each month over the entirety of our data. However, this is pretty misleading as it aggregates the measurements for both Seattle and New York!_
|
| 135 |
+
|
| 136 |
+
Let's subdivide the data by location using a color encoding, while also adjusting the mark opacity to accommodate overlapping areas:
|
| 137 |
+
""")
|
| 138 |
+
return
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
@app.cell
|
| 142 |
+
def _(alt, weather):
|
| 143 |
+
alt.Chart(weather).mark_area(opacity=0.3).encode(
|
| 144 |
+
alt.X('month(date):T'),
|
| 145 |
+
alt.Y('average(temp_max):Q'),
|
| 146 |
+
alt.Y2('average(temp_min):Q'),
|
| 147 |
+
alt.Color('location:N')
|
| 148 |
+
)
|
| 149 |
+
return
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
@app.cell(hide_code=True)
|
| 153 |
+
def _(mo):
|
| 154 |
+
mo.md(r"""
|
| 155 |
+
_We can see that Seattle is more temperate: warmer in the winter, and cooler in the summer._
|
| 156 |
+
|
| 157 |
+
In this case we've created a layered chart without any special features by simply subdividing the area marks by color. While the chart above shows us the temperature ranges, we might also want to emphasize the middle of the range.
|
| 158 |
+
|
| 159 |
+
Let's create a line chart showing the average temperature midpoint. We'll use a `calculate` transform to compute the midpoints between the minimum and maximum daily temperatures:
|
| 160 |
+
""")
|
| 161 |
+
return
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
@app.cell
|
| 165 |
+
def _(alt, weather):
|
| 166 |
+
alt.Chart(weather).mark_line().transform_calculate(
|
| 167 |
+
temp_mid='(+datum.temp_min + +datum.temp_max) / 2'
|
| 168 |
+
).encode(
|
| 169 |
+
alt.X('month(date):T'),
|
| 170 |
+
alt.Y('average(temp_mid):Q'),
|
| 171 |
+
alt.Color('location:N')
|
| 172 |
+
)
|
| 173 |
+
return
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
@app.cell(hide_code=True)
|
| 177 |
+
def _(mo):
|
| 178 |
+
mo.md(r"""
|
| 179 |
+
_Aside_: note the use of `+datum.temp_min` within the calculate transform. As we are loading the data directly from a CSV file without any special parsing instructions, the temperature values may be internally represented as string values. Adding the `+` in front of the value forces it to be treated as a number.
|
| 180 |
+
|
| 181 |
+
We'd now like to combine these charts by layering the midpoint lines over the range areas. Using the syntax `chart1 + chart2`, we can specify that we want a new layered chart in which `chart1` is the first layer and `chart2` is a second layer drawn on top:
|
| 182 |
+
""")
|
| 183 |
+
return
|
| 184 |
+
|
| 185 |
+
|
| 186 |
+
@app.cell
|
| 187 |
+
def _(alt, weather):
|
| 188 |
+
tempMinMax = alt.Chart(weather).mark_area(opacity=0.3).encode(
|
| 189 |
+
alt.X('month(date):T'),
|
| 190 |
+
alt.Y('average(temp_max):Q'),
|
| 191 |
+
alt.Y2('average(temp_min):Q'),
|
| 192 |
+
alt.Color('location:N')
|
| 193 |
+
)
|
| 194 |
+
|
| 195 |
+
tempMid = alt.Chart(weather).mark_line().transform_calculate(
|
| 196 |
+
temp_mid='(+datum.temp_min + +datum.temp_max) / 2'
|
| 197 |
+
).encode(
|
| 198 |
+
alt.X('month(date):T'),
|
| 199 |
+
alt.Y('average(temp_mid):Q'),
|
| 200 |
+
alt.Color('location:N')
|
| 201 |
+
)
|
| 202 |
+
|
| 203 |
+
tempMinMax + tempMid
|
| 204 |
+
return
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
@app.cell(hide_code=True)
|
| 208 |
+
def _(mo):
|
| 209 |
+
mo.md(r"""
|
| 210 |
+
_Now we have a multi-layer plot! However, the y-axis title (though informative) has become a bit long and unruly..._
|
| 211 |
+
|
| 212 |
+
Let's customize our axes to clean up the plot. If we set a custom axis title within one of the layers, it will automatically be used as a shared axis title for all the layers:
|
| 213 |
+
""")
|
| 214 |
+
return
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
@app.cell
|
| 218 |
+
def _(alt, weather):
|
| 219 |
+
tempMinMax_1 = alt.Chart(weather).mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature °C'), alt.Y2('average(temp_min):Q'), alt.Color('location:N'))
|
| 220 |
+
tempMid_1 = alt.Chart(weather).mark_line().transform_calculate(temp_mid='(+datum.temp_min + +datum.temp_max) / 2').encode(alt.X('month(date):T'), alt.Y('average(temp_mid):Q'), alt.Color('location:N'))
|
| 221 |
+
tempMinMax_1 + tempMid_1
|
| 222 |
+
return tempMid_1, tempMinMax_1
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
@app.cell(hide_code=True)
|
| 226 |
+
def _(mo):
|
| 227 |
+
mo.md(r"""
|
| 228 |
+
_What happens if both layers have custom axis titles? Modify the code above to find out..._
|
| 229 |
+
|
| 230 |
+
Above used the `+` operator, a convenient shorthand for Altair's `layer` method. We can generate an identical layered chart using the `layer` method directly:
|
| 231 |
+
""")
|
| 232 |
+
return
|
| 233 |
+
|
| 234 |
+
|
| 235 |
+
@app.cell
|
| 236 |
+
def _(alt, tempMid_1, tempMinMax_1):
|
| 237 |
+
alt.layer(tempMinMax_1, tempMid_1)
|
| 238 |
+
return
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
@app.cell(hide_code=True)
|
| 242 |
+
def _(mo):
|
| 243 |
+
mo.md(r"""
|
| 244 |
+
Note that the order of inputs to a layer matters, as subsequent layers will be drawn on top of earlier layers. _Try swapping the order of the charts in the cells above. What happens? (Hint: look closely at the color of the `line` marks.)_
|
| 245 |
+
""")
|
| 246 |
+
return
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
@app.cell(hide_code=True)
|
| 250 |
+
def _(mo):
|
| 251 |
+
mo.md(r"""
|
| 252 |
+
### Dual-Axis Charts
|
| 253 |
+
""")
|
| 254 |
+
return
|
| 255 |
+
|
| 256 |
+
|
| 257 |
+
@app.cell(hide_code=True)
|
| 258 |
+
def _(mo):
|
| 259 |
+
mo.md(r"""
|
| 260 |
+
_Seattle has a reputation as a rainy city. Is that deserved?_
|
| 261 |
+
|
| 262 |
+
Let's look at precipitation alongside temperature to learn more. First let's create a base plot the shows average monthly precipitation in Seattle:
|
| 263 |
+
""")
|
| 264 |
+
return
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
@app.cell
|
| 268 |
+
def _(alt, weather):
|
| 269 |
+
alt.Chart(weather).transform_filter(
|
| 270 |
+
'datum.location == "Seattle"'
|
| 271 |
+
).mark_line(
|
| 272 |
+
interpolate='monotone',
|
| 273 |
+
stroke='grey'
|
| 274 |
+
).encode(
|
| 275 |
+
alt.X('month(date):T', title=None),
|
| 276 |
+
alt.Y('average(precipitation):Q', title='Precipitation')
|
| 277 |
+
)
|
| 278 |
+
return
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
@app.cell(hide_code=True)
|
| 282 |
+
def _(mo):
|
| 283 |
+
mo.md(r"""
|
| 284 |
+
To facilitate comparison with the temperature data, let's create a new layered chart. Here's what happens if we try to layer the charts as we did earlier:
|
| 285 |
+
""")
|
| 286 |
+
return
|
| 287 |
+
|
| 288 |
+
|
| 289 |
+
@app.cell
|
| 290 |
+
def _(alt, weather):
|
| 291 |
+
tempMinMax_2 = alt.Chart(weather).transform_filter('datum.location == "Seattle"').mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature °C'), alt.Y2('average(temp_min):Q'))
|
| 292 |
+
_precip = alt.Chart(weather).transform_filter('datum.location == "Seattle"').mark_line(interpolate='monotone', stroke='grey').encode(alt.X('month(date):T'), alt.Y('average(precipitation):Q', title='Precipitation'))
|
| 293 |
+
alt.layer(tempMinMax_2, _precip)
|
| 294 |
+
return
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
@app.cell(hide_code=True)
|
| 298 |
+
def _(mo):
|
| 299 |
+
mo.md(r"""
|
| 300 |
+
_The precipitation values use a much smaller range of the y-axis then the temperatures!_
|
| 301 |
+
|
| 302 |
+
By default, layered charts use a *shared domain*: the values for the x-axis or y-axis are combined across all the layers to determine a shared extent. This default behavior assumes that the layered values have the same units. However, this doesn't hold up for this example, as we are combining temperature values (degrees Celsius) with precipitation values (inches)!
|
| 303 |
+
|
| 304 |
+
If we want to use different y-axis scales, we need to specify how we want Altair to *resolve* the data across layers. In this case, we want to resolve the y-axis `scale` domains to be `independent` rather than use a `shared` domain. The `Chart` object produced by a layer operator includes a `resolve_scale` method with which we can specify the desired resolution:
|
| 305 |
+
""")
|
| 306 |
+
return
|
| 307 |
+
|
| 308 |
+
|
| 309 |
+
@app.cell
|
| 310 |
+
def _(alt, weather):
|
| 311 |
+
tempMinMax_3 = alt.Chart(weather).transform_filter('datum.location == "Seattle"').mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature °C'), alt.Y2('average(temp_min):Q'))
|
| 312 |
+
_precip = alt.Chart(weather).transform_filter('datum.location == "Seattle"').mark_line(interpolate='monotone', stroke='grey').encode(alt.X('month(date):T'), alt.Y('average(precipitation):Q', title='Precipitation'))
|
| 313 |
+
alt.layer(tempMinMax_3, _precip).resolve_scale(y='independent')
|
| 314 |
+
return
|
| 315 |
+
|
| 316 |
+
|
| 317 |
+
@app.cell(hide_code=True)
|
| 318 |
+
def _(mo):
|
| 319 |
+
mo.md(r"""
|
| 320 |
+
_We can now see that autumn is the rainiest season in Seattle (peaking in November), complemented by dry summers._
|
| 321 |
+
|
| 322 |
+
You may have noticed some redundancy in our plot specifications above: both use the same dataset and the same filter to look at Seattle only. If you want, you can streamline the code a bit by providing the data and filter transform to the top-level layered chart. The individual layers will then inherit the data if they don't have their own data definitions:
|
| 323 |
+
""")
|
| 324 |
+
return
|
| 325 |
+
|
| 326 |
+
|
| 327 |
+
@app.cell
|
| 328 |
+
def _(alt, weather):
|
| 329 |
+
tempMinMax_4 = alt.Chart().mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature °C'), alt.Y2('average(temp_min):Q'))
|
| 330 |
+
_precip = alt.Chart().mark_line(interpolate='monotone', stroke='grey').encode(alt.X('month(date):T'), alt.Y('average(precipitation):Q', title='Precipitation'))
|
| 331 |
+
alt.layer(tempMinMax_4, _precip, data=weather).transform_filter('datum.location == "Seattle"').resolve_scale(y='independent')
|
| 332 |
+
return
|
| 333 |
+
|
| 334 |
+
|
| 335 |
+
@app.cell(hide_code=True)
|
| 336 |
+
def _(mo):
|
| 337 |
+
mo.md(r"""
|
| 338 |
+
While dual-axis charts can be useful, _they are often prone to misinterpretation_, as the different units and axis scales may be incommensurate. As is feasible, you might consider transformations that map different data fields to shared units, for example showing [quantiles](https://en.wikipedia.org/wiki/Quantile) or relative percentage change.
|
| 339 |
+
""")
|
| 340 |
+
return
|
| 341 |
+
|
| 342 |
+
|
| 343 |
+
@app.cell(hide_code=True)
|
| 344 |
+
def _(mo):
|
| 345 |
+
mo.md(r"""
|
| 346 |
+
## Facet
|
| 347 |
+
""")
|
| 348 |
+
return
|
| 349 |
+
|
| 350 |
+
|
| 351 |
+
@app.cell(hide_code=True)
|
| 352 |
+
def _(mo):
|
| 353 |
+
mo.md(r"""
|
| 354 |
+
*Faceting* involves subdividing a dataset into groups and creating a separate plot for each group. In earlier notebooks, we learned how to create faceted charts using the `row` and `column` encoding channels. We'll first review those channels and then show how they are instances of the more general `facet` operator.
|
| 355 |
+
|
| 356 |
+
Let's start with a basic histogram of maximum temperature values in Seattle:
|
| 357 |
+
""")
|
| 358 |
+
return
|
| 359 |
+
|
| 360 |
+
|
| 361 |
+
@app.cell
|
| 362 |
+
def _(alt, weather):
|
| 363 |
+
alt.Chart(weather).mark_bar().transform_filter(
|
| 364 |
+
'datum.location == "Seattle"'
|
| 365 |
+
).encode(
|
| 366 |
+
alt.X('temp_max:Q', bin=True, title='Temperature (°C)'),
|
| 367 |
+
alt.Y('count():Q')
|
| 368 |
+
)
|
| 369 |
+
return
|
| 370 |
+
|
| 371 |
+
|
| 372 |
+
@app.cell(hide_code=True)
|
| 373 |
+
def _(mo):
|
| 374 |
+
mo.md(r"""
|
| 375 |
+
_How does this temperature profile change based on the weather of a given day – that is, whether there was drizzle, fog, rain, snow, or sun?_
|
| 376 |
+
|
| 377 |
+
Let's use the `column` encoding channel to facet the data by weather type. We can also use `color` as a redundant encoding, using a customized color range:
|
| 378 |
+
""")
|
| 379 |
+
return
|
| 380 |
+
|
| 381 |
+
|
| 382 |
+
@app.cell
|
| 383 |
+
def _(alt, weather):
|
| 384 |
+
_colors = alt.Scale(domain=['drizzle', 'fog', 'rain', 'snow', 'sun'], range=['#aec7e8', '#c7c7c7', '#1f77b4', '#9467bd', '#e7ba52'])
|
| 385 |
+
alt.Chart(weather).mark_bar().transform_filter('datum.location == "Seattle"').encode(alt.X('temp_max:Q', bin=True, title='Temperature (°C)'), alt.Y('count():Q'), alt.Color('weather:N', scale=_colors), alt.Column('weather:N')).properties(width=150, height=150)
|
| 386 |
+
return
|
| 387 |
+
|
| 388 |
+
|
| 389 |
+
@app.cell(hide_code=True)
|
| 390 |
+
def _(mo):
|
| 391 |
+
mo.md(r"""
|
| 392 |
+
_Unsurprisingly, those rare snow days center on the coldest temperatures, followed by rainy and foggy days. Sunny days are warmer and, despite Seattle stereotypes, are the most plentiful. Though as any Seattleite can tell you, the drizzle occasionally comes, no matter the temperature!_
|
| 393 |
+
""")
|
| 394 |
+
return
|
| 395 |
+
|
| 396 |
+
|
| 397 |
+
@app.cell(hide_code=True)
|
| 398 |
+
def _(mo):
|
| 399 |
+
mo.md(r"""
|
| 400 |
+
In addition to `row` and `column` encoding channels *within* a chart definition, we can take a basic chart definition and apply faceting using an explicit `facet` operator.
|
| 401 |
+
|
| 402 |
+
Let's recreate the chart above, but this time using `facet`. We start with the same basic histogram definition, but remove the data source, filter transform, and column channel. We can then invoke the `facet` method, passing in the data and specifying that we should facet into columns according to the `weather` field. The `facet` method accepts both `row` and `column` arguments. The two can be used together to create a 2D grid of faceted plots.
|
| 403 |
+
|
| 404 |
+
Finally we include our filter transform, applying it to the top-level faceted chart. While we could apply the filter transform to the histogram definition as before, that is slightly less efficient. Rather than filter out "New York" values within each facet cell, applying the filter to the faceted chart lets Vega-Lite know that we can filter out those values up front, prior to the facet subdivision.
|
| 405 |
+
""")
|
| 406 |
+
return
|
| 407 |
+
|
| 408 |
+
|
| 409 |
+
@app.cell
|
| 410 |
+
def _(alt, weather):
|
| 411 |
+
_colors = alt.Scale(domain=['drizzle', 'fog', 'rain', 'snow', 'sun'], range=['#aec7e8', '#c7c7c7', '#1f77b4', '#9467bd', '#e7ba52'])
|
| 412 |
+
alt.Chart().mark_bar().encode(alt.X('temp_max:Q', bin=True, title='Temperature (°C)'), alt.Y('count():Q'), alt.Color('weather:N', scale=_colors)).properties(width=150, height=150).facet(data=weather, column='weather:N').transform_filter('datum.location == "Seattle"')
|
| 413 |
+
return
|
| 414 |
+
|
| 415 |
+
|
| 416 |
+
@app.cell(hide_code=True)
|
| 417 |
+
def _(mo):
|
| 418 |
+
mo.md(r"""
|
| 419 |
+
Given all the extra code above, why would we want to use an explicit `facet` operator? For basic charts, we should certainly use the `column` or `row` encoding channels if we can. However, using the `facet` operator explicitly is useful if we want to facet composed views, such as layered charts.
|
| 420 |
+
|
| 421 |
+
Let's revisit our layered temperature plots from earlier. Instead of plotting data for New York and Seattle in the same plot, let's break them up into separate facets. The individual chart definitions are nearly the same as before: one area chart and one line chart. The only difference is that this time we won't pass the data directly to the chart constructors; we'll wait and pass it to the facet operator later. We can layer the charts much as before, then invoke `facet` on the layered chart object, passing in the data and specifying `column` facets based on the `location` field:
|
| 422 |
+
""")
|
| 423 |
+
return
|
| 424 |
+
|
| 425 |
+
|
| 426 |
+
@app.cell
|
| 427 |
+
def _(alt, weather):
|
| 428 |
+
tempMinMax_5 = alt.Chart().mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature (°C)'), alt.Y2('average(temp_min):Q'), alt.Color('location:N'))
|
| 429 |
+
tempMid_2 = alt.Chart().mark_line().transform_calculate(temp_mid='(+datum.temp_min + +datum.temp_max) / 2').encode(alt.X('month(date):T'), alt.Y('average(temp_mid):Q'), alt.Color('location:N'))
|
| 430 |
+
alt.layer(tempMinMax_5, tempMid_2).facet(data=weather, column='location:N')
|
| 431 |
+
return
|
| 432 |
+
|
| 433 |
+
|
| 434 |
+
@app.cell(hide_code=True)
|
| 435 |
+
def _(mo):
|
| 436 |
+
mo.md(r"""
|
| 437 |
+
The faceted charts we have seen so far use the same axis scale domains across the facet cells. This default of using *shared* scales and axes helps aid accurate comparison of values. However, in some cases you may wish to scale each chart independently, for example if the range of values in the cells differs significantly.
|
| 438 |
+
|
| 439 |
+
Similar to layered charts, faceted charts also support _resolving_ to independent scales or axes across plots. Let's see what happens if we call the `resolve_axis` method to request `independent` y-axes:
|
| 440 |
+
""")
|
| 441 |
+
return
|
| 442 |
+
|
| 443 |
+
|
| 444 |
+
@app.cell
|
| 445 |
+
def _(alt, weather):
|
| 446 |
+
tempMinMax_6 = alt.Chart().mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature (°C)'), alt.Y2('average(temp_min):Q'), alt.Color('location:N'))
|
| 447 |
+
tempMid_3 = alt.Chart().mark_line().transform_calculate(temp_mid='(+datum.temp_min + +datum.temp_max) / 2').encode(alt.X('month(date):T'), alt.Y('average(temp_mid):Q'), alt.Color('location:N'))
|
| 448 |
+
alt.layer(tempMinMax_6, tempMid_3).facet(data=weather, column='location:N').resolve_axis(y='independent')
|
| 449 |
+
return
|
| 450 |
+
|
| 451 |
+
|
| 452 |
+
@app.cell(hide_code=True)
|
| 453 |
+
def _(mo):
|
| 454 |
+
mo.md(r"""
|
| 455 |
+
_The chart above looks largely unchanged, but the plot for Seattle now includes its own axis._
|
| 456 |
+
|
| 457 |
+
What if we instead call `resolve_scale` to resolve the underlying scale domains?
|
| 458 |
+
""")
|
| 459 |
+
return
|
| 460 |
+
|
| 461 |
+
|
| 462 |
+
@app.cell
|
| 463 |
+
def _(alt, weather):
|
| 464 |
+
tempMinMax_7 = alt.Chart().mark_area(opacity=0.3).encode(alt.X('month(date):T', title=None, axis=alt.Axis(format='%b')), alt.Y('average(temp_max):Q', title='Avg. Temperature (°C)'), alt.Y2('average(temp_min):Q'), alt.Color('location:N'))
|
| 465 |
+
tempMid_4 = alt.Chart().mark_line().transform_calculate(temp_mid='(+datum.temp_min + +datum.temp_max) / 2').encode(alt.X('month(date):T'), alt.Y('average(temp_mid):Q'), alt.Color('location:N'))
|
| 466 |
+
alt.layer(tempMinMax_7, tempMid_4).facet(data=weather, column='location:N').resolve_scale(y='independent')
|
| 467 |
+
return
|
| 468 |
+
|
| 469 |
+
|
| 470 |
+
@app.cell(hide_code=True)
|
| 471 |
+
def _(mo):
|
| 472 |
+
mo.md(r"""
|
| 473 |
+
_Now we see facet cells with different axis scale domains. In this case, using independent scales seems like a bad idea! The domains aren't very different, and one might be fooled into thinking that New York and Seattle have similar maximum summer temperatures._
|
| 474 |
+
|
| 475 |
+
To borrow a cliché: just because you *can* do something, doesn't mean you *should*...
|
| 476 |
+
""")
|
| 477 |
+
return
|
| 478 |
+
|
| 479 |
+
|
| 480 |
+
@app.cell(hide_code=True)
|
| 481 |
+
def _(mo):
|
| 482 |
+
mo.md(r"""
|
| 483 |
+
## Concatenate
|
| 484 |
+
""")
|
| 485 |
+
return
|
| 486 |
+
|
| 487 |
+
|
| 488 |
+
@app.cell(hide_code=True)
|
| 489 |
+
def _(mo):
|
| 490 |
+
mo.md(r"""
|
| 491 |
+
Faceting creates [small multiple](https://en.wikipedia.org/wiki/Small_multiple) plots that show separate subdivisions of the data. However, we might wish to create a multi-view display with different views of the *same* dataset (not subsets) or views involving *different* datasets.
|
| 492 |
+
|
| 493 |
+
Altair provides *concatenation* operators to combine arbitrary charts into a composed chart. The `hconcat` operator (shorthand `|` ) performs horizontal concatenation, while the `vconcat` operator (shorthand `&`) performs vertical concatenation.
|
| 494 |
+
""")
|
| 495 |
+
return
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
@app.cell(hide_code=True)
|
| 499 |
+
def _(mo):
|
| 500 |
+
mo.md(r"""
|
| 501 |
+
Let's start with a basic line chart showing the average maximum temperature per month for both New York and Seattle, much like we've seen before:
|
| 502 |
+
""")
|
| 503 |
+
return
|
| 504 |
+
|
| 505 |
+
|
| 506 |
+
@app.cell
|
| 507 |
+
def _(alt, weather):
|
| 508 |
+
alt.Chart(weather).mark_line().encode(
|
| 509 |
+
alt.X('month(date):T', title=None),
|
| 510 |
+
alt.Y('average(temp_max):Q'),
|
| 511 |
+
color='location:N'
|
| 512 |
+
)
|
| 513 |
+
return
|
| 514 |
+
|
| 515 |
+
|
| 516 |
+
@app.cell(hide_code=True)
|
| 517 |
+
def _(mo):
|
| 518 |
+
mo.md(r"""
|
| 519 |
+
_What if we want to compare not just temperature over time, but also precipitation and wind levels?_
|
| 520 |
+
|
| 521 |
+
Let's create a concatenated chart consisting of three plots. We'll start by defining a "base" chart definition that contains all the aspects that should be shared by our three plots. We can then modify this base chart to create customized variants, with different y-axis encodings for the `temp_max`, `precipitation`, and `wind` fields. We can then concatenate them using the pipe (`|`) shorthand operator:
|
| 522 |
+
""")
|
| 523 |
+
return
|
| 524 |
+
|
| 525 |
+
|
| 526 |
+
@app.cell
|
| 527 |
+
def _(alt, weather):
|
| 528 |
+
base = alt.Chart(weather).mark_line().encode(alt.X('month(date):T', title=None), color='location:N').properties(width=240, height=180)
|
| 529 |
+
temp = base.encode(alt.Y('average(temp_max):Q'))
|
| 530 |
+
_precip = base.encode(alt.Y('average(precipitation):Q'))
|
| 531 |
+
wind = base.encode(alt.Y('average(wind):Q'))
|
| 532 |
+
temp | _precip | wind
|
| 533 |
+
return
|
| 534 |
+
|
| 535 |
+
|
| 536 |
+
@app.cell(hide_code=True)
|
| 537 |
+
def _(mo):
|
| 538 |
+
mo.md(r"""
|
| 539 |
+
Alternatively, we could use the more explicit `alt.hconcat()` method in lieu of the pipe `|` operator. _Try rewriting the code above to use `hconcat` instead._
|
| 540 |
+
|
| 541 |
+
Vertical concatenation works similarly to horizontal concatenation. _Using the `&` operator (or `alt.vconcat` method), modify the code to use a vertical ordering instead of a horizontal ordering._
|
| 542 |
+
|
| 543 |
+
Finally, note that horizontal and vertical concatenation can be combined. _What happens if you write something like `(temp | precip) & wind`?_
|
| 544 |
+
|
| 545 |
+
_Aside_: Note the importance of those parentheses... what happens if you remove them? Keep in mind that these overloaded operators are still subject to [Python's operator precendence rules](https://docs.python.org/3/reference/expressions.html#operator-precedence), and so vertical concatenation with `&` will take precedence over horizontal concatenation with `|`!
|
| 546 |
+
|
| 547 |
+
As we will revisit later, concatenation operators let you combine any and all charts into a multi-view dashboard!
|
| 548 |
+
""")
|
| 549 |
+
return
|
| 550 |
+
|
| 551 |
+
|
| 552 |
+
@app.cell(hide_code=True)
|
| 553 |
+
def _(mo):
|
| 554 |
+
mo.md(r"""
|
| 555 |
+
## Repeat
|
| 556 |
+
""")
|
| 557 |
+
return
|
| 558 |
+
|
| 559 |
+
|
| 560 |
+
@app.cell(hide_code=True)
|
| 561 |
+
def _(mo):
|
| 562 |
+
mo.md(r"""
|
| 563 |
+
The concatenation operators above are quite general, allowing arbitrary charts to be composed. Nevertheless, the example above was still a bit verbose: we have three very similar charts, yet have to define them separately and then concatenate them.
|
| 564 |
+
|
| 565 |
+
For cases where only one or two variables are changing, the `repeat` operator provides a convenient shortcut for creating multiple charts. Given a *template* specification with some free variables, the repeat operator will then create a chart for each specified assignment to those variables.
|
| 566 |
+
|
| 567 |
+
Let's recreate our concatenation example above using the `repeat` operator. The only aspect that changes across charts is the choice of data field for the `y` encoding channel. To create a template specification, we can use the *repeater variable* `alt.repeat('column')` as our y-axis field. This code simply states that we want to use the variable assigned to the `column` repeater, which organizes repeated charts in a horizontal direction. (As the repeater provides the field name only, we have to specify the field data type separately as `type='quantitative'`.)
|
| 568 |
+
|
| 569 |
+
We then invoke the `repeat` method, passing in data field names for each column:
|
| 570 |
+
""")
|
| 571 |
+
return
|
| 572 |
+
|
| 573 |
+
|
| 574 |
+
@app.cell
|
| 575 |
+
def _(alt, weather):
|
| 576 |
+
alt.Chart(weather).mark_line().encode(
|
| 577 |
+
alt.X('month(date):T',title=None),
|
| 578 |
+
alt.Y(alt.repeat('column'), aggregate='average', type='quantitative'),
|
| 579 |
+
color='location:N'
|
| 580 |
+
).properties(
|
| 581 |
+
width=240,
|
| 582 |
+
height=180
|
| 583 |
+
).repeat(
|
| 584 |
+
column=['temp_max', 'precipitation', 'wind']
|
| 585 |
+
)
|
| 586 |
+
return
|
| 587 |
+
|
| 588 |
+
|
| 589 |
+
@app.cell(hide_code=True)
|
| 590 |
+
def _(mo):
|
| 591 |
+
mo.md(r"""
|
| 592 |
+
Repetition is supported for both columns and rows. _What happens if you modify the code above to use `row` instead of `column`?_
|
| 593 |
+
|
| 594 |
+
We can also use `row` and `column` repetition together! One common visualization for exploratory data analysis is the [scatter plot matrix (or SPLOM)](https://en.wikipedia.org/wiki/Scatter_plot#Scatterplot_matrices). Given a collection of variables to inspect, a SPLOM provides a grid of all pairwise plots of those variables, allowing us to assess potential associations.
|
| 595 |
+
|
| 596 |
+
Let's use the `repeat` operator to create a SPLOM for the `temp_max`, `precipitation`, and `wind` fields. We first create our template specification, with repeater variables for both the x- and y-axis data fields. We then invoke `repeat`, passing in arrays of field names to use for both `row` and `column`. Altair will then generate the [cross product (or, Cartesian product)](https://en.wikipedia.org/wiki/Cartesian_product) to create the full space of repeated charts:
|
| 597 |
+
""")
|
| 598 |
+
return
|
| 599 |
+
|
| 600 |
+
|
| 601 |
+
@app.cell
|
| 602 |
+
def _(alt, weather):
|
| 603 |
+
alt.Chart().mark_point(filled=True, size=15, opacity=0.5).encode(
|
| 604 |
+
alt.X(alt.repeat('column'), type='quantitative'),
|
| 605 |
+
alt.Y(alt.repeat('row'), type='quantitative')
|
| 606 |
+
).properties(
|
| 607 |
+
width=150,
|
| 608 |
+
height=150
|
| 609 |
+
).repeat(
|
| 610 |
+
data=weather,
|
| 611 |
+
row=['temp_max', 'precipitation', 'wind'],
|
| 612 |
+
column=['wind', 'precipitation', 'temp_max']
|
| 613 |
+
).transform_filter(
|
| 614 |
+
'datum.location == "Seattle"'
|
| 615 |
+
)
|
| 616 |
+
return
|
| 617 |
+
|
| 618 |
+
|
| 619 |
+
@app.cell(hide_code=True)
|
| 620 |
+
def _(mo):
|
| 621 |
+
mo.md(r"""
|
| 622 |
+
_Looking at these plots, there does not appear to be a strong association between precipitation and wind, though we do see that extreme wind and precipitation events occur in similar temperature ranges (~5-15° C). However, this observation is not particularly surprising: if we revisit our histogram at the beginning of the facet section, we can plainly see that the days with maximum temperatures in the range of 5-15° C are the most commonly occurring._
|
| 623 |
+
|
| 624 |
+
*Modify the code above to get a better understanding of chart repetition. Try adding another variable (`temp_min`) to the SPLOM. What happens if you rearrange the order of the field names in either the `row` or `column` parameters for the `repeat` operator?*
|
| 625 |
+
|
| 626 |
+
_Finally, to really appreciate what the `repeat` operator provides, take a moment to imagine how you might recreate the SPLOM above using only `hconcat` and `vconcat`!_
|
| 627 |
+
""")
|
| 628 |
+
return
|
| 629 |
+
|
| 630 |
+
|
| 631 |
+
@app.cell(hide_code=True)
|
| 632 |
+
def _(mo):
|
| 633 |
+
mo.md(r"""
|
| 634 |
+
## A View Composition Algebra
|
| 635 |
+
""")
|
| 636 |
+
return
|
| 637 |
+
|
| 638 |
+
|
| 639 |
+
@app.cell(hide_code=True)
|
| 640 |
+
def _(mo):
|
| 641 |
+
mo.md(r"""
|
| 642 |
+
Together, the composition operators `layer`, `facet`, `concat`, and `repeat` form a *view composition algebra*: the various operators can be combined to construct a variety of multi-view visualizations.
|
| 643 |
+
|
| 644 |
+
As an example, let's start with two basic charts: a histogram and a simple line (a single `rule` mark) showing a global average.
|
| 645 |
+
""")
|
| 646 |
+
return
|
| 647 |
+
|
| 648 |
+
|
| 649 |
+
@app.cell
|
| 650 |
+
def _(alt, weather):
|
| 651 |
+
basic1 = alt.Chart(weather).transform_filter(
|
| 652 |
+
'datum.location == "Seattle"'
|
| 653 |
+
).mark_bar().encode(
|
| 654 |
+
alt.X('month(date):O'),
|
| 655 |
+
alt.Y('average(temp_max):Q')
|
| 656 |
+
)
|
| 657 |
+
|
| 658 |
+
basic2 = alt.Chart(weather).transform_filter(
|
| 659 |
+
'datum.location == "Seattle"'
|
| 660 |
+
).mark_rule(stroke='firebrick').encode(
|
| 661 |
+
alt.Y('average(temp_max):Q')
|
| 662 |
+
)
|
| 663 |
+
|
| 664 |
+
basic1 | basic2
|
| 665 |
+
return
|
| 666 |
+
|
| 667 |
+
|
| 668 |
+
@app.cell(hide_code=True)
|
| 669 |
+
def _(mo):
|
| 670 |
+
mo.md(r"""
|
| 671 |
+
We can then combine the two charts using a `layer` operator, and then `repeat` that layered chart to show histograms with overlaid averages for multiple fields:
|
| 672 |
+
""")
|
| 673 |
+
return
|
| 674 |
+
|
| 675 |
+
|
| 676 |
+
@app.cell
|
| 677 |
+
def _(alt, weather):
|
| 678 |
+
alt.layer(
|
| 679 |
+
alt.Chart().mark_bar().encode(
|
| 680 |
+
alt.X('month(date):O', title='Month'),
|
| 681 |
+
alt.Y(alt.repeat('column'), aggregate='average', type='quantitative')
|
| 682 |
+
),
|
| 683 |
+
alt.Chart().mark_rule(stroke='firebrick').encode(
|
| 684 |
+
alt.Y(alt.repeat('column'), aggregate='average', type='quantitative')
|
| 685 |
+
)
|
| 686 |
+
).properties(
|
| 687 |
+
width=200,
|
| 688 |
+
height=150
|
| 689 |
+
).repeat(
|
| 690 |
+
data=weather,
|
| 691 |
+
column=['temp_max', 'precipitation', 'wind']
|
| 692 |
+
).transform_filter(
|
| 693 |
+
'datum.location == "Seattle"'
|
| 694 |
+
)
|
| 695 |
+
return
|
| 696 |
+
|
| 697 |
+
|
| 698 |
+
@app.cell(hide_code=True)
|
| 699 |
+
def _(mo):
|
| 700 |
+
mo.md(r"""
|
| 701 |
+
Focusing only on the multi-view composition operators, the model for the visualization above is:
|
| 702 |
+
|
| 703 |
+
```
|
| 704 |
+
repeat(column=[...])
|
| 705 |
+
|- layer
|
| 706 |
+
|- basic1
|
| 707 |
+
|- basic2
|
| 708 |
+
```
|
| 709 |
+
|
| 710 |
+
Now let's explore how we can apply *all* the operators within a final [dashboard](https://en.wikipedia.org/wiki/Dashboard_%28business%29) that provides an overview of Seattle weather. We'll combine the SPLOM and faceted histogram displays from earlier sections with the repeated histograms above:
|
| 711 |
+
""")
|
| 712 |
+
return
|
| 713 |
+
|
| 714 |
+
|
| 715 |
+
@app.cell
|
| 716 |
+
def _(alt, weather):
|
| 717 |
+
splom = alt.Chart().mark_point(filled=True, size=15, opacity=0.5).encode(
|
| 718 |
+
alt.X(alt.repeat('column'), type='quantitative'),
|
| 719 |
+
alt.Y(alt.repeat('row'), type='quantitative')
|
| 720 |
+
).properties(
|
| 721 |
+
width=125,
|
| 722 |
+
height=125
|
| 723 |
+
).repeat(
|
| 724 |
+
row=['temp_max', 'precipitation', 'wind'],
|
| 725 |
+
column=['wind', 'precipitation', 'temp_max']
|
| 726 |
+
)
|
| 727 |
+
|
| 728 |
+
dateHist = alt.layer(
|
| 729 |
+
alt.Chart().mark_bar().encode(
|
| 730 |
+
alt.X('month(date):O', title='Month'),
|
| 731 |
+
alt.Y(alt.repeat('row'), aggregate='average', type='quantitative')
|
| 732 |
+
),
|
| 733 |
+
alt.Chart().mark_rule(stroke='firebrick').encode(
|
| 734 |
+
alt.Y(alt.repeat('row'), aggregate='average', type='quantitative')
|
| 735 |
+
)
|
| 736 |
+
).properties(
|
| 737 |
+
width=175,
|
| 738 |
+
height=125
|
| 739 |
+
).repeat(
|
| 740 |
+
row=['temp_max', 'precipitation', 'wind']
|
| 741 |
+
)
|
| 742 |
+
|
| 743 |
+
tempHist = alt.Chart(weather).mark_bar().encode(
|
| 744 |
+
alt.X('temp_max:Q', bin=True, title='Temperature (°C)'),
|
| 745 |
+
alt.Y('count():Q'),
|
| 746 |
+
alt.Color('weather:N', scale=alt.Scale(
|
| 747 |
+
domain=['drizzle', 'fog', 'rain', 'snow', 'sun'],
|
| 748 |
+
range=['#aec7e8', '#c7c7c7', '#1f77b4', '#9467bd', '#e7ba52']
|
| 749 |
+
))
|
| 750 |
+
).properties(
|
| 751 |
+
width=115,
|
| 752 |
+
height=100
|
| 753 |
+
).facet(
|
| 754 |
+
column='weather:N'
|
| 755 |
+
)
|
| 756 |
+
|
| 757 |
+
alt.vconcat(
|
| 758 |
+
alt.hconcat(splom, dateHist),
|
| 759 |
+
tempHist,
|
| 760 |
+
data=weather,
|
| 761 |
+
title='Seattle Weather Dashboard'
|
| 762 |
+
).transform_filter(
|
| 763 |
+
'datum.location == "Seattle"'
|
| 764 |
+
).resolve_legend(
|
| 765 |
+
color='independent'
|
| 766 |
+
).configure_axis(
|
| 767 |
+
labelAngle=0
|
| 768 |
+
)
|
| 769 |
+
return
|
| 770 |
+
|
| 771 |
+
|
| 772 |
+
@app.cell(hide_code=True)
|
| 773 |
+
def _(mo):
|
| 774 |
+
mo.md(r"""
|
| 775 |
+
The full composition model for this dashboard is:
|
| 776 |
+
|
| 777 |
+
```
|
| 778 |
+
vconcat
|
| 779 |
+
|- hconcat
|
| 780 |
+
| |- repeat(row=[...], column=[...])
|
| 781 |
+
| | |- splom base chart
|
| 782 |
+
| |- repeat(row=[...])
|
| 783 |
+
| |- layer
|
| 784 |
+
| |- dateHist base chart 1
|
| 785 |
+
| |- dateHist base chart 2
|
| 786 |
+
|- facet(column='weather')
|
| 787 |
+
|- tempHist base chart
|
| 788 |
+
```
|
| 789 |
+
|
| 790 |
+
_Phew!_ The dashboard also includes a few customizations to improve the layout:
|
| 791 |
+
|
| 792 |
+
- We adjust chart `width` and `height` properties to assist alignment and ensure the full visualization fits on the screen.
|
| 793 |
+
- We add `resolve_legend(color='independent')` to ensure the color legend is associated directly with the colored histograms by temperature. Otherwise, the legend will resolve to the dashboard as a whole.
|
| 794 |
+
- We use `configure_axis(labelAngle=0)` to ensure that no axis labels are rotated. This helps to ensure proper alignment among the scatter plots in the SPLOM and the histograms by month on the right.
|
| 795 |
+
|
| 796 |
+
_Try removing or modifying any of these adjustments and see how the dashboard layout responds!_
|
| 797 |
+
|
| 798 |
+
This dashboard can be reused to show data for other locations or from other datasets. _Update the dashboard to show weather patterns for New York instead of Seattle._
|
| 799 |
+
""")
|
| 800 |
+
return
|
| 801 |
+
|
| 802 |
+
|
| 803 |
+
@app.cell(hide_code=True)
|
| 804 |
+
def _(mo):
|
| 805 |
+
mo.md(r"""
|
| 806 |
+
## Summary
|
| 807 |
+
|
| 808 |
+
For more details on multi-view composition, including control over sub-plot spacing and header labels, see the [Altair Compound Charts documentation](https://altair-viz.github.io/user_guide/compound_charts.html).
|
| 809 |
+
|
| 810 |
+
Now that we've seen how to compose multiple views, we're ready to put them into action. In addition to statically presenting data, multiple views can enable interactive multi-dimensional exploration. For example, using _linked selections_ we can highlight points in one view to see corresponding values highlight in other views.
|
| 811 |
+
|
| 812 |
+
In the next notebook, we'll examine how to author *interactive selections* for both individual plots and multi-view compositions.
|
| 813 |
+
""")
|
| 814 |
+
return
|
| 815 |
+
|
| 816 |
+
|
| 817 |
+
if __name__ == "__main__":
|
| 818 |
+
app.run()
|
|
@@ -0,0 +1,671 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# ]
|
| 8 |
+
# ///
|
| 9 |
+
|
| 10 |
+
import marimo
|
| 11 |
+
|
| 12 |
+
__generated_with = "0.20.4"
|
| 13 |
+
app = marimo.App()
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@app.cell
|
| 17 |
+
def _():
|
| 18 |
+
import marimo as mo
|
| 19 |
+
|
| 20 |
+
return (mo,)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
@app.cell(hide_code=True)
|
| 24 |
+
def _(mo):
|
| 25 |
+
mo.md(r"""
|
| 26 |
+
# Interaction
|
| 27 |
+
|
| 28 |
+
_“A graphic is not ‘drawn’ once and for all; it is ‘constructed’ and reconstructed until it reveals all the relationships constituted by the interplay of the data. The best graphic operations are those carried out by the decision-maker themself.”_ — [Jacques Bertin](https://books.google.com/books?id=csqX_xnm4tcC)
|
| 29 |
+
|
| 30 |
+
Visualization provides a powerful means of making sense of data. A single image, however, typically provides answers to, at best, a handful of questions. Through _interaction_ we can transform static images into tools for exploration: highlighting points of interest, zooming in to reveal finer-grained patterns, and linking across multiple views to reason about multi-dimensional relationships.
|
| 31 |
+
|
| 32 |
+
At the core of interaction is the notion of a _selection_: a means of indicating to the computer which elements or regions we are interested in. For example, we might hover the mouse over a point, click multiple marks, or draw a bounding box around a region to highlight subsets of the data for further scrutiny.
|
| 33 |
+
|
| 34 |
+
Alongside visual encodings and data transformations, Altair provides a _selection_ abstraction for authoring interactions. These selections encompass three aspects:
|
| 35 |
+
|
| 36 |
+
1. Input event handling to select points or regions of interest, such as mouse hover, click, drag, scroll, and touch events.
|
| 37 |
+
2. Generalizing from the input to form a selection rule (or [_predicate_](https://en.wikipedia.org/wiki/Predicate_%28mathematical_logic%29)) that determines whether or not a given data record lies within the selection.
|
| 38 |
+
3. Using the selection predicate to dynamically configure a visualization by driving _conditional encodings_, _filter transforms_, or _scale domains_.
|
| 39 |
+
|
| 40 |
+
This notebook introduces interactive selections and explores how to use them to author a variety of interaction techniques, such as dynamic queries, panning & zooming, details-on-demand, and brushing & linking.
|
| 41 |
+
|
| 42 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 43 |
+
""")
|
| 44 |
+
return
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
@app.cell
|
| 48 |
+
def _():
|
| 49 |
+
import pandas as pd
|
| 50 |
+
import altair as alt
|
| 51 |
+
|
| 52 |
+
return alt, pd
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
@app.cell(hide_code=True)
|
| 56 |
+
def _(mo):
|
| 57 |
+
mo.md(r"""
|
| 58 |
+
## Datasets
|
| 59 |
+
""")
|
| 60 |
+
return
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
@app.cell(hide_code=True)
|
| 64 |
+
def _(mo):
|
| 65 |
+
mo.md(r"""
|
| 66 |
+
We will visualize a variety of datasets from the [vega-datasets](https://github.com/vega/vega-datasets) collection:
|
| 67 |
+
|
| 68 |
+
- A dataset of `cars` from the 1970s and early 1980s,
|
| 69 |
+
- A dataset of `movies`, previously used in the [Data Transformation](https://github.com/uwdata/visualization-curriculum/blob/master/altair_data_transformation.ipynb) notebook,
|
| 70 |
+
- A dataset containing ten years of [S&P 500](https://en.wikipedia.org/wiki/S%26P_500_Index) (`sp500`) stock prices,
|
| 71 |
+
- A dataset of technology company `stocks`, and
|
| 72 |
+
- A dataset of `flights`, including departure time, distance, and arrival delay.
|
| 73 |
+
""")
|
| 74 |
+
return
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
@app.cell
|
| 78 |
+
def _():
|
| 79 |
+
cars = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/cars.json'
|
| 80 |
+
movies = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/movies.json'
|
| 81 |
+
sp500 = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/sp500.csv'
|
| 82 |
+
stocks = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/stocks.csv'
|
| 83 |
+
flights = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/flights-5k.json'
|
| 84 |
+
return cars, flights, movies, sp500, stocks
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
@app.cell(hide_code=True)
|
| 88 |
+
def _(mo):
|
| 89 |
+
mo.md(r"""
|
| 90 |
+
## Introducing Selections
|
| 91 |
+
""")
|
| 92 |
+
return
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
@app.cell(hide_code=True)
|
| 96 |
+
def _(mo):
|
| 97 |
+
mo.md(r"""
|
| 98 |
+
Let's start with a basic selection: simply clicking a point to highlight it. Using the `cars` dataset, we'll start with a scatter plot of horsepower versus miles per gallon, with a color encoding for the number cylinders in the car engine.
|
| 99 |
+
|
| 100 |
+
In addition, we'll create a selection instance by calling `alt.selection_single()`, indicating we want a selection defined over a _single value_. By default, the selection uses a mouse click to determine the selected value. To register a selection with a chart, we must add it using the `.add_params()` method.
|
| 101 |
+
|
| 102 |
+
Once our selection has been defined, we can use it as a parameter for _conditional encodings_, which apply a different encoding depending on whether a data record lies in or out of the selection. For example, consider the following code:
|
| 103 |
+
|
| 104 |
+
~~~ python
|
| 105 |
+
color=alt.condition(selection, 'Cylinders:O', alt.value('grey'))
|
| 106 |
+
~~~
|
| 107 |
+
|
| 108 |
+
This encoding definition states that data points contained within the `selection` should be colored according to the `Cylinder` field, while non-selected data points should use a default `grey`. An empty selection includes _all_ data points, and so initially all points will be colored.
|
| 109 |
+
|
| 110 |
+
_Try clicking different points in the chart below. What happens? (Click the background to clear the selection state and return to an "empty" selection.)_
|
| 111 |
+
""")
|
| 112 |
+
return
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
@app.cell
|
| 116 |
+
def _(alt, cars):
|
| 117 |
+
_selection = alt.selection_point(toggle=False)
|
| 118 |
+
alt.Chart(cars).mark_circle().add_params(_selection).encode(x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.condition(_selection, 'Cylinders:O', alt.value('grey')), opacity=alt.condition(_selection, alt.value(0.8), alt.value(0.1)))
|
| 119 |
+
return
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
@app.cell(hide_code=True)
|
| 123 |
+
def _(mo):
|
| 124 |
+
mo.md(r"""
|
| 125 |
+
Of course, highlighting individual data points one-at-a-time is not particularly exciting! As we'll see, however, single value selections provide a useful building block for more powerful interactions. Moreover, single value selections are just one of the three selection types provided by Altair:
|
| 126 |
+
|
| 127 |
+
- `selection_single` - select a single discrete value, by default on click events.
|
| 128 |
+
- `selection_multi` - select multiple discrete values. The first value is selected on mouse click and additional values toggled using shift-click.
|
| 129 |
+
- `selection_interval` - select a continuous range of values, initiated by mouse drag.
|
| 130 |
+
|
| 131 |
+
Let's compare each of these selection types side-by-side. To keep our code tidy we'll first define a function (`plot`) that generates a scatter plot specification just like the one above. We can pass a selection to the `plot` function to have it applied to the chart:
|
| 132 |
+
""")
|
| 133 |
+
return
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
@app.cell
|
| 137 |
+
def _(alt, cars):
|
| 138 |
+
def plot(selection):
|
| 139 |
+
return alt.Chart(cars).mark_circle().add_params(selection).encode(x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.condition(selection, 'Cylinders:O', alt.value('grey')), opacity=alt.condition(selection, alt.value(0.8), alt.value(0.1))).properties(width=240, height=180)
|
| 140 |
+
|
| 141 |
+
return (plot,)
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
@app.cell(hide_code=True)
|
| 145 |
+
def _(mo):
|
| 146 |
+
mo.md(r"""
|
| 147 |
+
Let's use our `plot` function to create three chart variants, one per selection type.
|
| 148 |
+
|
| 149 |
+
The first (`single`) chart replicates our earlier example. The second (`multi`) chart supports shift-click interactions to toggle inclusion of multiple points within the selection. The third (`interval`) chart generates a selection region (or _brush_) upon mouse drag. Once created, you can drag the brush around to select different points, or scroll when the cursor is inside the brush to scale (zoom) the brush size.
|
| 150 |
+
|
| 151 |
+
_Try interacting with each of the charts below!_
|
| 152 |
+
""")
|
| 153 |
+
return
|
| 154 |
+
|
| 155 |
+
|
| 156 |
+
@app.cell
|
| 157 |
+
def _(alt, plot):
|
| 158 |
+
alt.hconcat(
|
| 159 |
+
plot(alt.selection_point(toggle=False)).properties(title='Single (Click)'),
|
| 160 |
+
plot(alt.selection_point()).properties(title='Multi (Shift-Click)'),
|
| 161 |
+
plot(alt.selection_interval()).properties(title='Interval (Drag)')
|
| 162 |
+
)
|
| 163 |
+
return
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
@app.cell(hide_code=True)
|
| 167 |
+
def _(mo):
|
| 168 |
+
mo.md(r"""
|
| 169 |
+
The examples above use default interactions (click, shift-click, drag) for each selection type. We can further customize the interactions by providing input event specifications using [Vega event selector syntax](https://vega.github.io/vega/docs/event-streams/). For example, we can modify our `single` and `multi` charts to trigger upon `mouseover` events instead of `click` events.
|
| 170 |
+
|
| 171 |
+
_Hold down the shift key in the second chart to "paint" with data!_
|
| 172 |
+
""")
|
| 173 |
+
return
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
@app.cell
|
| 177 |
+
def _(alt, plot):
|
| 178 |
+
alt.hconcat(
|
| 179 |
+
plot(alt.selection_point(toggle=False, on='mouseover')).properties(title='Single (Mouseover)'),
|
| 180 |
+
plot(alt.selection_point(on='mouseover')).properties(title='Multi (Shift-Mouseover)')
|
| 181 |
+
)
|
| 182 |
+
return
|
| 183 |
+
|
| 184 |
+
|
| 185 |
+
@app.cell(hide_code=True)
|
| 186 |
+
def _(mo):
|
| 187 |
+
mo.md(r"""
|
| 188 |
+
Now that we've covered the basics of Altair selections, let's take a tour through the various interaction techniques they enable!
|
| 189 |
+
""")
|
| 190 |
+
return
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
@app.cell(hide_code=True)
|
| 194 |
+
def _(mo):
|
| 195 |
+
mo.md(r"""
|
| 196 |
+
## Dynamic Queries
|
| 197 |
+
""")
|
| 198 |
+
return
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
@app.cell(hide_code=True)
|
| 202 |
+
def _(mo):
|
| 203 |
+
mo.md(r"""
|
| 204 |
+
_Dynamic queries_ enables rapid, reversible exploration of data to isolate patterns of interest. As defined by [Ahlberg, Williamson, & Shneiderman](https://www.cs.umd.edu/~ben/papers/Ahlberg1992Dynamic.pdf), a dynamic query:
|
| 205 |
+
|
| 206 |
+
- represents a query graphically,
|
| 207 |
+
- provides visible limits on the query range,
|
| 208 |
+
- provides a graphical representation of the data and query result,
|
| 209 |
+
- gives immediate feedback of the result after every query adjustment,
|
| 210 |
+
- and allows novice users to begin working with little training.
|
| 211 |
+
|
| 212 |
+
A common approach is to manipulate query parameters using standard user interface widgets such as sliders, radio buttons, and drop-down menus. To generate dynamic query widgets, we can apply a selection's `bind` operation to one or more data fields we wish to query.
|
| 213 |
+
|
| 214 |
+
Let's build an interactive scatter plot that uses a dynamic query to filter the display. Given a scatter plot of movie ratings (from Rotten Tomates and IMDB), we can add a selection over the `Major_Genre` field to enable interactive filtering by film genre.
|
| 215 |
+
""")
|
| 216 |
+
return
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
@app.cell(hide_code=True)
|
| 220 |
+
def _(mo):
|
| 221 |
+
mo.md(r"""
|
| 222 |
+
To start, let's extract the unique (non-null) genres from the `movies` data:
|
| 223 |
+
""")
|
| 224 |
+
return
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
@app.cell
|
| 228 |
+
def _(movies, pd):
|
| 229 |
+
df = pd.read_json(movies) # load movies data
|
| 230 |
+
genres = df['Major_Genre'].unique() # get unique field values
|
| 231 |
+
genres = list(filter(pd.notna, genres)) # filter out None/NaN values
|
| 232 |
+
genres.sort() # sort alphabetically
|
| 233 |
+
return (genres,)
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
@app.cell(hide_code=True)
|
| 237 |
+
def _(mo):
|
| 238 |
+
mo.md(r"""
|
| 239 |
+
For later use, let's also define a list of unique `MPAA_Rating` values:
|
| 240 |
+
""")
|
| 241 |
+
return
|
| 242 |
+
|
| 243 |
+
|
| 244 |
+
@app.cell
|
| 245 |
+
def _():
|
| 246 |
+
mpaa = ['G', 'PG', 'PG-13', 'R', 'NC-17', 'Not Rated']
|
| 247 |
+
return (mpaa,)
|
| 248 |
+
|
| 249 |
+
|
| 250 |
+
@app.cell(hide_code=True)
|
| 251 |
+
def _(mo):
|
| 252 |
+
mo.md(r"""
|
| 253 |
+
Now let's create a `single` selection bound to a drop-down menu.
|
| 254 |
+
|
| 255 |
+
*Use the dynamic query menu below to explore the data. How do ratings vary by genre? How would you revise the code to filter `MPAA_Rating` (G, PG, PG-13, etc.) instead of `Major_Genre`?*
|
| 256 |
+
""")
|
| 257 |
+
return
|
| 258 |
+
|
| 259 |
+
|
| 260 |
+
@app.cell
|
| 261 |
+
def _(alt, genres, movies):
|
| 262 |
+
selectGenre = alt.selection_point(
|
| 263 |
+
toggle=False,
|
| 264 |
+
name='Select', # name the selection 'Select'
|
| 265 |
+
fields=['Major_Genre'], # limit selection to the Major_Genre field
|
| 266 |
+
value=[{'Major_Genre': genres[0]}], # use first genre entry as initial value
|
| 267 |
+
bind=alt.binding_select(options=genres) # bind to a menu of unique genre values
|
| 268 |
+
)
|
| 269 |
+
|
| 270 |
+
alt.Chart(movies).mark_circle().add_params(
|
| 271 |
+
selectGenre
|
| 272 |
+
).encode(
|
| 273 |
+
x='Rotten_Tomatoes_Rating:Q',
|
| 274 |
+
y='IMDB_Rating:Q',
|
| 275 |
+
tooltip='Title:N',
|
| 276 |
+
opacity=alt.condition(selectGenre, alt.value(0.75), alt.value(0.05))
|
| 277 |
+
)
|
| 278 |
+
return
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
@app.cell(hide_code=True)
|
| 282 |
+
def _(mo):
|
| 283 |
+
mo.md(r"""
|
| 284 |
+
Our construction above leverages multiple aspects of selections:
|
| 285 |
+
|
| 286 |
+
- We give the selection a name (`'Select'`). This name is not required, but allows us to influence the label text of the generated dynamic query menu. (_What happens if you remove the name? Try it!_)
|
| 287 |
+
- We constrain the selection to a specific data field (`Major_Genre`). Earlier when we used a `single` selection, the selection mapped to individual data points. By limiting the selection to a specific field, we can select _all_ data points whose `Major_Genre` field value matches the single selected value.
|
| 288 |
+
- We initialize `init=...` the selection to a starting value.
|
| 289 |
+
- We `bind` the selection to an interface widget, in this case a drop-down menu via `binding_select`.
|
| 290 |
+
- As before, we then use a conditional encoding to control the opacity channel.
|
| 291 |
+
""")
|
| 292 |
+
return
|
| 293 |
+
|
| 294 |
+
|
| 295 |
+
@app.cell(hide_code=True)
|
| 296 |
+
def _(mo):
|
| 297 |
+
mo.md(r"""
|
| 298 |
+
### Binding Selections to Multiple Inputs
|
| 299 |
+
|
| 300 |
+
One selection instance can be bound to _multiple_ dynamic query widgets. Let's modify the example above to provide filters for _both_ `Major_Genre` and `MPAA_Rating`, using radio buttons instead of a menu. Our `single` selection is now defined over a single _pair_ of genre and MPAA rating values
|
| 301 |
+
|
| 302 |
+
_Look for surprising conjunctions of genre and rating. Are there any G or PG-rated horror films?_
|
| 303 |
+
""")
|
| 304 |
+
return
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
@app.cell
|
| 308 |
+
def _(alt, genres, movies, mpaa):
|
| 309 |
+
# single-value selection over [Major_Genre, MPAA_Rating] pairs
|
| 310 |
+
# use specific hard-wired values as the initial selected values
|
| 311 |
+
_selection = alt.selection_point(toggle=False, name='Select', fields=['Major_Genre', 'MPAA_Rating'], value=[{'Major_Genre': 'Drama', 'MPAA_Rating': 'R'}], bind={'Major_Genre': alt.binding_select(options=genres), 'MPAA_Rating': alt.binding_radio(options=mpaa)})
|
| 312 |
+
# scatter plot, modify opacity based on selection
|
| 313 |
+
alt.Chart(movies).mark_circle().add_params(_selection).encode(x='Rotten_Tomatoes_Rating:Q', y='IMDB_Rating:Q', tooltip='Title:N', opacity=alt.condition(_selection, alt.value(0.75), alt.value(0.05)))
|
| 314 |
+
return
|
| 315 |
+
|
| 316 |
+
|
| 317 |
+
@app.cell(hide_code=True)
|
| 318 |
+
def _(mo):
|
| 319 |
+
mo.md(r"""
|
| 320 |
+
_Fun facts: The PG-13 rating didn't exist when the movies [Jaws](https://www.imdb.com/title/tt0073195/) and [Jaws 2](https://www.imdb.com/title/tt0077766/) were released. The first film to receive a PG-13 rating was 1984's [Red Dawn](https://www.imdb.com/title/tt0087985/)._
|
| 321 |
+
""")
|
| 322 |
+
return
|
| 323 |
+
|
| 324 |
+
|
| 325 |
+
@app.cell(hide_code=True)
|
| 326 |
+
def _(mo):
|
| 327 |
+
mo.md(r"""
|
| 328 |
+
### Using Visualizations as Dynamic Queries
|
| 329 |
+
|
| 330 |
+
Though standard interface widgets show the _possible_ query parameter values, they do not visualize the _distribution_ of those values. We might also wish to use richer interactions, such as multi-value or interval selections, rather than input widgets that select only a single value at a time.
|
| 331 |
+
|
| 332 |
+
To address these issues, we can author additional charts to both visualize data and support dynamic queries. Let's add a histogram of the count of films per year and use an interval selection to dynamically highlight films over selected time periods.
|
| 333 |
+
|
| 334 |
+
*Interact with the year histogram to explore films from different time periods. Do you seen any evidence of [sampling bias](https://en.wikipedia.org/wiki/Sampling_bias) across the years? (How do year and critics' ratings relate?)*
|
| 335 |
+
|
| 336 |
+
_The years range from 1930 to 2040! Are future films in pre-production, or are there "off-by-one century" errors? Also, depending on which time zone you're in, you may see a small bump in either 1969 or 1970. Why might that be? (See the end of the notebook for an explanation!)_
|
| 337 |
+
""")
|
| 338 |
+
return
|
| 339 |
+
|
| 340 |
+
|
| 341 |
+
@app.cell
|
| 342 |
+
def _(alt, movies):
|
| 343 |
+
_brush = alt.selection_interval(encodings=['x'])
|
| 344 |
+
years = alt.Chart(movies).mark_bar().add_params(_brush).encode(alt.X('year(Release_Date):T', title='Films by Release Year'), alt.Y('count():Q', title=None)).properties(width=650, height=50) # limit selection to x-axis (year) values
|
| 345 |
+
ratings = alt.Chart(movies).mark_circle().encode(x='Rotten_Tomatoes_Rating:Q', y='IMDB_Rating:Q', tooltip='Title:N', opacity=alt.condition(_brush, alt.value(0.75), alt.value(0.05))).properties(width=650, height=400)
|
| 346 |
+
# dynamic query histogram
|
| 347 |
+
# scatter plot, modify opacity based on selection
|
| 348 |
+
alt.vconcat(years, ratings).properties(spacing=5)
|
| 349 |
+
return
|
| 350 |
+
|
| 351 |
+
|
| 352 |
+
@app.cell(hide_code=True)
|
| 353 |
+
def _(mo):
|
| 354 |
+
mo.md(r"""
|
| 355 |
+
The example above provides dynamic queries using a _linked selection_ between charts:
|
| 356 |
+
|
| 357 |
+
- We create an `interval` selection (`brush`), and set `encodings=['x']` to limit the selection to the x-axis only, resulting in a one-dimensional selection interval.
|
| 358 |
+
- We register `brush` with our histogram of films per year via `.add_params(brush)`.
|
| 359 |
+
- We use `brush` in a conditional encoding to adjust the scatter plot `opacity`.
|
| 360 |
+
|
| 361 |
+
This interaction technique of selecting elements in one chart and seeing linked highlights in one or more other charts is known as [_brushing & linking_](https://en.wikipedia.org/wiki/Brushing_and_linking).
|
| 362 |
+
""")
|
| 363 |
+
return
|
| 364 |
+
|
| 365 |
+
|
| 366 |
+
@app.cell(hide_code=True)
|
| 367 |
+
def _(mo):
|
| 368 |
+
mo.md(r"""
|
| 369 |
+
## Panning & Zooming
|
| 370 |
+
""")
|
| 371 |
+
return
|
| 372 |
+
|
| 373 |
+
|
| 374 |
+
@app.cell(hide_code=True)
|
| 375 |
+
def _(mo):
|
| 376 |
+
mo.md(r"""
|
| 377 |
+
The movie rating scatter plot is a bit cluttered in places, making it hard to examine points in denser regions. Using the interaction techniques of _panning_ and _zooming_, we can inspect dense regions more closely.
|
| 378 |
+
|
| 379 |
+
Let's start by thinking about how we might express panning and zooming using Altair selections. What defines the "viewport" of a chart? _Axis scale domains!_
|
| 380 |
+
|
| 381 |
+
We can change the scale domains to modify the visualized range of data values. To do so interactively, we can bind an `interval` selection to scale domains with the code `bind='scales'`. The result is that instead of an interval brush that we can drag and zoom, we instead can drag and zoom the entire plotting area!
|
| 382 |
+
|
| 383 |
+
_In the chart below, click and drag to pan (translate) the view, or scroll to zoom (scale) the view. What can you discover about the precision of the provided rating values?_
|
| 384 |
+
""")
|
| 385 |
+
return
|
| 386 |
+
|
| 387 |
+
|
| 388 |
+
@app.cell
|
| 389 |
+
def _(alt, movies):
|
| 390 |
+
alt.Chart(movies).mark_circle().add_params(
|
| 391 |
+
alt.selection_interval(bind='scales')
|
| 392 |
+
).encode(
|
| 393 |
+
x='Rotten_Tomatoes_Rating:Q',
|
| 394 |
+
y=alt.Y('IMDB_Rating:Q', axis=alt.Axis(minExtent=30)), # use min extent to stabilize axis title placement
|
| 395 |
+
tooltip=['Title:N', 'Release_Date:N', 'IMDB_Rating:Q', 'Rotten_Tomatoes_Rating:Q']
|
| 396 |
+
).properties(
|
| 397 |
+
width=600,
|
| 398 |
+
height=400
|
| 399 |
+
)
|
| 400 |
+
return
|
| 401 |
+
|
| 402 |
+
|
| 403 |
+
@app.cell(hide_code=True)
|
| 404 |
+
def _(mo):
|
| 405 |
+
mo.md(r"""
|
| 406 |
+
_Zooming in, we can see that the rating values have limited precision! The Rotten Tomatoes ratings are integers, while the IMDB ratings are truncated to tenths. As a result, there is overplotting even when we zoom, with multiple movies sharing the same rating values._
|
| 407 |
+
|
| 408 |
+
Reading the code above, you may notice the code `alt.Axis(minExtent=30)` in the `y` encoding channel. The `minExtent` parameter ensures a minimum amount of space is reserved for axis ticks and labels. Why do this? When we pan and zoom, the axis labels may change and cause the axis title position to shift. By setting a minimum extent we can reduce distracting movements in the plot. _Try changing the `minExtent` value, for example setting it to zero, and then zoom out to see what happens when longer axis labels enter the view._
|
| 409 |
+
|
| 410 |
+
Altair also includes a shorthand for adding panning and zooming to a plot. Instead of directly creating a selection, you can call `.interactive()` to have Altair automatically generate an interval selection bound to the chart's scales:
|
| 411 |
+
""")
|
| 412 |
+
return
|
| 413 |
+
|
| 414 |
+
|
| 415 |
+
@app.cell
|
| 416 |
+
def _(alt, movies):
|
| 417 |
+
alt.Chart(movies).mark_circle().encode(
|
| 418 |
+
x='Rotten_Tomatoes_Rating:Q',
|
| 419 |
+
y=alt.Y('IMDB_Rating:Q', axis=alt.Axis(minExtent=30)), # use min extent to stabilize axis title placement
|
| 420 |
+
tooltip=['Title:N', 'Release_Date:N', 'IMDB_Rating:Q', 'Rotten_Tomatoes_Rating:Q']
|
| 421 |
+
).properties(
|
| 422 |
+
width=600,
|
| 423 |
+
height=400
|
| 424 |
+
).interactive()
|
| 425 |
+
return
|
| 426 |
+
|
| 427 |
+
|
| 428 |
+
@app.cell(hide_code=True)
|
| 429 |
+
def _(mo):
|
| 430 |
+
mo.md(r"""
|
| 431 |
+
By default, scale bindings for selections include both the `x` and `y` encoding channels. What if we want to limit panning and zooming along a single dimension? We can invoke `encodings=['x']` to constrain the selection to the `x` channel only:
|
| 432 |
+
""")
|
| 433 |
+
return
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
@app.cell
|
| 437 |
+
def _(alt, movies):
|
| 438 |
+
alt.Chart(movies).mark_circle().add_params(
|
| 439 |
+
alt.selection_interval(bind='scales', encodings=['x'])
|
| 440 |
+
).encode(
|
| 441 |
+
x='Rotten_Tomatoes_Rating:Q',
|
| 442 |
+
y=alt.Y('IMDB_Rating:Q', axis=alt.Axis(minExtent=30)), # use min extent to stabilize axis title placement
|
| 443 |
+
tooltip=['Title:N', 'Release_Date:N', 'IMDB_Rating:Q', 'Rotten_Tomatoes_Rating:Q']
|
| 444 |
+
).properties(
|
| 445 |
+
width=600,
|
| 446 |
+
height=400
|
| 447 |
+
)
|
| 448 |
+
return
|
| 449 |
+
|
| 450 |
+
|
| 451 |
+
@app.cell(hide_code=True)
|
| 452 |
+
def _(mo):
|
| 453 |
+
mo.md(r"""
|
| 454 |
+
_When zooming along a single axis only, the shape of the visualized data can change, potentially affecting our perception of relationships in the data. [Choosing an appropriate aspect ratio](http://vis.stanford.edu/papers/arclength-banking) is an important visualization design concern!_
|
| 455 |
+
""")
|
| 456 |
+
return
|
| 457 |
+
|
| 458 |
+
|
| 459 |
+
@app.cell(hide_code=True)
|
| 460 |
+
def _(mo):
|
| 461 |
+
mo.md(r"""
|
| 462 |
+
## Navigation: Overview + Detail
|
| 463 |
+
""")
|
| 464 |
+
return
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
@app.cell(hide_code=True)
|
| 468 |
+
def _(mo):
|
| 469 |
+
mo.md(r"""
|
| 470 |
+
When panning and zooming, we directly adjust the "viewport" of a chart. The related navigation strategy of _overview + detail_ instead uses an overview display to show _all_ of the data, while supporting selections that pan and zoom a separate focus display.
|
| 471 |
+
|
| 472 |
+
Below we have two area charts showing a decade of price fluctuations for the S&P 500 stock index. Initially both charts show the same data range. _Click and drag in the bottom overview chart to update the focus display and examine specific time spans._
|
| 473 |
+
""")
|
| 474 |
+
return
|
| 475 |
+
|
| 476 |
+
|
| 477 |
+
@app.cell
|
| 478 |
+
def _(alt, sp500):
|
| 479 |
+
_brush = alt.selection_interval(encodings=['x'])
|
| 480 |
+
_base = alt.Chart().mark_area().encode(alt.X('date:T', title=None), alt.Y('price:Q')).properties(width=700)
|
| 481 |
+
alt.vconcat(_base.encode(alt.X('date:T', title=None, scale=alt.Scale(domain=_brush))), _base.add_params(_brush).properties(height=60), data=sp500)
|
| 482 |
+
return
|
| 483 |
+
|
| 484 |
+
|
| 485 |
+
@app.cell(hide_code=True)
|
| 486 |
+
def _(mo):
|
| 487 |
+
mo.md(r"""
|
| 488 |
+
Unlike our earlier panning & zooming case, here we don't want to bind a selection directly to the scales of a single interactive chart. Instead, we want to bind the selection to a scale domain in _another_ chart. To do so, we update the `x` encoding channel for our focus chart, setting the scale `domain` property to reference our `brush` selection. If no interval is defined (the selection is empty), Altair ignores the brush and uses the underlying data to determine the domain. When a brush interval is created, Altair instead uses that as the scale `domain` for the focus chart.
|
| 489 |
+
""")
|
| 490 |
+
return
|
| 491 |
+
|
| 492 |
+
|
| 493 |
+
@app.cell(hide_code=True)
|
| 494 |
+
def _(mo):
|
| 495 |
+
mo.md(r"""
|
| 496 |
+
## Details on Demand
|
| 497 |
+
""")
|
| 498 |
+
return
|
| 499 |
+
|
| 500 |
+
|
| 501 |
+
@app.cell(hide_code=True)
|
| 502 |
+
def _(mo):
|
| 503 |
+
mo.md(r"""
|
| 504 |
+
Once we spot points of interest within a visualization, we often want to know more about them. _Details-on-demand_ refers to interactively querying for more information about selected values. _Tooltips_ are one useful means of providing details on demand. However, tooltips typically only show information for one data point at a time. How might we show more?
|
| 505 |
+
|
| 506 |
+
The movie ratings scatterplot includes a number of potentially interesting outliers where the Rotten Tomatoes and IMDB ratings disagree. Let's create a plot that allows us to interactively select points and show their labels. To trigger the filter query on either the hover or click interaction, we will use the [Altair composition operator](https://altair-viz.github.io/user_guide/interactions.html#composing-multiple-selections) `|` ("or").
|
| 507 |
+
|
| 508 |
+
_Mouse over points in the scatter plot below to see a highlight and title label. Shift-click points to make annotations persistent and view multiple labels at once. Which movies are loved by Rotten Tomatoes critics, but not the general audience on IMDB (or vice versa)? See if you can find possible errors, where two different movies with the same name were accidentally combined!_
|
| 509 |
+
""")
|
| 510 |
+
return
|
| 511 |
+
|
| 512 |
+
|
| 513 |
+
@app.cell
|
| 514 |
+
def _(alt, movies):
|
| 515 |
+
hover = alt.selection_point(toggle=False, on='mouseover', nearest=True, empty=False)
|
| 516 |
+
click = alt.selection_point(empty=False)
|
| 517 |
+
plot_1 = alt.Chart().mark_circle().encode(x='Rotten_Tomatoes_Rating:Q', y='IMDB_Rating:Q')
|
| 518 |
+
_base = plot_1.transform_filter(hover | click)
|
| 519 |
+
alt.layer(plot_1.add_params(hover).add_params(click), _base.mark_point(size=100, stroke='firebrick', strokeWidth=1), _base.mark_text(dx=4, dy=-8, align='right', stroke='white', strokeWidth=2).encode(text='Title:N'), _base.mark_text(dx=4, dy=-8, align='right').encode(text='Title:N'), data=movies).properties(width=600, height=450)
|
| 520 |
+
return
|
| 521 |
+
|
| 522 |
+
|
| 523 |
+
@app.cell(hide_code=True)
|
| 524 |
+
def _(mo):
|
| 525 |
+
mo.md(r"""
|
| 526 |
+
The example above adds three new layers to the scatter plot: a circular annotation, white text to provide a legible background, and black text showing a film title. In addition, this example uses two selections in tandem:
|
| 527 |
+
|
| 528 |
+
1. A single selection (`hover`) that includes `nearest=True` to automatically select the nearest data point as the mouse moves.
|
| 529 |
+
2. A multi selection (`click`) to create persistent selections via shift-click.
|
| 530 |
+
|
| 531 |
+
Both selections include the set `empty='none'` to indicate that no points should be included if a selection is empty. These selections are then combined into a single filter predicate — the logical _or_ of `hover` and `click` — to include points that reside in _either_ selection. We use this predicate to filter the new layers to show annotations and labels for selected points only.
|
| 532 |
+
""")
|
| 533 |
+
return
|
| 534 |
+
|
| 535 |
+
|
| 536 |
+
@app.cell(hide_code=True)
|
| 537 |
+
def _(mo):
|
| 538 |
+
mo.md(r"""
|
| 539 |
+
Using selections and layers, we can realize a number of different designs for details on demand! For example, here is a log-scaled time series of technology stock prices, annotated with a guideline and labels for the date nearest the mouse cursor:
|
| 540 |
+
""")
|
| 541 |
+
return
|
| 542 |
+
|
| 543 |
+
|
| 544 |
+
@app.cell
|
| 545 |
+
def _(alt, stocks):
|
| 546 |
+
# select a point for which to provide details-on-demand
|
| 547 |
+
label = alt.selection_point(toggle=False, encodings=['x'], on='mouseover', nearest=True, empty=False)
|
| 548 |
+
_base = alt.Chart().mark_line().encode(alt.X('date:T'), alt.Y('price:Q', scale=alt.Scale(type='log')), alt.Color('symbol:N')) # limit selection to x-axis value
|
| 549 |
+
# define our base line chart of stock prices
|
| 550 |
+
alt.layer(_base, alt.Chart().mark_rule(color='#aaa').encode(x='date:T').transform_filter(label), _base.mark_circle().encode(opacity=alt.condition(label, alt.value(1), alt.value(0))).add_params(label), _base.mark_text(align='left', dx=5, dy=-5, stroke='white', strokeWidth=2).encode(text='price:Q').transform_filter(label), _base.mark_text(align='left', dx=5, dy=-5).encode(text='price:Q').transform_filter(label), data=stocks).properties(width=700, height=400) # select on mouseover events # select data point nearest the cursor # empty selection includes no data points # base line chart # add a rule mark to serve as a guide line # add circle marks for selected time points, hide unselected points # add white stroked text to provide a legible background for labels # add text labels for stock prices
|
| 551 |
+
return
|
| 552 |
+
|
| 553 |
+
|
| 554 |
+
@app.cell(hide_code=True)
|
| 555 |
+
def _(mo):
|
| 556 |
+
mo.md(r"""
|
| 557 |
+
_Putting into action what we've learned so far: can you modify the movie scatter plot above (the one with the dynamic query over years) to include a `rule` mark that shows the average IMDB (or Rotten Tomatoes) rating for the data contained within the year `interval` selection?_
|
| 558 |
+
""")
|
| 559 |
+
return
|
| 560 |
+
|
| 561 |
+
|
| 562 |
+
@app.cell(hide_code=True)
|
| 563 |
+
def _(mo):
|
| 564 |
+
mo.md(r"""
|
| 565 |
+
## Brushing & Linking, Revisited
|
| 566 |
+
""")
|
| 567 |
+
return
|
| 568 |
+
|
| 569 |
+
|
| 570 |
+
@app.cell(hide_code=True)
|
| 571 |
+
def _(mo):
|
| 572 |
+
mo.md(r"""
|
| 573 |
+
Earlier in this notebook we saw an example of _brushing & linking_: using a dynamic query histogram to highlight points in a movie rating scatter plot. Here, we'll visit some additional examples involving linked selections.
|
| 574 |
+
|
| 575 |
+
Returning to the `cars` dataset, we can use the `repeat` operator to build a [scatter plot matrix (SPLOM)](https://en.wikipedia.org/wiki/Scatter_plot#Scatterplot_matrices) that shows associations between mileage, acceleration, and horsepower. We can define an `interval` selection and include it _within_ our repeated scatter plot specification to enable linked selections among all the plots.
|
| 576 |
+
|
| 577 |
+
_Click and drag in any of the plots below to perform brushing & linking!_
|
| 578 |
+
""")
|
| 579 |
+
return
|
| 580 |
+
|
| 581 |
+
|
| 582 |
+
@app.cell
|
| 583 |
+
def _(alt, cars):
|
| 584 |
+
_brush = alt.selection_interval(resolve='global')
|
| 585 |
+
alt.Chart(cars).mark_circle().add_params(_brush).encode(alt.X(alt.repeat('column'), type='quantitative'), alt.Y(alt.repeat('row'), type='quantitative'), color=alt.condition(_brush, 'Cylinders:O', alt.value('grey')), opacity=alt.condition(_brush, alt.value(0.8), alt.value(0.1))).properties(width=140, height=140).repeat(column=['Acceleration', 'Horsepower', 'Miles_per_Gallon'], row=['Miles_per_Gallon', 'Horsepower', 'Acceleration']) # resolve all selections to a single global instance
|
| 586 |
+
return
|
| 587 |
+
|
| 588 |
+
|
| 589 |
+
@app.cell(hide_code=True)
|
| 590 |
+
def _(mo):
|
| 591 |
+
mo.md(r"""
|
| 592 |
+
Note above the use of `resolve='global'` on the `interval` selection. The default setting of `'global'` indicates that across all plots only one brush can be active at a time. However, in some cases we might want to define brushes in multiple plots and combine the results. If we use `resolve='union'`, the selection will be the _union_ of all brushes: if a point resides within any brush it will be selected. Alternatively, if we use `resolve='intersect'`, the selection will consist of the _intersection_ of all brushes: only points that reside within all brushes will be selected.
|
| 593 |
+
|
| 594 |
+
_Try setting the `resolve` parameter to `'union'` and `'intersect'` and see how it changes the resulting selection logic._
|
| 595 |
+
""")
|
| 596 |
+
return
|
| 597 |
+
|
| 598 |
+
|
| 599 |
+
@app.cell(hide_code=True)
|
| 600 |
+
def _(mo):
|
| 601 |
+
mo.md(r"""
|
| 602 |
+
### Cross-Filtering
|
| 603 |
+
|
| 604 |
+
The brushing & linking examples we've looked at all use conditional encodings, for example to change opacity values in response to a selection. Another option is to use a selection defined in one view to _filter_ the content of another view.
|
| 605 |
+
|
| 606 |
+
Let's build a collection of histograms for the `flights` dataset: arrival `delay` (how early or late a flight arrives, in minutes), `distance` flown (in miles), and `time` of departure (hour of the day). We'll use the `repeat` operator to create the histograms, and add an `interval` selection for the `x` axis with brushes resolved via intersection.
|
| 607 |
+
|
| 608 |
+
In particular, each histogram will consist of two layers: a gray background layer and a blue foreground layer, with the foreground layer filtered by our intersection of brush selections. The result is a _cross-filtering_ interaction across the three charts!
|
| 609 |
+
|
| 610 |
+
_Drag out brush intervals in the charts below. As you select flights with longer or shorter arrival delays, how do the distance and time distributions respond?_
|
| 611 |
+
""")
|
| 612 |
+
return
|
| 613 |
+
|
| 614 |
+
|
| 615 |
+
@app.cell
|
| 616 |
+
def _(alt, flights):
|
| 617 |
+
_brush = alt.selection_interval(encodings=['x'], resolve='intersect')
|
| 618 |
+
hist = alt.Chart().mark_bar().encode(alt.X(alt.repeat('row'), type='quantitative', bin=alt.Bin(maxbins=100, minstep=1), axis=alt.Axis(format='d', titleAnchor='start')), alt.Y('count():Q', title=None))
|
| 619 |
+
alt.layer(hist.add_params(_brush).encode(color=alt.value('lightgrey')), hist.transform_filter(_brush)).properties(width=900, height=100).repeat(row=['delay', 'distance', 'time'], data=flights).transform_calculate(delay='datum.delay < 180 ? datum.delay : 180', time='hours(datum.date) + minutes(datum.date) / 60').configure_view(stroke='transparent') # up to 100 bins # integer format, left-aligned title # no y-axis title # clamp delays > 3 hours # fractional hours # no outline
|
| 620 |
+
return
|
| 621 |
+
|
| 622 |
+
|
| 623 |
+
@app.cell(hide_code=True)
|
| 624 |
+
def _(mo):
|
| 625 |
+
mo.md(r"""
|
| 626 |
+
_By cross-filtering you can observe that delayed flights are more likely to depart at later hours. This phenomenon is familiar to frequent fliers: a delay can propagate through the day, affecting subsequent travel by that plane. For the best odds of an on-time arrival, book an early flight!_
|
| 627 |
+
|
| 628 |
+
The combination of multiple views and interactive selections can enable valuable forms of multi-dimensional reasoning, turning even basic histograms into powerful input devices for asking questions of a dataset!
|
| 629 |
+
""")
|
| 630 |
+
return
|
| 631 |
+
|
| 632 |
+
|
| 633 |
+
@app.cell(hide_code=True)
|
| 634 |
+
def _(mo):
|
| 635 |
+
mo.md(r"""
|
| 636 |
+
## Summary
|
| 637 |
+
""")
|
| 638 |
+
return
|
| 639 |
+
|
| 640 |
+
|
| 641 |
+
@app.cell(hide_code=True)
|
| 642 |
+
def _(mo):
|
| 643 |
+
mo.md(r"""
|
| 644 |
+
For more information about the supported interaction options in Altair, please consult the [Altair interactive selection documentation](https://altair-viz.github.io/user_guide/interactions.html). For details about customizing event handlers, for example to compose multiple interaction techniques or support touch-based input on mobile devices, see the [Vega-Lite selection documentation](https://vega.github.io/vega-lite/docs/selection.html).
|
| 645 |
+
|
| 646 |
+
Interested in learning more?
|
| 647 |
+
- The _selection_ abstraction was introduced in the paper [Vega-Lite: A Grammar of Interactive Graphics](http://idl.cs.washington.edu/papers/vega-lite/), by Satyanarayan, Moritz, Wongsuphasawat, & Heer.
|
| 648 |
+
- The PRIM-9 system (for projection, rotation, isolation, and masking in up to 9 dimensions) is one of the earliest interactive visualization tools, built in the early 1970s by Fisherkeller, Tukey, & Friedman. [A retro demo video survives!](https://www.youtube.com/watch?v=B7XoW2qiFUA)
|
| 649 |
+
- The concept of brushing & linking was crystallized by Becker, Cleveland, & Wilks in their 1987 article [Dynamic Graphics for Data Analysis](https://scholar.google.com/scholar?cluster=14817303117298653693).
|
| 650 |
+
- For a comprehensive summary of interaction techniques for visualization, see [Interactive Dynamics for Visual Analysis](https://queue.acm.org/detail.cfm?id=2146416) by Heer & Shneiderman.
|
| 651 |
+
- Finally, for a treatise on what makes interaction effective, read the classic [Direct Manipulation Interfaces](https://scholar.google.com/scholar?cluster=15702972136892195211) paper by Hutchins, Hollan, & Norman.
|
| 652 |
+
""")
|
| 653 |
+
return
|
| 654 |
+
|
| 655 |
+
|
| 656 |
+
@app.cell(hide_code=True)
|
| 657 |
+
def _(mo):
|
| 658 |
+
mo.md(r"""
|
| 659 |
+
#### Appendix: On The Representation of Time
|
| 660 |
+
|
| 661 |
+
Earlier we observed a small bump in the number of movies in either 1969 and 1970. Where does that bump come from? And why 1969 _or_ 1970? The answer stems from a combination of missing data and how your computer represents time.
|
| 662 |
+
|
| 663 |
+
Internally, dates and times are represented relative to the [UNIX epoch](https://en.wikipedia.org/wiki/Unix_time), in which time "zero" corresponds to the stroke of midnight on January 1, 1970 in [UTC time](https://en.wikipedia.org/wiki/Coordinated_Universal_Time), which runs along the [prime meridian](https://en.wikipedia.org/wiki/Prime_meridian). It turns out there are a few movies with missing (`null`) release dates. Those `null` values get interpreted as time `0`, and thus map to January 1, 1970 in UTC time. If you live in the Americas – and thus in "earlier" time zones – this precise point in time corresponds to an earlier hour on December 31, 1969 in your local time zone. On the other hand, if you live near or east of the prime meridian, the date in your local time zone will be January 1, 1970.
|
| 664 |
+
|
| 665 |
+
The takeaway? Always be skeptical of your data, and be mindful that how data is represented (whether as date times, or floating point numbers, or latitudes and longitudes, _etc._) can sometimes lead to artifacts that impact analysis!
|
| 666 |
+
""")
|
| 667 |
+
return
|
| 668 |
+
|
| 669 |
+
|
| 670 |
+
if __name__ == "__main__":
|
| 671 |
+
app.run()
|
|
@@ -0,0 +1,898 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# "vega_datasets==0.9.0",
|
| 8 |
+
# ]
|
| 9 |
+
# ///
|
| 10 |
+
|
| 11 |
+
import marimo
|
| 12 |
+
|
| 13 |
+
__generated_with = "0.20.4"
|
| 14 |
+
app = marimo.App()
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
@app.cell
|
| 18 |
+
def _():
|
| 19 |
+
import marimo as mo
|
| 20 |
+
|
| 21 |
+
return (mo,)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@app.cell(hide_code=True)
|
| 25 |
+
def _(mo):
|
| 26 |
+
mo.md(r"""
|
| 27 |
+
# Cartographic Visualization
|
| 28 |
+
|
| 29 |
+
_“The making of maps is one of humanity's longest established intellectual endeavors and also one of its most complex, with scientific theory, graphical representation, geographical facts, and practical considerations blended together in an unending variety of ways.”_ — [H. J. Steward](https://books.google.com/books?id=cVy1Ms43fFYC)
|
| 30 |
+
|
| 31 |
+
Cartography – the study and practice of map-making – has a rich history spanning centuries of discovery and design. Cartographic visualization leverages mapping techniques to convey data containing spatial information, such as locations, routes, or trajectories on the surface of the Earth.
|
| 32 |
+
|
| 33 |
+
<div style="float: right; margin-left: 1em; margin-top: 1em;"><img width="300px" src="https://gist.githubusercontent.com/jheer/c90d582ef5322582cf4960ec7689f6f6/raw/8dc92382a837ccc34c076f4ce7dd864e7893324a/latlon.png" /></div>
|
| 34 |
+
|
| 35 |
+
Approximating the Earth as a sphere, we can denote positions using a spherical coordinate system of _latitude_ (angle in degrees north or south of the _equator_) and _longitude_ (angle in degrees specifying east-west position). In this system, a _parallel_ is a circle of constant latitude and a _meridian_ is a circle of constant longitude. The [_prime meridian_](https://en.wikipedia.org/wiki/Prime_meridian) lies at 0° longitude and by convention is defined to pass through the Royal Observatory in Greenwich, England.
|
| 36 |
+
|
| 37 |
+
To "flatten" a three-dimensional sphere on to a two-dimensional plane, we must apply a [projection](https://en.wikipedia.org/wiki/Map_projection) that maps (`longitude`, `latitude`) pairs to (`x`, `y`) coordinates. Similar to [scales](https://github.com/uwdata/visualization-curriculum/blob/master/altair_scales_axes_legends.ipynb), projections map from a data domain (spatial position) to a visual range (pixel position). However, the scale mappings we've seen thus far accept a one-dimensional domain, whereas map projections are inherently two-dimensional.
|
| 38 |
+
|
| 39 |
+
In this notebook, we will introduce the basics of creating maps and visualizing spatial data with Altair, including:
|
| 40 |
+
|
| 41 |
+
- Data formats for representing geographic features,
|
| 42 |
+
- Geo-visualization techniques such as point, symbol, and choropleth maps, and
|
| 43 |
+
- A review of common cartographic projections.
|
| 44 |
+
|
| 45 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 46 |
+
""")
|
| 47 |
+
return
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
@app.cell
|
| 51 |
+
def _():
|
| 52 |
+
import pandas as pd
|
| 53 |
+
import altair as alt
|
| 54 |
+
from vega_datasets import data
|
| 55 |
+
|
| 56 |
+
return alt, data
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
@app.cell(hide_code=True)
|
| 60 |
+
def _(mo):
|
| 61 |
+
mo.md(r"""
|
| 62 |
+
## Geographic Data: GeoJSON and TopoJSON
|
| 63 |
+
""")
|
| 64 |
+
return
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
@app.cell(hide_code=True)
|
| 68 |
+
def _(mo):
|
| 69 |
+
mo.md(r"""
|
| 70 |
+
Up to this point, we have worked with JSON and CSV formatted datasets that correspond to data tables made up of rows (records) and columns (fields). In order to represent geographic regions (countries, states, _etc._) and trajectories (flight paths, subway lines, _etc._), we need to expand our repertoire with additional formats designed to support rich geometries.
|
| 71 |
+
|
| 72 |
+
[GeoJSON](https://en.wikipedia.org/wiki/GeoJSON) models geographic features within a specialized JSON format. A GeoJSON `feature` can include geometric data – such as `longitude`, `latitude` coordinates that make up a country boundary – as well as additional data attributes.
|
| 73 |
+
|
| 74 |
+
Here is a GeoJSON `feature` object for the boundary of the U.S. state of Colorado:
|
| 75 |
+
""")
|
| 76 |
+
return
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
@app.cell(hide_code=True)
|
| 80 |
+
def _(mo):
|
| 81 |
+
mo.md(r"""
|
| 82 |
+
~~~ json
|
| 83 |
+
{
|
| 84 |
+
"type": "Feature",
|
| 85 |
+
"id": 8,
|
| 86 |
+
"properties": {"name": "Colorado"},
|
| 87 |
+
"geometry": {
|
| 88 |
+
"type": "Polygon",
|
| 89 |
+
"coordinates": [
|
| 90 |
+
[[-106.32056285448942,40.998675790862656],[-106.19134826714341,40.99813863734313],[-105.27607827344248,40.99813863734313],[-104.9422739227986,40.99813863734313],[-104.05212898774828,41.00136155846029],[-103.57475287338661,41.00189871197981],[-103.38093099236758,41.00189871197981],[-102.65589358559272,41.00189871197981],[-102.62000064466328,41.00189871197981],[-102.052892177978,41.00189871197981],[-102.052892177978,40.74889940428302],[-102.052892177978,40.69733266640851],[-102.052892177978,40.44003613055551],[-102.052892177978,40.3492571857556],[-102.052892177978,40.00333031918079],[-102.04930288388505,39.57414465707943],[-102.04930288388505,39.56823596836465],[-102.0457135897921,39.1331416175485],[-102.0457135897921,39.0466599009048],[-102.0457135897921,38.69751011321283],[-102.0457135897921,38.61478847120581],[-102.0457135897921,38.268861604631],[-102.0457135897921,38.262415762396685],[-102.04212429569915,37.738153927339205],[-102.04212429569915,37.64415206142214],[-102.04212429569915,37.38900413964724],[-102.04212429569915,36.99365914927603],[-103.00046581851544,37.00010499151034],[-103.08660887674611,37.00010499151034],[-104.00905745863294,36.99580776335414],[-105.15404227428235,36.995270609834606],[-105.2222388620483,36.995270609834606],[-105.7175614468747,36.99580776335414],[-106.00829426840322,36.995270609834606],[-106.47490250048605,36.99365914927603],[-107.4224761410235,37.00010499151034],[-107.48349414060355,37.00010499151034],[-108.38081766383978,36.99903068447129],[-109.04483707103458,36.99903068447129],[-109.04483707103458,37.484617466122884],[-109.04124777694163,37.88049961001363],[-109.04124777694163,38.15283644441336],[-109.05919424740635,38.49983761802722],[-109.05201565922046,39.36680339854235],[-109.05201565922046,39.49786885730673],[-109.05201565922046,39.66062637372313],[-109.05201565922046,40.22248895514744],[-109.05201565922046,40.653823231326896],[-109.05201565922046,41.000287251421234],[-107.91779872584989,41.00189871197981],[-107.3183866123281,41.00297301901887],[-106.85895696843116,41.00189871197981],[-106.32056285448942,40.998675790862656]]
|
| 91 |
+
]
|
| 92 |
+
}
|
| 93 |
+
}
|
| 94 |
+
~~~
|
| 95 |
+
""")
|
| 96 |
+
return
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
@app.cell(hide_code=True)
|
| 100 |
+
def _(mo):
|
| 101 |
+
mo.md(r"""
|
| 102 |
+
The `feature` includes a `properties` object, which can include any number of data fields, plus a `geometry` object, which in this case contains a single polygon that consists of `[longitude, latitude]` coordinates for the state boundary. The coordinates continue off to the right for a while should you care to scroll...
|
| 103 |
+
|
| 104 |
+
To learn more about the nitty-gritty details of GeoJSON, see the [official GeoJSON specification](http://geojson.org/) or read [Tom MacWright's helpful primer](https://macwright.org/2015/03/23/geojson-second-bite).
|
| 105 |
+
""")
|
| 106 |
+
return
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
@app.cell(hide_code=True)
|
| 110 |
+
def _(mo):
|
| 111 |
+
mo.md(r"""
|
| 112 |
+
One drawback of GeoJSON as a storage format is that it can be redundant, resulting in larger file sizes. Consider: Colorado shares boundaries with six other states (seven if you include the corner touching Arizona). Instead of using separate, overlapping coordinate lists for each of those states, a more compact approach is to encode shared borders only once, representing the _topology_ of geographic regions. Fortunately, this is precisely what the [TopoJSON](https://github.com/topojson/topojson/blob/master/README.md) format does!
|
| 113 |
+
""")
|
| 114 |
+
return
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
@app.cell(hide_code=True)
|
| 118 |
+
def _(mo):
|
| 119 |
+
mo.md(r"""
|
| 120 |
+
Let's load a TopoJSON file of world countries (at 110 meter resolution):
|
| 121 |
+
""")
|
| 122 |
+
return
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
@app.cell
|
| 126 |
+
def _(data):
|
| 127 |
+
world = data.world_110m.url
|
| 128 |
+
world
|
| 129 |
+
return (world,)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
@app.cell
|
| 133 |
+
def _(data):
|
| 134 |
+
world_topo = data.world_110m()
|
| 135 |
+
return (world_topo,)
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
@app.cell
|
| 139 |
+
def _(world_topo):
|
| 140 |
+
world_topo.keys()
|
| 141 |
+
return
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
@app.cell
|
| 145 |
+
def _(world_topo):
|
| 146 |
+
world_topo['type']
|
| 147 |
+
return
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
@app.cell
|
| 151 |
+
def _(world_topo):
|
| 152 |
+
world_topo['objects'].keys()
|
| 153 |
+
return
|
| 154 |
+
|
| 155 |
+
|
| 156 |
+
@app.cell(hide_code=True)
|
| 157 |
+
def _(mo):
|
| 158 |
+
mo.md(r"""
|
| 159 |
+
_Inspect the `world_topo` TopoJSON dictionary object above to see its contents._
|
| 160 |
+
|
| 161 |
+
In the data above, the `objects` property indicates the named elements we can extract from the data: geometries for all `countries`, or a single polygon representing all `land` on Earth. Either of these can be unpacked to GeoJSON data we can then visualize.
|
| 162 |
+
|
| 163 |
+
As TopoJSON is a specialized format, we need to instruct Altair to parse the TopoJSON format, indicating which named faeture object we wish to extract from the topology. The following code indicates that we want to extract GeoJSON features from the `world` dataset for the `countries` object:
|
| 164 |
+
|
| 165 |
+
~~~ js
|
| 166 |
+
alt.topo_feature(world, 'countries')
|
| 167 |
+
~~~
|
| 168 |
+
|
| 169 |
+
This `alt.topo_feature` method call expands to the following Vega-Lite JSON:
|
| 170 |
+
|
| 171 |
+
~~~ json
|
| 172 |
+
{
|
| 173 |
+
"values": world,
|
| 174 |
+
"format": {"type": "topojson", "feature": "countries"}
|
| 175 |
+
}
|
| 176 |
+
~~~
|
| 177 |
+
|
| 178 |
+
Now that we can load geographic data, we're ready to start making maps!
|
| 179 |
+
""")
|
| 180 |
+
return
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
@app.cell(hide_code=True)
|
| 184 |
+
def _(mo):
|
| 185 |
+
mo.md(r"""
|
| 186 |
+
## Geoshape Marks
|
| 187 |
+
""")
|
| 188 |
+
return
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
@app.cell(hide_code=True)
|
| 192 |
+
def _(mo):
|
| 193 |
+
mo.md(r"""
|
| 194 |
+
To visualize geographic data, Altair provides the `geoshape` mark type. To create a basic map, we can create a `geoshape` mark and pass it our TopoJSON data, which is then unpacked into GeoJSON features, one for each country of the world:
|
| 195 |
+
""")
|
| 196 |
+
return
|
| 197 |
+
|
| 198 |
+
|
| 199 |
+
@app.cell
|
| 200 |
+
def _(alt, world):
|
| 201 |
+
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape()
|
| 202 |
+
return
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
@app.cell(hide_code=True)
|
| 206 |
+
def _(mo):
|
| 207 |
+
mo.md(r"""
|
| 208 |
+
In the example above, Altair applies a default blue color and uses a default map projection (`mercator`). We can customize the colors and boundary stroke widths using standard mark properties. Using the `project` method we can also add our own map projection:
|
| 209 |
+
""")
|
| 210 |
+
return
|
| 211 |
+
|
| 212 |
+
|
| 213 |
+
@app.cell
|
| 214 |
+
def _(alt, world):
|
| 215 |
+
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
|
| 216 |
+
fill='#2a1d0c', stroke='#706545', strokeWidth=0.5
|
| 217 |
+
).project(
|
| 218 |
+
type='mercator'
|
| 219 |
+
)
|
| 220 |
+
return
|
| 221 |
+
|
| 222 |
+
|
| 223 |
+
@app.cell(hide_code=True)
|
| 224 |
+
def _(mo):
|
| 225 |
+
mo.md(r"""
|
| 226 |
+
By default Altair automatically adjusts the projection so that all the data fits within the width and height of the chart. We can also specify projection parameters, such as `scale` (zoom level) and `translate` (panning), to customize the projection settings. Here we adjust the `scale` and `translate` parameters to focus on Europe:
|
| 227 |
+
""")
|
| 228 |
+
return
|
| 229 |
+
|
| 230 |
+
|
| 231 |
+
@app.cell
|
| 232 |
+
def _(alt, world):
|
| 233 |
+
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
|
| 234 |
+
fill='#2a1d0c', stroke='#706545', strokeWidth=0.5
|
| 235 |
+
).project(
|
| 236 |
+
type='mercator', scale=400, translate=[100, 550]
|
| 237 |
+
)
|
| 238 |
+
return
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
@app.cell(hide_code=True)
|
| 242 |
+
def _(mo):
|
| 243 |
+
mo.md(r"""
|
| 244 |
+
_Note how the 110m resolution of the data becomes apparent at this scale. To see more detailed coast lines and boundaries, we need an input file with more fine-grained geometries. Adjust the `scale` and `translate` parameters to focus the map on other regions!_
|
| 245 |
+
""")
|
| 246 |
+
return
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
@app.cell(hide_code=True)
|
| 250 |
+
def _(mo):
|
| 251 |
+
mo.md(r"""
|
| 252 |
+
So far our map shows countries only. Using the `layer` operator, we can combine multiple map elements. Altair includes _data generators_ we can use to create data for additional map layers:
|
| 253 |
+
|
| 254 |
+
- The sphere generator (`{'sphere': True}`) provides a GeoJSON representation of the full sphere of the Earth. We can create an additional `geoshape` mark that fills in the shape of the Earth as a background layer.
|
| 255 |
+
- The graticule generator (`{'graticule': ...}`) creates a GeoJSON feature representing a _graticule_: a grid formed by lines of latitude and longitude. The default graticule has meridians and parallels every 10° between ±80° latitude. For the polar regions, there are meridians every 90°. These settings can be customized using the `stepMinor` and `stepMajor` properties.
|
| 256 |
+
|
| 257 |
+
Let's layer sphere, graticule, and country marks into a reusable map specification:
|
| 258 |
+
""")
|
| 259 |
+
return
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
@app.cell
|
| 263 |
+
def _(alt, world):
|
| 264 |
+
map = alt.layer(
|
| 265 |
+
# use the sphere of the Earth as the base layer
|
| 266 |
+
alt.Chart({'sphere': True}).mark_geoshape(
|
| 267 |
+
fill='#e6f3ff'
|
| 268 |
+
),
|
| 269 |
+
# add a graticule for geographic reference lines
|
| 270 |
+
alt.Chart({'graticule': True}).mark_geoshape(
|
| 271 |
+
stroke='#ffffff', strokeWidth=1
|
| 272 |
+
),
|
| 273 |
+
# and then the countries of the world
|
| 274 |
+
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
|
| 275 |
+
fill='#2a1d0c', stroke='#706545', strokeWidth=0.5
|
| 276 |
+
)
|
| 277 |
+
).properties(
|
| 278 |
+
width=600,
|
| 279 |
+
height=400
|
| 280 |
+
)
|
| 281 |
+
return (map,)
|
| 282 |
+
|
| 283 |
+
|
| 284 |
+
@app.cell(hide_code=True)
|
| 285 |
+
def _(mo):
|
| 286 |
+
mo.md(r"""
|
| 287 |
+
We can extend the map with a desired projection and draw the result. Here we apply a [Natural Earth projection](https://en.wikipedia.org/wiki/Natural_Earth_projection). The _sphere_ layer provides the light blue background; the _graticule_ layer provides the white geographic reference lines.
|
| 288 |
+
""")
|
| 289 |
+
return
|
| 290 |
+
|
| 291 |
+
|
| 292 |
+
@app.cell
|
| 293 |
+
def _(map):
|
| 294 |
+
map.project(
|
| 295 |
+
type='naturalEarth1', scale=110, translate=[300, 200]
|
| 296 |
+
).configure_view(stroke=None)
|
| 297 |
+
return
|
| 298 |
+
|
| 299 |
+
|
| 300 |
+
@app.cell(hide_code=True)
|
| 301 |
+
def _(mo):
|
| 302 |
+
mo.md(r"""
|
| 303 |
+
## Point Maps
|
| 304 |
+
|
| 305 |
+
In addition to the _geometric_ data provided by GeoJSON or TopoJSON files, many tabular datasets include geographic information in the form of fields for `longitude` and `latitude` coordinates, or references to geographic regions such as country names, state names, postal codes, _etc._, which can be mapped to coordinates using a [geocoding service](https://en.wikipedia.org/wiki/Geocoding). In some cases, location data is rich enough that we can see meaningful patterns by projecting the data points alone — no base map required!
|
| 306 |
+
|
| 307 |
+
Let's look at a dataset of 5-digit zip codes in the United States, including `longitude`, `latitude` coordinates for each post office in addition to a `zip_code` field.
|
| 308 |
+
""")
|
| 309 |
+
return
|
| 310 |
+
|
| 311 |
+
|
| 312 |
+
@app.cell
|
| 313 |
+
def _(data):
|
| 314 |
+
zipcodes = data.zipcodes.url
|
| 315 |
+
zipcodes
|
| 316 |
+
return (zipcodes,)
|
| 317 |
+
|
| 318 |
+
|
| 319 |
+
@app.cell(hide_code=True)
|
| 320 |
+
def _(mo):
|
| 321 |
+
mo.md(r"""
|
| 322 |
+
We can visualize each post office location using a small (1-pixel) `square` mark. However, to set the positions we do _not_ use `x` and `y` channels. _Why is that?_
|
| 323 |
+
|
| 324 |
+
While cartographic projections map (`longitude`, `latitude`) coordinates to (`x`, `y`) coordinates, they can do so in arbitrary ways. There is no guarantee, for example, that `longitude` → `x` and `latitude` → `y`! Instead, Altair includes special `longitude` and `latitude` encoding channels to handle geographic coordinates. These channels indicate which data fields should be mapped to `longitude` and `latitude` coordinates, and then applies a projection to map those coordinates to (`x`, `y`) positions.
|
| 325 |
+
""")
|
| 326 |
+
return
|
| 327 |
+
|
| 328 |
+
|
| 329 |
+
@app.cell
|
| 330 |
+
def _(alt, zipcodes):
|
| 331 |
+
alt.Chart(zipcodes).mark_square(
|
| 332 |
+
size=1, opacity=1
|
| 333 |
+
).encode(
|
| 334 |
+
longitude='longitude:Q', # apply the field named 'longitude' to the longitude channel
|
| 335 |
+
latitude='latitude:Q' # apply the field named 'latitude' to the latitude channel
|
| 336 |
+
).project(
|
| 337 |
+
type='albersUsa'
|
| 338 |
+
).properties(
|
| 339 |
+
width=900,
|
| 340 |
+
height=500
|
| 341 |
+
).configure_view(
|
| 342 |
+
stroke=None
|
| 343 |
+
)
|
| 344 |
+
return
|
| 345 |
+
|
| 346 |
+
|
| 347 |
+
@app.cell(hide_code=True)
|
| 348 |
+
def _(mo):
|
| 349 |
+
mo.md(r"""
|
| 350 |
+
_Plotting zip codes only, we can see the outline of the United States and discern meaningful patterns in the density of post offices, without a base map or additional reference elements!_
|
| 351 |
+
|
| 352 |
+
We use the `albersUsa` projection, which takes some liberties with the actual geometry of the Earth, with scaled versions of Alaska and Hawaii in the bottom-left corner. As we did not specify projection `scale` or `translate` parameters, Altair sets them automatically to fit the visualized data.
|
| 353 |
+
|
| 354 |
+
We can now go on to ask more questions of our dataset. For example, is there any rhyme or reason to the allocation of zip codes? To assess this question we can add a color encoding based on the first digit of the zip code. We first add a `calculate` transform to extract the first digit, and encode the result using the color channel:
|
| 355 |
+
""")
|
| 356 |
+
return
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
@app.cell
|
| 360 |
+
def _(alt, zipcodes):
|
| 361 |
+
alt.Chart(zipcodes).transform_calculate(
|
| 362 |
+
digit='datum.zip_code[0]'
|
| 363 |
+
).mark_square(
|
| 364 |
+
size=2, opacity=1
|
| 365 |
+
).encode(
|
| 366 |
+
longitude='longitude:Q',
|
| 367 |
+
latitude='latitude:Q',
|
| 368 |
+
color='digit:N'
|
| 369 |
+
).project(
|
| 370 |
+
type='albersUsa'
|
| 371 |
+
).properties(
|
| 372 |
+
width=900,
|
| 373 |
+
height=500
|
| 374 |
+
).configure_view(
|
| 375 |
+
stroke=None
|
| 376 |
+
)
|
| 377 |
+
return
|
| 378 |
+
|
| 379 |
+
|
| 380 |
+
@app.cell(hide_code=True)
|
| 381 |
+
def _(mo):
|
| 382 |
+
mo.md(r"""
|
| 383 |
+
_To zoom in on a specific digit, add a filter transform to limit the data shown! Try adding an [interactive selection](https://github.com/uwdata/visualization-curriculum/blob/master/altair_interaction.ipynb) to filter to a single digit and dynamically update the map. And be sure to use strings (\`'1'\`) instead of numbers (\`1\`) when filtering digit values!_
|
| 384 |
+
|
| 385 |
+
(This example is inspired by Ben Fry's classic [zipdecode](https://benfry.com/zipdecode/) visualization!)
|
| 386 |
+
|
| 387 |
+
We might further wonder what the _sequence_ of zip codes might indicate. One way to explore this question is to connect each consecutive zip code using a `line` mark, as done in Robert Kosara's [ZipScribble](https://eagereyes.org/zipscribble-maps/united-states) visualization:
|
| 388 |
+
""")
|
| 389 |
+
return
|
| 390 |
+
|
| 391 |
+
|
| 392 |
+
@app.cell
|
| 393 |
+
def _(alt, zipcodes):
|
| 394 |
+
alt.Chart(zipcodes).transform_filter(
|
| 395 |
+
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
|
| 396 |
+
).transform_calculate(
|
| 397 |
+
digit='datum.zip_code[0]'
|
| 398 |
+
).mark_line(
|
| 399 |
+
strokeWidth=0.5
|
| 400 |
+
).encode(
|
| 401 |
+
longitude='longitude:Q',
|
| 402 |
+
latitude='latitude:Q',
|
| 403 |
+
color='digit:N',
|
| 404 |
+
order='zip_code:O'
|
| 405 |
+
).project(
|
| 406 |
+
type='albersUsa'
|
| 407 |
+
).properties(
|
| 408 |
+
width=900,
|
| 409 |
+
height=500
|
| 410 |
+
).configure_view(
|
| 411 |
+
stroke=None
|
| 412 |
+
)
|
| 413 |
+
return
|
| 414 |
+
|
| 415 |
+
|
| 416 |
+
@app.cell(hide_code=True)
|
| 417 |
+
def _(mo):
|
| 418 |
+
mo.md(r"""
|
| 419 |
+
_We can now see how zip codes further cluster into smaller areas, indicating a hierarchical allocation of codes by location, but with some notable variability within local clusters._
|
| 420 |
+
|
| 421 |
+
If you were paying careful attention to our earlier maps, you may have noticed that there are zip codes being plotted in the upper-left corner! These correspond to locations such as Puerto Rico or American Samoa, which contain U.S. zip codes but are mapped to `null` coordinates (`0`, `0`) by the `albersUsa` projection. In addition, Alaska and Hawaii can complicate our view of the connecting line segments. In response, the code above includes an additional filter that removes points outside our chosen `longitude` and `latitude` spans.
|
| 422 |
+
|
| 423 |
+
_Remove the filter above to see what happens!_
|
| 424 |
+
""")
|
| 425 |
+
return
|
| 426 |
+
|
| 427 |
+
|
| 428 |
+
@app.cell(hide_code=True)
|
| 429 |
+
def _(mo):
|
| 430 |
+
mo.md(r"""
|
| 431 |
+
## Symbol Maps
|
| 432 |
+
""")
|
| 433 |
+
return
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
@app.cell(hide_code=True)
|
| 437 |
+
def _(mo):
|
| 438 |
+
mo.md(r"""
|
| 439 |
+
Now let's combine a base map and plotted data as separate layers. We'll examine the U.S. commercial flight network, considering both airports and flight routes. To do so, we'll need three datasets.
|
| 440 |
+
For our base map, we'll use a TopoJSON file for the United States at 10m resolution, containing features for `states` or `counties`:
|
| 441 |
+
""")
|
| 442 |
+
return
|
| 443 |
+
|
| 444 |
+
|
| 445 |
+
@app.cell
|
| 446 |
+
def _(data):
|
| 447 |
+
usa = data.us_10m.url
|
| 448 |
+
usa
|
| 449 |
+
return (usa,)
|
| 450 |
+
|
| 451 |
+
|
| 452 |
+
@app.cell(hide_code=True)
|
| 453 |
+
def _(mo):
|
| 454 |
+
mo.md(r"""
|
| 455 |
+
For the airports, we will use a dataset with fields for the `longitude` and `latitude` coordinates of each airport as well as the `iata` airport code — for example, `'SEA'` for [Seattle-Tacoma International Airport](https://en.wikipedia.org/wiki/Seattle%E2%80%93Tacoma_International_Airport).
|
| 456 |
+
""")
|
| 457 |
+
return
|
| 458 |
+
|
| 459 |
+
|
| 460 |
+
@app.cell
|
| 461 |
+
def _(data):
|
| 462 |
+
airports = data.airports.url
|
| 463 |
+
airports
|
| 464 |
+
return (airports,)
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
@app.cell(hide_code=True)
|
| 468 |
+
def _(mo):
|
| 469 |
+
mo.md(r"""
|
| 470 |
+
Finally, we will use a dataset of flight routes, which contains `origin` and `destination` fields with the IATA codes for the corresponding airports:
|
| 471 |
+
""")
|
| 472 |
+
return
|
| 473 |
+
|
| 474 |
+
|
| 475 |
+
@app.cell
|
| 476 |
+
def _(data):
|
| 477 |
+
flights = data.flights_airport.url
|
| 478 |
+
flights
|
| 479 |
+
return (flights,)
|
| 480 |
+
|
| 481 |
+
|
| 482 |
+
@app.cell(hide_code=True)
|
| 483 |
+
def _(mo):
|
| 484 |
+
mo.md(r"""
|
| 485 |
+
Let's start by creating a base map using the `albersUsa` projection, and add a layer that plots `circle` marks for each airport:
|
| 486 |
+
""")
|
| 487 |
+
return
|
| 488 |
+
|
| 489 |
+
|
| 490 |
+
@app.cell
|
| 491 |
+
def _(airports, alt, usa):
|
| 492 |
+
alt.layer(
|
| 493 |
+
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
|
| 494 |
+
fill='#ddd', stroke='#fff', strokeWidth=1
|
| 495 |
+
),
|
| 496 |
+
alt.Chart(airports).mark_circle(size=9).encode(
|
| 497 |
+
latitude='latitude:Q',
|
| 498 |
+
longitude='longitude:Q',
|
| 499 |
+
tooltip='iata:N'
|
| 500 |
+
)
|
| 501 |
+
).project(
|
| 502 |
+
type='albersUsa'
|
| 503 |
+
).properties(
|
| 504 |
+
width=900,
|
| 505 |
+
height=500
|
| 506 |
+
).configure_view(
|
| 507 |
+
stroke=None
|
| 508 |
+
)
|
| 509 |
+
return
|
| 510 |
+
|
| 511 |
+
|
| 512 |
+
@app.cell(hide_code=True)
|
| 513 |
+
def _(mo):
|
| 514 |
+
mo.md(r"""
|
| 515 |
+
_That's a lot of airports! Obviously, not all of them are major hubs._
|
| 516 |
+
|
| 517 |
+
Similar to our zip codes dataset, our airport data includes points that lie outside the continental United States. So we again see points in the upper-left corner. We might want to filter these points, but to do so we first need to know more about them.
|
| 518 |
+
|
| 519 |
+
_Update the map projection above to `albers` – side-stepping the idiosyncratic behavior of `albersUsa` – so that the actual locations of these additional points is revealed!_
|
| 520 |
+
|
| 521 |
+
Now, instead of showing all airports in an undifferentiated fashion, let's identify major hubs by considering the total number of routes that originate at each airport. We'll use the `routes` dataset as our primary data source: it contains a list of flight routes that we can aggregate to count the number of routes for each `origin` airport.
|
| 522 |
+
|
| 523 |
+
However, the `routes` dataset does not include the _locations_ of the airports! To augment the `routes` data with locations, we need a new data transformation: `lookup`. The `lookup` transform takes a field value in a primary dataset and uses it as a _key_ to look up related information in another table. In this case, we want to match the `origin` airport code in our `routes` dataset against the `iata` field of the `airports` dataset, then extract the corresponding `latitude` and `longitude` fields.
|
| 524 |
+
""")
|
| 525 |
+
return
|
| 526 |
+
|
| 527 |
+
|
| 528 |
+
@app.cell
|
| 529 |
+
def _(airports, alt, flights, usa):
|
| 530 |
+
alt.layer(
|
| 531 |
+
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
|
| 532 |
+
fill='#ddd', stroke='#fff', strokeWidth=1
|
| 533 |
+
),
|
| 534 |
+
alt.Chart(flights).mark_circle().transform_aggregate(
|
| 535 |
+
groupby=['origin'],
|
| 536 |
+
routes='count()'
|
| 537 |
+
).transform_lookup(
|
| 538 |
+
lookup='origin',
|
| 539 |
+
from_=alt.LookupData(data=airports, key='iata',
|
| 540 |
+
fields=['state', 'latitude', 'longitude'])
|
| 541 |
+
).transform_filter(
|
| 542 |
+
'datum.state !== "PR" && datum.state !== "VI"'
|
| 543 |
+
).encode(
|
| 544 |
+
latitude='latitude:Q',
|
| 545 |
+
longitude='longitude:Q',
|
| 546 |
+
tooltip=['origin:N', 'routes:Q'],
|
| 547 |
+
size=alt.Size('routes:Q', scale=alt.Scale(range=[0, 1000]), legend=None),
|
| 548 |
+
order=alt.Order('routes:Q', sort='descending')
|
| 549 |
+
)
|
| 550 |
+
).project(
|
| 551 |
+
type='albersUsa'
|
| 552 |
+
).properties(
|
| 553 |
+
width=900,
|
| 554 |
+
height=500
|
| 555 |
+
).configure_view(
|
| 556 |
+
stroke=None
|
| 557 |
+
)
|
| 558 |
+
return
|
| 559 |
+
|
| 560 |
+
|
| 561 |
+
@app.cell(hide_code=True)
|
| 562 |
+
def _(mo):
|
| 563 |
+
mo.md(r"""
|
| 564 |
+
_Which U.S. airports have the highest number of outgoing routes?_
|
| 565 |
+
|
| 566 |
+
Now that we can see the airports, which may wish to interact with them to better understand the structure of the air traffic network. We can add a `rule` mark layer to represent paths from `origin` airports to `destination` airports, which requires two `lookup` transforms to retreive coordinates for each end point. In addition, we can use a `single` selection to filter these routes, such that only the routes originating at the currently selected airport are shown.
|
| 567 |
+
|
| 568 |
+
_Starting from the static map above, can you build an interactive version? Feel free to skip the code below to engage with the interactive map first, and think through how you might build it on your own!_
|
| 569 |
+
""")
|
| 570 |
+
return
|
| 571 |
+
|
| 572 |
+
|
| 573 |
+
@app.cell
|
| 574 |
+
def _(airports, alt, flights, usa):
|
| 575 |
+
# interactive selection for origin airport
|
| 576 |
+
# select nearest airport to mouse cursor
|
| 577 |
+
origin = alt.selection_point(
|
| 578 |
+
on='mouseover', nearest=True,
|
| 579 |
+
fields=['origin'], empty='none'
|
| 580 |
+
)
|
| 581 |
+
|
| 582 |
+
# shared data reference for lookup transforms
|
| 583 |
+
foreign = alt.LookupData(data=airports, key='iata',
|
| 584 |
+
fields=['latitude', 'longitude'])
|
| 585 |
+
|
| 586 |
+
alt.layer(
|
| 587 |
+
# base map of the United States
|
| 588 |
+
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
|
| 589 |
+
fill='#ddd', stroke='#fff', strokeWidth=1
|
| 590 |
+
),
|
| 591 |
+
# route lines from selected origin airport to destination airports
|
| 592 |
+
alt.Chart(flights).mark_rule(
|
| 593 |
+
color='#000', opacity=0.35
|
| 594 |
+
).transform_filter(
|
| 595 |
+
origin # filter to selected origin only
|
| 596 |
+
).transform_lookup(
|
| 597 |
+
lookup='origin', from_=foreign # origin lat/lon
|
| 598 |
+
).transform_lookup(
|
| 599 |
+
lookup='destination', from_=foreign, as_=['lat2', 'lon2'] # dest lat/lon
|
| 600 |
+
).encode(
|
| 601 |
+
latitude='latitude:Q',
|
| 602 |
+
longitude='longitude:Q',
|
| 603 |
+
latitude2='lat2',
|
| 604 |
+
longitude2='lon2',
|
| 605 |
+
),
|
| 606 |
+
# size airports by number of outgoing routes
|
| 607 |
+
# 1. aggregate flights-airport data set
|
| 608 |
+
# 2. lookup location data from airports data set
|
| 609 |
+
# 3. remove Puerto Rico (PR) and Virgin Islands (VI)
|
| 610 |
+
alt.Chart(flights).mark_circle().transform_aggregate(
|
| 611 |
+
groupby=['origin'],
|
| 612 |
+
routes='count()'
|
| 613 |
+
).transform_lookup(
|
| 614 |
+
lookup='origin',
|
| 615 |
+
from_=alt.LookupData(data=airports, key='iata',
|
| 616 |
+
fields=['state', 'latitude', 'longitude'])
|
| 617 |
+
).transform_filter(
|
| 618 |
+
'datum.state !== "PR" && datum.state !== "VI"'
|
| 619 |
+
).add_params(
|
| 620 |
+
origin
|
| 621 |
+
).encode(
|
| 622 |
+
latitude='latitude:Q',
|
| 623 |
+
longitude='longitude:Q',
|
| 624 |
+
tooltip=['origin:N', 'routes:Q'],
|
| 625 |
+
size=alt.Size('routes:Q', scale=alt.Scale(range=[0, 1000]), legend=None),
|
| 626 |
+
order=alt.Order('routes:Q', sort='descending') # place smaller circles on top
|
| 627 |
+
)
|
| 628 |
+
).project(
|
| 629 |
+
type='albersUsa'
|
| 630 |
+
).properties(
|
| 631 |
+
width=900,
|
| 632 |
+
height=500
|
| 633 |
+
).configure_view(
|
| 634 |
+
stroke=None
|
| 635 |
+
)
|
| 636 |
+
return
|
| 637 |
+
|
| 638 |
+
|
| 639 |
+
@app.cell(hide_code=True)
|
| 640 |
+
def _(mo):
|
| 641 |
+
mo.md(r"""
|
| 642 |
+
_Mouseover the map to probe the flight network!_
|
| 643 |
+
""")
|
| 644 |
+
return
|
| 645 |
+
|
| 646 |
+
|
| 647 |
+
@app.cell(hide_code=True)
|
| 648 |
+
def _(mo):
|
| 649 |
+
mo.md(r"""
|
| 650 |
+
## Choropleth Maps
|
| 651 |
+
""")
|
| 652 |
+
return
|
| 653 |
+
|
| 654 |
+
|
| 655 |
+
@app.cell(hide_code=True)
|
| 656 |
+
def _(mo):
|
| 657 |
+
mo.md(r"""
|
| 658 |
+
A [choropleth map](https://en.wikipedia.org/wiki/Choropleth_map) uses shaded or textured regions to visualize data values. Sized symbol maps are often more accurate to read, as people tend to be better at estimating proportional differences between the area of circles than between color shades. Nevertheless, choropleth maps are popular in practice and particularly useful when too many symbols become perceptually overwhelming.
|
| 659 |
+
|
| 660 |
+
For example, while the United States only has 50 states, it has thousands of counties within those states. Let's build a choropleth map of the unemployment rate per county, back in the recession year of 2008. In some cases, input GeoJSON or TopoJSON files might include statistical data that we can directly visualize. In this case, however, we have two files: our TopoJSON file that includes county boundary features (`usa`), and a separate text file that contains unemployment statistics:
|
| 661 |
+
""")
|
| 662 |
+
return
|
| 663 |
+
|
| 664 |
+
|
| 665 |
+
@app.cell
|
| 666 |
+
def _(data):
|
| 667 |
+
unemp = data.unemployment.url
|
| 668 |
+
unemp
|
| 669 |
+
return (unemp,)
|
| 670 |
+
|
| 671 |
+
|
| 672 |
+
@app.cell(hide_code=True)
|
| 673 |
+
def _(mo):
|
| 674 |
+
mo.md(r"""
|
| 675 |
+
To integrate our data sources, we will again need to use the `lookup` transform, augmenting our TopoJSON-based `geoshape` data with unemployment rates. We can then create a map that includes a `color` encoding for the looked-up `rate` field.
|
| 676 |
+
""")
|
| 677 |
+
return
|
| 678 |
+
|
| 679 |
+
|
| 680 |
+
@app.cell
|
| 681 |
+
def _(alt, unemp, usa):
|
| 682 |
+
alt.Chart(alt.topo_feature(usa, 'counties')).mark_geoshape(
|
| 683 |
+
stroke='#aaa', strokeWidth=0.25
|
| 684 |
+
).transform_lookup(
|
| 685 |
+
lookup='id', from_=alt.LookupData(data=unemp, key='id', fields=['rate'])
|
| 686 |
+
).encode(
|
| 687 |
+
alt.Color('rate:Q',
|
| 688 |
+
scale=alt.Scale(domain=[0, 0.3], clamp=True),
|
| 689 |
+
legend=alt.Legend(format='%')),
|
| 690 |
+
alt.Tooltip('rate:Q', format='.0%')
|
| 691 |
+
).project(
|
| 692 |
+
type='albersUsa'
|
| 693 |
+
).properties(
|
| 694 |
+
width=900,
|
| 695 |
+
height=500
|
| 696 |
+
).configure_view(
|
| 697 |
+
stroke=None
|
| 698 |
+
)
|
| 699 |
+
return
|
| 700 |
+
|
| 701 |
+
|
| 702 |
+
@app.cell(hide_code=True)
|
| 703 |
+
def _(mo):
|
| 704 |
+
mo.md(r"""
|
| 705 |
+
*Examine the unemployment rates by county. Higher values in Michigan may relate to the automotive industry. Counties in the [Great Plains](https://en.wikipedia.org/wiki/Great_Plains) and Mountain states exhibit both low **and** high rates. Is this variation meaningful, or is it possibly an [artifact of lower sample sizes](https://medium.com/@uwdata/surprise-maps-showing-the-unexpected-e92b67398865)? To explore further, try changing the upper scale domain (e.g., to `0.2`) to adjust the color mapping.*
|
| 706 |
+
|
| 707 |
+
A central concern for choropleth maps is the choice of colors. Above, we use Altair's default `'yellowgreenblue'` scheme for heatmaps. Below we compare other schemes, including a _single-hue sequential_ scheme (`teals`) that varies in luminance only, a _multi-hue sequential_ scheme (`viridis`) that ramps in both luminance and hue, and a _diverging_ scheme (`blueorange`) that uses a white mid-point:
|
| 708 |
+
""")
|
| 709 |
+
return
|
| 710 |
+
|
| 711 |
+
|
| 712 |
+
@app.cell
|
| 713 |
+
def _(alt, unemp, usa):
|
| 714 |
+
# utility function to generate a map specification for a provided color scheme
|
| 715 |
+
def map_(scheme):
|
| 716 |
+
return alt.Chart().mark_geoshape().project(type='albersUsa').encode(
|
| 717 |
+
alt.Color('rate:Q', scale=alt.Scale(scheme=scheme), legend=None)
|
| 718 |
+
).properties(width=305, height=200)
|
| 719 |
+
|
| 720 |
+
alt.hconcat(
|
| 721 |
+
map_('tealblues'), map_('viridis'), map_('blueorange'),
|
| 722 |
+
data=alt.topo_feature(usa, 'counties')
|
| 723 |
+
).transform_lookup(
|
| 724 |
+
lookup='id', from_=alt.LookupData(data=unemp, key='id', fields=['rate'])
|
| 725 |
+
).configure_view(
|
| 726 |
+
stroke=None
|
| 727 |
+
).resolve_scale(
|
| 728 |
+
color='independent'
|
| 729 |
+
)
|
| 730 |
+
return
|
| 731 |
+
|
| 732 |
+
|
| 733 |
+
@app.cell(hide_code=True)
|
| 734 |
+
def _(mo):
|
| 735 |
+
mo.md(r"""
|
| 736 |
+
_Which color schemes do you find to be more effective? Why might that be? Modify the maps above to use other available schemes, as described in the [Vega Color Schemes documentation](https://vega.github.io/vega/docs/schemes/)._
|
| 737 |
+
""")
|
| 738 |
+
return
|
| 739 |
+
|
| 740 |
+
|
| 741 |
+
@app.cell(hide_code=True)
|
| 742 |
+
def _(mo):
|
| 743 |
+
mo.md(r"""
|
| 744 |
+
## Cartographic Projections
|
| 745 |
+
""")
|
| 746 |
+
return
|
| 747 |
+
|
| 748 |
+
|
| 749 |
+
@app.cell(hide_code=True)
|
| 750 |
+
def _(mo):
|
| 751 |
+
mo.md(r"""
|
| 752 |
+
Now that we have some experience creating maps, let's take a closer look at cartographic projections. As explained by [Wikipedia](https://en.wikipedia.org/wiki/Map_projection),
|
| 753 |
+
|
| 754 |
+
> _All map projections necessarily distort the surface in some fashion. Depending on the purpose of the map, some distortions are acceptable and others are not; therefore, different map projections exist in order to preserve some properties of the sphere-like body at the expense of other properties._
|
| 755 |
+
|
| 756 |
+
Some of the properties we might wish to consider include:
|
| 757 |
+
|
| 758 |
+
- _Area_: Does the projection distort region sizes?
|
| 759 |
+
- _Bearing_: Does a straight line correspond to a constant direction of travel?
|
| 760 |
+
- _Distance_: Do lines of equal length correspond to equal distances on the globe?
|
| 761 |
+
- _Shape_: Does the projection preserve spatial relations (angles) between points?
|
| 762 |
+
|
| 763 |
+
Selecting an appropriate projection thus depends on the use case for the map. For example, if we are assessing land use and the extent of land matters, we might choose an area-preserving projection. If we want to visualize shockwaves emanating from an earthquake, we might focus the map on the quake's epicenter and preserve distances outward from that point. Or, if we wish to aid navigation, the preservation of bearing and shape may be more important.
|
| 764 |
+
|
| 765 |
+
We can also characterize projections in terms of the _projection surface_. Cylindrical projections, for example, project surface points of the sphere onto a surrounding cylinder; the "unrolled" cylinder then provides our map. As we further describe below, we might alternatively project onto the surface of a cone (conic projections) or directly onto a flat plane (azimuthal projections).
|
| 766 |
+
|
| 767 |
+
*Let's first build up our intuition by interacting with a variety of projections! **[Open the online Vega-Lite Cartographic Projections notebook](https://observablehq.com/@vega/vega-lite-cartographic-projections).** Use the controls on that page to select a projection and explore projection parameters, such as the `scale` (zooming) and x/y translation (panning). The rotation ([yaw, pitch, roll](https://en.wikipedia.org/wiki/Aircraft_principal_axes)) controls determine the orientation of the globe relative to the surface being projected upon.*
|
| 768 |
+
""")
|
| 769 |
+
return
|
| 770 |
+
|
| 771 |
+
|
| 772 |
+
@app.cell(hide_code=True)
|
| 773 |
+
def _(mo):
|
| 774 |
+
mo.md(r"""
|
| 775 |
+
### A Tour of Specific Projection Types
|
| 776 |
+
""")
|
| 777 |
+
return
|
| 778 |
+
|
| 779 |
+
|
| 780 |
+
@app.cell(hide_code=True)
|
| 781 |
+
def _(mo):
|
| 782 |
+
mo.md(r"""
|
| 783 |
+
[**Cylindrical projections**](https://en.wikipedia.org/wiki/Map_projection#Cylindrical) map the sphere onto a surrounding cylinder, then unroll the cylinder. If the major axis of the cylinder is oriented north-south, meridians are mapped to straight lines. [Pseudo-cylindrical](https://en.wikipedia.org/wiki/Map_projection#Pseudocylindrical) projections represent a central meridian as a straight line, with other meridians "bending" away from the center.
|
| 784 |
+
""")
|
| 785 |
+
return
|
| 786 |
+
|
| 787 |
+
|
| 788 |
+
@app.cell
|
| 789 |
+
def _(alt, map):
|
| 790 |
+
_minimap = map.properties(width=225, height=225)
|
| 791 |
+
alt.hconcat(_minimap.project(type='equirectangular').properties(title='equirectangular'), _minimap.project(type='mercator').properties(title='mercator'), _minimap.project(type='transverseMercator').properties(title='transverseMercator'), _minimap.project(type='naturalEarth1').properties(title='naturalEarth1')).properties(spacing=10).configure_view(stroke=None)
|
| 792 |
+
return
|
| 793 |
+
|
| 794 |
+
|
| 795 |
+
@app.cell(hide_code=True)
|
| 796 |
+
def _(mo):
|
| 797 |
+
mo.md(r"""
|
| 798 |
+
- [Equirectangular](https://en.wikipedia.org/wiki/Equirectangular_projection) (`equirectangular`): Scale `lat`, `lon` coordinate values directly.
|
| 799 |
+
- [Mercator](https://en.wikipedia.org/wiki/Mercator_projection) (`mercator`): Project onto a cylinder, using `lon` directly, but subjecting `lat` to a non-linear transformation. Straight lines preserve constant compass bearings ([rhumb lines](https://en.wikipedia.org/wiki/Rhumb_line)), making this projection well-suited to navigation. However, areas in the far north or south can be greatly distorted.
|
| 800 |
+
- [Transverse Mercator](https://en.wikipedia.org/wiki/Transverse_Mercator_projection) (`transverseMercator`): A mercator projection, but with the bounding cylinder rotated to a transverse axis. Whereas the standard Mercator projection has highest accuracy along the equator, the Transverse Mercator projection is most accurate along the central meridian.
|
| 801 |
+
- [Natural Earth](https://en.wikipedia.org/wiki/Natural_Earth_projection) (`naturalEarth1`): A pseudo-cylindrical projection designed for showing the whole Earth in one view.
|
| 802 |
+
<br/><br/>
|
| 803 |
+
""")
|
| 804 |
+
return
|
| 805 |
+
|
| 806 |
+
|
| 807 |
+
@app.cell(hide_code=True)
|
| 808 |
+
def _(mo):
|
| 809 |
+
mo.md(r"""
|
| 810 |
+
[**Conic projections**](https://en.wikipedia.org/wiki/Map_projection#Conic) map the sphere onto a cone, and then unroll the cone on to the plane. Conic projections are configured by two _standard parallels_, which determine where the cone intersects the globe.
|
| 811 |
+
""")
|
| 812 |
+
return
|
| 813 |
+
|
| 814 |
+
|
| 815 |
+
@app.cell
|
| 816 |
+
def _(alt, map):
|
| 817 |
+
_minimap = map.properties(width=180, height=130)
|
| 818 |
+
alt.hconcat(_minimap.project(type='conicEqualArea').properties(title='conicEqualArea'), _minimap.project(type='conicEquidistant').properties(title='conicEquidistant'), _minimap.project(type='conicConformal', scale=35, translate=[90, 65]).properties(title='conicConformal'), _minimap.project(type='albers').properties(title='albers'), _minimap.project(type='albersUsa').properties(title='albersUsa')).properties(spacing=10).configure_view(stroke=None)
|
| 819 |
+
return
|
| 820 |
+
|
| 821 |
+
|
| 822 |
+
@app.cell(hide_code=True)
|
| 823 |
+
def _(mo):
|
| 824 |
+
mo.md(r"""
|
| 825 |
+
- [Conic Equal Area](https://en.wikipedia.org/wiki/Albers_projection) (`conicEqualArea`): Area-preserving conic projection. Shape and distance are not preserved, but roughly accurate within standard parallels.
|
| 826 |
+
- [Conic Equidistant](https://en.wikipedia.org/wiki/Equidistant_conic_projection) (`conicEquidistant`): Conic projection that preserves distance along the meridians and standard parallels.
|
| 827 |
+
- [Conic Conformal](https://en.wikipedia.org/wiki/Lambert_conformal_conic_projection) (`conicConformal`): Conic projection that preserves shape (local angles), but not area or distance.
|
| 828 |
+
- [Albers](https://en.wikipedia.org/wiki/Albers_projection) (`albers`): A variant of the conic equal area projection with standard parallels optimized for creating maps of the United States.
|
| 829 |
+
- [Albers USA](https://en.wikipedia.org/wiki/Albers_projection) (`albersUsa`): A hybrid projection for the 50 states of the United States of America. This projection stitches together three Albers projections with different parameters for the continental U.S., Alaska, and Hawaii.
|
| 830 |
+
<br/><br/>
|
| 831 |
+
""")
|
| 832 |
+
return
|
| 833 |
+
|
| 834 |
+
|
| 835 |
+
@app.cell(hide_code=True)
|
| 836 |
+
def _(mo):
|
| 837 |
+
mo.md(r"""
|
| 838 |
+
[**Azimuthal projections**](https://en.wikipedia.org/wiki/Map_projection#Azimuthal_%28projections_onto_a_plane%29) map the sphere directly onto a plane.
|
| 839 |
+
""")
|
| 840 |
+
return
|
| 841 |
+
|
| 842 |
+
|
| 843 |
+
@app.cell
|
| 844 |
+
def _(alt, map):
|
| 845 |
+
_minimap = map.properties(width=180, height=180)
|
| 846 |
+
alt.hconcat(_minimap.project(type='azimuthalEqualArea').properties(title='azimuthalEqualArea'), _minimap.project(type='azimuthalEquidistant').properties(title='azimuthalEquidistant'), _minimap.project(type='orthographic').properties(title='orthographic'), _minimap.project(type='stereographic').properties(title='stereographic'), _minimap.project(type='gnomonic').properties(title='gnomonic')).properties(spacing=10).configure_view(stroke=None)
|
| 847 |
+
return
|
| 848 |
+
|
| 849 |
+
|
| 850 |
+
@app.cell(hide_code=True)
|
| 851 |
+
def _(mo):
|
| 852 |
+
mo.md(r"""
|
| 853 |
+
- [Azimuthal Equal Area](https://en.wikipedia.org/wiki/Lambert_azimuthal_equal-area_projection) (`azimuthalEqualArea`): Accurately projects area in all parts of the globe, but does not preserve shape (local angles).
|
| 854 |
+
- [Azimuthal Equidistant](https://en.wikipedia.org/wiki/Azimuthal_equidistant_projection) (`azimuthalEquidistant`): Preserves proportional distance from the projection center to all other points on the globe.
|
| 855 |
+
- [Orthographic](https://en.wikipedia.org/wiki/Orthographic_projection_in_cartography) (`orthographic`): Projects a visible hemisphere onto a distant plane. Approximately matches a view of the Earth from outer space.
|
| 856 |
+
- [Stereographic](https://en.wikipedia.org/wiki/Stereographic_projection) (`stereographic`): Preserves shape, but not area or distance.
|
| 857 |
+
- [Gnomonic](https://en.wikipedia.org/wiki/Gnomonic_projection) (`gnomonic`): Projects the surface of the sphere directly onto a tangent plane. [Great circles](https://en.wikipedia.org/wiki/Great_circle) around the Earth are projected to straight lines, showing the shortest path between points.
|
| 858 |
+
<br/><br/>
|
| 859 |
+
""")
|
| 860 |
+
return
|
| 861 |
+
|
| 862 |
+
|
| 863 |
+
@app.cell(hide_code=True)
|
| 864 |
+
def _(mo):
|
| 865 |
+
mo.md(r"""
|
| 866 |
+
## Coda: Wrangling Geographic Data
|
| 867 |
+
""")
|
| 868 |
+
return
|
| 869 |
+
|
| 870 |
+
|
| 871 |
+
@app.cell(hide_code=True)
|
| 872 |
+
def _(mo):
|
| 873 |
+
mo.md(r"""
|
| 874 |
+
The examples above all draw from the vega-datasets collection, including geometric (TopoJSON) and tabular (airports, unemployment rates) data. A common challenge to getting starting with geographic visualization is collecting the necessary data for your task. A number of data providers abound, including services such as the [United States Geological Survey](https://www.usgs.gov/products/data/all-data) and [U.S. Census Bureau](https://www.census.gov/data/datasets.html).
|
| 875 |
+
|
| 876 |
+
In many cases you may have existing data with a geographic component, but require additional measures or geometry. To help you get started, here is one workflow:
|
| 877 |
+
|
| 878 |
+
1. Visit [Natural Earth Data](http://www.naturalearthdata.com/downloads/) and browse to select data for regions and resolutions of interest. Download the corresponding zip file(s).
|
| 879 |
+
2. Go to [MapShaper](https://mapshaper.org/) and drop your downloaded zip file onto the page. Revise the data as desired, and then "Export" generated TopoJSON or GeoJSON files.
|
| 880 |
+
3. Load the exported data from MapShaper for use with Altair!
|
| 881 |
+
|
| 882 |
+
Of course, many other tools – both open-source and proprietary – exist for working with geographic data. For more about geo-data wrangling and map creation, see Mike Bostock's tutorial series on [Command-Line Cartography](https://medium.com/@mbostock/command-line-cartography-part-1-897aa8f8ca2c).
|
| 883 |
+
""")
|
| 884 |
+
return
|
| 885 |
+
|
| 886 |
+
|
| 887 |
+
@app.cell(hide_code=True)
|
| 888 |
+
def _(mo):
|
| 889 |
+
mo.md(r"""
|
| 890 |
+
## Summary
|
| 891 |
+
|
| 892 |
+
At this point, we've only dipped our toes into the waters of map-making. _(You didn't expect a single notebook to impart centuries of learning, did you?)_ For example, we left untouched topics such as [_cartograms_](https://en.wikipedia.org/wiki/Cartogram) and conveying [_topography_](https://en.wikipedia.org/wiki/Topography) — as in Imhof's illuminating book [_Cartographic Relief Presentation_](https://books.google.com/books?id=cVy1Ms43fFYC). Nevertheless, you should now be well-equipped to create a rich array of geo-visualizations. For more, MacEachren's book [_How Maps Work: Representation, Visualization, and Design_](https://books.google.com/books?id=xhAvN3B0CkUC) provides a valuable overview of map-making from the perspective of data visualization.
|
| 893 |
+
""")
|
| 894 |
+
return
|
| 895 |
+
|
| 896 |
+
|
| 897 |
+
if __name__ == "__main__":
|
| 898 |
+
app.run()
|
|
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# /// script
|
| 2 |
+
# requires-python = ">=3.11"
|
| 3 |
+
# dependencies = [
|
| 4 |
+
# "altair==6.0.0",
|
| 5 |
+
# "marimo",
|
| 6 |
+
# "pandas==3.0.1",
|
| 7 |
+
# "vega_datasets==0.9.0",
|
| 8 |
+
# ]
|
| 9 |
+
# ///
|
| 10 |
+
|
| 11 |
+
import marimo
|
| 12 |
+
|
| 13 |
+
__generated_with = "0.20.4"
|
| 14 |
+
app = marimo.App()
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
@app.cell
|
| 18 |
+
def _():
|
| 19 |
+
import marimo as mo
|
| 20 |
+
|
| 21 |
+
return (mo,)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@app.cell(hide_code=True)
|
| 25 |
+
def _(mo):
|
| 26 |
+
mo.md(r"""
|
| 27 |
+
# Altair Debugging Guide
|
| 28 |
+
|
| 29 |
+
In this notebook we show you common debugging techniques that you can use if you run into issues with Altair.
|
| 30 |
+
|
| 31 |
+
You can jump to the following sections:
|
| 32 |
+
|
| 33 |
+
* [Installation and Setup](#Installation) when Altair is not installed correctly
|
| 34 |
+
* [Display Issues](#Display-Troubleshooting) when you don't see a chart
|
| 35 |
+
* [Invalid Specifications](#Invalid-Specifications) when you get an error
|
| 36 |
+
* [Properties are Being Ignored](#Properties-are-Being-Ignored) when you don't see any errors or warnings
|
| 37 |
+
* [Asking for Help](#Asking-for-Help) when you get stuck
|
| 38 |
+
* [Reporting Issues](#Reporting-Issues) when you find a bug
|
| 39 |
+
|
| 40 |
+
In addition to this notebook, you might find the [Frequently Asked Questions](https://altair-viz.github.io/user_guide/faq.html) and [Display Troubleshooting](https://altair-viz.github.io/user_guide/troubleshooting.html) guides helpful.
|
| 41 |
+
|
| 42 |
+
_This notebook is part of the [data visualization curriculum](https://github.com/uwdata/visualization-curriculum)._
|
| 43 |
+
""")
|
| 44 |
+
return
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
@app.cell(hide_code=True)
|
| 48 |
+
def _(mo):
|
| 49 |
+
mo.md(r"""
|
| 50 |
+
## Installation
|
| 51 |
+
""")
|
| 52 |
+
return
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
@app.cell(hide_code=True)
|
| 56 |
+
def _(mo):
|
| 57 |
+
mo.md(r"""
|
| 58 |
+
These instructions follow [the Altair documentation](https://altair-viz.github.io/getting_started/installation.html) but focus on some specifics for this series of notebooks.
|
| 59 |
+
|
| 60 |
+
In every notebook, we will import the [Altair](https://github.com/altair-viz/altair) and [Vega Datasets](https://github.com/altair-viz/vega_datasets) packages. If you are running this notebook on [Colab](https://colab.research.google.com), Altair and Vega Datasets should be preinstalled and ready to go. The notebooks in this series are designed for Colab but should also work in Jupyter Lab or the Jupyter Notebook (the notebook requires a bit more setup [described below](#Special-Setup-for-the-Jupyter-Notebook)) but additional packages are required.
|
| 61 |
+
|
| 62 |
+
If you are running in Jupyter Lab or Jupyter Notebooks, you have to install the necessary packages by running the following command in your terminal.
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
pip install altair vega_datasets
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Or if you use [Conda](https://conda.io)
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
conda install -c conda-forge altair vega_datasets
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
You can run command line commands from a code cell by prefixing it with `!`. For example, to install Altair and Vega Datasets with [Pip](https://pip.pypa.io/), you can run the following cell.
|
| 75 |
+
""")
|
| 76 |
+
return
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
@app.cell
|
| 80 |
+
def _():
|
| 81 |
+
# packages added via marimo's package management: altair vega_datasets !pip install altair vega_datasets
|
| 82 |
+
return
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
@app.cell
|
| 86 |
+
def _():
|
| 87 |
+
import altair as alt
|
| 88 |
+
from vega_datasets import data
|
| 89 |
+
|
| 90 |
+
return alt, data
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
@app.cell(hide_code=True)
|
| 94 |
+
def _(mo):
|
| 95 |
+
mo.md(r"""
|
| 96 |
+
### Make sure you are Using the Latest Version of Altair
|
| 97 |
+
""")
|
| 98 |
+
return
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
@app.cell(hide_code=True)
|
| 102 |
+
def _(mo):
|
| 103 |
+
mo.md(r"""
|
| 104 |
+
If you are running into issues with Altair, first make sure that you are running the latest version. To check the version of Altair that you have installed, run the cell below.
|
| 105 |
+
""")
|
| 106 |
+
return
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
@app.cell
|
| 110 |
+
def _(alt):
|
| 111 |
+
alt.__version__
|
| 112 |
+
return
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
@app.cell(hide_code=True)
|
| 116 |
+
def _(mo):
|
| 117 |
+
mo.md(r"""
|
| 118 |
+
To check what the latest version of altair is, go to [this page](https://pypi.org/project/altair/) or run the cell below (requires Python 3).
|
| 119 |
+
""")
|
| 120 |
+
return
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
@app.cell
|
| 124 |
+
def _():
|
| 125 |
+
import urllib.request, json
|
| 126 |
+
with urllib.request.urlopen("https://pypi.org/pypi/altair/json") as url:
|
| 127 |
+
print(json.loads(url.read().decode())['info']['version'])
|
| 128 |
+
return
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
@app.cell(hide_code=True)
|
| 132 |
+
def _(mo):
|
| 133 |
+
mo.md(r"""
|
| 134 |
+
If you are not running the latest version, you can update it with `pip`. You can update Altair and Vega Datasets by running this command in your terminal.
|
| 135 |
+
|
| 136 |
+
```
|
| 137 |
+
pip install -U altair vega_datasets
|
| 138 |
+
```
|
| 139 |
+
""")
|
| 140 |
+
return
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
@app.cell(hide_code=True)
|
| 144 |
+
def _(mo):
|
| 145 |
+
mo.md(r"""
|
| 146 |
+
### Try Making a Chart
|
| 147 |
+
""")
|
| 148 |
+
return
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
@app.cell(hide_code=True)
|
| 152 |
+
def _(mo):
|
| 153 |
+
mo.md(r"""
|
| 154 |
+
Now you can create an Altair chart.
|
| 155 |
+
""")
|
| 156 |
+
return
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
@app.cell
|
| 160 |
+
def _(alt, data):
|
| 161 |
+
cars = data.cars()
|
| 162 |
+
|
| 163 |
+
alt.Chart(cars).mark_point().encode(
|
| 164 |
+
x='Horsepower',
|
| 165 |
+
y='Displacement',
|
| 166 |
+
color='Origin'
|
| 167 |
+
)
|
| 168 |
+
return (cars,)
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
@app.cell(hide_code=True)
|
| 172 |
+
def _(mo):
|
| 173 |
+
mo.md(r"""
|
| 174 |
+
### Special Setup for the Jupyter Notebook
|
| 175 |
+
""")
|
| 176 |
+
return
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
@app.cell(hide_code=True)
|
| 180 |
+
def _(mo):
|
| 181 |
+
mo.md(r"""
|
| 182 |
+
If you are running in Jupyter Lab, Jupyter Notebook, or Colab (and have a working Internet connection) you should be seeing a chart. If you are running in another environment (or offline), you will need to tell Altair to use a different renderer;
|
| 183 |
+
|
| 184 |
+
To activate a different renderer in a notebook cell:
|
| 185 |
+
|
| 186 |
+
```python
|
| 187 |
+
# to run in nteract, VSCode, or offline in JupyterLab
|
| 188 |
+
alt.renderers.enable('mimebundle')
|
| 189 |
+
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
To run offline in Jupyter Notebook you must install an additional dependency, the `vega` package. Run this command in your terminal:
|
| 193 |
+
|
| 194 |
+
```bash
|
| 195 |
+
pip install vega
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
Then activate the notebook renderer:
|
| 199 |
+
|
| 200 |
+
```python
|
| 201 |
+
# to run offline in Jupyter Notebook
|
| 202 |
+
alt.renderers.enable('notebook')
|
| 203 |
+
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
These instruction follow [the instructions on the Altair website](https://altair-viz.github.io/getting_started/installation.html#installation-notebook).
|
| 208 |
+
""")
|
| 209 |
+
return
|
| 210 |
+
|
| 211 |
+
|
| 212 |
+
@app.cell(hide_code=True)
|
| 213 |
+
def _(mo):
|
| 214 |
+
mo.md(r"""
|
| 215 |
+
## Display Troubleshooting
|
| 216 |
+
|
| 217 |
+
If you are having issues with seeing a chart, make sure your setup is correct by following the [debugging instruction above](#Installation). If you are still having issues, follow the [instruction about debugging display issues in the Altair documentation](https://iliatimofeev.github.io/altair-viz.github.io/user_guide/troubleshooting.html).
|
| 218 |
+
""")
|
| 219 |
+
return
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
@app.cell(hide_code=True)
|
| 223 |
+
def _(mo):
|
| 224 |
+
mo.md(r"""
|
| 225 |
+
### Non Existent Fields
|
| 226 |
+
|
| 227 |
+
A common error is [accidentally using a field that does not exist](https://iliatimofeev.github.io/altair-viz.github.io/user_guide/troubleshooting.html#plot-displays-but-the-content-is-empty).
|
| 228 |
+
""")
|
| 229 |
+
return
|
| 230 |
+
|
| 231 |
+
|
| 232 |
+
@app.cell
|
| 233 |
+
def _(alt):
|
| 234 |
+
import pandas as pd
|
| 235 |
+
|
| 236 |
+
df = pd.DataFrame({'x': [1, 2, 3],
|
| 237 |
+
'y': [3, 1, 4]})
|
| 238 |
+
|
| 239 |
+
alt.Chart(df).mark_point().encode(
|
| 240 |
+
x='x:Q',
|
| 241 |
+
y='y:Q',
|
| 242 |
+
color='color:Q' # <-- this field does not exist in the data!
|
| 243 |
+
)
|
| 244 |
+
return (df,)
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
@app.cell(hide_code=True)
|
| 248 |
+
def _(mo):
|
| 249 |
+
mo.md(r"""
|
| 250 |
+
Check the spelling of your files and print the data source to confirm that the data and fields exist. For instance, here you see that `color` is not a vaid field.
|
| 251 |
+
""")
|
| 252 |
+
return
|
| 253 |
+
|
| 254 |
+
|
| 255 |
+
@app.cell
|
| 256 |
+
def _(df):
|
| 257 |
+
df.head()
|
| 258 |
+
return
|
| 259 |
+
|
| 260 |
+
|
| 261 |
+
@app.cell(hide_code=True)
|
| 262 |
+
def _(mo):
|
| 263 |
+
mo.md(r"""
|
| 264 |
+
## Invalid Specifications
|
| 265 |
+
|
| 266 |
+
Another common issue is creating an invalid specification and getting an error.
|
| 267 |
+
""")
|
| 268 |
+
return
|
| 269 |
+
|
| 270 |
+
|
| 271 |
+
@app.cell(hide_code=True)
|
| 272 |
+
def _(mo):
|
| 273 |
+
mo.md(r"""
|
| 274 |
+
### Invalid Properties
|
| 275 |
+
|
| 276 |
+
Altair might show an `SchemaValidationError` or `ValueError`. Read the error message carefully. Usually it will tell you what is going wrong.
|
| 277 |
+
""")
|
| 278 |
+
return
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
@app.cell(hide_code=True)
|
| 282 |
+
def _(mo):
|
| 283 |
+
mo.md(r"""
|
| 284 |
+
For example, if you forget the mark type, you will see this `SchemaValidationError`.
|
| 285 |
+
""")
|
| 286 |
+
return
|
| 287 |
+
|
| 288 |
+
|
| 289 |
+
@app.cell
|
| 290 |
+
def _(alt, cars):
|
| 291 |
+
alt.Chart(cars).encode(
|
| 292 |
+
y='Horsepower'
|
| 293 |
+
)
|
| 294 |
+
return
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
@app.cell(hide_code=True)
|
| 298 |
+
def _(mo):
|
| 299 |
+
mo.md(r"""
|
| 300 |
+
Or if you use a non-existent channel, you get a `TypeError`.
|
| 301 |
+
""")
|
| 302 |
+
return
|
| 303 |
+
|
| 304 |
+
|
| 305 |
+
@app.cell
|
| 306 |
+
def _(alt, cars):
|
| 307 |
+
try:
|
| 308 |
+
alt.Chart(cars).mark_point().encode(
|
| 309 |
+
z='Horsepower'
|
| 310 |
+
)
|
| 311 |
+
except TypeError as e:
|
| 312 |
+
print(f"TypeError: {e}")
|
| 313 |
+
return
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
@app.cell(hide_code=True)
|
| 317 |
+
def _(mo):
|
| 318 |
+
mo.md(r"""
|
| 319 |
+
## Properties are Being Ignored
|
| 320 |
+
|
| 321 |
+
Altair might ignore a property that you specified. In the chart below, we are using a `text` channel, which is only compatible with `mark_text`. You do not see an error or a warning about this in the notebook. However, the underlying Vega-Lite library will show a warning in the browser console. Press <kbd>Alt</kbd>+<kbd>Cmd</kbd>+<kbd>I</kbd> on Mac or <kbd>Alt</kbd>+<kbd>Ctrl</kbd>+<kbd>I</kbd> on Windows and Linux to open the developer tools and click on the `Console` tab. When you run the example in the cell below, you will see a the following warning.
|
| 322 |
+
|
| 323 |
+
```
|
| 324 |
+
WARN text dropped as it is incompatible with "bar".
|
| 325 |
+
```
|
| 326 |
+
""")
|
| 327 |
+
return
|
| 328 |
+
|
| 329 |
+
|
| 330 |
+
@app.cell
|
| 331 |
+
def _(alt, cars):
|
| 332 |
+
alt.Chart(cars).mark_bar().encode(
|
| 333 |
+
y='mean(Horsepower)',
|
| 334 |
+
text='mean(Acceleration)'
|
| 335 |
+
)
|
| 336 |
+
return
|
| 337 |
+
|
| 338 |
+
|
| 339 |
+
@app.cell(hide_code=True)
|
| 340 |
+
def _(mo):
|
| 341 |
+
mo.md(r"""
|
| 342 |
+
If you find yourself debugging issues related to Vega-Lite, you can open the chart in the [Vega Editor](https://vega.github.io/editor/) either by clicking on the "Open in Vega Editor" link at the bottom of the chart or in the action menu (click to open) at the top right of a chart. The Vega Editor provides additional debugging but you will be writing Vega-Lite JSON instead of Altair in Python.
|
| 343 |
+
|
| 344 |
+
**Note**: The Vega Editor may be using a newer version of Vega-Lite and so the behavior may vary.
|
| 345 |
+
""")
|
| 346 |
+
return
|
| 347 |
+
|
| 348 |
+
|
| 349 |
+
@app.cell(hide_code=True)
|
| 350 |
+
def _(mo):
|
| 351 |
+
mo.md(r"""
|
| 352 |
+
## Asking for Help
|
| 353 |
+
|
| 354 |
+
If you find a problem with Altair and get stuck, you can ask a question on Stack Overflow. Ask your question with the `altair` and `vega-lite` tags. You can find a list of questions people have asked before [here](https://stackoverflow.com/questions/tagged/altair).
|
| 355 |
+
""")
|
| 356 |
+
return
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
@app.cell(hide_code=True)
|
| 360 |
+
def _(mo):
|
| 361 |
+
mo.md(r"""
|
| 362 |
+
## Reporting Issues
|
| 363 |
+
|
| 364 |
+
If you find a problem with Altair and believe it is a bug, please [create an issue in the Altair GitHub repo](https://github.com/altair-viz/altair/issues/new) with a description of your problem. If you believe the issue is related to the underlying Vega-Lite library, please [create an issue in the Vega-Lite GitHub repo](https://github.com/vega/vega-lite/issues/new).
|
| 365 |
+
""")
|
| 366 |
+
return
|
| 367 |
+
|
| 368 |
+
|
| 369 |
+
if __name__ == "__main__":
|
| 370 |
+
app.run()
|
|
The diff for this file is too large to render.
See raw diff
|
|
|
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Learn Altair
|
| 3 |
+
description: >
|
| 4 |
+
Learn the basics of Altair, a high-performance visualization library,
|
| 5 |
+
using lessons developed at the University of Washington.
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Acknowledgments
|
| 9 |
+
|
| 10 |
+
These notebooks were created by Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft
|
| 11 |
+
as part of the [Visualization Curriculum](https://uwdata.github.io/visualization-curriculum/intro.html)
|
| 12 |
+
at the University of Washington.
|
| 13 |
+
Our thanks to the authors for making their work available under an open license:
|
| 14 |
+
if we all share a little, we all get a lot.
|
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
:root {
|
| 2 |
+
--primary-green: #10B981;
|
| 3 |
+
--dark-green: #047857;
|
| 4 |
+
--light-green: #D1FAE5;
|
| 5 |
+
}
|
| 6 |
+
.bg-primary { background-color: var(--primary-green); }
|
| 7 |
+
.text-primary { color: var(--primary-green); }
|
| 8 |
+
.border-primary { border-color: var(--primary-green); }
|
| 9 |
+
.bg-light { background-color: var(--light-green); }
|
| 10 |
+
.hover-grow { transition: transform 0.2s ease; }
|
| 11 |
+
.hover-grow:hover { transform: scale(1.02); }
|
| 12 |
+
.card-shadow { box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05), 0 1px 3px rgba(0, 0, 0, 0.1); }
|
| 13 |
+
|
| 14 |
+
/* Prose styles for markdown-generated content */
|
| 15 |
+
.prose h1 { font-size: 1.875rem; font-weight: 700; color: #1f2937; margin: 1.5rem 0 0.75rem; }
|
| 16 |
+
.prose h2 { font-size: 1.5rem; font-weight: 700; color: #1f2937; margin: 1.5rem 0 0.75rem; }
|
| 17 |
+
.prose h3 { font-size: 1.25rem; font-weight: 600; color: #1f2937; margin: 1.25rem 0 0.5rem; }
|
| 18 |
+
.prose h4 { font-size: 1.125rem; font-weight: 600; color: #1f2937; margin: 1rem 0 0.5rem; }
|
| 19 |
+
.prose p { color: #4b5563; margin-bottom: 1rem; line-height: 1.75; }
|
| 20 |
+
.prose ul { list-style-type: disc; padding-left: 1.25rem; margin-bottom: 1rem; color: #4b5563; }
|
| 21 |
+
.prose ol { list-style-type: decimal; padding-left: 1.25rem; margin-bottom: 1rem; color: #4b5563; }
|
| 22 |
+
.prose li { margin-bottom: 0.25rem; line-height: 1.75; }
|
| 23 |
+
.prose a { color: var(--primary-green); }
|
| 24 |
+
.prose a:hover { color: var(--dark-green); }
|
| 25 |
+
.prose strong { font-weight: 600; }
|
| 26 |
+
.prose code { font-family: ui-monospace, monospace; font-size: 0.875em;
|
| 27 |
+
background-color: #f3f4f6; padding: 0.1em 0.3em; border-radius: 0.25rem; }
|
| 28 |
+
.prose pre { background-color: #f3f4f6; color: #1f2937; padding: 1rem;
|
| 29 |
+
border-radius: 0.5rem; overflow-x: auto; margin-bottom: 1rem; }
|
| 30 |
+
.prose pre code { background: none; padding: 0; font-size: 0.875rem; color: inherit; }
|
| 31 |
+
|
| 32 |
+
/* Component classes */
|
| 33 |
+
.logo-container { background-color: var(--light-green); padding: 0.25rem; border-radius: 0.5rem; }
|
| 34 |
+
.card-accent { height: 0.5rem; background-color: var(--primary-green); }
|
| 35 |
+
.feature-card { background-color: #ffffff; padding: 1.5rem; border-radius: 0.5rem;
|
| 36 |
+
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05), 0 1px 3px rgba(0, 0, 0, 0.1); }
|
| 37 |
+
.content-card { background-color: #ffffff; border: 1px solid #e5e7eb; border-radius: 0.5rem;
|
| 38 |
+
overflow: hidden; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05), 0 1px 3px rgba(0, 0, 0, 0.1); }
|
| 39 |
+
.icon-container { width: 3rem; height: 3rem; background-color: var(--light-green);
|
| 40 |
+
border-radius: 9999px; display: flex; align-items: center;
|
| 41 |
+
justify-content: center; margin-bottom: 1rem; }
|
| 42 |
+
|
| 43 |
+
.link-primary { color: var(--primary-green); }
|
| 44 |
+
.link-primary:hover { color: var(--dark-green); }
|
| 45 |
+
|
| 46 |
+
.btn-primary { background-color: var(--primary-green); color: #ffffff; font-weight: 500;
|
| 47 |
+
border-radius: 0.375rem; transition: background-color 300ms ease-in-out; }
|
| 48 |
+
.btn-primary:hover { background-color: var(--dark-green); }
|
| 49 |
+
|
| 50 |
+
.footer-link { color: #d1d5db; transition: color 300ms ease-in-out; }
|
| 51 |
+
.footer-link:hover { color: #ffffff; }
|
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
"""Generate a static site from Jinja2 templates and lesson data."""
|
| 3 |
+
|
| 4 |
+
import argparse
|
| 5 |
+
import datetime
|
| 6 |
+
import json
|
| 7 |
+
import re
|
| 8 |
+
import shutil
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
import frontmatter
|
| 12 |
+
import markdown as md
|
| 13 |
+
from jinja2 import Environment, FileSystemLoader
|
| 14 |
+
|
| 15 |
+
from utils import get_notebook_title
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def transform_lessons(data: dict, root: Path) -> dict:
|
| 19 |
+
"""Transform raw lesson data into template-ready form."""
|
| 20 |
+
for course_id, course in data.items():
|
| 21 |
+
desc = course.get("description", "").strip()
|
| 22 |
+
course["description_html"] = f"<p>{desc}</p>" if desc else ""
|
| 23 |
+
course["notebooks"] = [
|
| 24 |
+
{
|
| 25 |
+
"title": get_notebook_title(root / course_id / nb)
|
| 26 |
+
or re.sub(r"^\d+_", "", nb.replace(".py", "")).replace("_", " ").title(),
|
| 27 |
+
"html_path": f"{course_id}/{nb.replace('.py', '.html')}",
|
| 28 |
+
"local_html_path": nb.replace(".py", ".html"),
|
| 29 |
+
}
|
| 30 |
+
for nb in course.get("notebooks", [])
|
| 31 |
+
]
|
| 32 |
+
index_md = root / course_id / "index.md"
|
| 33 |
+
post = frontmatter.load(index_md)
|
| 34 |
+
course["body_html"] = md.markdown(post.content, extensions=["fenced_code", "tables"])
|
| 35 |
+
return data
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
def render(template, path, **kwargs):
|
| 39 |
+
path.parent.mkdir(parents=True, exist_ok=True)
|
| 40 |
+
path.write_text(template.render(**kwargs))
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def main():
|
| 44 |
+
parser = argparse.ArgumentParser(description="Generate static site from lesson data")
|
| 45 |
+
parser.add_argument("--root", required=True, help="Project root directory")
|
| 46 |
+
parser.add_argument("--output", required=True, help="Output directory")
|
| 47 |
+
parser.add_argument("--data", required=True, help="Path to lessons JSON file")
|
| 48 |
+
args = parser.parse_args()
|
| 49 |
+
|
| 50 |
+
root = Path(args.root)
|
| 51 |
+
output = Path(args.output)
|
| 52 |
+
output.mkdir(parents=True, exist_ok=True)
|
| 53 |
+
|
| 54 |
+
lessons = transform_lessons(json.loads(Path(args.data).read_text()), root)
|
| 55 |
+
env = Environment(loader=FileSystemLoader(root / "templates"))
|
| 56 |
+
current_year = datetime.date.today().year
|
| 57 |
+
|
| 58 |
+
render(
|
| 59 |
+
env.get_template("index.html"),
|
| 60 |
+
output / "index.html",
|
| 61 |
+
courses=lessons,
|
| 62 |
+
current_year=current_year,
|
| 63 |
+
root_path="",
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
assets_src = root / "assets"
|
| 67 |
+
if assets_src.exists():
|
| 68 |
+
shutil.copytree(assets_src, output / "assets", dirs_exist_ok=True)
|
| 69 |
+
|
| 70 |
+
for course_id, lesson in lessons.items():
|
| 71 |
+
render(
|
| 72 |
+
env.get_template("lesson.html"),
|
| 73 |
+
output / course_id / "index.html",
|
| 74 |
+
lesson=lesson,
|
| 75 |
+
current_year=current_year,
|
| 76 |
+
root_path="../",
|
| 77 |
+
)
|
| 78 |
+
|
| 79 |
+
page_template = env.get_template("page.html")
|
| 80 |
+
for page_src in sorted((root / "pages").glob("*.md")):
|
| 81 |
+
post = frontmatter.load(page_src)
|
| 82 |
+
render(
|
| 83 |
+
page_template,
|
| 84 |
+
output / page_src.stem / "index.html",
|
| 85 |
+
title=post.get("title", page_src.stem),
|
| 86 |
+
body_html=md.markdown(post.content, extensions=["fenced_code", "tables"]),
|
| 87 |
+
current_year=current_year,
|
| 88 |
+
root_path="../",
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
if __name__ == "__main__":
|
| 93 |
+
main()
|
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#!/usr/bin/env
|
| 2 |
"""
|
| 3 |
Script to detect empty cells in marimo notebooks.
|
| 4 |
|
|
@@ -15,7 +15,6 @@ This script will:
|
|
| 15 |
"""
|
| 16 |
|
| 17 |
import os
|
| 18 |
-
import re
|
| 19 |
import sys
|
| 20 |
from pathlib import Path
|
| 21 |
from typing import List, Tuple
|
|
@@ -136,4 +135,4 @@ def main():
|
|
| 136 |
|
| 137 |
|
| 138 |
if __name__ == "__main__":
|
| 139 |
-
main()
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
"""
|
| 3 |
Script to detect empty cells in marimo notebooks.
|
| 4 |
|
|
|
|
| 15 |
"""
|
| 16 |
|
| 17 |
import os
|
|
|
|
| 18 |
import sys
|
| 19 |
from pathlib import Path
|
| 20 |
from typing import List, Tuple
|
|
|
|
| 135 |
|
| 136 |
|
| 137 |
if __name__ == "__main__":
|
| 138 |
+
main()
|
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
"""Report marimo notebooks that are missing an H1 title."""
|
| 3 |
+
|
| 4 |
+
import sys
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
from utils import get_notebook_title
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def main():
|
| 11 |
+
root = Path(__file__).parent.parent
|
| 12 |
+
notebooks = sorted(root.glob("*/[0-9]*.py"))
|
| 13 |
+
missing = [nb for nb in notebooks if get_notebook_title(nb) is None]
|
| 14 |
+
if missing:
|
| 15 |
+
for nb in missing:
|
| 16 |
+
print(nb.relative_to(root))
|
| 17 |
+
sys.exit(1)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
if __name__ == "__main__":
|
| 21 |
+
main()
|
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
"""Check that marimo notebooks in the same lesson directory agree on package versions.
|
| 3 |
+
|
| 4 |
+
It is acceptable for different notebooks in a directory to specify different packages,
|
| 5 |
+
but if two or more notebooks specify the same package, their version constraints must
|
| 6 |
+
be identical.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import argparse
|
| 10 |
+
import re
|
| 11 |
+
import sys
|
| 12 |
+
from collections import defaultdict
|
| 13 |
+
from pathlib import Path
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# Regex to extract the inline script metadata block (PEP 723)
|
| 17 |
+
SCRIPT_BLOCK_RE = re.compile(r"^# /// script\s*\n((?:#[^\n]*\n)*?)# ///", re.MULTILINE)
|
| 18 |
+
DEPENDENCY_LINE_RE = re.compile(r'^#\s+"([^"]+)",?\s*$')
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def parse_script_header(text: str) -> list[str]:
|
| 22 |
+
"""Return the list of dependency strings from a PEP 723 script header, or []."""
|
| 23 |
+
match = SCRIPT_BLOCK_RE.search(text)
|
| 24 |
+
if not match:
|
| 25 |
+
return []
|
| 26 |
+
block = match.group(1)
|
| 27 |
+
deps: list[str] = []
|
| 28 |
+
in_deps = False
|
| 29 |
+
for raw_line in block.splitlines():
|
| 30 |
+
line = raw_line.lstrip("#").strip()
|
| 31 |
+
if line.startswith("dependencies"):
|
| 32 |
+
in_deps = True
|
| 33 |
+
continue
|
| 34 |
+
if in_deps:
|
| 35 |
+
if line.startswith("]"):
|
| 36 |
+
break
|
| 37 |
+
# strip surrounding quotes and comma: e.g. ' "polars==1.0",' -> 'polars==1.0'
|
| 38 |
+
stripped = line.strip().strip('"\'').rstrip(",").strip('"\'')
|
| 39 |
+
if stripped:
|
| 40 |
+
deps.append(stripped)
|
| 41 |
+
return deps
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def package_name(dep: str) -> str:
|
| 45 |
+
"""Extract the bare package name from a PEP 508 dependency string.
|
| 46 |
+
|
| 47 |
+
Examples:
|
| 48 |
+
"polars==1.22.0" -> "polars"
|
| 49 |
+
"pandas>=2.0,<3" -> "pandas"
|
| 50 |
+
"marimo" -> "marimo"
|
| 51 |
+
"""
|
| 52 |
+
return re.split(r"[><=!;\s\[]", dep, maxsplit=1)[0].lower()
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def check_directory(lesson_dir: Path, only: set[str]) -> list[str]:
|
| 56 |
+
"""Return a list of error messages for version inconsistencies among *only* in lesson_dir."""
|
| 57 |
+
# Map package name -> {version_spec: [notebook_path, ...]}
|
| 58 |
+
seen: dict[str, dict[str, list[str]]] = defaultdict(lambda: defaultdict(list))
|
| 59 |
+
|
| 60 |
+
for nb in sorted(lesson_dir.glob("*.py")):
|
| 61 |
+
if nb.name not in only:
|
| 62 |
+
continue
|
| 63 |
+
try:
|
| 64 |
+
text = nb.read_text(encoding="utf-8")
|
| 65 |
+
except IOError:
|
| 66 |
+
continue
|
| 67 |
+
if "marimo.App" not in text:
|
| 68 |
+
continue
|
| 69 |
+
for dep in parse_script_header(text):
|
| 70 |
+
name = package_name(dep)
|
| 71 |
+
seen[name][dep].append(nb.name)
|
| 72 |
+
|
| 73 |
+
errors: list[str] = []
|
| 74 |
+
for name, specs in sorted(seen.items()):
|
| 75 |
+
if len(specs) > 1:
|
| 76 |
+
errors.append(f" Package '{name}' has conflicting specifications:")
|
| 77 |
+
for spec, files in sorted(specs.items()):
|
| 78 |
+
errors.append(f" {spec!r} in: {', '.join(files)}")
|
| 79 |
+
return errors
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def main() -> None:
|
| 83 |
+
parser = argparse.ArgumentParser(description=__doc__)
|
| 84 |
+
parser.add_argument("notebooks", nargs="+", metavar="NOTEBOOK",
|
| 85 |
+
help="notebook files to check (grouped by directory)")
|
| 86 |
+
args = parser.parse_args()
|
| 87 |
+
|
| 88 |
+
dir_filter: dict[Path, set[str]] = defaultdict(set)
|
| 89 |
+
for nb_path in (Path(p) for p in args.notebooks):
|
| 90 |
+
dir_filter[nb_path.parent].add(nb_path.name)
|
| 91 |
+
|
| 92 |
+
total_errors = 0
|
| 93 |
+
for lesson_dir, only in sorted(dir_filter.items()):
|
| 94 |
+
errors = check_directory(lesson_dir, only=only)
|
| 95 |
+
if errors:
|
| 96 |
+
print(f"\n{lesson_dir}/")
|
| 97 |
+
for msg in errors:
|
| 98 |
+
print(msg)
|
| 99 |
+
total_errors += len(errors)
|
| 100 |
+
|
| 101 |
+
if total_errors:
|
| 102 |
+
print(f"\nFound package version inconsistencies in {total_errors} package(s).")
|
| 103 |
+
sys.exit(1)
|
| 104 |
+
else:
|
| 105 |
+
print("All package version specifications are consistent.")
|
| 106 |
+
sys.exit(0)
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
if __name__ == "__main__":
|
| 110 |
+
main()
|
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
create table job (
|
| 2 |
+
name text not null,
|
| 3 |
+
credits real not null
|
| 4 |
+
);
|
| 5 |
+
|
| 6 |
+
create table work (
|
| 7 |
+
person text not null,
|
| 8 |
+
job text not null
|
| 9 |
+
);
|
| 10 |
+
|
| 11 |
+
insert into job values
|
| 12 |
+
('calibrate', 1.5),
|
| 13 |
+
('clean', 0.5);
|
| 14 |
+
|
| 15 |
+
insert into work values
|
| 16 |
+
('Amal', 'calibrate'),
|
| 17 |
+
('Amal', 'clean'),
|
| 18 |
+
('Amal', 'complain'),
|
| 19 |
+
('Gita', 'clean'),
|
| 20 |
+
('Gita', 'clean'),
|
| 21 |
+
('Gita', 'complain'),
|
| 22 |
+
('Madhi', 'complain');
|
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
|
| 3 |
+
import csv
|
| 4 |
+
import sqlite3
|
| 5 |
+
import sys
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
SCHEMA = """
|
| 9 |
+
CREATE TABLE penguins (
|
| 10 |
+
species text,
|
| 11 |
+
island text,
|
| 12 |
+
bill_length_mm real,
|
| 13 |
+
bill_depth_mm real,
|
| 14 |
+
flipper_length_mm real,
|
| 15 |
+
body_mass_g real,
|
| 16 |
+
sex text
|
| 17 |
+
);
|
| 18 |
+
"""
|
| 19 |
+
|
| 20 |
+
def main():
|
| 21 |
+
infile = sys.argv[1]
|
| 22 |
+
outfile = sys.argv[2]
|
| 23 |
+
|
| 24 |
+
con = sqlite3.connect(outfile)
|
| 25 |
+
con.execute(SCHEMA)
|
| 26 |
+
|
| 27 |
+
with open(infile, newline="") as f:
|
| 28 |
+
reader = csv.DictReader(f)
|
| 29 |
+
rows = [
|
| 30 |
+
(
|
| 31 |
+
row["species"],
|
| 32 |
+
row["island"],
|
| 33 |
+
float(row["bill_length_mm"]) if row["bill_length_mm"] else None,
|
| 34 |
+
float(row["bill_depth_mm"]) if row["bill_depth_mm"] else None,
|
| 35 |
+
float(row["flipper_length_mm"]) if row["flipper_length_mm"] else None,
|
| 36 |
+
float(row["body_mass_g"]) if row["body_mass_g"] else None,
|
| 37 |
+
row["sex"] if row["sex"] else None,
|
| 38 |
+
)
|
| 39 |
+
for row in reader
|
| 40 |
+
]
|
| 41 |
+
|
| 42 |
+
con.executemany(
|
| 43 |
+
"INSERT INTO penguins VALUES (?, ?, ?, ?, ?, ?, ?)", rows
|
| 44 |
+
)
|
| 45 |
+
con.commit()
|
| 46 |
+
con.close()
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
if __name__ == "__main__":
|
| 50 |
+
main()
|
|
@@ -0,0 +1,175 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
|
| 3 |
+
import datetime
|
| 4 |
+
import faker
|
| 5 |
+
import itertools
|
| 6 |
+
import random
|
| 7 |
+
import sqlite3
|
| 8 |
+
import sys
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
LOCALE = "es"
|
| 12 |
+
|
| 13 |
+
NUM_PERSONS = 6
|
| 14 |
+
|
| 15 |
+
DATE_START = datetime.date(2025, 9, 1)
|
| 16 |
+
DATE_END = datetime.date(2025, 12, 31)
|
| 17 |
+
DATE_DURATION = 7
|
| 18 |
+
|
| 19 |
+
NUM_MACHINES = 5
|
| 20 |
+
|
| 21 |
+
CREATE_PERSONS = """\
|
| 22 |
+
create table person(
|
| 23 |
+
person_id text not null primary key,
|
| 24 |
+
personal text not null,
|
| 25 |
+
family text not null,
|
| 26 |
+
supervisor_id text,
|
| 27 |
+
foreign key(supervisor_id) references person(person_id)
|
| 28 |
+
);
|
| 29 |
+
"""
|
| 30 |
+
INSERT_PERSONS = """\
|
| 31 |
+
insert into person values (:person_id, :personal, :family, :supervisor_id);
|
| 32 |
+
"""
|
| 33 |
+
|
| 34 |
+
CREATE_SURVEYS = """\
|
| 35 |
+
create table survey(
|
| 36 |
+
survey_id text not null primary key,
|
| 37 |
+
person_id text not null,
|
| 38 |
+
start_date text,
|
| 39 |
+
end_date text,
|
| 40 |
+
foreign key(person_id) references person(person_id)
|
| 41 |
+
);
|
| 42 |
+
"""
|
| 43 |
+
INSERT_SURVEYS = """\
|
| 44 |
+
insert into survey values(:survey_id, :person_id, :start, :end);
|
| 45 |
+
"""
|
| 46 |
+
|
| 47 |
+
CREATE_MACHINES = """\
|
| 48 |
+
create table machine(
|
| 49 |
+
machine_id text not null primary key,
|
| 50 |
+
machine_type text not null
|
| 51 |
+
);
|
| 52 |
+
"""
|
| 53 |
+
INSERT_MACHINES = """\
|
| 54 |
+
insert into machine values(:machine_id, :machine_type);
|
| 55 |
+
"""
|
| 56 |
+
|
| 57 |
+
CREATE_RATINGS = """\
|
| 58 |
+
create table rating(
|
| 59 |
+
person_id text not null,
|
| 60 |
+
machine_id text not null,
|
| 61 |
+
level integer,
|
| 62 |
+
foreign key(person_id) references person(person_id),
|
| 63 |
+
foreign key(machine_id) references machine(machine_id)
|
| 64 |
+
);
|
| 65 |
+
"""
|
| 66 |
+
INSERT_RATINGS = """\
|
| 67 |
+
insert into rating values(:person_id, :machine_id, :level);
|
| 68 |
+
"""
|
| 69 |
+
|
| 70 |
+
def main():
|
| 71 |
+
db_name = sys.argv[1]
|
| 72 |
+
seed = int(sys.argv[2])
|
| 73 |
+
random.seed(seed)
|
| 74 |
+
|
| 75 |
+
persons_counter = itertools.count()
|
| 76 |
+
next(persons_counter)
|
| 77 |
+
persons = gen_persons(NUM_PERSONS, persons_counter)
|
| 78 |
+
|
| 79 |
+
supers = gen_persons(int(NUM_PERSONS / 2), persons_counter)
|
| 80 |
+
for p in persons:
|
| 81 |
+
p["supervisor_id"] = random.choice(supers)["person_id"]
|
| 82 |
+
if len(supers) > 1:
|
| 83 |
+
supers[0]["supervisor_id"] = supers[-1]["person_id"]
|
| 84 |
+
|
| 85 |
+
surveys = gen_surveys(persons + supers[0:int(len(supers)/2)])
|
| 86 |
+
surveys[int(len(surveys)/2)]["start"] = None
|
| 87 |
+
|
| 88 |
+
cnx = sqlite3.connect(db_name)
|
| 89 |
+
cur = cnx.cursor()
|
| 90 |
+
|
| 91 |
+
everyone = persons + supers
|
| 92 |
+
random.shuffle(everyone)
|
| 93 |
+
cur.execute(CREATE_PERSONS)
|
| 94 |
+
cur.executemany(INSERT_PERSONS, everyone)
|
| 95 |
+
|
| 96 |
+
cur.execute(CREATE_SURVEYS)
|
| 97 |
+
cur.executemany(INSERT_SURVEYS, surveys)
|
| 98 |
+
|
| 99 |
+
machines = gen_machines()
|
| 100 |
+
cur.execute(CREATE_MACHINES)
|
| 101 |
+
cur.executemany(INSERT_MACHINES, machines)
|
| 102 |
+
|
| 103 |
+
ratings = gen_ratings(everyone, machines)
|
| 104 |
+
cur.execute(CREATE_RATINGS)
|
| 105 |
+
cur.executemany(INSERT_RATINGS, ratings)
|
| 106 |
+
|
| 107 |
+
cnx.commit()
|
| 108 |
+
cnx.close()
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
def gen_machines():
|
| 112 |
+
adjectives = "hydraulic rotary modular industrial automated".split()
|
| 113 |
+
nouns = "press conveyor generator actuator compressor".split()
|
| 114 |
+
machines = set()
|
| 115 |
+
while len(machines) < NUM_MACHINES:
|
| 116 |
+
candidate = f"{random.choice(adjectives)} {random.choice(nouns)}"
|
| 117 |
+
if candidate not in machines:
|
| 118 |
+
machines.add(candidate)
|
| 119 |
+
counter = itertools.count()
|
| 120 |
+
next(counter)
|
| 121 |
+
return [
|
| 122 |
+
{"machine_id": f"M{next(counter):04d}", "machine_type": m}
|
| 123 |
+
for m in machines
|
| 124 |
+
]
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
def gen_persons(num, counter):
|
| 128 |
+
fake = faker.Faker(LOCALE)
|
| 129 |
+
fake.seed_instance(random.randint(0, 1_000_000))
|
| 130 |
+
return [
|
| 131 |
+
{
|
| 132 |
+
"person_id": f"P{next(counter):03d}",
|
| 133 |
+
"personal": fake.first_name(),
|
| 134 |
+
"family": fake.last_name(),
|
| 135 |
+
"supervisor_id": None,
|
| 136 |
+
}
|
| 137 |
+
for _ in range(num)
|
| 138 |
+
]
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
def gen_ratings(persons, machines):
|
| 142 |
+
temp = {}
|
| 143 |
+
while len(temp) < int(len(persons) * len(machines) / 4):
|
| 144 |
+
p = random.choice(persons)["person_id"]
|
| 145 |
+
m = random.choice(machines)["machine_id"]
|
| 146 |
+
if (p, m) in temp:
|
| 147 |
+
continue
|
| 148 |
+
temp[(p, m)] = random.choice([None, 1, 2, 3])
|
| 149 |
+
return [
|
| 150 |
+
{"person_id": p, "machine_id": m, "level": v}
|
| 151 |
+
for ((p, m), v) in temp.items()
|
| 152 |
+
]
|
| 153 |
+
|
| 154 |
+
def gen_surveys(persons):
|
| 155 |
+
surveys = []
|
| 156 |
+
counter = itertools.count()
|
| 157 |
+
next(counter)
|
| 158 |
+
for person in persons:
|
| 159 |
+
person_id = person["person_id"]
|
| 160 |
+
start = DATE_START
|
| 161 |
+
while start <= DATE_END:
|
| 162 |
+
survey_id = f"S{next(counter):04d}"
|
| 163 |
+
end = start + datetime.timedelta(days=random.randint(1, DATE_DURATION))
|
| 164 |
+
surveys.append({
|
| 165 |
+
"survey_id": survey_id,
|
| 166 |
+
"person_id": person_id,
|
| 167 |
+
"start": start.isoformat(),
|
| 168 |
+
"end": end.isoformat() if end <= DATE_END else None
|
| 169 |
+
})
|
| 170 |
+
start = end + datetime.timedelta(days=random.randint(1, DATE_DURATION))
|
| 171 |
+
return surveys
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
if __name__ == "__main__":
|
| 175 |
+
main()
|
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
"""Extract lesson metadata and notebook lists into a JSON file."""
|
| 3 |
+
|
| 4 |
+
import argparse
|
| 5 |
+
import json
|
| 6 |
+
import re
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
|
| 9 |
+
import frontmatter
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
NOTEBOOK_PATTERN = re.compile(r"^\d{2}_.*\.py$")
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def extract_lessons(root: Path) -> dict:
|
| 16 |
+
lessons = {}
|
| 17 |
+
for index_file in sorted(root.glob("*/index.md")):
|
| 18 |
+
lesson_dir = index_file.parent
|
| 19 |
+
post = frontmatter.load(index_file)
|
| 20 |
+
notebooks = sorted(
|
| 21 |
+
p.name
|
| 22 |
+
for p in lesson_dir.glob("*.py")
|
| 23 |
+
if NOTEBOOK_PATTERN.match(p.name)
|
| 24 |
+
)
|
| 25 |
+
lessons[lesson_dir.name] = {
|
| 26 |
+
**post.metadata,
|
| 27 |
+
"notebooks": notebooks,
|
| 28 |
+
}
|
| 29 |
+
return lessons
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
def main():
|
| 33 |
+
parser = argparse.ArgumentParser(description="Extract lesson metadata to JSON")
|
| 34 |
+
parser.add_argument("--root", required=True, help="Project root directory")
|
| 35 |
+
parser.add_argument("--data", required=True, help="Output JSON file")
|
| 36 |
+
args = parser.parse_args()
|
| 37 |
+
|
| 38 |
+
root = Path(args.root)
|
| 39 |
+
data = Path(args.data)
|
| 40 |
+
data.parent.mkdir(parents=True, exist_ok=True)
|
| 41 |
+
|
| 42 |
+
lessons = extract_lessons(root)
|
| 43 |
+
data.write_text(json.dumps(lessons, indent=2))
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
if __name__ == "__main__":
|
| 47 |
+
main()
|
|
@@ -1,10 +1,9 @@
|
|
| 1 |
-
#!/usr/bin/env
|
| 2 |
|
| 3 |
import os
|
| 4 |
import subprocess
|
| 5 |
import argparse
|
| 6 |
import webbrowser
|
| 7 |
-
import time
|
| 8 |
import sys
|
| 9 |
from pathlib import Path
|
| 10 |
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
|
| 3 |
import os
|
| 4 |
import subprocess
|
| 5 |
import argparse
|
| 6 |
import webbrowser
|
|
|
|
| 7 |
import sys
|
| 8 |
from pathlib import Path
|
| 9 |
|
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
for nb in $*
|
| 3 |
+
do
|
| 4 |
+
cd $(dirname $nb)
|
| 5 |
+
if ! output=$(uv run $(basename $nb) 2>&1); then
|
| 6 |
+
echo "=== $nb ==="
|
| 7 |
+
echo "$output"
|
| 8 |
+
echo
|
| 9 |
+
fi
|
| 10 |
+
cd $OLDPWD
|
| 11 |
+
done
|
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Utility functions for working with marimo notebooks."""
|
| 2 |
+
|
| 3 |
+
import re
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
def get_notebook_title(path: Path) -> str | None:
|
| 8 |
+
"""Return the first H1 Markdown heading in a marimo notebook, or None."""
|
| 9 |
+
text = path.read_text(encoding="utf-8")
|
| 10 |
+
for match in re.finditer(r'mo\.md\(r?f?"""(.*?)"""', text, re.DOTALL):
|
| 11 |
+
for line in match.group(1).splitlines():
|
| 12 |
+
if line.strip().startswith("# "):
|
| 13 |
+
return line.strip()[2:].strip()
|
| 14 |
+
return None
|
|
@@ -1,31 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Readme
|
| 3 |
-
marimo-version: 0.18.4
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
# Learn Daft
|
| 7 |
-
|
| 8 |
-
_🚧 This collection is a work in progress. Please help us add notebooks!_
|
| 9 |
-
|
| 10 |
-
This collection of marimo notebooks is designed to teach you the basics of
|
| 11 |
-
Daft, a distributed dataframe engine that unifies data engineering, analytics & ML/AI workflows.
|
| 12 |
-
|
| 13 |
-
**Help us build this course! ⚒️**
|
| 14 |
-
|
| 15 |
-
We're seeking contributors to help us build these notebooks. Every contributor
|
| 16 |
-
will be acknowledged as an author in this README and in their contributed
|
| 17 |
-
notebooks. Head over to the [tracking
|
| 18 |
-
issue](https://github.com/marimo-team/learn/issues/43) to sign up for a planned
|
| 19 |
-
notebook or propose your own.
|
| 20 |
-
|
| 21 |
-
**Running notebooks.** To run a notebook locally, use
|
| 22 |
-
|
| 23 |
-
```bash
|
| 24 |
-
uvx marimo edit <file_url>
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
You can also open notebooks in our online playground by appending marimo.app/ to a notebook's URL.
|
| 28 |
-
|
| 29 |
-
**Thanks to all our notebook authors!**
|
| 30 |
-
|
| 31 |
-
* [Péter Gyarmati](https://github.com/peter-gy)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Learn Daft
|
| 3 |
+
description: >
|
| 4 |
+
These notebooks introduce Daft, a distributed dataframe engine
|
| 5 |
+
that unifies data engineering, analysis, and ML/AI workflows.
|
| 6 |
+
tracking: 43
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Contributors
|
| 10 |
+
|
| 11 |
+
Thanks to our notebook authors:
|
| 12 |
+
|
| 13 |
+
* [Péter Gyarmati](https://github.com/peter-gy)
|
|
@@ -0,0 +1,345 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
|
| 2 |
+
Adelie,Torgersen,39.1,18.7,181,3750,MALE
|
| 3 |
+
Adelie,Torgersen,39.5,17.4,186,3800,FEMALE
|
| 4 |
+
Adelie,Torgersen,40.3,18,195,3250,FEMALE
|
| 5 |
+
Adelie,Torgersen,,,,,
|
| 6 |
+
Adelie,Torgersen,36.7,19.3,193,3450,FEMALE
|
| 7 |
+
Adelie,Torgersen,39.3,20.6,190,3650,MALE
|
| 8 |
+
Adelie,Torgersen,38.9,17.8,181,3625,FEMALE
|
| 9 |
+
Adelie,Torgersen,39.2,19.6,195,4675,MALE
|
| 10 |
+
Adelie,Torgersen,34.1,18.1,193,3475,
|
| 11 |
+
Adelie,Torgersen,42,20.2,190,4250,
|
| 12 |
+
Adelie,Torgersen,37.8,17.1,186,3300,
|
| 13 |
+
Adelie,Torgersen,37.8,17.3,180,3700,
|
| 14 |
+
Adelie,Torgersen,41.1,17.6,182,3200,FEMALE
|
| 15 |
+
Adelie,Torgersen,38.6,21.2,191,3800,MALE
|
| 16 |
+
Adelie,Torgersen,34.6,21.1,198,4400,MALE
|
| 17 |
+
Adelie,Torgersen,36.6,17.8,185,3700,FEMALE
|
| 18 |
+
Adelie,Torgersen,38.7,19,195,3450,FEMALE
|
| 19 |
+
Adelie,Torgersen,42.5,20.7,197,4500,MALE
|
| 20 |
+
Adelie,Torgersen,34.4,18.4,184,3325,FEMALE
|
| 21 |
+
Adelie,Torgersen,46,21.5,194,4200,MALE
|
| 22 |
+
Adelie,Biscoe,37.8,18.3,174,3400,FEMALE
|
| 23 |
+
Adelie,Biscoe,37.7,18.7,180,3600,MALE
|
| 24 |
+
Adelie,Biscoe,35.9,19.2,189,3800,FEMALE
|
| 25 |
+
Adelie,Biscoe,38.2,18.1,185,3950,MALE
|
| 26 |
+
Adelie,Biscoe,38.8,17.2,180,3800,MALE
|
| 27 |
+
Adelie,Biscoe,35.3,18.9,187,3800,FEMALE
|
| 28 |
+
Adelie,Biscoe,40.6,18.6,183,3550,MALE
|
| 29 |
+
Adelie,Biscoe,40.5,17.9,187,3200,FEMALE
|
| 30 |
+
Adelie,Biscoe,37.9,18.6,172,3150,FEMALE
|
| 31 |
+
Adelie,Biscoe,40.5,18.9,180,3950,MALE
|
| 32 |
+
Adelie,Dream,39.5,16.7,178,3250,FEMALE
|
| 33 |
+
Adelie,Dream,37.2,18.1,178,3900,MALE
|
| 34 |
+
Adelie,Dream,39.5,17.8,188,3300,FEMALE
|
| 35 |
+
Adelie,Dream,40.9,18.9,184,3900,MALE
|
| 36 |
+
Adelie,Dream,36.4,17,195,3325,FEMALE
|
| 37 |
+
Adelie,Dream,39.2,21.1,196,4150,MALE
|
| 38 |
+
Adelie,Dream,38.8,20,190,3950,MALE
|
| 39 |
+
Adelie,Dream,42.2,18.5,180,3550,FEMALE
|
| 40 |
+
Adelie,Dream,37.6,19.3,181,3300,FEMALE
|
| 41 |
+
Adelie,Dream,39.8,19.1,184,4650,MALE
|
| 42 |
+
Adelie,Dream,36.5,18,182,3150,FEMALE
|
| 43 |
+
Adelie,Dream,40.8,18.4,195,3900,MALE
|
| 44 |
+
Adelie,Dream,36,18.5,186,3100,FEMALE
|
| 45 |
+
Adelie,Dream,44.1,19.7,196,4400,MALE
|
| 46 |
+
Adelie,Dream,37,16.9,185,3000,FEMALE
|
| 47 |
+
Adelie,Dream,39.6,18.8,190,4600,MALE
|
| 48 |
+
Adelie,Dream,41.1,19,182,3425,MALE
|
| 49 |
+
Adelie,Dream,37.5,18.9,179,2975,
|
| 50 |
+
Adelie,Dream,36,17.9,190,3450,FEMALE
|
| 51 |
+
Adelie,Dream,42.3,21.2,191,4150,MALE
|
| 52 |
+
Adelie,Biscoe,39.6,17.7,186,3500,FEMALE
|
| 53 |
+
Adelie,Biscoe,40.1,18.9,188,4300,MALE
|
| 54 |
+
Adelie,Biscoe,35,17.9,190,3450,FEMALE
|
| 55 |
+
Adelie,Biscoe,42,19.5,200,4050,MALE
|
| 56 |
+
Adelie,Biscoe,34.5,18.1,187,2900,FEMALE
|
| 57 |
+
Adelie,Biscoe,41.4,18.6,191,3700,MALE
|
| 58 |
+
Adelie,Biscoe,39,17.5,186,3550,FEMALE
|
| 59 |
+
Adelie,Biscoe,40.6,18.8,193,3800,MALE
|
| 60 |
+
Adelie,Biscoe,36.5,16.6,181,2850,FEMALE
|
| 61 |
+
Adelie,Biscoe,37.6,19.1,194,3750,MALE
|
| 62 |
+
Adelie,Biscoe,35.7,16.9,185,3150,FEMALE
|
| 63 |
+
Adelie,Biscoe,41.3,21.1,195,4400,MALE
|
| 64 |
+
Adelie,Biscoe,37.6,17,185,3600,FEMALE
|
| 65 |
+
Adelie,Biscoe,41.1,18.2,192,4050,MALE
|
| 66 |
+
Adelie,Biscoe,36.4,17.1,184,2850,FEMALE
|
| 67 |
+
Adelie,Biscoe,41.6,18,192,3950,MALE
|
| 68 |
+
Adelie,Biscoe,35.5,16.2,195,3350,FEMALE
|
| 69 |
+
Adelie,Biscoe,41.1,19.1,188,4100,MALE
|
| 70 |
+
Adelie,Torgersen,35.9,16.6,190,3050,FEMALE
|
| 71 |
+
Adelie,Torgersen,41.8,19.4,198,4450,MALE
|
| 72 |
+
Adelie,Torgersen,33.5,19,190,3600,FEMALE
|
| 73 |
+
Adelie,Torgersen,39.7,18.4,190,3900,MALE
|
| 74 |
+
Adelie,Torgersen,39.6,17.2,196,3550,FEMALE
|
| 75 |
+
Adelie,Torgersen,45.8,18.9,197,4150,MALE
|
| 76 |
+
Adelie,Torgersen,35.5,17.5,190,3700,FEMALE
|
| 77 |
+
Adelie,Torgersen,42.8,18.5,195,4250,MALE
|
| 78 |
+
Adelie,Torgersen,40.9,16.8,191,3700,FEMALE
|
| 79 |
+
Adelie,Torgersen,37.2,19.4,184,3900,MALE
|
| 80 |
+
Adelie,Torgersen,36.2,16.1,187,3550,FEMALE
|
| 81 |
+
Adelie,Torgersen,42.1,19.1,195,4000,MALE
|
| 82 |
+
Adelie,Torgersen,34.6,17.2,189,3200,FEMALE
|
| 83 |
+
Adelie,Torgersen,42.9,17.6,196,4700,MALE
|
| 84 |
+
Adelie,Torgersen,36.7,18.8,187,3800,FEMALE
|
| 85 |
+
Adelie,Torgersen,35.1,19.4,193,4200,MALE
|
| 86 |
+
Adelie,Dream,37.3,17.8,191,3350,FEMALE
|
| 87 |
+
Adelie,Dream,41.3,20.3,194,3550,MALE
|
| 88 |
+
Adelie,Dream,36.3,19.5,190,3800,MALE
|
| 89 |
+
Adelie,Dream,36.9,18.6,189,3500,FEMALE
|
| 90 |
+
Adelie,Dream,38.3,19.2,189,3950,MALE
|
| 91 |
+
Adelie,Dream,38.9,18.8,190,3600,FEMALE
|
| 92 |
+
Adelie,Dream,35.7,18,202,3550,FEMALE
|
| 93 |
+
Adelie,Dream,41.1,18.1,205,4300,MALE
|
| 94 |
+
Adelie,Dream,34,17.1,185,3400,FEMALE
|
| 95 |
+
Adelie,Dream,39.6,18.1,186,4450,MALE
|
| 96 |
+
Adelie,Dream,36.2,17.3,187,3300,FEMALE
|
| 97 |
+
Adelie,Dream,40.8,18.9,208,4300,MALE
|
| 98 |
+
Adelie,Dream,38.1,18.6,190,3700,FEMALE
|
| 99 |
+
Adelie,Dream,40.3,18.5,196,4350,MALE
|
| 100 |
+
Adelie,Dream,33.1,16.1,178,2900,FEMALE
|
| 101 |
+
Adelie,Dream,43.2,18.5,192,4100,MALE
|
| 102 |
+
Adelie,Biscoe,35,17.9,192,3725,FEMALE
|
| 103 |
+
Adelie,Biscoe,41,20,203,4725,MALE
|
| 104 |
+
Adelie,Biscoe,37.7,16,183,3075,FEMALE
|
| 105 |
+
Adelie,Biscoe,37.8,20,190,4250,MALE
|
| 106 |
+
Adelie,Biscoe,37.9,18.6,193,2925,FEMALE
|
| 107 |
+
Adelie,Biscoe,39.7,18.9,184,3550,MALE
|
| 108 |
+
Adelie,Biscoe,38.6,17.2,199,3750,FEMALE
|
| 109 |
+
Adelie,Biscoe,38.2,20,190,3900,MALE
|
| 110 |
+
Adelie,Biscoe,38.1,17,181,3175,FEMALE
|
| 111 |
+
Adelie,Biscoe,43.2,19,197,4775,MALE
|
| 112 |
+
Adelie,Biscoe,38.1,16.5,198,3825,FEMALE
|
| 113 |
+
Adelie,Biscoe,45.6,20.3,191,4600,MALE
|
| 114 |
+
Adelie,Biscoe,39.7,17.7,193,3200,FEMALE
|
| 115 |
+
Adelie,Biscoe,42.2,19.5,197,4275,MALE
|
| 116 |
+
Adelie,Biscoe,39.6,20.7,191,3900,FEMALE
|
| 117 |
+
Adelie,Biscoe,42.7,18.3,196,4075,MALE
|
| 118 |
+
Adelie,Torgersen,38.6,17,188,2900,FEMALE
|
| 119 |
+
Adelie,Torgersen,37.3,20.5,199,3775,MALE
|
| 120 |
+
Adelie,Torgersen,35.7,17,189,3350,FEMALE
|
| 121 |
+
Adelie,Torgersen,41.1,18.6,189,3325,MALE
|
| 122 |
+
Adelie,Torgersen,36.2,17.2,187,3150,FEMALE
|
| 123 |
+
Adelie,Torgersen,37.7,19.8,198,3500,MALE
|
| 124 |
+
Adelie,Torgersen,40.2,17,176,3450,FEMALE
|
| 125 |
+
Adelie,Torgersen,41.4,18.5,202,3875,MALE
|
| 126 |
+
Adelie,Torgersen,35.2,15.9,186,3050,FEMALE
|
| 127 |
+
Adelie,Torgersen,40.6,19,199,4000,MALE
|
| 128 |
+
Adelie,Torgersen,38.8,17.6,191,3275,FEMALE
|
| 129 |
+
Adelie,Torgersen,41.5,18.3,195,4300,MALE
|
| 130 |
+
Adelie,Torgersen,39,17.1,191,3050,FEMALE
|
| 131 |
+
Adelie,Torgersen,44.1,18,210,4000,MALE
|
| 132 |
+
Adelie,Torgersen,38.5,17.9,190,3325,FEMALE
|
| 133 |
+
Adelie,Torgersen,43.1,19.2,197,3500,MALE
|
| 134 |
+
Adelie,Dream,36.8,18.5,193,3500,FEMALE
|
| 135 |
+
Adelie,Dream,37.5,18.5,199,4475,MALE
|
| 136 |
+
Adelie,Dream,38.1,17.6,187,3425,FEMALE
|
| 137 |
+
Adelie,Dream,41.1,17.5,190,3900,MALE
|
| 138 |
+
Adelie,Dream,35.6,17.5,191,3175,FEMALE
|
| 139 |
+
Adelie,Dream,40.2,20.1,200,3975,MALE
|
| 140 |
+
Adelie,Dream,37,16.5,185,3400,FEMALE
|
| 141 |
+
Adelie,Dream,39.7,17.9,193,4250,MALE
|
| 142 |
+
Adelie,Dream,40.2,17.1,193,3400,FEMALE
|
| 143 |
+
Adelie,Dream,40.6,17.2,187,3475,MALE
|
| 144 |
+
Adelie,Dream,32.1,15.5,188,3050,FEMALE
|
| 145 |
+
Adelie,Dream,40.7,17,190,3725,MALE
|
| 146 |
+
Adelie,Dream,37.3,16.8,192,3000,FEMALE
|
| 147 |
+
Adelie,Dream,39,18.7,185,3650,MALE
|
| 148 |
+
Adelie,Dream,39.2,18.6,190,4250,MALE
|
| 149 |
+
Adelie,Dream,36.6,18.4,184,3475,FEMALE
|
| 150 |
+
Adelie,Dream,36,17.8,195,3450,FEMALE
|
| 151 |
+
Adelie,Dream,37.8,18.1,193,3750,MALE
|
| 152 |
+
Adelie,Dream,36,17.1,187,3700,FEMALE
|
| 153 |
+
Adelie,Dream,41.5,18.5,201,4000,MALE
|
| 154 |
+
Chinstrap,Dream,46.5,17.9,192,3500,FEMALE
|
| 155 |
+
Chinstrap,Dream,50,19.5,196,3900,MALE
|
| 156 |
+
Chinstrap,Dream,51.3,19.2,193,3650,MALE
|
| 157 |
+
Chinstrap,Dream,45.4,18.7,188,3525,FEMALE
|
| 158 |
+
Chinstrap,Dream,52.7,19.8,197,3725,MALE
|
| 159 |
+
Chinstrap,Dream,45.2,17.8,198,3950,FEMALE
|
| 160 |
+
Chinstrap,Dream,46.1,18.2,178,3250,FEMALE
|
| 161 |
+
Chinstrap,Dream,51.3,18.2,197,3750,MALE
|
| 162 |
+
Chinstrap,Dream,46,18.9,195,4150,FEMALE
|
| 163 |
+
Chinstrap,Dream,51.3,19.9,198,3700,MALE
|
| 164 |
+
Chinstrap,Dream,46.6,17.8,193,3800,FEMALE
|
| 165 |
+
Chinstrap,Dream,51.7,20.3,194,3775,MALE
|
| 166 |
+
Chinstrap,Dream,47,17.3,185,3700,FEMALE
|
| 167 |
+
Chinstrap,Dream,52,18.1,201,4050,MALE
|
| 168 |
+
Chinstrap,Dream,45.9,17.1,190,3575,FEMALE
|
| 169 |
+
Chinstrap,Dream,50.5,19.6,201,4050,MALE
|
| 170 |
+
Chinstrap,Dream,50.3,20,197,3300,MALE
|
| 171 |
+
Chinstrap,Dream,58,17.8,181,3700,FEMALE
|
| 172 |
+
Chinstrap,Dream,46.4,18.6,190,3450,FEMALE
|
| 173 |
+
Chinstrap,Dream,49.2,18.2,195,4400,MALE
|
| 174 |
+
Chinstrap,Dream,42.4,17.3,181,3600,FEMALE
|
| 175 |
+
Chinstrap,Dream,48.5,17.5,191,3400,MALE
|
| 176 |
+
Chinstrap,Dream,43.2,16.6,187,2900,FEMALE
|
| 177 |
+
Chinstrap,Dream,50.6,19.4,193,3800,MALE
|
| 178 |
+
Chinstrap,Dream,46.7,17.9,195,3300,FEMALE
|
| 179 |
+
Chinstrap,Dream,52,19,197,4150,MALE
|
| 180 |
+
Chinstrap,Dream,50.5,18.4,200,3400,FEMALE
|
| 181 |
+
Chinstrap,Dream,49.5,19,200,3800,MALE
|
| 182 |
+
Chinstrap,Dream,46.4,17.8,191,3700,FEMALE
|
| 183 |
+
Chinstrap,Dream,52.8,20,205,4550,MALE
|
| 184 |
+
Chinstrap,Dream,40.9,16.6,187,3200,FEMALE
|
| 185 |
+
Chinstrap,Dream,54.2,20.8,201,4300,MALE
|
| 186 |
+
Chinstrap,Dream,42.5,16.7,187,3350,FEMALE
|
| 187 |
+
Chinstrap,Dream,51,18.8,203,4100,MALE
|
| 188 |
+
Chinstrap,Dream,49.7,18.6,195,3600,MALE
|
| 189 |
+
Chinstrap,Dream,47.5,16.8,199,3900,FEMALE
|
| 190 |
+
Chinstrap,Dream,47.6,18.3,195,3850,FEMALE
|
| 191 |
+
Chinstrap,Dream,52,20.7,210,4800,MALE
|
| 192 |
+
Chinstrap,Dream,46.9,16.6,192,2700,FEMALE
|
| 193 |
+
Chinstrap,Dream,53.5,19.9,205,4500,MALE
|
| 194 |
+
Chinstrap,Dream,49,19.5,210,3950,MALE
|
| 195 |
+
Chinstrap,Dream,46.2,17.5,187,3650,FEMALE
|
| 196 |
+
Chinstrap,Dream,50.9,19.1,196,3550,MALE
|
| 197 |
+
Chinstrap,Dream,45.5,17,196,3500,FEMALE
|
| 198 |
+
Chinstrap,Dream,50.9,17.9,196,3675,FEMALE
|
| 199 |
+
Chinstrap,Dream,50.8,18.5,201,4450,MALE
|
| 200 |
+
Chinstrap,Dream,50.1,17.9,190,3400,FEMALE
|
| 201 |
+
Chinstrap,Dream,49,19.6,212,4300,MALE
|
| 202 |
+
Chinstrap,Dream,51.5,18.7,187,3250,MALE
|
| 203 |
+
Chinstrap,Dream,49.8,17.3,198,3675,FEMALE
|
| 204 |
+
Chinstrap,Dream,48.1,16.4,199,3325,FEMALE
|
| 205 |
+
Chinstrap,Dream,51.4,19,201,3950,MALE
|
| 206 |
+
Chinstrap,Dream,45.7,17.3,193,3600,FEMALE
|
| 207 |
+
Chinstrap,Dream,50.7,19.7,203,4050,MALE
|
| 208 |
+
Chinstrap,Dream,42.5,17.3,187,3350,FEMALE
|
| 209 |
+
Chinstrap,Dream,52.2,18.8,197,3450,MALE
|
| 210 |
+
Chinstrap,Dream,45.2,16.6,191,3250,FEMALE
|
| 211 |
+
Chinstrap,Dream,49.3,19.9,203,4050,MALE
|
| 212 |
+
Chinstrap,Dream,50.2,18.8,202,3800,MALE
|
| 213 |
+
Chinstrap,Dream,45.6,19.4,194,3525,FEMALE
|
| 214 |
+
Chinstrap,Dream,51.9,19.5,206,3950,MALE
|
| 215 |
+
Chinstrap,Dream,46.8,16.5,189,3650,FEMALE
|
| 216 |
+
Chinstrap,Dream,45.7,17,195,3650,FEMALE
|
| 217 |
+
Chinstrap,Dream,55.8,19.8,207,4000,MALE
|
| 218 |
+
Chinstrap,Dream,43.5,18.1,202,3400,FEMALE
|
| 219 |
+
Chinstrap,Dream,49.6,18.2,193,3775,MALE
|
| 220 |
+
Chinstrap,Dream,50.8,19,210,4100,MALE
|
| 221 |
+
Chinstrap,Dream,50.2,18.7,198,3775,FEMALE
|
| 222 |
+
Gentoo,Biscoe,46.1,13.2,211,4500,FEMALE
|
| 223 |
+
Gentoo,Biscoe,50,16.3,230,5700,MALE
|
| 224 |
+
Gentoo,Biscoe,48.7,14.1,210,4450,FEMALE
|
| 225 |
+
Gentoo,Biscoe,50,15.2,218,5700,MALE
|
| 226 |
+
Gentoo,Biscoe,47.6,14.5,215,5400,MALE
|
| 227 |
+
Gentoo,Biscoe,46.5,13.5,210,4550,FEMALE
|
| 228 |
+
Gentoo,Biscoe,45.4,14.6,211,4800,FEMALE
|
| 229 |
+
Gentoo,Biscoe,46.7,15.3,219,5200,MALE
|
| 230 |
+
Gentoo,Biscoe,43.3,13.4,209,4400,FEMALE
|
| 231 |
+
Gentoo,Biscoe,46.8,15.4,215,5150,MALE
|
| 232 |
+
Gentoo,Biscoe,40.9,13.7,214,4650,FEMALE
|
| 233 |
+
Gentoo,Biscoe,49,16.1,216,5550,MALE
|
| 234 |
+
Gentoo,Biscoe,45.5,13.7,214,4650,FEMALE
|
| 235 |
+
Gentoo,Biscoe,48.4,14.6,213,5850,MALE
|
| 236 |
+
Gentoo,Biscoe,45.8,14.6,210,4200,FEMALE
|
| 237 |
+
Gentoo,Biscoe,49.3,15.7,217,5850,MALE
|
| 238 |
+
Gentoo,Biscoe,42,13.5,210,4150,FEMALE
|
| 239 |
+
Gentoo,Biscoe,49.2,15.2,221,6300,MALE
|
| 240 |
+
Gentoo,Biscoe,46.2,14.5,209,4800,FEMALE
|
| 241 |
+
Gentoo,Biscoe,48.7,15.1,222,5350,MALE
|
| 242 |
+
Gentoo,Biscoe,50.2,14.3,218,5700,MALE
|
| 243 |
+
Gentoo,Biscoe,45.1,14.5,215,5000,FEMALE
|
| 244 |
+
Gentoo,Biscoe,46.5,14.5,213,4400,FEMALE
|
| 245 |
+
Gentoo,Biscoe,46.3,15.8,215,5050,MALE
|
| 246 |
+
Gentoo,Biscoe,42.9,13.1,215,5000,FEMALE
|
| 247 |
+
Gentoo,Biscoe,46.1,15.1,215,5100,MALE
|
| 248 |
+
Gentoo,Biscoe,44.5,14.3,216,4100,
|
| 249 |
+
Gentoo,Biscoe,47.8,15,215,5650,MALE
|
| 250 |
+
Gentoo,Biscoe,48.2,14.3,210,4600,FEMALE
|
| 251 |
+
Gentoo,Biscoe,50,15.3,220,5550,MALE
|
| 252 |
+
Gentoo,Biscoe,47.3,15.3,222,5250,MALE
|
| 253 |
+
Gentoo,Biscoe,42.8,14.2,209,4700,FEMALE
|
| 254 |
+
Gentoo,Biscoe,45.1,14.5,207,5050,FEMALE
|
| 255 |
+
Gentoo,Biscoe,59.6,17,230,6050,MALE
|
| 256 |
+
Gentoo,Biscoe,49.1,14.8,220,5150,FEMALE
|
| 257 |
+
Gentoo,Biscoe,48.4,16.3,220,5400,MALE
|
| 258 |
+
Gentoo,Biscoe,42.6,13.7,213,4950,FEMALE
|
| 259 |
+
Gentoo,Biscoe,44.4,17.3,219,5250,MALE
|
| 260 |
+
Gentoo,Biscoe,44,13.6,208,4350,FEMALE
|
| 261 |
+
Gentoo,Biscoe,48.7,15.7,208,5350,MALE
|
| 262 |
+
Gentoo,Biscoe,42.7,13.7,208,3950,FEMALE
|
| 263 |
+
Gentoo,Biscoe,49.6,16,225,5700,MALE
|
| 264 |
+
Gentoo,Biscoe,45.3,13.7,210,4300,FEMALE
|
| 265 |
+
Gentoo,Biscoe,49.6,15,216,4750,MALE
|
| 266 |
+
Gentoo,Biscoe,50.5,15.9,222,5550,MALE
|
| 267 |
+
Gentoo,Biscoe,43.6,13.9,217,4900,FEMALE
|
| 268 |
+
Gentoo,Biscoe,45.5,13.9,210,4200,FEMALE
|
| 269 |
+
Gentoo,Biscoe,50.5,15.9,225,5400,MALE
|
| 270 |
+
Gentoo,Biscoe,44.9,13.3,213,5100,FEMALE
|
| 271 |
+
Gentoo,Biscoe,45.2,15.8,215,5300,MALE
|
| 272 |
+
Gentoo,Biscoe,46.6,14.2,210,4850,FEMALE
|
| 273 |
+
Gentoo,Biscoe,48.5,14.1,220,5300,MALE
|
| 274 |
+
Gentoo,Biscoe,45.1,14.4,210,4400,FEMALE
|
| 275 |
+
Gentoo,Biscoe,50.1,15,225,5000,MALE
|
| 276 |
+
Gentoo,Biscoe,46.5,14.4,217,4900,FEMALE
|
| 277 |
+
Gentoo,Biscoe,45,15.4,220,5050,MALE
|
| 278 |
+
Gentoo,Biscoe,43.8,13.9,208,4300,FEMALE
|
| 279 |
+
Gentoo,Biscoe,45.5,15,220,5000,MALE
|
| 280 |
+
Gentoo,Biscoe,43.2,14.5,208,4450,FEMALE
|
| 281 |
+
Gentoo,Biscoe,50.4,15.3,224,5550,MALE
|
| 282 |
+
Gentoo,Biscoe,45.3,13.8,208,4200,FEMALE
|
| 283 |
+
Gentoo,Biscoe,46.2,14.9,221,5300,MALE
|
| 284 |
+
Gentoo,Biscoe,45.7,13.9,214,4400,FEMALE
|
| 285 |
+
Gentoo,Biscoe,54.3,15.7,231,5650,MALE
|
| 286 |
+
Gentoo,Biscoe,45.8,14.2,219,4700,FEMALE
|
| 287 |
+
Gentoo,Biscoe,49.8,16.8,230,5700,MALE
|
| 288 |
+
Gentoo,Biscoe,46.2,14.4,214,4650,
|
| 289 |
+
Gentoo,Biscoe,49.5,16.2,229,5800,MALE
|
| 290 |
+
Gentoo,Biscoe,43.5,14.2,220,4700,FEMALE
|
| 291 |
+
Gentoo,Biscoe,50.7,15,223,5550,MALE
|
| 292 |
+
Gentoo,Biscoe,47.7,15,216,4750,FEMALE
|
| 293 |
+
Gentoo,Biscoe,46.4,15.6,221,5000,MALE
|
| 294 |
+
Gentoo,Biscoe,48.2,15.6,221,5100,MALE
|
| 295 |
+
Gentoo,Biscoe,46.5,14.8,217,5200,FEMALE
|
| 296 |
+
Gentoo,Biscoe,46.4,15,216,4700,FEMALE
|
| 297 |
+
Gentoo,Biscoe,48.6,16,230,5800,MALE
|
| 298 |
+
Gentoo,Biscoe,47.5,14.2,209,4600,FEMALE
|
| 299 |
+
Gentoo,Biscoe,51.1,16.3,220,6000,MALE
|
| 300 |
+
Gentoo,Biscoe,45.2,13.8,215,4750,FEMALE
|
| 301 |
+
Gentoo,Biscoe,45.2,16.4,223,5950,MALE
|
| 302 |
+
Gentoo,Biscoe,49.1,14.5,212,4625,FEMALE
|
| 303 |
+
Gentoo,Biscoe,52.5,15.6,221,5450,MALE
|
| 304 |
+
Gentoo,Biscoe,47.4,14.6,212,4725,FEMALE
|
| 305 |
+
Gentoo,Biscoe,50,15.9,224,5350,MALE
|
| 306 |
+
Gentoo,Biscoe,44.9,13.8,212,4750,FEMALE
|
| 307 |
+
Gentoo,Biscoe,50.8,17.3,228,5600,MALE
|
| 308 |
+
Gentoo,Biscoe,43.4,14.4,218,4600,FEMALE
|
| 309 |
+
Gentoo,Biscoe,51.3,14.2,218,5300,MALE
|
| 310 |
+
Gentoo,Biscoe,47.5,14,212,4875,FEMALE
|
| 311 |
+
Gentoo,Biscoe,52.1,17,230,5550,MALE
|
| 312 |
+
Gentoo,Biscoe,47.5,15,218,4950,FEMALE
|
| 313 |
+
Gentoo,Biscoe,52.2,17.1,228,5400,MALE
|
| 314 |
+
Gentoo,Biscoe,45.5,14.5,212,4750,FEMALE
|
| 315 |
+
Gentoo,Biscoe,49.5,16.1,224,5650,MALE
|
| 316 |
+
Gentoo,Biscoe,44.5,14.7,214,4850,FEMALE
|
| 317 |
+
Gentoo,Biscoe,50.8,15.7,226,5200,MALE
|
| 318 |
+
Gentoo,Biscoe,49.4,15.8,216,4925,MALE
|
| 319 |
+
Gentoo,Biscoe,46.9,14.6,222,4875,FEMALE
|
| 320 |
+
Gentoo,Biscoe,48.4,14.4,203,4625,FEMALE
|
| 321 |
+
Gentoo,Biscoe,51.1,16.5,225,5250,MALE
|
| 322 |
+
Gentoo,Biscoe,48.5,15,219,4850,FEMALE
|
| 323 |
+
Gentoo,Biscoe,55.9,17,228,5600,MALE
|
| 324 |
+
Gentoo,Biscoe,47.2,15.5,215,4975,FEMALE
|
| 325 |
+
Gentoo,Biscoe,49.1,15,228,5500,MALE
|
| 326 |
+
Gentoo,Biscoe,47.3,13.8,216,4725,
|
| 327 |
+
Gentoo,Biscoe,46.8,16.1,215,5500,MALE
|
| 328 |
+
Gentoo,Biscoe,41.7,14.7,210,4700,FEMALE
|
| 329 |
+
Gentoo,Biscoe,53.4,15.8,219,5500,MALE
|
| 330 |
+
Gentoo,Biscoe,43.3,14,208,4575,FEMALE
|
| 331 |
+
Gentoo,Biscoe,48.1,15.1,209,5500,MALE
|
| 332 |
+
Gentoo,Biscoe,50.5,15.2,216,5000,FEMALE
|
| 333 |
+
Gentoo,Biscoe,49.8,15.9,229,5950,MALE
|
| 334 |
+
Gentoo,Biscoe,43.5,15.2,213,4650,FEMALE
|
| 335 |
+
Gentoo,Biscoe,51.5,16.3,230,5500,MALE
|
| 336 |
+
Gentoo,Biscoe,46.2,14.1,217,4375,FEMALE
|
| 337 |
+
Gentoo,Biscoe,55.1,16,230,5850,MALE
|
| 338 |
+
Gentoo,Biscoe,44.5,15.7,217,4875,
|
| 339 |
+
Gentoo,Biscoe,48.8,16.2,222,6000,MALE
|
| 340 |
+
Gentoo,Biscoe,47.2,13.7,214,4925,FEMALE
|
| 341 |
+
Gentoo,Biscoe,,,,,
|
| 342 |
+
Gentoo,Biscoe,46.8,14.3,215,4850,FEMALE
|
| 343 |
+
Gentoo,Biscoe,50.4,15.7,222,5750,MALE
|
| 344 |
+
Gentoo,Biscoe,45.2,14.8,212,5200,FEMALE
|
| 345 |
+
Gentoo,Biscoe,49.9,16.1,213,5400,MALE
|
|
@@ -2,14 +2,13 @@
|
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
-
# "duckdb==1.
|
| 6 |
-
# "
|
| 7 |
-
# "
|
| 8 |
-
# "
|
| 9 |
-
# "
|
| 10 |
-
# "sqlglot==
|
| 11 |
-
# "
|
| 12 |
-
# "statsmodels==0.14.4",
|
| 13 |
# ]
|
| 14 |
# ///
|
| 15 |
|
|
@@ -32,9 +31,7 @@ def _(mo):
|
|
| 32 |
@app.cell(hide_code=True)
|
| 33 |
def _(mo):
|
| 34 |
mo.md(rf"""
|
| 35 |
-
#
|
| 36 |
-
|
| 37 |
-
## What is DuckDB?
|
| 38 |
|
| 39 |
[DuckDB](https://duckdb.org/) is a _high-performance_, in-process, embeddable SQL OLAP (Online Analytical Processing) Database Management System (DBMS) designed for simplicity and speed. It's essentially a fully-featured database that runs directly within your application's process, without needing a separate server. This makes it excellent for complex analytical workloads, offering a robust SQL interface and efficient processing – perfect for learning about databases and data analysis concepts. It's a great alternative to heavier database systems like PostgreSQL or MySQL when you don't need a full-blown server.
|
| 40 |
|
|
|
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
+
# "duckdb==1.4.4",
|
| 6 |
+
# "numpy==2.4.3",
|
| 7 |
+
# "pandas==2.3.2",
|
| 8 |
+
# "plotly[express]==6.3.0",
|
| 9 |
+
# "polars[pyarrow]==1.24.0",
|
| 10 |
+
# "sqlglot==27.0.0",
|
| 11 |
+
# "statsmodels==0.14.5",
|
|
|
|
| 12 |
# ]
|
| 13 |
# ///
|
| 14 |
|
|
|
|
| 31 |
@app.cell(hide_code=True)
|
| 32 |
def _(mo):
|
| 33 |
mo.md(rf"""
|
| 34 |
+
# What is DuckDB?
|
|
|
|
|
|
|
| 35 |
|
| 36 |
[DuckDB](https://duckdb.org/) is a _high-performance_, in-process, embeddable SQL OLAP (Online Analytical Processing) Database Management System (DBMS) designed for simplicity and speed. It's essentially a fully-featured database that runs directly within your application's process, without needing a separate server. This makes it excellent for complex analytical workloads, offering a robust SQL interface and efficient processing – perfect for learning about databases and data analysis concepts. It's a great alternative to heavier database systems like PostgreSQL or MySQL when you don't need a full-blown server.
|
| 37 |
|
|
@@ -2,9 +2,9 @@
|
|
| 2 |
# requires-python = ">=3.10"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
-
# "duckdb==1.
|
| 6 |
-
# "pyarrow==
|
| 7 |
-
# "plotly
|
| 8 |
# "sqlglot==27.0.0",
|
| 9 |
# ]
|
| 10 |
# ///
|
|
|
|
| 2 |
# requires-python = ">=3.10"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
+
# "duckdb==1.4.4",
|
| 6 |
+
# "polars[pyarrow]==1.24.0",
|
| 7 |
+
# "plotly[express]==6.3.0",
|
| 8 |
# "sqlglot==27.0.0",
|
| 9 |
# ]
|
| 10 |
# ///
|
|
@@ -2,9 +2,9 @@
|
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
-
# "duckdb==1.
|
| 6 |
-
# "
|
| 7 |
-
# "
|
| 8 |
# ]
|
| 9 |
# ///
|
| 10 |
|
|
|
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
+
# "duckdb==1.4.4",
|
| 6 |
+
# "polars[pyarrow]==1.24.0",
|
| 7 |
+
# "sqlglot==27.0.0",
|
| 8 |
# ]
|
| 9 |
# ///
|
| 10 |
|
|
@@ -2,13 +2,11 @@
|
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
-
# "
|
| 6 |
-
# "
|
| 7 |
-
# "
|
| 8 |
-
# "
|
| 9 |
# "sqlglot==27.0.0",
|
| 10 |
-
# "psutil==7.0.0",
|
| 11 |
-
# "altair",
|
| 12 |
# ]
|
| 13 |
# ///
|
| 14 |
|
|
@@ -534,15 +532,8 @@ def _(mo):
|
|
| 534 |
|
| 535 |
|
| 536 |
@app.cell
|
| 537 |
-
def _(
|
| 538 |
-
import
|
| 539 |
-
import pyarrow.compute as pc # Add this import
|
| 540 |
-
|
| 541 |
-
# Get current process
|
| 542 |
-
process = psutil.Process(os.getpid())
|
| 543 |
-
|
| 544 |
-
# Measure memory before operations
|
| 545 |
-
memory_before = process.memory_info().rss / 1024 / 1024 # MB
|
| 546 |
|
| 547 |
# Perform multiple Arrow-based operations (zero-copy)
|
| 548 |
latest_start_time = time.time()
|
|
@@ -550,11 +541,9 @@ def _(polars_data, psutil, time):
|
|
| 550 |
# These operations use Arrow's zero-copy capabilities
|
| 551 |
arrow_table = polars_data.to_arrow()
|
| 552 |
arrow_sliced = arrow_table.slice(0, 100000)
|
| 553 |
-
# Use PyArrow compute functions for filtering
|
| 554 |
arrow_filtered = arrow_table.filter(pc.greater(arrow_table['value'], 500000))
|
| 555 |
|
| 556 |
arrow_ops_time = time.time() - latest_start_time
|
| 557 |
-
memory_after_arrow = process.memory_info().rss / 1024 / 1024 # MB
|
| 558 |
|
| 559 |
# Compare with traditional copy-based operations
|
| 560 |
latest_start_time = time.time()
|
|
@@ -565,16 +554,21 @@ def _(polars_data, psutil, time):
|
|
| 565 |
pandas_filtered = pandas_copy[pandas_copy['value'] > 500000].copy()
|
| 566 |
|
| 567 |
copy_ops_time = time.time() - latest_start_time
|
| 568 |
-
memory_after_copy = process.memory_info().rss / 1024 / 1024 # MB
|
| 569 |
|
| 570 |
-
|
| 571 |
-
|
| 572 |
-
|
| 573 |
-
|
| 574 |
-
|
| 575 |
-
|
| 576 |
-
|
| 577 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 578 |
return
|
| 579 |
|
| 580 |
|
|
@@ -608,8 +602,7 @@ def _():
|
|
| 608 |
import pandas as pd
|
| 609 |
import duckdb
|
| 610 |
import sqlglot
|
| 611 |
-
|
| 612 |
-
return duckdb, mo, pa, pd, pl, psutil
|
| 613 |
|
| 614 |
|
| 615 |
if __name__ == "__main__":
|
|
|
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
+
# "altair==6.0.0",
|
| 6 |
+
# "duckdb==1.4.4",
|
| 7 |
+
# "pandas==2.3.2",
|
| 8 |
+
# "polars[pyarrow]==1.24.0",
|
| 9 |
# "sqlglot==27.0.0",
|
|
|
|
|
|
|
| 10 |
# ]
|
| 11 |
# ///
|
| 12 |
|
|
|
|
| 532 |
|
| 533 |
|
| 534 |
@app.cell
|
| 535 |
+
def _(mo, polars_data, time):
|
| 536 |
+
import pyarrow.compute as pc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 537 |
|
| 538 |
# Perform multiple Arrow-based operations (zero-copy)
|
| 539 |
latest_start_time = time.time()
|
|
|
|
| 541 |
# These operations use Arrow's zero-copy capabilities
|
| 542 |
arrow_table = polars_data.to_arrow()
|
| 543 |
arrow_sliced = arrow_table.slice(0, 100000)
|
|
|
|
| 544 |
arrow_filtered = arrow_table.filter(pc.greater(arrow_table['value'], 500000))
|
| 545 |
|
| 546 |
arrow_ops_time = time.time() - latest_start_time
|
|
|
|
| 547 |
|
| 548 |
# Compare with traditional copy-based operations
|
| 549 |
latest_start_time = time.time()
|
|
|
|
| 554 |
pandas_filtered = pandas_copy[pandas_copy['value'] > 500000].copy()
|
| 555 |
|
| 556 |
copy_ops_time = time.time() - latest_start_time
|
|
|
|
| 557 |
|
| 558 |
+
mo.vstack([
|
| 559 |
+
mo.md(f"""
|
| 560 |
+
**Time comparison:**
|
| 561 |
+
|
| 562 |
+
| Method | Time (s) |
|
| 563 |
+
|--------|----------|
|
| 564 |
+
| Arrow operations | {arrow_ops_time:.3f} |
|
| 565 |
+
| Copy operations | {copy_ops_time:.3f} |
|
| 566 |
+
| Speedup | {copy_ops_time/arrow_ops_time:.1f}x |
|
| 567 |
+
|
| 568 |
+
> **Note:** Memory usage statistics are not available in this environment.
|
| 569 |
+
> Arrow's zero-copy design typically uses 20–40% less memory than Pandas copies.
|
| 570 |
+
"""),
|
| 571 |
+
])
|
| 572 |
return
|
| 573 |
|
| 574 |
|
|
|
|
| 602 |
import pandas as pd
|
| 603 |
import duckdb
|
| 604 |
import sqlglot
|
| 605 |
+
return duckdb, mo, pa, pd, pl
|
|
|
|
| 606 |
|
| 607 |
|
| 608 |
if __name__ == "__main__":
|
|
@@ -2,12 +2,11 @@
|
|
| 2 |
# requires-python = ">=3.10"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
-
# "
|
| 6 |
# "plotly==6.0.1",
|
| 7 |
-
# "duckdb==1.3.2",
|
| 8 |
-
# "sqlglot==26.11.1",
|
| 9 |
-
# "pyarrow==19.0.1",
|
| 10 |
# "polars==1.27.1",
|
|
|
|
|
|
|
| 11 |
# ]
|
| 12 |
# ///
|
| 13 |
|
|
|
|
| 2 |
# requires-python = ">=3.10"
|
| 3 |
# dependencies = [
|
| 4 |
# "marimo",
|
| 5 |
+
# "duckdb==1.4.4",
|
| 6 |
# "plotly==6.0.1",
|
|
|
|
|
|
|
|
|
|
| 7 |
# "polars==1.27.1",
|
| 8 |
+
# "pyarrow==19.0.1",
|
| 9 |
+
# "sqlglot==27.0.0",
|
| 10 |
# ]
|
| 11 |
# ///
|
| 12 |
|
|
@@ -1,37 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Readme
|
| 3 |
-
marimo-version: 0.18.4
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
# Learn DuckDB
|
| 7 |
-
|
| 8 |
-
_🚧 This collection is a work in progress. Please help us add notebooks!_
|
| 9 |
-
|
| 10 |
-
This collection of marimo notebooks is designed to teach you the basics of
|
| 11 |
-
DuckDB, a fast in-memory OLAP engine that can interoperate with Dataframes.
|
| 12 |
-
These notebooks also show how marimo gives DuckDB superpowers.
|
| 13 |
-
|
| 14 |
-
**Help us build this course! ⚒️**
|
| 15 |
-
|
| 16 |
-
We're seeking contributors to help us build these notebooks. Every contributor
|
| 17 |
-
will be acknowledged as an author in this README and in their contributed
|
| 18 |
-
notebooks. Head over to the [tracking
|
| 19 |
-
issue](https://github.com/marimo-team/learn/issues/48) to sign up for a planned
|
| 20 |
-
notebook or propose your own.
|
| 21 |
-
|
| 22 |
-
**Running notebooks.** To run a notebook locally, use
|
| 23 |
-
|
| 24 |
-
```bash
|
| 25 |
-
uvx marimo edit <file_url>
|
| 26 |
-
```
|
| 27 |
-
|
| 28 |
-
You can also open notebooks in our online playground by appending marimo.app/ to a notebook's URL.
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
**Authors.**
|
| 32 |
-
|
| 33 |
-
Thanks to all our notebook authors!
|
| 34 |
-
|
| 35 |
-
* [Mustjaab](https://github.com/Mustjaab)
|
| 36 |
-
* [julius383](https://github.com/julius383)
|
| 37 |
-
* [thliang01](https://github.com/thliang01)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Learn DuckDB
|
| 3 |
+
description: >
|
| 4 |
+
These notebooks teach you the basics of DuckDB,
|
| 5 |
+
a fast in-memory database engine that can interoperate
|
| 6 |
+
with dataframes, and show how marimo gives DuckDB superpowers.
|
| 7 |
+
tracking: 48
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Contributors
|
| 11 |
+
|
| 12 |
+
Thanks to our notebook authors:
|
| 13 |
+
|
| 14 |
+
* [Mustjaab](https://github.com/Mustjaab)
|
| 15 |
+
* [julius383](https://github.com/julius383)
|
| 16 |
+
* [thliang01](https://github.com/thliang01)
|
|
@@ -875,7 +875,7 @@ def _(mo):
|
|
| 875 |
|
| 876 |
@app.cell(hide_code=True)
|
| 877 |
def _(mo):
|
| 878 |
-
mo.md("""
|
| 879 |
## Functor laws, again
|
| 880 |
|
| 881 |
Once again there are a few axioms that functors have to obey.
|
|
|
|
| 875 |
|
| 876 |
@app.cell(hide_code=True)
|
| 877 |
def _(mo):
|
| 878 |
+
mo.md(r"""
|
| 879 |
## Functor laws, again
|
| 880 |
|
| 881 |
Once again there are a few axioms that functors have to obey.
|
|
@@ -14,7 +14,7 @@ app = marimo.App(app_title="Applicative programming with effects")
|
|
| 14 |
@app.cell(hide_code=True)
|
| 15 |
def _(mo):
|
| 16 |
mo.md(r"""
|
| 17 |
-
# Applicative
|
| 18 |
|
| 19 |
`Applicative Functor` encapsulates certain sorts of *effectful* computations in a functionally pure way, and encourages an *applicative* programming style.
|
| 20 |
|
|
|
|
| 14 |
@app.cell(hide_code=True)
|
| 15 |
def _(mo):
|
| 16 |
mo.md(r"""
|
| 17 |
+
# Applicative Programming with Effects
|
| 18 |
|
| 19 |
`Applicative Functor` encapsulates certain sorts of *effectful* computations in a functionally pure way, and encourages an *applicative* programming style.
|
| 20 |
|
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Learn Functional Programming
|
| 3 |
+
description: >
|
| 4 |
+
These notebooks introduce powerful ideas from functional programming
|
| 5 |
+
in Python, taking inspiration from Haskell and category theory.
|
| 6 |
+
tracking: 51
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
Using only Python's standard library, these lessons construct
|
| 10 |
+
functional programming concepts from first principles.
|
| 11 |
+
Topics include:
|
| 12 |
+
|
| 13 |
+
- Currying and higher-order functions
|
| 14 |
+
- Functors, Applicatives, and Monads
|
| 15 |
+
- Category theory fundamentals
|
| 16 |
+
|
| 17 |
+
## Contributors
|
| 18 |
+
|
| 19 |
+
Thanks to our notebook authors:
|
| 20 |
+
|
| 21 |
+
- métaboulie
|
| 22 |
+
|
| 23 |
+
and reviewers:
|
| 24 |
+
|
| 25 |
+
- [Srihari Thyagarajan](https://github.com/Haleshot)
|
|
@@ -1,129 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Changelog
|
| 3 |
-
marimo-version: 0.18.4
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
# Changelog of the functional-programming course
|
| 7 |
-
|
| 8 |
-
## 2025-04-16
|
| 9 |
-
|
| 10 |
-
**applicatives.py**
|
| 11 |
-
|
| 12 |
-
- replace `return NotImplementedError` with `raise NotImplementedError`
|
| 13 |
-
|
| 14 |
-
- add `Either` applicative
|
| 15 |
-
- Add `Alternative`
|
| 16 |
-
|
| 17 |
-
## 2025-04-11
|
| 18 |
-
|
| 19 |
-
**functors.py**
|
| 20 |
-
|
| 21 |
-
- add `Bifunctor` section
|
| 22 |
-
|
| 23 |
-
- replace `return NotImplementedError` with `raise NotImplementedError`
|
| 24 |
-
|
| 25 |
-
## 2025-04-08
|
| 26 |
-
|
| 27 |
-
**functors.py**
|
| 28 |
-
|
| 29 |
-
- restructure the notebook
|
| 30 |
-
- replace `f` in the function signatures with `g` to indicate regular functions and
|
| 31 |
-
distinguish from functors
|
| 32 |
-
- move `Maybe` funtor to section `More Functor instances`
|
| 33 |
-
|
| 34 |
-
- add `Either` functor
|
| 35 |
-
|
| 36 |
-
- add `unzip` utility function for functors
|
| 37 |
-
|
| 38 |
-
## 2025-04-07
|
| 39 |
-
|
| 40 |
-
**applicatives.py**
|
| 41 |
-
|
| 42 |
-
- the `apply` method of `Maybe` _Applicative_ should return `None` when `fg` or `fa` is
|
| 43 |
-
`None`
|
| 44 |
-
|
| 45 |
-
- add `sequenceL` as a classmethod for `Applicative` and add examples for `Wrapper`,
|
| 46 |
-
`Maybe`, `List`
|
| 47 |
-
- add description for utility functions of `Applicative`
|
| 48 |
-
|
| 49 |
-
- refine the implementation of `IO` _Applicative_
|
| 50 |
-
- reimplement `get_chars` with `IO.sequenceL`
|
| 51 |
-
|
| 52 |
-
- add an example to show that `ListMonoidal` is equivalent to `List` _Applicative_
|
| 53 |
-
|
| 54 |
-
## 2025-04-06
|
| 55 |
-
|
| 56 |
-
**applicatives.py**
|
| 57 |
-
|
| 58 |
-
- remove `sequenceL` from `Applicative` because it should be a classmethod but can't be
|
| 59 |
-
generically implemented
|
| 60 |
-
|
| 61 |
-
## 2025-04-02
|
| 62 |
-
|
| 63 |
-
**functors.py**
|
| 64 |
-
|
| 65 |
-
- Migrate to `python3.13`
|
| 66 |
-
|
| 67 |
-
- Replace all occurrences of
|
| 68 |
-
|
| 69 |
-
```python
|
| 70 |
-
class Functor(Generic[A])
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
with
|
| 74 |
-
|
| 75 |
-
```python
|
| 76 |
-
class Functor[A]
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
for conciseness
|
| 80 |
-
|
| 81 |
-
- Use `fa` in function signatures instead of `a` when `fa` is a _Functor_
|
| 82 |
-
|
| 83 |
-
**applicatives.py**
|
| 84 |
-
|
| 85 |
-
- `0.1.0` version of notebook `06_applicatives.py`
|
| 86 |
-
|
| 87 |
-
## 2025-03-16
|
| 88 |
-
|
| 89 |
-
**functors.py**
|
| 90 |
-
|
| 91 |
-
- Use uppercased letters for `Generic` types, e.g. `A = TypeVar("A")`
|
| 92 |
-
- Refactor the `Functor` class, changing `fmap` and utility methods to `classmethod`
|
| 93 |
-
|
| 94 |
-
For example:
|
| 95 |
-
|
| 96 |
-
```python
|
| 97 |
-
@dataclass
|
| 98 |
-
class Wrapper(Functor, Generic[A]):
|
| 99 |
-
value: A
|
| 100 |
-
|
| 101 |
-
@classmethod
|
| 102 |
-
def fmap(cls, f: Callable[[A], B], a: "Wrapper[A]") -> "Wrapper[B]":
|
| 103 |
-
return Wrapper(f(a.value))
|
| 104 |
-
|
| 105 |
-
>>> Wrapper.fmap(lambda x: x + 1, wrapper)
|
| 106 |
-
Wrapper(value=2)
|
| 107 |
-
```
|
| 108 |
-
|
| 109 |
-
- Move the `check_functor_law` method from `Functor` class to a standard function
|
| 110 |
-
|
| 111 |
-
- Rename `ListWrapper` to `List` for simplicity
|
| 112 |
-
- Remove the `Just` class
|
| 113 |
-
|
| 114 |
-
- Rewrite proofs
|
| 115 |
-
|
| 116 |
-
## 2025-03-13
|
| 117 |
-
|
| 118 |
-
**functors.py**
|
| 119 |
-
|
| 120 |
-
- `0.1.0` version of notebook `05_functors`
|
| 121 |
-
|
| 122 |
-
Thank [Akshay](https://github.com/akshayka) and [Haleshot](https://github.com/Haleshot)
|
| 123 |
-
for reviewing
|
| 124 |
-
|
| 125 |
-
## 2025-03-11
|
| 126 |
-
|
| 127 |
-
**functors.py**
|
| 128 |
-
|
| 129 |
-
- Demo version of notebook `05_functors.py`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,77 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Readme
|
| 3 |
-
marimo-version: 0.18.4
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
# Learn Functional Programming
|
| 7 |
-
|
| 8 |
-
_🚧 This collection is a [work in progress](https://github.com/marimo-team/learn/issues/51)._
|
| 9 |
-
|
| 10 |
-
This series of marimo notebooks introduces the powerful paradigm of functional
|
| 11 |
-
programming through Python. Taking inspiration from Haskell and Category
|
| 12 |
-
Theory, we'll build a strong foundation in FP concepts that can transform how
|
| 13 |
-
you approach software development.
|
| 14 |
-
|
| 15 |
-
## What You'll Learn
|
| 16 |
-
|
| 17 |
-
**Using only Python's standard library**, we'll construct functional
|
| 18 |
-
programming concepts from first principles.
|
| 19 |
-
|
| 20 |
-
Topics include:
|
| 21 |
-
|
| 22 |
-
+ Currying and higher-order functions
|
| 23 |
-
+ Functors, Applicatives, and Monads
|
| 24 |
-
+ Category theory fundamentals
|
| 25 |
-
|
| 26 |
-
## Running Notebooks
|
| 27 |
-
|
| 28 |
-
### Locally
|
| 29 |
-
|
| 30 |
-
To run a notebook locally, use
|
| 31 |
-
|
| 32 |
-
```bash
|
| 33 |
-
uvx marimo edit <URL>
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
For example, run the `Functor` tutorial with
|
| 37 |
-
|
| 38 |
-
```bash
|
| 39 |
-
uvx marimo edit https://github.com/marimo-team/learn/blob/main/functional_programming/05_functors.py
|
| 40 |
-
```
|
| 41 |
-
|
| 42 |
-
### On Our Online Playground
|
| 43 |
-
|
| 44 |
-
You can also open notebooks in our online playground by appending `marimo.app/` to a notebook's URL like:
|
| 45 |
-
|
| 46 |
-
https://marimo.app/https://github.com/marimo-team/learn/blob/main/functional_programming/05_functors.py
|
| 47 |
-
|
| 48 |
-
### On Our Landing Page
|
| 49 |
-
|
| 50 |
-
Open the notebooks in our landing page page [here](https://marimo-team.github.io/learn/functional_programming/05_functors.html)
|
| 51 |
-
|
| 52 |
-
## Collaboration
|
| 53 |
-
|
| 54 |
-
If you're interested in collaborating or have questions, please reach out to me
|
| 55 |
-
on Discord (@eugene.hs).
|
| 56 |
-
|
| 57 |
-
## Description of notebooks
|
| 58 |
-
|
| 59 |
-
Check [here](https://github.com/marimo-team/learn/issues/51) for current series
|
| 60 |
-
structure.
|
| 61 |
-
|
| 62 |
-
| Notebook | Title | Key Concepts | Prerequisites |
|
| 63 |
-
|----------|-------|--------------|---------------|
|
| 64 |
-
| [05. Functors](https://github.com/marimo-team/learn/blob/main/functional_programming/05_functors.py) | Category Theory and Functors | Category Theory, Functor, fmap, Bifunctor | Basic Python, Functions |
|
| 65 |
-
| [06. Applicatives](https://github.com/marimo-team/learn/blob/main/functional_programming/06_applicatives.py) | Applicative programming with effects | Applicative Functor, pure, apply, Effectful programming, Alternative | Functors |
|
| 66 |
-
|
| 67 |
-
**Authors.**
|
| 68 |
-
|
| 69 |
-
Thanks to all our notebook authors!
|
| 70 |
-
|
| 71 |
-
- [métaboulie](https://github.com/metaboulie)
|
| 72 |
-
|
| 73 |
-
**Reviewers.**
|
| 74 |
-
|
| 75 |
-
Thanks to all our notebook reviews!
|
| 76 |
-
|
| 77 |
-
- [Haleshot](https://github.com/Haleshot)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,9 +1,9 @@
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
-
# "cvxpy
|
| 5 |
# "marimo",
|
| 6 |
-
# "numpy==2.
|
| 7 |
# ]
|
| 8 |
# ///
|
| 9 |
|
|
@@ -22,7 +22,7 @@ def _():
|
|
| 22 |
@app.cell(hide_code=True)
|
| 23 |
def _(mo):
|
| 24 |
mo.md(r"""
|
| 25 |
-
# Least
|
| 26 |
|
| 27 |
In a least-squares problem, we have measurements $A \in \mathcal{R}^{m \times
|
| 28 |
n}$ (i.e., $m$ rows and $n$ columns) and $b \in \mathcal{R}^m$. We seek a vector
|
|
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.11"
|
| 3 |
# dependencies = [
|
| 4 |
+
# "cvxpy-base",
|
| 5 |
# "marimo",
|
| 6 |
+
# "numpy==2.4.3",
|
| 7 |
# ]
|
| 8 |
# ///
|
| 9 |
|
|
|
|
| 22 |
@app.cell(hide_code=True)
|
| 23 |
def _(mo):
|
| 24 |
mo.md(r"""
|
| 25 |
+
# Least Squares
|
| 26 |
|
| 27 |
In a least-squares problem, we have measurements $A \in \mathcal{R}^{m \times
|
| 28 |
n}$ (i.e., $m$ rows and $n$ columns) and $b \in \mathcal{R}^m$. We seek a vector
|
|
@@ -1,11 +1,11 @@
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
-
# "cvxpy
|
| 5 |
# "marimo",
|
| 6 |
-
# "matplotlib==3.10.
|
| 7 |
-
# "numpy==2.
|
| 8 |
-
# "wigglystuff==0.
|
| 9 |
# ]
|
| 10 |
# ///
|
| 11 |
|
|
@@ -24,7 +24,7 @@ def _():
|
|
| 24 |
@app.cell(hide_code=True)
|
| 25 |
def _(mo):
|
| 26 |
mo.md(r"""
|
| 27 |
-
# Linear
|
| 28 |
|
| 29 |
A linear program is an optimization problem with a linear objective and affine
|
| 30 |
inequality constraints. A common standard form is the following:
|
|
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
+
# "cvxpy-base",
|
| 5 |
# "marimo",
|
| 6 |
+
# "matplotlib==3.10.8",
|
| 7 |
+
# "numpy==2.4.3",
|
| 8 |
+
# "wigglystuff==0.2.37",
|
| 9 |
# ]
|
| 10 |
# ///
|
| 11 |
|
|
|
|
| 24 |
@app.cell(hide_code=True)
|
| 25 |
def _(mo):
|
| 26 |
mo.md(r"""
|
| 27 |
+
# Linear Program
|
| 28 |
|
| 29 |
A linear program is an optimization problem with a linear objective and affine
|
| 30 |
inequality constraints. A common standard form is the following:
|
|
@@ -1,7 +1,11 @@
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
|
|
|
| 4 |
# "marimo",
|
|
|
|
|
|
|
|
|
|
| 5 |
# ]
|
| 6 |
# ///
|
| 7 |
import marimo
|
|
@@ -19,7 +23,7 @@ def _():
|
|
| 19 |
@app.cell(hide_code=True)
|
| 20 |
def _(mo):
|
| 21 |
mo.md(r"""
|
| 22 |
-
# Minimal
|
| 23 |
|
| 24 |
This notebook includes an application of linear programming to controlling a
|
| 25 |
physical system, adapted from [Convex
|
|
@@ -128,14 +132,14 @@ def _():
|
|
| 128 |
|
| 129 |
|
| 130 |
@app.cell
|
| 131 |
-
def _(A, T, b, cp, mo, n, x0, xdes):
|
| 132 |
X, u = cp.Variable(shape=(n, T + 1)), cp.Variable(shape=(1, T))
|
| 133 |
|
| 134 |
objective = cp.sum(cp.maximum(cp.abs(u), 2 * cp.abs(u) - 1))
|
| 135 |
constraints = [
|
| 136 |
X[:, 1:] == A @ X[:, :-1] + b @ u,
|
| 137 |
-
X[:, 0] == x0,
|
| 138 |
-
X[:, -1] == xdes,
|
| 139 |
]
|
| 140 |
|
| 141 |
fuel_used = cp.Problem(cp.Minimize(objective), constraints).solve()
|
|
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
+
# "cvxpy-base",
|
| 5 |
# "marimo",
|
| 6 |
+
# "matplotlib==3.10.8",
|
| 7 |
+
# "numpy==2.4.3",
|
| 8 |
+
# "wigglystuff==0.2.37",
|
| 9 |
# ]
|
| 10 |
# ///
|
| 11 |
import marimo
|
|
|
|
| 23 |
@app.cell(hide_code=True)
|
| 24 |
def _(mo):
|
| 25 |
mo.md(r"""
|
| 26 |
+
# Minimal Fuel Optimal Control
|
| 27 |
|
| 28 |
This notebook includes an application of linear programming to controlling a
|
| 29 |
physical system, adapted from [Convex
|
|
|
|
| 132 |
|
| 133 |
|
| 134 |
@app.cell
|
| 135 |
+
def _(A, T, b, cp, mo, n, np, x0, xdes):
|
| 136 |
X, u = cp.Variable(shape=(n, T + 1)), cp.Variable(shape=(1, T))
|
| 137 |
|
| 138 |
objective = cp.sum(cp.maximum(cp.abs(u), 2 * cp.abs(u) - 1))
|
| 139 |
constraints = [
|
| 140 |
X[:, 1:] == A @ X[:, :-1] + b @ u,
|
| 141 |
+
X[:, 0] == np.array(x0).flatten(),
|
| 142 |
+
X[:, -1] == np.array(xdes).flatten(),
|
| 143 |
]
|
| 144 |
|
| 145 |
fuel_used = cp.Problem(cp.Minimize(objective), constraints).solve()
|
|
@@ -1,11 +1,11 @@
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
-
# "cvxpy
|
| 5 |
# "marimo",
|
| 6 |
-
# "matplotlib==3.10.
|
| 7 |
-
# "numpy==2.
|
| 8 |
-
# "wigglystuff==0.
|
| 9 |
# ]
|
| 10 |
# ///
|
| 11 |
|
|
@@ -24,7 +24,7 @@ def _():
|
|
| 24 |
@app.cell(hide_code=True)
|
| 25 |
def _(mo):
|
| 26 |
mo.md(r"""
|
| 27 |
-
# Quadratic
|
| 28 |
|
| 29 |
A quadratic program is an optimization problem with a quadratic objective and
|
| 30 |
affine equality and inequality constraints. A common standard form is the
|
|
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
+
# "cvxpy-base",
|
| 5 |
# "marimo",
|
| 6 |
+
# "matplotlib==3.10.8",
|
| 7 |
+
# "numpy==2.4.3",
|
| 8 |
+
# "wigglystuff==0.2.37",
|
| 9 |
# ]
|
| 10 |
# ///
|
| 11 |
|
|
|
|
| 24 |
@app.cell(hide_code=True)
|
| 25 |
def _(mo):
|
| 26 |
mo.md(r"""
|
| 27 |
+
# Quadratic Program
|
| 28 |
|
| 29 |
A quadratic program is an optimization problem with a quadratic objective and
|
| 30 |
affine equality and inequality constraints. A common standard form is the
|
|
@@ -1,12 +1,12 @@
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
-
# "cvxpy
|
| 5 |
# "marimo",
|
| 6 |
-
# "matplotlib==3.10.
|
| 7 |
-
# "numpy==2.
|
| 8 |
-
# "scipy==1.
|
| 9 |
-
# "wigglystuff==0.
|
| 10 |
# ]
|
| 11 |
# ///
|
| 12 |
|
|
@@ -25,7 +25,7 @@ def _():
|
|
| 25 |
@app.cell(hide_code=True)
|
| 26 |
def _(mo):
|
| 27 |
mo.md(r"""
|
| 28 |
-
# Portfolio
|
| 29 |
""")
|
| 30 |
return
|
| 31 |
|
|
@@ -145,7 +145,7 @@ def _(mo, np):
|
|
| 145 |
def _(mu_widget, np):
|
| 146 |
np.random.seed(1)
|
| 147 |
n = 10
|
| 148 |
-
mu = np.array(mu_widget.matrix)
|
| 149 |
Sigma = np.random.randn(n, n)
|
| 150 |
Sigma = Sigma.T.dot(Sigma)
|
| 151 |
return Sigma, mu, n
|
|
@@ -153,7 +153,7 @@ def _(mu_widget, np):
|
|
| 153 |
|
| 154 |
@app.cell(hide_code=True)
|
| 155 |
def _(mo):
|
| 156 |
-
mo.md("""
|
| 157 |
Next, we solve the problem for 100 different values of $\gamma$
|
| 158 |
""")
|
| 159 |
return
|
|
@@ -187,7 +187,7 @@ def _(cp, gamma, np, prob, ret, risk):
|
|
| 187 |
|
| 188 |
@app.cell(hide_code=True)
|
| 189 |
def _(mo):
|
| 190 |
-
mo.md("""
|
| 191 |
Plotted below are the risk return tradeoffs for two values of $\gamma$ (blue squares), and the risk return tradeoffs for investing fully in each asset (red circles)
|
| 192 |
""")
|
| 193 |
return
|
|
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
+
# "cvxpy-base",
|
| 5 |
# "marimo",
|
| 6 |
+
# "matplotlib==3.10.8",
|
| 7 |
+
# "numpy==2.4.3",
|
| 8 |
+
# "scipy==1.17.1",
|
| 9 |
+
# "wigglystuff==0.2.37",
|
| 10 |
# ]
|
| 11 |
# ///
|
| 12 |
|
|
|
|
| 25 |
@app.cell(hide_code=True)
|
| 26 |
def _(mo):
|
| 27 |
mo.md(r"""
|
| 28 |
+
# Portfolio Optimization
|
| 29 |
""")
|
| 30 |
return
|
| 31 |
|
|
|
|
| 145 |
def _(mu_widget, np):
|
| 146 |
np.random.seed(1)
|
| 147 |
n = 10
|
| 148 |
+
mu = np.array(mu_widget.matrix).flatten()
|
| 149 |
Sigma = np.random.randn(n, n)
|
| 150 |
Sigma = Sigma.T.dot(Sigma)
|
| 151 |
return Sigma, mu, n
|
|
|
|
| 153 |
|
| 154 |
@app.cell(hide_code=True)
|
| 155 |
def _(mo):
|
| 156 |
+
mo.md(r"""
|
| 157 |
Next, we solve the problem for 100 different values of $\gamma$
|
| 158 |
""")
|
| 159 |
return
|
|
|
|
| 187 |
|
| 188 |
@app.cell(hide_code=True)
|
| 189 |
def _(mo):
|
| 190 |
+
mo.md(r"""
|
| 191 |
Plotted below are the risk return tradeoffs for two values of $\gamma$ (blue squares), and the risk return tradeoffs for investing fully in each asset (red circles)
|
| 192 |
""")
|
| 193 |
return
|
|
@@ -1,9 +1,9 @@
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
-
# "cvxpy
|
| 5 |
# "marimo",
|
| 6 |
-
# "numpy==2.
|
| 7 |
# ]
|
| 8 |
# ///
|
| 9 |
|
|
@@ -22,7 +22,7 @@ def _():
|
|
| 22 |
@app.cell(hide_code=True)
|
| 23 |
def _(mo):
|
| 24 |
mo.md(r"""
|
| 25 |
-
# Convex
|
| 26 |
|
| 27 |
In the previous tutorials, we learned about least squares, linear programming,
|
| 28 |
and quadratic programming, and saw applications of each. We also learned that these problem
|
|
|
|
| 1 |
# /// script
|
| 2 |
# requires-python = ">=3.13"
|
| 3 |
# dependencies = [
|
| 4 |
+
# "cvxpy-base",
|
| 5 |
# "marimo",
|
| 6 |
+
# "numpy==2.4.3",
|
| 7 |
# ]
|
| 8 |
# ///
|
| 9 |
|
|
|
|
| 22 |
@app.cell(hide_code=True)
|
| 23 |
def _(mo):
|
| 24 |
mo.md(r"""
|
| 25 |
+
# Convex Optimization
|
| 26 |
|
| 27 |
In the previous tutorials, we learned about least squares, linear programming,
|
| 28 |
and quadratic programming, and saw applications of each. We also learned that these problem
|