Spaces:

marimo-team
/

marimo-learn

Runtime error

marimo-learn / queueing /08_inspectors_paradox.py

Greg Wilson

feat: hiding a few cells

8eccedb 12 days ago

10.8 kB

	# /// script
	# requires-python = ">=3.13"
	# dependencies = [
	# "altair",
	# "asimpy",
	# "marimo",
	# "polars==1.24.0",
	# ]
	# ///

	import marimo

	__generated_with = "0.20.4"
	app = marimo.App(width="medium")


	@app.cell(hide_code=True)
	def _():
	import marimo as mo
	import random
	import statistics

	import altair as alt
	import polars as pl

	from asimpy import Environment, Process

	return Environment, Process, alt, mo, pl, random, statistics


	@app.cell(hide_code=True)
	def _(mo):
	mo.md(r"""
	# The Inspector's Paradox

	## Why the Bus Is Always Late

	Buses arrive at a stop with some average headway (gap between buses) of $\mu$ minutes. A passenger arrives at a uniformly random time and waits for the next bus. How long do they wait? The naive answer is $\mu / 2$: on average you land in the middle of a gap. The correct answer is almost always longer—sometimes much longer.

	The expected wait is not $\mu/2$ but:

	$$E[\text{wait}] = \frac{\mu}{2} + \frac{\sigma^2}{2\mu}$$

	where $\sigma^2 = \text{Var}[\text{headway}]$. The second term is always non-negative, so higher variance always means longer expected waits, even when the mean headway is unchanged.

	### Three Bus Schedules with Mean Headway $\mu = 10$

	\| Schedule \| $\sigma^2$ \| Predicted wait \| Naive wait \|
	\|-------------\|-----------\|----------------\|-----------\|
	\| Regular \| 0 \| 5.0 \| 5.0 \|
	\| Exponential \| 100 \| 10.0 \| 5.0 \|
	\| Clustered \| 64 \| 8.2 \| 5.0 \|

	For exponentially distributed headways, $\sigma^2 = \mu^2$, so:

	$$E[\text{wait}] = \frac{\mu}{2} + \frac{\mu^2}{2\mu} = \mu$$

	A passenger waits on average for an entire mean headway — twice the naive expectation.

	## Why This Happens: Length-Biased Sampling

	A passenger arriving at a random time is more likely to land inside a long gap than a short one, because long gaps occupy more time on the clock. This is called length-biased sampling. The interval containing your arrival is not a random headway: it is drawn from the length-biased distribution with density:

	$$f^*(h) = \frac{h \cdot f(h)}{\mu}$$

	The mean of this biased distribution is $\mu + \sigma^2/\mu$, and you arrive uniformly within it, giving expected wait $(\mu + \sigma^2/\mu)/2$.

	The same phenomenon explains why the average class size experienced by a student exceeds the average class size reported by the university (large classes have more students to report them).

	## Why "Inspector's Paradox"?

	The name comes from quality control, where an inspector arrives at a random time to sample a production process and systematically encounters longer-than-average intervals. The paradox is that a random observer is more likely to land inside a long gap than a short one, so their experienced mean interval exceeds the true mean interval. It feels paradoxical because you'd expect a random arrival to see the average gap, but length-biased sampling guarantees they see worse-than-average gaps whenever there's any variance at all.
	""")
	return


	@app.cell(hide_code=True)
	def _(mo):
	mo.md(r"""
	## Implementation

	A `BusService` process generates buses under three headway distributions (regular, exponential, clustered bimodal) and records their arrival times. After the simulation, passenger wait times are estimated by sampling $N$ uniformly random arrival times and finding the next bus for each, without needing explicit `Passenger` processes.
	""")
	return


	@app.cell(hide_code=True)
	def _(mo):
	sim_time_slider = mo.ui.slider(
	start=0,
	stop=100_000,
	step=1_000,
	value=20_000,
	label="Simulation time",
	)

	mean_headway_slider = mo.ui.slider(
	start=5.0,
	stop=30.0,
	step=1.0,
	value=10.0,
	label="Mean headway",
	)

	seed_input = mo.ui.number(
	value=192,
	step=1,
	label="Random seed",
	)

	run_button = mo.ui.run_button(label="Run simulation")

	mo.vstack([
	sim_time_slider,
	mean_headway_slider,
	seed_input,
	run_button,
	])
	return mean_headway_slider, seed_input, sim_time_slider


	@app.cell
	def _(mean_headway_slider, seed_input, sim_time_slider):
	SIM_TIME = int(sim_time_slider.value)
	MEAN_HEADWAY = float(mean_headway_slider.value)
	SEED = int(seed_input.value)
	N_PASSENGERS = 20_000
	return MEAN_HEADWAY, N_PASSENGERS, SEED, SIM_TIME


	@app.cell
	def _(MEAN_HEADWAY, Process, random):
	class BusService(Process):
	def init(self, mode, bus_arrivals):
	self.mode = mode
	self.bus_arrivals = bus_arrivals

	async def run(self):
	while True:
	if self.mode == "regular":
	headway = MEAN_HEADWAY
	elif self.mode == "exponential":
	headway = random.expovariate(1.0 / MEAN_HEADWAY)
	elif self.mode == "clustered":
	headway = MEAN_HEADWAY * 0.2 if random.random() < 0.5 else MEAN_HEADWAY * 1.8
	else:
	raise ValueError(f"Unknown mode: {self.mode}")
	await self.timeout(headway)
	self.bus_arrivals.append(self.now)

	return (BusService,)


	@app.cell
	def _(BusService, Environment, SIM_TIME):
	def collect_buses(mode):
	bus_arrivals = []
	env = Environment()
	BusService(env, mode, bus_arrivals)
	env.run(until=SIM_TIME)
	return bus_arrivals

	return (collect_buses,)


	@app.cell
	def _(N_PASSENGERS, random, statistics):
	def expected_wait(bus_arrivals, n=N_PASSENGERS):
	max_t = bus_arrivals[-1]
	waits = []
	for _ in range(n):
	t = random.uniform(0.0, max_t * 0.95)
	for b in bus_arrivals:
	if b > t:
	waits.append(b - t)
	break
	return statistics.mean(waits) if waits else 0.0

	return (expected_wait,)


	@app.cell
	def _(statistics):
	def headway_variance(bus_arrivals):
	headways = [b - a for a, b in zip(bus_arrivals, bus_arrivals[1:])]
	return statistics.variance(headways) if len(headways) > 1 else 0.0

	return (headway_variance,)


	@app.cell(hide_code=True)
	def _(MEAN_HEADWAY, mo):
	mu = MEAN_HEADWAY
	naive = MEAN_HEADWAY / 2.0
	var_exp = mu ** 2
	var_clustered = 0.5 * (mu * 0.2 - mu) ** 2 + 0.5 * (mu * 1.8 - mu) ** 2
	mo.md(f"""
	## Results

	Mean headway: {MEAN_HEADWAY} → naive expected wait = {naive:.1f}

	- Exponential (Var ≈ {var_exp:.1f}): predicted = {mu / 2 + var_exp / (2 * mu):.1f} (= full mean headway!)
	- Clustered (Var ≈ {var_clustered:.1f}): predicted = {mu / 2 + var_clustered / (2 * mu):.1f}
	""")
	return (naive,)


	@app.cell
	def _(MEAN_HEADWAY, collect_buses, expected_wait, headway_variance, naive, pl):
	def run_models():
	rows = []
	for mode in ["regular", "exponential", "clustered"]:
	buses = collect_buses(mode)
	var_h = headway_variance(buses)
	mean_w = expected_wait(buses)
	rows.append({
	"mode": mode,
	"var_headway": round(var_h, 4),
	"mean_wait": round(mean_w, 4),
	"predicted": round(MEAN_HEADWAY / 2.0 + var_h / (2.0 * MEAN_HEADWAY), 4),
	"ratio": round(mean_w / naive, 4),
	})
	return pl.DataFrame(rows)

	return (run_models,)


	@app.cell
	def _(SEED, random, run_models):
	random.seed(SEED)
	df = run_models()
	df
	return (df,)


	@app.cell
	def _(alt, df, naive, pl):
	chart = (
	alt.Chart(df)
	.mark_bar()
	.encode(
	x=alt.X("mode:N", title="Bus schedule type"),
	y=alt.Y("mean_wait:Q", title="Mean passenger wait"),
	color=alt.Color("mode:N", legend=None),
	tooltip=["mode:N", "mean_wait:Q", "ratio:Q"],
	)
	.properties(title="Inspector's Paradox: Mean Wait by Schedule Type")
	)
	naive_line = (
	alt.Chart(pl.DataFrame({"naive": [naive]}))
	.mark_rule(strokeDash=[4, 4], color="gray")
	.encode(y="naive:Q")
	)
	(chart + naive_line)
	return


	@app.cell(hide_code=True)
	def _(mo):
	mo.md(r"""
	## Understanding the Math

	### Length-biased sampling

	Suppose buses run on an irregular schedule where gaps between buses are either 2 minutes or 18 minutes, each with probability 1/2. The mean gap is $\mu = (2 + 18)/2 = 10$ minutes. Now ask: if you arrive at a completely random moment, which gap are you most likely to land inside?

	A 2-minute gap occupies only 2 minutes on the clock, but an 18-minute gap occupies 18. Out of every 20 minutes of clock time on average, 2 minutes belong to a short gap and 18 to a long one. So a random arrival lands in a short gap with probability $2/(2+18) = 1/10$ and in a long gap with probability $18/20 = 9/10$. The expected gap length you experience is:

	$$E[\text{gap experienced}] = \frac{1}{10} \cdot 2 + \frac{9}{10} \cdot 18 = 0.2 + 16.2 = 16.4 \text{ minutes}$$

	That is far above the mean gap of 10 minutes. You are disproportionately likely to land inside a long gap simply because it takes up more time.

	### The wait formula

	Once you are inside a gap, you arrive uniformly within it, so on average you land in the middle. Your expected wait is half the gap length you experience. The full formula is:

	$$E[\text{wait}] = \frac{\mu}{2} + \frac{\sigma^2}{2\mu}$$

	Here $\mu$ is the mean gap and $\sigma^2 = \text{Var}[\text{gap}]$ is the variance of gap lengths. The first term, $\mu/2$, is what you would get if every gap were exactly $\mu$ (deterministic buses — arrive in the middle every time). The second term, $\sigma^2/(2\mu)$, is the extra waiting from length-biased sampling. It is always non-negative, so irregular buses always make you wait longer than regular buses with the same mean headway.

	### Why variance matters

	The variance $\sigma^2$ measures how spread out the gap sizes are. A perfectly regular bus schedule has $\sigma^2 = 0$ and gives the naive answer $\mu/2$. An exponentially distributed schedule has $\sigma^2 = \mu^2$, which doubles the expected wait to $\mu$. More irregular buses, higher penalty.

	### Connecting to expected values

	The formula arises from a standard result: the expected length of the gap containing a random arrival is $\mu + \sigma^2/\mu$. You can think of this as the mean gap plus a correction term proportional to the variance divided by the mean. Dividing by 2 (uniform arrival within the gap) gives the wait formula above.
	""")
	return


	if __name__ == "__main__":
	app.run()