| .. currentmodule:: socceraction.data.statsbomb |
|
|
| ========================= |
| Loading StatsBomb data |
| ========================= |
|
|
| The :class:`StatsBombLoader` class provides an API client enabling you to |
| fetch `StatsBomb event stream data`_ as Pandas DataFrames. This document provides |
| an overview of the available data sources and how to access them. |
|
|
| ------ |
| Setup |
| ------ |
|
|
| To be able to load StatsBomb data, you'll first need to install a few |
| additional dependencies which are not included in the default installation of |
| socceraction. You can install these additional dependencies by running: |
|
|
| .. code-block:: console |
|
|
| $ pip install "socceraction[statsbomb]" |
|
|
|
|
| -------------------------- |
| Connecting to a data store |
| -------------------------- |
|
|
| First, you have to create a :class:`StatsBombLoader` object and configure it |
| for the data store you want to use. The :class:`StatsBombLoader` supports |
| loading data from the StatsBomb Open Data repository, from the official |
| StatsBomb API, and from local files. |
|
|
|
|
| Open Data repository |
| ==================== |
|
|
| StatsBomb has made event stream data of certain leagues freely available for |
| public non-commercial use at https://github.com/statsbomb/open-data. This open |
| data can be accessed without the need of authentication, but its use is |
| subject to a `user agreement`_. The code below shows how to setup an API client |
| that can fetch data from the repository. |
|
|
| .. code-block:: python |
|
|
| # optional: suppress warning about missing authentication |
| import warnings |
| from statsbombpy.api_client import NoAuthWarning |
| warnings.simplefilter('ignore', NoAuthWarning) |
|
|
| from socceraction.data.statsbomb import StatsBombLoader |
|
|
| api = StatsBombLoader(getter="remote", creds=None) |
|
|
|
|
| .. note:: |
| If you publish, share or distribute any research, analysis or insights based |
| on this data, StatsBomb requires you to state the data source as StatsBomb |
| and use their logo. |
|
|
|
|
| StatsBomb API |
| ============= |
|
|
| API access is for paying customers only. Authentication can be done by setting |
| environment variables named ``SB_USERNAME`` and ``SB_PASSWORD`` to your login |
| credentials. Alternatively, the constructor accepts an argument ``creds`` to |
| pass your login credentials in the format ``{"user": "", "passwd": ""}``. |
|
|
| .. code-block:: python |
|
|
| from socceraction.data.statsbomb import StatsBombLoader |
|
|
| # set authentication credentials as environment variables |
| import os |
| os.environ["SB_USERNAME"] = "your_username" |
| os.environ["SB_PASSWORD"] = "your_password" |
| api = StatsBombLoader(getter="remote") |
|
|
| # or provide authentication credentials as a dictionary |
| api = StatsBombLoader(getter="remote", creds={"user": "", "passwd": ""}) |
|
|
|
|
| Local directory |
| =============== |
|
|
| A final option is to load data from a local directory. This local directory |
| can be specified by passing the ``root`` argument to the constructor, |
| specifying the path to the local data directory. |
|
|
| .. code-block:: python |
|
|
| from socceraction.data.statsbomb import StatsBombLoader |
|
|
| api = StatsBombLoader(getter="local", root="data/statsbomb") |
|
|
| Note that the data should be organized in the same way as the StatsBomb Open |
| Data repository, which corresponds to the following file hierarchy: |
|
|
| .. code-block:: |
|
|
| root |
| βββ competitions.json |
| βββ events |
| β βββ <match_id>.json |
| β βββ ... |
| β βββ ... |
| βββ lineups |
| β βββ <match_id>.json |
| β βββ ... |
| βββ matches |
| β βββ <competition_id> |
| β β βββ <season_id>.json |
| β β βββ ... |
| β βββ ... |
| βββ three-sixty |
| βββ <match_id>.json |
| βββ ... |
|
|
|
|
|
|
| ------------ |
| Loading data |
| ------------ |
|
|
| Next, you can load the match event stream data and metadata by calling the |
| corresponding methods on the :class:`StatsBombLoader` object. |
|
|
|
|
| :func:`StatsBombLoader.competitions()` |
| ====================================== |
|
|
| .. code-block:: python |
|
|
| df_competitions = api.competitions() |
|
|
| .. csv-table:: |
| :class: dataframe |
| :header: season_id,competition_id,competition_name,country_name,competition_gender,season_name |
|
|
| 106,43,FIFA World Cup,International,male,2022 |
| 30,72,Women's World Cup,International,female,2019 |
| 3,43,FIFA World Cup,International,male,2018 |
|
|
|
|
| :func:`StatsBombLoader.games()` |
| =============================== |
|
|
| .. code-block:: python |
|
|
| df_games = api.games(competition_id=43, season_id=3) |
|
|
|
|
| .. csv-table:: |
| :class: dataframe |
| :header: game_id,season_id,competition_id,competition_stage,game_day,game_date,home_team_id,away_team_id,home_score,away_score,venue,referee_id |
|
|
| 8658,3,43,Final,7,2018-07-15 17:00:00,771,785,4,2,Stadion Luzhniki,730 |
| 8657,3,43,3rd Place Final,7,2018-07-14 16:00:00,782,768,2,0,Saint-Petersburg Stadium,741 |
|
|
| :func:`StatsBombLoader.teams()` |
| =============================== |
|
|
| .. code-block:: python |
|
|
| df_teams = api.teams(game_id=8658) |
|
|
| .. csv-table:: |
| :class: dataframe |
| :header: team_id,team_name |
| :align: left |
|
|
| 771,France |
| 785,Croatia |
|
|
|
|
|
|
| :func:`StatsBombLoader.players()` |
| ================================= |
|
|
| .. code-block:: python |
|
|
| df_players = api.players(game_id=8658) |
|
|
|
|
| .. csv-table:: |
| :class: dataframe |
| :header: game_id,team_id,player_id,player_name,nickname,jersey_number,is_starter,starting_position_id,starting_position_name,minutes_played |
|
|
| 8658,771,3009,Kylian MbappΓ© Lottin,Kylian MbappΓ©,10,True,12,Right Midfield,95 |
| 8658,785,5463,Luka ModriΔ,,10,True,13,Right Center Midfield,95 |
|
|
|
|
| :func:`StatsBombLoader.events()` |
| ================================ |
|
|
| .. code-block:: python |
|
|
| df_events = api.events(game_id=8658) |
|
|
| .. csv-table:: |
| :class: dataframe |
| :header: event_id,index,period_id,timestamp,minute,second,type_id,type_name,possession,possession_team_id,possession_team_name,play_pattern_id,play_pattern_name,team_id,team_name,duration,extra,related_events,player_id,player_name,position_id,position_name,location,under_pressure,counterpress,game_id |
|
|
| 47638847-fd43-4656-b49c-cff64e5cfc0a,1,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,771,France,0.0,"{...}",[],,,,,,False,False,8658 |
| 0c04305d-5615-4520-9be5-7c232829954b,2,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,785,Croatia,1.412,"{...}",[],,,,,,False,False,8658 |
| c5e17439-efe2-480b-9cff-1600998674d7,3,1,1900-01-01,0,0,18,Half Start,1,771,France,1,Regular Play,771,France,0.0,{},['7e1460eb-c572-4059-8cd4-cec4857f818d'],,,,,,False,False,8658 |
|
|
|
|
| If `360 data snapshots`_ are available for the game, they can be loaded by |
| passing ``load_360=True`` to the ``events()`` method. This will add two columns |
| to the events dataframe: ``visible_area_360`` and ``freeze_frame_360``. The |
| former contains the visible area of the pitch in the 360 snapshot, while the |
| latter contains the player locations in the 360 snapshot. |
|
|
| .. code-block:: python |
|
|
| df_events = api.events(game_id=3788741, load_360=True) |
|
|
|
|
| .. _StatsBomb event stream data: https://statsbomb.com/what-we-do/soccer-data/ |
| .. _statsbombpy: https://pypi.org/project/statsbombpy/ |
| .. _user agreement: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf |
| .. _360 data snapshots: https://statsbomb.com/what-we-do/soccer-data/360-2/ |
|
|