| Quickstart |
| =========== |
|
|
| Eager to get started valuing some soccer actions? This page gives a quick |
| introduction on how to get started. |
|
|
| Installation |
| ------------ |
|
|
| First, make sure that socceraction is installed: |
|
|
| .. code-block:: console |
|
|
| $ pip install socceraction[statsbomb] |
|
|
| For detailed instructions and other installation options, check out our |
| detailed :doc:`installation instructions <install>`. |
|
|
| Loading event stream data |
| ------------------------- |
|
|
| First of all, you will need some data. Luckily, both `StatsBomb <https://github.com/statsbomb/open-data>`_ and |
| `Wyscout <https://www.nature.com/articles/s41597-019-0247-7>`_ provide a small freely available dataset. |
| The :ref:`data module<api-data>` of socceraction makes it trivial to load these datasets as |
| `Pandas DataFrames <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__. |
| In this short introduction, we will work with Statsbomb's dataset of the 2018 World Cup. |
|
|
| .. code-block:: python |
|
|
| import pandas as pd |
| from socceraction.data.statsbomb import StatsBombLoader |
|
|
| # Set up the StatsBomb data loader |
| SBL = StatsBombLoader() |
|
|
| # View all available competitions |
| df_competitions = SBL.competitions() |
|
|
| # Create a dataframe with all games from the 2018 World Cup |
| df_games = SBL.games(competition_id=43, season_id=3).set_index("game_id") |
|
|
|
|
| .. note:: |
| Keep in mind that by using the public StatsBomb data you are agreeing to their `user agreement <https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf>`__. |
|
|
| For each game, you can then retrieve a dataframe containing the teams, all |
| players that participated, and all events that were recorded in that game. |
| Specifically, we'll load the data from the third place play-off game between |
| England and Belgium. |
|
|
| .. code-block:: python |
|
|
| game_id = 8657 |
| df_teams = SBL.teams(game_id) |
| df_players = SBL.players(game_id) |
| df_events = SBL.events(game_id) |
|
|
|
|
| Converting to SPADL actions |
| --------------------------- |
|
|
| The event stream format is not well-suited for data analysis: some of the |
| recorded information is irrelevant for valuing actions, each vendor uses their |
| own custom format and definitions, and the events are stored as unstructured |
| JSON objects. Therefore, socceraction uses the :doc:`SPADL format |
| <spadl/index>` for describing actions on the pitch. With the code below, you |
| can convert the events to SPADL actions. |
|
|
| .. code-block:: python |
|
|
| import socceraction.spadl as spadl |
|
|
| home_team_id = df_games.at[game_id, "home_team_id"] |
| df_actions = spadl.statsbomb.convert_to_actions(df_events, home_team_id) |
|
|
| With the `matplotsoccer package <https://github.com/TomDecroos/matplotsoccer>`_, you can try plotting some of these |
| actions: |
|
|
| .. code-block:: python |
|
|
| import matplotsoccer as mps |
|
|
| # Select relevant actions |
| df_actions_goal = df_actions.loc[2196:2200] |
| # Replace result, actiontype and bodypart IDs by their corresponding name |
| df_actions_goal = spadl.add_names(df_actions_goal) |
| # Add team and player names |
| df_actions_goal = df_actions_goal.merge(df_teams).merge(df_players) |
| # Create the plot |
| mps.actions( |
| location=df_actions_goal[["start_x", "start_y", "end_x", "end_y"]], |
| action_type=df_actions_goal.type_name, |
| team=df_actions_goal.team_name, |
| result=df_actions_goal.result_name == "success", |
| label=df_actions_goal[["time_seconds", "type_name", "player_name", "team_name"]], |
| labeltitle=["time", "actiontype", "player", "team"], |
| zoom=False |
| ) |
|
|
| .. figure:: spadl/eden_hazard_goal_spadl.png |
| :align: center |
|
|
|
|
| Valuing actions |
| --------------- |
|
|
| We can now assign a numeric value to each of these individual actions that |
| quantifies how much the action contributed towards winning the game. |
| Socceraction implements three frameworks for doing this: xT, VAEP and |
| Atomic-Vaep. In this quickstart guide, we will focus on the xT framework. |
|
|
| The expected threat or xT model overlays a :math:`M \times N` grid on the |
| pitch in order to divide it into zones. Each zone :math:`z` is |
| then assigned a value :math:`xT(z)` that reflects how threatening teams are at |
| that location, in terms of scoring. An example grid is visualized below. |
|
|
| .. image:: valuing_actions/default_xt_grid.png |
| :width: 600 |
| :align: center |
|
|
| The code below allows you to load |
| league-wide xT values from the 2017-18 Premier League season (the 12x8 grid |
| shown above). Instructions on how to train your own model can be found in the |
| :doc:`detailed documentation about xT <valuing_actions/xT>`. |
|
|
| .. code-block:: python |
|
|
| import socceraction.xthreat as xthreat |
|
|
| url_grid = "https://karun.in/blog/data/open_xt_12x8_v1.json" |
| xT_model = xthreat.load_model(url_grid) |
|
|
|
|
|
|
| Subsequently, the model can be used to value actions that successfully move |
| the ball between two zones by computing the difference between the threat |
| value on the start and end location of each action. The xT framework does not |
| assign a value to failed actions, shots and defensive actions such as tackles. |
|
|
| .. code-block:: python |
|
|
| df_actions_ltr = spadl.play_left_to_right(df_actions, home_team_id) |
| df_actions["xT_value"] = xT_model.rate(df_actions_ltr) |
|
|
|
|
| .. image:: valuing_actions/eden_hazard_goal_xt.png |
| :align: center |
|
|
|
|
| ----------------------- |
| |
| Ready for more? Check out the detailed documentation about the |
| :doc:`data representation <spadl/index>` and |
| :doc:`action value frameworks <valuing_actions/index>`. |
| |