| --- |
| title: LAMBDA |
| app_file: LAMBDA.py |
| sdk: gradio |
| sdk_version: 6.14.0 |
| --- |
| <div align="center"> |
| |
| # LAMBDA - LArge Model-based Data Analysis System |
| [](https://ama-cmfai.github.io/LAMBDA-Docs/#/) |
| [](https://www.polyu.edu.hk/ama/cmfai/lambda.html) |
| [](https://arxiv.org/pdf/2407.17535) |
| [](https://github.com/AMA-CMFAI/LAMBDA/releases/download/app/LAMBDA-MacOS-beta-v0.0.2.zip) |
| [](https://github.com/AMA-CMFAI/LAMBDA/releases/download/app/LAMBDA-Windows-beta-v0.0.2.zip) |
|
|
| </div> |
|
|
| <body> |
| <!-- <img src="https://github.com/user-attachments/assets/df454158-79e4-4da4-ae03-eb687fe02f16" style="width: 80%"> --> |
| <!-- <p align="center"> |
| <img src="https://github.com/user-attachments/assets/6f6d49ef-40b7-46f2-88ae-b8f6d9719c3a" style="width: 600px;"> |
|  |
| </p> --> |
| |
|  |
|
|
|
|
| We introduce **LAMBDA**, a novel open-source, code-free multi-agent data analysis system that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. |
|
|
| ## News |
| - LAMBDA App for macOS and Windows has been released. Details can be found in [Released](https://github.com/AMA-CMFAI/LAMBDA/releases/tag/app). (Hint: There are some problems with the kernel installation in the APP. You should run `ipython kernel install --name lambda --user` to install the kernel in advance.) |
| - [Docs site](https://ama-cmfai.github.io/LAMBDA-Docs/#/) is available! |
|
|
| ## Key Features |
|
|
| - **Code-Free Data Analysis**: Perform complex data analysis tasks through human language instruction. |
| - **Multi-Agent System**: Utilizes two key agent roles, the programmer and the inspector, to generate and debug code seamlessly. |
| - **User Interface**: This includes a robust user interface that allows direct user intervention in the operational loop. |
| - **Model Integration**: Flexibly integrates external models and algorithms to cater to customized data analysis needs. |
| - **Automatic Report Generation**: Concentrate on high-value tasks, rather than spending time and resources on report writing and formatting. |
| - **Jupyter Notebook Exporting**: Export the code and the results to Jupyter Notebook for reproduction and further analysis flexibly. |
|
|
| ## Getting Started |
| ### Installation |
| First, clone the repository. |
|
|
| ```bash |
| git clone https://github.com/AMA-CMFAI/LAMBDA.git |
| cd LAMBDA |
| ``` |
|
|
| Then, we recommend creating a [Conda](https://docs.conda.io/en/latest/) environment for this project and installing the dependencies by following the commands: |
| ```bash |
| conda create -n lambda python=3.10 |
| conda activate lambda |
| ``` |
|
|
| Then, install the required packages: |
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| Next, you should install the Jupyter kernel to create a local Code Interpreter: |
| ```bash |
| ipython kernel install --name lambda --user |
| ``` |
|
|
| ### Configuration to Easy Start |
| 1. To use the Large Language Models, you should have an API key from [OpenAI](https://openai.com/api/pricing/) or other companies. Besides, we support OpenAI-Style interface for your local LLMs once deployed, available frameworks such as [Ollama](https://ollama.com/), [LiteLLM](https://docs.litellm.ai/docs/), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
| > Here are some products that offer free APIkeys for your reference: [OpenRouter](https://openrouter.ai/) and [SILICONFLOW](https://siliconflow.cn/) |
| 2. Set your API key, models and working path in the config.yaml: |
| ```bash |
| #================================================================================================ |
| # Config of the LLMs |
| #================================================================================================ |
| conv_model : "gpt-4.1-mini" # Choose the model you want to use. We highly recommned using the advanced model. |
| programmer_model : "gpt-4.1-mini" |
| inspector_model : "gpt-4.1-mini" |
| api_key : "sk-xxxxxxx" # The API Keys you buy. |
| base_url_conv_model : 'https://api.openai.com/v1' # The base url from the provider. |
| base_url_programmer : 'https://api.openai.com/v1' |
| base_url_inspector : 'https://api.openai.com/v1' |
| |
| |
| #================================================================================================ |
| # Config of the system |
| #================================================================================================ |
| streaming : True |
| project_cache_path : "cache/conv_cache/" # Local cache path |
| max_attempts : 5 # The max attempts of self-correcting |
| max_exe_time: 18000 # The maximum time for the execution |
| |
| #knowledge integration |
| retrieval : False # Whether to start a knowledge retrieval. If you don't create your knowledge base, you should set it to False |
| ``` |
|
|
|
|
| Finally, run the following command to start the LAMBDA with GUI: |
| ```bash |
| python lambda_app.py |
| ``` |
|
|
|
|
| ## Demonstration Videos |
|
|
| The performance of LAMBDA in solving data science problems is demonstrated in several case studies, including: |
| - **[Data Analysis](https://www.polyu.edu.hk/ama/cmfai/files/lambda/lambda.mp4)** |
| - **[Integrating Human Intelligence](https://www.polyu.edu.hk/ama/cmfai/files/lambda/knw.mp4)** |
| - **[Education](https://www.polyu.edu.hk/ama/cmfai/files/lambda/LAMBDA_education.mp4)** |
|
|
|
|
| ## Planning Works |
| - [ ] Create a Logger for log. |
| - [ ] Pre-installation of popular packages in the kernel. |
| - [ ] Replace Gradio UI with OpenWebUI. |
| - [ ] Refactor the Knowledge Integration and Knowledge base module by ChromaDB. |
| - [ ] Add a Docker image for easier use. |
| - [x] Docsite. |
|
|
|
|
| ## Updating History |
| See [Docs site](https://ama-cmfai.github.io/LAMBDA-Docs/#/). |
|
|
|
|
| ## Related Works |
| If you are interested in Data Agent, you can take a look at : |
| - Our survey paper [[A Survey on Large Language Model-based Agents for Statistics and Data Science]](https://www.arxiv.org/pdf/2412.14222) |
| - and a reading list: [[Paper List of LLM-based Data Science Agents]](https://github.com/Stephen-SMJ/Reading-List-of-Large-Language-Model-Based-Data-Science-Agent) |
|
|
|
|
| ## License |
|
|
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
|
| ## Acknowledgements |
|
|
| Thank the contributors and the communities for their support and feedback. |
|
|
| --- |
|
|
| > If you find our work useful in your research, consider citing our paper by: |
|
|
|
|
|
|
| ```bash |
| @article{sun2025lambda, |
| title={Lambda: A large model based data agent}, |
| author={Sun, Maojun and Han, Ruijian and Jiang, Binyan and Qi, Houduo and Sun, Defeng and Yuan, Yancheng and Huang, Jian}, |
| journal={Journal of the American Statistical Association}, |
| pages={1--13}, |
| year={2025}, |
| publisher={Taylor \& Francis} |
| } |
| |
| @article{sun2025survey, |
| title={A survey on large language model-based agents for statistics and data science}, |
| author={Sun, Maojun and Han, Ruijian and Jiang, Binyan and Qi, Houduo and Sun, Defeng and Yuan, Yancheng and Huang, Jian}, |
| journal={The American Statistician}, |
| pages={1--14}, |
| year={2025}, |
| publisher={Taylor \& Francis} |
| } |
| ``` |
| ## Star History |
|
|
| [](https://www.star-history.com/#AMA-CMFAI/LAMBDA&Date) |
| </body> |
|
|