Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| # Set page configuration | |
| st.set_page_config(page_title="ML Lifecycle", layout="centered") | |
| # Custom CSS for dark theme design | |
| st.markdown(""" | |
| <style> | |
| body { | |
| background-color: #121212; | |
| color: #E0E0E0; | |
| font-family: Arial, sans-serif; | |
| } | |
| .stApp { | |
| background: #1F1F1F; | |
| padding: 30px; | |
| border-radius: 12px; | |
| box-shadow: 0 4px 8px rgba(0, 0, 0, 0.5); | |
| } | |
| .stButton > button { | |
| display: block; | |
| margin: 12px auto; | |
| width: 80%; | |
| background: linear-gradient(90deg, #ff6b6b, #f06595); | |
| color: white; | |
| border: none; | |
| padding: 12px 25px; | |
| font-size: 16px; | |
| border-radius: 8px; | |
| font-weight: bold; | |
| box-shadow: 0 4px 6px rgba(0, 0, 0, 0.3); | |
| cursor: pointer; | |
| transition: all 0.3s ease-in-out; | |
| } | |
| .stButton > button:hover { | |
| background: linear-gradient(90deg, #f06595, #ff6b6b); | |
| transform: translateY(-3px); | |
| } | |
| h1 { | |
| color: #ff6b6b; | |
| font-size: 40px; | |
| text-align: center; | |
| font-weight: bold; | |
| margin-bottom: 20px; | |
| } | |
| h2, h3 { | |
| color: #f06595; | |
| text-align: center; | |
| } | |
| .hr { | |
| border: 0; | |
| height: 2px; | |
| background: linear-gradient(to right, #ff6b6b, #f06595); | |
| margin: 20px 0; | |
| } | |
| .arrow { | |
| font-size: 24px; | |
| text-align: center; | |
| color: #f06595; | |
| margin: 10px 0; | |
| } | |
| </style> | |
| """, unsafe_allow_html=True) | |
| # Initialize session state for page navigation | |
| if 'page' not in st.session_state: | |
| st.session_state.page = 'main' | |
| if 'previous_page' not in st.session_state: | |
| st.session_state.previous_page = 'main' | |
| # Function to render the main page | |
| def main_page(): | |
| st.title("π Machine Learning Project Lifecycle") | |
| steps = [ | |
| "1. Problem Statement", | |
| "2. Data Collection", | |
| "3. Simple EDA", | |
| "4. Data Preprocessing", | |
| "5. EDA", | |
| "6. Feature Engineering", | |
| "7. Training the Model", | |
| "8. Testing the Model", | |
| "9. Deployment", | |
| "10. Monitoring" | |
| ] | |
| for i, step in enumerate(steps): | |
| if st.button(step): | |
| st.session_state.previous_page = 'main' | |
| st.session_state.page = step.replace(".", "").replace(" ", "_").lower() | |
| # Add a downward arrow between steps | |
| if i < len(steps) - 1: | |
| st.markdown("<p class='arrow'>β¬οΈ</p>", unsafe_allow_html=True) | |
| # Data Collection Page | |
| def data_collection_page(): | |
| st.title("π¦ Data Collection") | |
| st.write("In this field, we will work with data using the Python programming language. The term Data Analysis indicates that it focuses on handling data. This involves gathering, cleaning, and then analyzing the data to extract valuable insights. Now, let's explore what data means.") | |
| st.header("What is Data?") | |
| st.write("In a simple definition we can say that data is a collection of information. And we can also say Facts or pieces of information that can be measured. It can be in various form such as") | |
| st.markdown("- IMAGE πΌοΈ") | |
| st.markdown("- TEXT π") | |
| st.markdown("- VIDEO πΉ") | |
| st.markdown("- AUDIO π") | |
| st.write("Not all data is created equal. Data can come in various forms, and knowing how to classify it is essential for choosing the right tools and methods to analyze it. Broadly, data can be classified into three main categories:") | |
| st.markdown("- Structured") | |
| st.markdown("- Semi Structured") | |
| st.markdown("- Unstructured") | |
| st.write("Each type of data has its own characteristics, advantages, and challenges when it comes to processing and extracting insights.") | |
| st.subheader("Structured Data:") | |
| st.write("Structured data is the most organized and easily accessible form of data. It refers to information that is highly organized and formatted in a way that can be easily stored, accessed, and processed by machines. Think of structured data as data that fits neatly into rows and columns, like in a spreadsheet or a relational database.") | |
| st.write("Examples : ") | |
| st.markdown("- Databases: Tables in SQL databases where each column represents a different attribute (e.g., name, age, salary), and each row represents a record.") | |
| st.markdown("- Excel Sheets: Rows and columns filled with categorical and numerical data.") | |
| st.image('https://k21academy.com/wp-content/uploads/2020/10/structured-data-1.png', width=400) | |
| st.subheader("Semi structured Data:") | |
| st.write("Semi-structured data doesnβt fit as neatly into the traditional table format as structured data, but it still follows a certain organizational framework. This type of data contains tags, markers, or attributes that make it somewhat organized, but it doesnβt strictly conform to a table format.") | |
| st.write("Examples :") | |
| st.markdown("- JSON and XML Files") | |
| st.markdown("- NoSQL Databases") | |
| st.image("https://www.imediacto.com/wp-content/uploads/2021/02/xml-csv-json-data-formats.png",width=400) | |
| st.subheader("Unstructured Data:") | |
| st.write("Unstructured data is the most complex and least organized form of data. It does not follow a specific format or structure, making it difficult to process and analyze using traditional methods. Unstructured data includes a wide range of formats, such as text, images, videos, and more.") | |
| st.write("Examples :") | |
| st.markdown("- Text Files: Documents, emails, and written reports.") | |
| st.markdown("- Multimedia: Photos, videos, and audio files.") | |
| st.markdown("- Social Media: Tweets, posts, and comments on platforms like Facebook, Twitter, or Instagram.") | |
| st.image("https://k21academy.com/wp-content/uploads/2020/10/unstructured.png",width = 600) | |
| st.subheader("Data Collection Methods") | |
| st.subheader("Dataset Websites") | |
| st.write(""" | |
| - Explore platforms like Kaggle, Data.gov, and UCI Machine Learning Repository for relevant datasets. | |
| """) | |
| st.subheader("APIs") | |
| st.write(""" | |
| - Utilize APIs offered by companies or organizations to access real-time structured data for analysis. | |
| """) | |
| st.subheader("Databases") | |
| st.write(""" | |
| - Connect to relational or NoSQL databases where structured data is stored and retrieve the necessary information. | |
| """) | |
| st.subheader("Web Scraping") | |
| st.write(""" | |
| - Extract information from websites using tools like BeautifulSoup or Scrapy to gather unstructured or semi-structured data. | |
| """) | |
| st.subheader("Manual Collection") | |
| st.write(""" | |
| - Collect data manually through surveys, questionnaires, interviews, or direct observations. | |
| """) | |
| # Buttons for types of data | |
| st.subheader("Data Types") | |
| if st.button("Structured Data"): | |
| st.session_state.previous_page = '2_data_collection' | |
| st.session_state.page = 'structured_data' | |
| if st.button("Semi-Structured Data"): | |
| st.session_state.previous_page = '2_data_collection' | |
| st.session_state.page = 'semi_structured_data' | |
| if st.button("Unstructured Data"): | |
| st.session_state.previous_page = '2_data_collection' | |
| st.session_state.page = 'unstructured_data' | |
| # Back button to return to main page | |
| if st.button("Back to Main"): | |
| st.session_state.previous_page = '2_data_collection' | |
| st.session_state.page = 'main' | |
| def structured_data_page(): | |
| st.title("Structured Data") | |
| st.write(""" | |
| Structured data refers to data that is organized into a tabular format with rows and columns, such as in a database or spreadsheet. | |
| """) | |
| # Additional buttons for types of structured data | |
| st.subheader("Types of Structured Data") | |
| if st.button("Excel Files"): | |
| st.session_state.page = 'excel_data' | |
| if st.button("SQL Databases"): | |
| st.session_state.page = 'sql_data' | |
| if st.button("Back to Data Collection"): | |
| st.session_state.page = '2_data_collection' | |
| def semi_structured_data_page(): | |
| st.title("Semi-Structured Data") | |
| st.write(""" | |
| Semi-structured data has some form of organization but is not as rigid as structured data. It may include elements such as tags, metadata, etc. | |
| """) | |
| # Additional buttons for types of semi-structured data | |
| st.subheader("Types of Semi-Structured Data") | |
| if st.button("CSV Files"): | |
| st.session_state.page = 'csv_data' | |
| if st.button("JSON Files"): | |
| st.session_state.page = 'json_data' | |
| if st.button("XML Files"): | |
| st.session_state.page = 'xml_data' | |
| if st.button("HTML Files"): | |
| st.session_state.page = 'html_data' | |
| if st.button("Back to Data Collection"): | |
| st.session_state.page = '2_data_collection' | |
| def unstructured_data_page(): | |
| st.title("Unstructured Data") | |
| st.write(""" | |
| Unstructured data does not have a predefined format or structure. It includes data such as images, videos, and text. | |
| """) | |
| # Additional buttons for types of unstructured data | |
| st.subheader("Types of Unstructured Data") | |
| if st.button("Images"): | |
| st.session_state.page = 'image_data' | |
| if st.button("Videos"): | |
| st.session_state.page = 'video_data' | |
| if st.button("Audio"): | |
| st.session_state.page = 'audio_data' | |
| if st.button("Text"): | |
| st.session_state.page = 'text_data' | |
| if st.button("Back to Data Collection"): | |
| st.session_state.page = '2_data_collection' | |
| # Individual Data Pages (Examples) | |
| def excel_data_page(): | |
| st.title("Handling Excel Files") | |
| st.header("Understanding Data Format:") | |
| st.markdown("""- Can only be created using applications like Microsoft Excel. | |
| - It is always structured data because it organizes data in rows and columns. | |
| - XLSX files are also called Workbooks because they can contain multiple sheets.""") | |
| st.subheader("Workbook and Sheets") | |
| st.markdown("""*An XLSX file is similar to a Book:* | |
| - The Workbook acts as the book itself. | |
| - Each Sheet inside the workbook is like a Page. | |
| - Each Sheet can be thought of as an individual CSV file. | |
| """) | |
| st.markdown("""*Why Use XLSX Instead of CSV?* | |
| - If you have to choose between a CSV file and an XLSX file, always choose XLSX because: | |
| - It does not have parser errors or encoding issues that are common in CSV files. | |
| - It contains pure structured data. | |
| """) | |
| st.subheader("Default Extension and Handling of XLSX") | |
| st.markdown(""" | |
| - The default extension for Excel files is :blue-background[.xlsx.] | |
| - Multiple Sheets = Workbook. | |
| - Each sheet in an XLSX file can be processed separately. | |
| """) | |
| st.subheader("Reading XLSX Files into a DataFrame Using Pandas") | |
| st.markdown(""" To work with Excel files in Python, you use the pandas library: | |
| - *Read a Single Sheet* | |
| - Use the :blue-background[pd.read_excel()] function to read an XLSX file into a DataFrame. | |
| - By default, it reads the first sheet :blue-background[(index 0)]. | |
| """) | |
| code = '''import pandas as pd | |
| df = pd.read_excel('file.xlsx', sheet_name=0) # Reads the first sheet | |
| print(df) | |
| ''' | |
| st.code(code, language="python") | |
| st.markdown(""" | |
| - *Key Notes:* | |
| - Each sheet in the XLSX file can be loaded as a single DataFrame. | |
| - Sheet indices start from 0 (zero-based indexing). | |
| """) | |
| st.subheader("Converting Multiple Sheets to CSV Files") | |
| st.write("If you want to save each sheet in an XLSX file as a separate CSV file:") | |
| st.write("*Step 1:* Read the workbook and load sheets.") | |
| code = ''' | |
| xlsx_file = 'file.xlsx' | |
| xls = pd.ExcelFile(xlsx_file) | |
| ''' | |
| st.code(code, language="python") | |
| st.write("*Step 2:* Loop through all the sheets and save them as separate CSV files:") | |
| code = ''' | |
| for sheet_name in xls.sheet_names: # Loop through all sheet names | |
| df = pd.read_excel(xlsx_file, sheet_name=sheet_name) | |
| df.to_csv(f'{sheet_name}.csv', index=False) # Save each sheet as a CSV | |
| ''' | |
| st.code(code, language="python") | |
| st.write("Result: Each sheet is now saved as an individual CSV file.") | |
| if st.button("Back to Structured Data"): | |
| st.session_state.page = 'structured_data' | |
| def sql_data_page(): | |
| st.title("SQL Databases") | |
| st.write("SQL databases store data in structured tables, and data is queried using SQL commands.") | |
| if st.button("Back to Structured Data"): | |
| st.session_state.page = 'structured_data' | |
| def csv_data_page(): | |
| st.title("CSV Files") | |
| st.write("CSV files store data in a comma-separated format and are often used for exchanging semi-structured data.") | |
| if st.button("Back to Semi-Structured Data"): | |
| st.session_state.page = 'semi_structured_data' | |
| def json_data_page(): | |
| st.title("JSON Files") | |
| st.write("JSON files store data in a key-value format and are widely used in web development.") | |
| if st.button("Back to Semi-Structured Data"): | |
| st.session_state.page = 'semi_structured_data' | |
| def xml_data_page(): | |
| st.title("XML Files") | |
| st.write("XML files store data in a hierarchical format and are often used for semi-structured data.") | |
| if st.button("Back to Semi-Structured Data"): | |
| st.session_state.page = 'semi_structured_data' | |
| def html_data_page(): | |
| st.title("HTML Files") | |
| st.write("HTML files are used to store web page content and are typically semi-structured data.") | |
| if st.button("Back to Semi-Structured Data"): | |
| st.session_state.page = 'semi_structured_data' | |
| def image_data_page(): | |
| st.title("Images") | |
| st.write("Images are a form of unstructured data, often stored in formats like JPEG, PNG, and TIFF.") | |
| if st.button("Back to Unstructured Data"): | |
| st.session_state.page = 'unstructured_data' | |
| def video_data_page(): | |
| st.title("Videos") | |
| st.write("Videos are a form of unstructured data, often stored in formats like MP4, AVI, and MKV.") | |
| if st.button("Back to Unstructured Data"): | |
| st.session_state.page = 'unstructured_data' | |
| def audio_data_page(): | |
| st.title("Audio") | |
| st.write("Audio files are a form of unstructured data, often stored in formats like MP3, WAV, and AAC.") | |
| if st.button("Back to Unstructured Data"): | |
| st.session_state.page = 'unstructured_data' | |
| def text_data_page(): | |
| st.title("Text") | |
| st.write("Text data is unstructured and can come from sources like emails, documents, and social media.") | |
| if st.button("Back to Unstructured Data"): | |
| st.session_state.page = 'unstructured_data' | |
| # Main logic to render pages based on session state | |
| if st.session_state.page == 'main': | |
| main_page() | |
| elif st.session_state.page == '2_data_collection': | |
| data_collection_page() | |
| elif st.session_state.page == 'structured_data': | |
| structured_data_page() | |
| elif st.session_state.page == 'semi_structured_data': | |
| semi_structured_data_page() | |
| elif st.session_state.page == 'unstructured_data': | |
| unstructured_data_page() | |
| elif st.session_state.page == 'excel_data': | |
| excel_data_page() | |
| elif st.session_state.page == 'sql_data': | |
| sql_data_page() | |
| elif st.session_state.page == 'csv_data': | |
| csv_data_page() | |
| elif st.session_state.page == 'json_data': | |
| json_data_page() | |
| elif st.session_state.page == 'xml_data': | |
| xml_data_page() | |
| elif st.session_state.page == 'html_data': | |
| html_data_page() | |
| elif st.session_state.page == 'image_data': | |
| image_data_page() | |
| elif st.session_state.page == 'video_data': | |
| video_data_page() | |
| elif st.session_state.page == 'audio_data': | |
| audio_data_page() | |
| elif st.session_state.page == 'text_data': | |
| text_data_page() |