| # efficient-context Documentation |
|
|
| ## Overview |
|
|
| `efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies. |
|
|
| ## Key Features |
|
|
| 1. **Context Compression**: Reduce memory requirements while preserving information quality |
| 2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management |
| 3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies |
| 4. **Memory Management**: Handle large contexts on limited hardware resources |
|
|
| ## Installation |
|
|
| ```bash |
| pip install efficient-context |
| ``` |
|
|
| ## Core Components |
|
|
| ### ContextManager |
|
|
| The central class that orchestrates all components of the library. |
|
|
| ```python |
| from efficient_context import ContextManager |
| |
| # Initialize with default settings |
| context_manager = ContextManager() |
| |
| # Add documents |
| context_manager.add_document("This is a sample document about renewable energy...") |
| context_manager.add_documents([doc1, doc2, doc3]) # Add multiple documents |
| |
| # Generate context for a query |
| optimized_context = context_manager.generate_context(query="Tell me about renewable energy") |
| ``` |
|
|
| ### Context Compression |
|
|
| The compression module reduces the size of content while preserving key information. |
|
|
| ```python |
| from efficient_context.compression import SemanticDeduplicator |
| |
| # Initialize with custom settings |
| compressor = SemanticDeduplicator( |
| threshold=0.85, # Similarity threshold for deduplication |
| embedding_model="lightweight", # Use a lightweight embedding model |
| min_sentence_length=10, # Minimum length of sentences to consider |
| importance_weight=0.3 # Weight given to sentence importance vs. deduplication |
| ) |
| |
| # Compress content |
| compressed_content = compressor.compress( |
| content="Your large text content here...", |
| target_size=1000 # Optional target size in tokens |
| ) |
| ``` |
|
|
| ### Semantic Chunking |
|
|
| The chunking module divides content into semantically coherent chunks. |
|
|
| ```python |
| from efficient_context.chunking import SemanticChunker |
| |
| # Initialize with custom settings |
| chunker = SemanticChunker( |
| chunk_size=512, # Target size for chunks in tokens |
| chunk_overlap=50, # Number of tokens to overlap between chunks |
| respect_paragraphs=True, # Avoid breaking paragraphs across chunks |
| min_chunk_size=100, # Minimum chunk size in tokens |
| max_chunk_size=1024 # Maximum chunk size in tokens |
| ) |
| |
| # Chunk content |
| chunks = chunker.chunk( |
| content="Your large text content here...", |
| document_id="doc-1", # Optional document ID |
| metadata={"source": "example", "author": "John Doe"} # Optional metadata |
| ) |
| ``` |
|
|
| ### Retrieval Optimization |
|
|
| The retrieval module finds the most relevant chunks for a query. |
|
|
| ```python |
| from efficient_context.retrieval import CPUOptimizedRetriever |
| |
| # Initialize with custom settings |
| retriever = CPUOptimizedRetriever( |
| embedding_model="lightweight", # Use a lightweight embedding model |
| similarity_metric="cosine", # Metric for comparing embeddings |
| use_batching=True, # Batch embedding operations |
| batch_size=32, # Size of batches for embedding |
| max_index_size=5000 # Maximum number of chunks to keep in the index |
| ) |
| |
| # Index chunks |
| retriever.index_chunks(chunks) |
| |
| # Retrieve relevant chunks |
| relevant_chunks = retriever.retrieve( |
| query="Your query here...", |
| top_k=5 # Number of chunks to retrieve |
| ) |
| ``` |
|
|
| ### Memory Management |
|
|
| The memory module helps optimize memory usage during operations. |
|
|
| ```python |
| from efficient_context.memory import MemoryManager |
| |
| # Initialize with custom settings |
| memory_manager = MemoryManager( |
| target_usage_percent=80.0, # Target memory usage percentage |
| aggressive_cleanup=False, # Whether to perform aggressive garbage collection |
| memory_monitor_interval=None # Interval for memory monitoring in seconds |
| ) |
| |
| # Use context manager for memory-intensive operations |
| with memory_manager.optimize_memory(): |
| # Run memory-intensive operations here |
| results = process_large_documents(documents) |
| |
| # Get memory usage statistics |
| memory_stats = memory_manager.get_memory_usage() |
| print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB") |
| ``` |
|
|
| ## Advanced Usage |
|
|
| ### Customizing the Context Manager |
|
|
| ```python |
| from efficient_context import ContextManager |
| from efficient_context.compression import SemanticDeduplicator |
| from efficient_context.chunking import SemanticChunker |
| from efficient_context.retrieval import CPUOptimizedRetriever |
| from efficient_context.memory import MemoryManager |
| |
| # Initialize a fully customized context manager |
| context_manager = ContextManager( |
| compressor=SemanticDeduplicator(threshold=0.85), |
| chunker=SemanticChunker(chunk_size=256, chunk_overlap=50), |
| retriever=CPUOptimizedRetriever(embedding_model="lightweight"), |
| memory_manager=MemoryManager(target_usage_percent=80.0), |
| max_context_size=4096 |
| ) |
| ``` |
|
|
| ### Integration with LLMs |
|
|
| ```python |
| from efficient_context import ContextManager |
| from your_llm_library import LLM # Replace with your actual LLM library |
| |
| # Initialize components |
| context_manager = ContextManager() |
| llm = LLM(model="lightweight-model") |
| |
| # Process documents |
| context_manager.add_documents(documents) |
| |
| # For each query |
| query = "Tell me about renewable energy" |
| optimized_context = context_manager.generate_context(query=query) |
| |
| # Use context with the LLM |
| response = llm.generate( |
| prompt=query, |
| context=optimized_context, |
| max_tokens=512 |
| ) |
| ``` |
|
|
| ## Performance Considerations |
|
|
| - **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory. |
| - **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments. |
| - **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed. |
| - **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources. |
|
|
| ## Extending the Library |
|
|
| You can create custom implementations of the base classes to adapt the library to your specific needs: |
|
|
| ```python |
| from efficient_context.compression.base import BaseCompressor |
| |
| class MyCustomCompressor(BaseCompressor): |
| def __init__(self, custom_param=None): |
| self.custom_param = custom_param |
| |
| def compress(self, content, target_size=None): |
| # Your custom compression logic here |
| return compressed_content |
| ``` |
|
|
| ## Troubleshooting |
|
|
| **High Memory Usage** |
| - Reduce `batch_size` in the retriever |
| - Use a more lightweight embedding model |
| - Decrease `max_index_size` to limit the number of chunks stored in memory |
|
|
| **Slow Processing** |
| - Increase `batch_size` (balancing with memory constraints) |
| - Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication |
| - Reduce `chunk_overlap` to minimize redundant processing |
|
|
| ## Example Applications |
|
|
| - **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources |
| - **Document QA Systems**: Create efficient question-answering systems for large document collections |
| - **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems |
| - **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources |
|
|