tritesh
/

dflash-mlx-universal

Text Generation

dflash-mlx-universal

speculative-decoding

inference-acceleration

block-diffusion

Model card Files Files and versions

tritesh commited on 2 days ago

Commit

7aca493

·

verified ·

1 Parent(s): 0433390

Update ML Intern artifact metadata

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -1,3 +1,7 @@
 # DFlash-MLX-M2ProMax-96GB: Block Diffusion Speculative Decoding for MLX on Apple Silicon
 > **Tested on M2 Pro Max (96GB Unified Memory)** — Apple Silicon optimized implementation of DFlash speculative decoding for MLX.
@@ -345,3 +349,23 @@ MIT License — same as the original DFlash project.
 **Get 6× faster LLM inference on your M2 Pro Max (96GB) today!** 🚀
 > *Tested on M2 Pro Max, 38 GPU cores, 96GB unified memory, macOS 15+.*

+---
+tags:
+- ml-intern
+---
 # DFlash-MLX-M2ProMax-96GB: Block Diffusion Speculative Decoding for MLX on Apple Silicon
 > **Tested on M2 Pro Max (96GB Unified Memory)** — Apple Silicon optimized implementation of DFlash speculative decoding for MLX.
 **Get 6× faster LLM inference on your M2 Pro Max (96GB) today!** 🚀
 > *Tested on M2 Pro Max, 38 GPU cores, 96GB unified memory, macOS 15+.*
+<!-- ml-intern-provenance -->
+## Generated by ML Intern
+This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
+- Try ML Intern: https://smolagents-ml-intern.hf.space
+- Source code: https://github.com/huggingface/ml-intern
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = 'tritesh/dflash-mlx-universal'
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+```
+For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.