Gali
galinilin
AI & ML interests
Deep Learning, Bioinformatics and Proteomics
Recent Activity
upvoted an article about 21 hours ago
The ML Engineer's Guide to Protein AI reacted to MaziyarPanahi's post with ๐ฅ about 21 hours ago
DNA, mRNA, proteins, AI. I spent the last year going deep into computational biology as an ML engineer. This is Part I of what I found. ๐งฌ
In 2024, AlphaFold won the Nobel Prize in Chemistry.
By 2026, the open-source community had built alternatives that outperform it.
That's the story I find most interesting about protein AI right now. Not just the science (which is incredible), but the speed at which open-source caught up. Multiple teams, independently, reproduced and then exceeded AlphaFold 3's accuracy with permissive licenses. The field went from prediction to generation: we're not just modeling known proteins anymore, we're designing new ones.
I spent months mapping this landscape for ML engineers. What the architectures actually are (spoiler: transformers and diffusion models), which tools to use for what, and which ones you can actually ship commercially.
New post on the Hugging Face blog: https://huggingface.co/blog/MaziyarPanahi/protein-ai-landscape
Hope you all enjoy! ๐ค reacted to MaziyarPanahi's post with ๐ค about 21 hours ago
Training mRNA Language Models Across 25 Species for $165
We built an end-to-end protein AI pipeline covering structure prediction, sequence design, and codon optimization. After comparing multiple transformer architectures for codon-level language modeling, CodonRoBERTa-large-v2 emerged as the clear winner with a perplexity of 4.10 and a Spearman CAI correlation of 0.40, significantly outperforming ModernBERT. We then scaled to 25 species, trained 4 production models in 55 GPU-hours, and built a species-conditioned system that no other open-source project offers. Complete results, architectural decisions, and runnable code below.
https://huggingface.co/blog/OpenMed/training-mrna-models-25-species