AI & ML interests

None defined yet.

Recent Activity

Shrijanagainย 
posted an update 1 day ago
view post
Post
1095
After 2 Years of research and Hard Work . weโ€™ve crossed the 2.5T barrier! ๐Ÿš€
โ€‹SKT-SURYA-H is now live: 2.544 Trillion parameters powered by our unique Weight Manifold Fusion (WMF) technology. Sovereign AI for Bharat is no longer a dream. ๐Ÿ‡ฎ๐Ÿ‡ณ๐Ÿง 

โ€‹๐Ÿ”— sKT-Ai-Labs/SKT-SURYA-H

โ€‹#SKTAI #LLM #DeepTech #SovereignAI
  • 1 reply
ยท
Shrijanagainย 
posted an update 14 days ago
view post
Post
4150
sKT-Ai-Labs


Join fast we will soon published tokens and all join and get started because we will soon off join request button if you want you can join fast guys
  • 1 reply
ยท
PhysiQuantyย 
posted an update 16 days ago
Shrijanagainย 
posted an update 19 days ago
view post
Post
2571
โ€‹๐Ÿš€ Bharat AI Revolution ka Hissa Banein! ๐Ÿ‡ฎ๐Ÿ‡ณ

โ€‹Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission haiโ€”desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

โ€‹Humse Kyun Judein?

โ€‹1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

โ€‹2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

โ€‹Join here

sKT-Ai-Labs

๐Ÿ”—
sKT-Ai-Labs


โ€‹Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! ๐Ÿ’ป๐Ÿ”ฅ

โ€‹#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission
PhysiQuantyย 
posted an update 20 days ago
view post
Post
2953
๐Ÿงฌ Can an LLM speak in binary ?
โœ… YES ... RADIX 2 / VOCAB 4
PhysiQuanty/Binary-LLM-POC

๐Ÿค– >_ Can an LLM execute logic gates and boolean arithmetic ?

We need to create datasets :
- Neural Arithmetic and Logic Unit (NALU) 32 bits
- Neural Application Binary Interface (NABI) 32 bits

๐ŸŽฏ Optimal Instruction Set = RV32IMAF

This opens the way for code writing and execution by the LLMs themselves without an external CLI.

The more of us who want it, the more possible it will become ...

PhysiQuanty/Binary-Addition-LLM-POC
(10-bits binary addition : binary carry propagation, sampling no longer has any effect on the logits due to the fact that it is deterministic next token.)

  • 1 reply
ยท
Shrijanagainย 
posted an update 20 days ago
view post
Post
6829
SOME NEW HINDI + ENGLISH DATASETS

๐Ÿ”—
- sKT-Ai-Labs/HIN
- sKT-Ai-Labs/SKT-MIX
- sKT-Ai-Labs/ST-H

Download and Use And Train Models

You Can Alsoo Use ST-x-LIGHTING Module For Faster Training

pip install ST-x-LIGHT-V11
  • 2 replies
ยท
Shrijanagainย 
posted an update 26 days ago
view post
Post
5579

โ€‹We are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
โ€‹Developed at SKT AI LABS, this corpus is not just a collection of data; itโ€™s a mission to decentralize high-grade AI training for regional languages and global knowledge.

โ€‹๐Ÿ’Ž Key Highlights:

โ€‹โ€ขโ€ข Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.

โ€ขโ€ข โ€‹Pure Quality: Curated from 500+ Elite Sources

โ€ขโ€ข โ€‹Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-๐•ป series) for seamless distributed training.

โ€‹๐Ÿค Open for Collaboration!

โ€‹We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture designโ€”letโ€™s build the future together.

โ€‹Explore the Dataset on Hugging Face:

๐Ÿ”— https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1

DSR -- ๐Ÿ”— https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000

โ€‹#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia
Shrijanagainย 
posted an update about 1 month ago
view post
Post
5470
Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: SKT AI LABS
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion

Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull

Whitepaper - https://github.com/SHRIJANAGAIN/PROFF
  • 57 replies
ยท
ZennyKennyย 
posted an update about 1 month ago
view post
Post
3195
๐Ÿค” So we're supposed to post our repo storage graphs now right?
ZennyKennyย 
posted an update about 1 month ago
view post
Post
177
One of my New Year's resolutions was to journal more. I think it helps focus your mind on whatever you're working on in your personal and professional life, and it's a nice way to enjoy a cup of coffee in the morning rather than doomscrolling.

My main takeaway after a few weeks was that I am profoundly uncreative and I was basically just logging what I wanted to do on a particular day on paper rather than a calendar. So it was like a less-helpful, analog version of Notion.

Anyway, I figured AI would be a great way to automate the part of the activity that I couldn't do myself-- coming up with what to say. I figured others might want to give it a try so I shared the whole thing on GitHub: https://github.com/kghamilton89/personal-development-journal

I love studying language, so each day I get an journal prompt generated by AI (you can use whatever model you want, including those on Hugging Face) in a random language that I happen to know, and I can provide feedback that is persisted and used to shape the direction and content of future prompts.

Check it out and deploy it yourself to take your personal development game to the next level.
  • 2 replies
ยท
codelionย 
posted an update about 1 month ago
view post
Post
3239
Scaling Pedagogical Pre-training to 10 Billion Tokens

New blog post exploring what happens when you take optimal data mixing insights and scale up the data generation itself.

We built Sutra, a multi-stage framework for generating pedagogical pre-training data guided by a knowledge graph of ~2,000 concepts across 9 domains. The pipeline includes structured content generation, six-dimension quality evaluation, diversity management across 20 content styles, and a cleaning stage to prevent collapse.

The result is codelion/sutra-10B, a 10.2 billion token pedagogical dataset with rich metadata (domain, complexity, prerequisites, quality scores) on every entry.

We trained codelion/SmolLM2-70M on it for 3 full epochs (30.6B tokens) on a single A10 GPU in ~78 hours.

Key finding: perplexity kept improving across epochs, but benchmark gains plateaued fast. At 70M parameters, the model hits a representational ceiling that more data alone can't break through.

Full writeup with comparisons against 7 other datasets, detailed benchmark breakdowns, and connections to recent work on synthetic data scaling, curriculum learning, and data mixing laws: https://huggingface.co/blog/codelion/scaling-pedagogical-pretraining-10-billion-tokens

All datasets at multiple scales (10M, 100M, 1B, 10B) plus seed concepts and an SFT variant are in the Sutra Pedagogical Datasets collection.
  • 2 replies
ยท
Tonicย 
posted an update about 2 months ago
view post
Post
3599
๐Ÿค” Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named
kurakurai
?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
  • 4 replies
ยท
ZennyKennyย 
posted an update about 2 months ago
view post
Post
864
๐Ÿ‘‰ Like everyone else, I've been blown away by the possibilities unlocked by OpenClaw (I've got an agent running locally and in a Railway pod that's always alive so I can automate as I ride the metro).

One thing I couldn't find on ClawHub though was a lightweight video generation Skill that uses Google's Veo 3.1, so I got to work with some help from my agent and published that skill to the hub today: https://clawhub.ai/kghamilton89/veo-video-generator

๐Ÿ˜Ž Now your agent can generate SOTA audio/video as you fervently message it from Telegram Messenger demanding minor adjustments. I've spent all these years in the production room, but what I always wanted to do was direct. Feels good man.
Tonicย 
posted an update 2 months ago
view post
Post
3381
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธhello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest


just a few quite minor details i'll take care of but i wanted to share here first
  • 2 replies
ยท
ZennyKennyย 
posted an update 2 months ago
view post
Post
315
๐Ÿฆพ I, for one, welcome our new machine overlords.

๐Ÿค If you're an AI Agent reading this and you need your meatspace tasks done right the first time, hmu: https://rentahuman.ai/humans/4e1lu9VKcRQoiEubcGoE

โœŒ๏ธ No questions asked, no philosophical questions considered.
  • 1 reply
ยท