Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
rtferraz
/
domainTokenizer
like
0
arxiv:
9 papers
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
d60868a
domainTokenizer
333 kB
Ctrl+K
Ctrl+K
1 contributor
History:
42 commits
rtferraz
Add 02_ecommerce_pretrain.ipynb β REES46 e-commerce pre-training with sequential entropy check, wandb, push to hub
d60868a
verified
26 days ago
docs
Add finance pre-training report β honest analysis of results and lessons learned
26 days ago
examples
Phase 3.0: Pipeline validation demo on mindweave/bank-transactions-us β ALL 10 CHECKS PASSED
27 days ago
notebooks
Add 02_ecommerce_pretrain.ipynb β REES46 e-commerce pre-training with sequential entropy check, wandb, push to hub
26 days ago
src
Update package to v0.4.0 with fine-tuning exports
27 days ago
tests
Add fine-tuning test suite β 15 tests covering dataset, batching, forward/backward, Trainer smoke, multiclass
27 days ago
.gitattributes
Safe
1.52 kB
initial commit
27 days ago
.gitignore
Safe
452 Bytes
Add .gitignore β Python, Jupyter, training artifacts, IDE files
26 days ago
README.md
Safe
8.46 kB
Update README v0.3.0 β add usage example, update roadmap status, add implementation report link
27 days ago