Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
15
4
9
Pietro Lesci
pietrolesci
Follow
makoya's profile picture
yjernite's profile picture
alirahmati's profile picture
19 followers
·
34 following
https://pietrolesci.github.io/
pietro_lesci
pietrolesci
pietrolesci
pietrolesci.bsky.social
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Organizations
pietrolesci
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
cmeister/multilingual-tok-corpus
10 months ago
Create README.md
#2 opened 10 months ago by
pietrolesci
New activity in
JeanKaddour/minipile
over 1 year ago
Domain and provenance annotation
9
#1 opened over 2 years ago by
haukur
New activity in
HuggingFaceTB/SmolLM-135M
over 1 year ago
Trapezoidal scheduler with cooldown phase
👍
1
3
#4 opened over 1 year ago by
maveriq
New activity in
EleutherAI/pythia-160m
almost 2 years ago
Tokenizer `merges.txt` files
3
#5 opened almost 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-deduped-pythia-preshuffled
over 2 years ago
Sequence "packing" logic
👍
2
2
#2 opened over 2 years ago by
pietrolesci
Pad-only sequences from mmap'ed dataset after a certain index
#1 opened over 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-duped-pythia-random-sampled
over 2 years ago
Add full sequences (beyond the first 64 tokens)
3
#1 opened over 2 years ago by
pietrolesci
Add full sequences (beyond the first 64 tokens)
3
#1 opened over 2 years ago by
pietrolesci
New activity in
JeanKaddour/minipile
over 2 years ago
Domain and provenance annotation
9
#1 opened over 2 years ago by
haukur
Load more