Add SPECTER2 embedding-based deduplication (replaces Jaccard word overlap) ecdb8ec verified nkshirsa commited on 15 days ago