AI & ML interests
Reinforcement learning
Organizations
None yet
NicholasCorrado/tinyllama-1.1b-chat-v1.0-ui-dpo
Updated
NicholasCorrado/tinyllama-1.1b-chat-v1.0-hh-dpo
Text Generation
• Updated • 2
NicholasCorrado/tinyllama-1.1b-chat-v1.0-arena-dpo
Text Generation
• Updated • 2
NicholasCorrado/uf-rlced-conifer-zephyr-7b-group-dpo-no-clip
Text Generation
• Updated • 3
NicholasCorrado/zephyr-7b-hh-dpo
Text Generation
• Updated • 6
NicholasCorrado/tulu-2-7b-hh-dpo
Text Generation
• Updated • 5
NicholasCorrado/hh-tulu-2-7b-dpo
Updated
NicholasCorrado/uf-rlced-conifer-zephyr-7b-group-dpo-full
Text Generation
• Updated • 2
NicholasCorrado/uf-tulu-2-7b-dpo-full
Text Generation
• Updated • 4
NicholasCorrado/rlced-conifer-tulu-2-7b-dpo-full
Text Generation
• Updated NicholasCorrado/uf-rlced-conifer_tulu-2-7b-dpo-full
Text Generation
• Updated • 4
NicholasCorrado/uf-rlced-conifer-3-1-tinyllama-1.1b-chat-v1.0-dpo-full
Text Generation
• Updated • 2
NicholasCorrado/uf-rlced-conifer-tinyllama-1.1b-chat-v1.0-dpo-full
Updated
NicholasCorrado/uf-tulu-2-7b-dpo-full-2
Updated
NicholasCorrado/rlced-conifer-zephyr-7b-dpo-full
Text Generation
• Updated NicholasCorrado/uf-rlced-conifer-zephyr-7b-dpo-full
Text Generation
• Updated • 11
NicholasCorrado/uf-gemma-2b-it-dpo-full
Updated
NicholasCorrado/uf-rlced-50k-zephyr-7b-dpo-full
Updated
NicholasCorrado/ultrafeedback-binarized-tulu-2-7b-dpo-full
Text Generation
• Updated • 7
NicholasCorrado/ultrafeedback-binarized-tulu-2-7b-dpo-full-2
Updated
NicholasCorrado/uf_rlced_conifer_tulu-2-7b-dpo-full
Updated
NicholasCorrado/hh-rlhf-gemma-2b-dpo-full
Updated
NicholasCorrado/rlced_conifer_zephyr-7b-dpo-full
Text Generation
• Updated • 6
NicholasCorrado/conifer_zephyr-7b-dpo-full
Updated
NicholasCorrado/zephyr-7b-group-dpo-full-4
Text Generation
• Updated NicholasCorrado/zephyr-7b-group-dpo-full-3
Text Generation
• Updated NicholasCorrado/zephyr-7b-group-dpo-full-2
Updated
NicholasCorrado/zephyr-7b-group-dpo-full
Updated
NicholasCorrado/zephyr-7b-dpo-full
Text Generation
• Updated • 3
NicholasCorrado/mixed_ref_losses
Updated