Inference Providers
Active filters: rlhf
nvidia/NV-Llama2-70B-RLHF-Chat
Text Generation
• Updated • 5
tasksource/deberta-small-long-nli
Zero-Shot Classification
• 0.1B • Updated • 11.3k
• • 50
percyraskova/llm-training
Text Generation
• Updated • 1
BlueGod01/medibot-llama-3.2-1b-grpo
Text Generation
• 1B • Updated • 33
• 2
AMAImedia/CodeRM-GRPO-Selection-8B-NOESIS-AWQ-INT4
Text Classification
• 8B • Updated • 19
• 1
Text Generation
• 1B • Updated • 16
• 1
sileod/deberta-v3-base-tasksource-nli
Zero-Shot Classification
• 0.2B • Updated • 6.99k
• • 133
stanfordnlp/SteamSHP-flan-t5-xl
Updated • 15
• 43
stanfordnlp/SteamSHP-flan-t5-large
Updated • 28
• 33
sileod/deberta-v3-large-tasksource-nli
Zero-Shot Classification
• 0.4B • Updated • 986
• 40
sileod/deberta-v3-large-tasksource-rlhf-reward-model
Text Classification
• Updated • 710
• 11
trl-lib/llama-7b-se-rl-peft
Updated • 103
trl-lib/llama-7b-se-rm-peft
toloka/gpt2-large-rl-prompt-writing
Text Generation
• 0.8B • Updated • 11
• 3
AdamG012/chat-opt-1.3b-rlhf-actor-deepspeed
Text Generation
• Updated • 11
• 5
AdamG012/chat-opt-1.3b-rlhf-critic-deepspeed
Text Generation
• Updated • 9
• 3
AdamG012/chat-opt-1.3b-rlhf-actor-ema-deepspeed
Text Generation
• Updated • 7
• 8
sileod/mdeberta-v3-base-tasksource-nli
Zero-Shot Classification
• 0.3B • Updated • 55
• 18
Text Generation
• Updated • 9
• 5
Text Generation
• Updated • 9
• 3
Text Generation
• Updated • 12
• 6
argilla/roberta-base-reward-model-falcon-dolly
Text Classification
• Updated • 16
• 4
Text Generation
• Updated • 6
PKU-Alignment/beaver-7b-v1.0
Reinforcement Learning
• 7B • Updated • 36
• 13
lyogavin/Anima33B-DPO-Belle-1k
Text Generation
• Updated • 1
lyogavin/Anima33B-DPO-Belle-1k-merged
Text Generation
• Updated • 13
• 12
PKU-Alignment/beaver-7b-v1.0-reward
Reinforcement Learning
• 7B • Updated • 1.17k
• 17
PKU-Alignment/beaver-dam-7b
Updated • 6.47k
• 17
PKU-Alignment/beaver-7b-v1.0-cost
Reinforcement Learning
• 7B • Updated • 1.16k
• 10