Hyperparameter optimisation with Optuna and Claude Code on Hugging Face Jobs
Last week, @burtenshaw and @evalstate showed how to get Claude Code to fine-tune a model on Hugging Face Jobs. That got me thinking: if Claude can run a single training job, why not let it run dozens -- systematically searching for optimal hyperparameters?
I'm not really a fan of vibe coding (no judgment -- I'm just faster without!), but Claude Code excels at something different: coordinating complex, multi-step processes. It's essentially a universal agent orchestrator. And hyperparameter optimisation is exactly the kind of tedious, repetitive coordination task that's perfect for delegation.
So I built an optuna-hpo skill that lets Claude Code run distributed HPO studies on Hugging Face Jobs. Here's what that looks like in practice.
The Setup
We're going to fine-tune Qwen/Qwen2.5-0.5B on a 2k subsample of wikitext-2-raw-v1, searching for optimal:
- Learning rate (log scale: 1e-5 to 5e-4)
- Batch size (4 or 8)
- LoRA rank (8, 16, or 32)
- Weight decay (0 to 0.1)
- Warmup ratio (0 to 0.1)
Budget: $5. Hardware: T4 GPU ($0.75/hr on HF Jobs).
The Magic: One Prompt
Here's the entire interaction:
> Hi, Claude. Can you finetune Qwen/Qwen2.5-0.5B on a 2k subsample of
> wikitext/wikitext-2-raw-v1 using your optuna-hpo skill to figure out
> the ideal training hyperparameters? Your budget is $5. Do it on
> Hugging Face Jobs. And please launch the Gradio dashboard for me, too.
Claude responds:
I'm using the optuna-hpo skill to guide hyperparameter optimization
for fine-tuning Qwen2.5-0.5B on wikitext-2-raw-v1.
And then it just... does it. Creates an Optuna study. Generates trial scripts. Submits jobs to HF Jobs. Polls for completion. Extracts metrics. Launches a dashboard. All automatically.
What Actually Happens
Under the hood, Claude:
- Verifies prerequisites -- checks HF authentication, estimates costs
- Creates an Optuna study with TPE sampling and MedianPruner for early stopping
- Generates a training script for each trial (complete with TRL/SFT, LoRA, callbacks)
- Submits each trial as a HF Job using
run_uv_job() - Polls job status and extracts
eval_lossfrom logs - Reports back to Optuna so it can suggest smarter hyperparameters
- Tracks budget and stops when exhausted
Each trial runs on its own cloud GPU. The orchestrator handles rate limits, retries, and timeouts.
Real results
After just two trials (~$0.18 spent):
| Trial | Learning Rate | Batch | LoRA r | Weight Decay | Warmup | eval_loss |
|---|---|---|---|---|---|---|
| 0 | 1.28e-05 | 4 | 32 | 0.085 | 0.044 | 3.056 |
| 1 | 9.78e-05 | 4 | 8 | 0.092 | 0.064 | 2.869 |
Trial 1 already found a 6% improvement by using a higher learning rate and smaller LoRA rank. Optuna's TPE sampler is learning.
The Gradio dashboard (launched automatically at localhost:7860) shows optimisation history, parameter importance, and trial details -- refreshing as new results come in.
Why this matters
The boring answer: HPO is tedious. You write scripts, submit jobs, wait, parse logs, record results, repeat. Automating this saves hours.
The interesting answer: This is what agent-assisted development should look like. Not "AI writes my code" but "AI handles the coordination while I do the thinking." I decided the search space. I set the budget. Claude handled the plumbing. The skill system makes this repeatable: the same prompt next week gets the same careful workflow -- cost estimates, budget checks, proper cleanup. It's not a one-off hack but a reusable capability.
Try it yourself
Prerequisites
- Hugging Face account with Jobs access (Pro/Team/Enterprise)
HF_TOKENexported in your shell
export HF_TOKEN=hf_...
Install the skill
# Add the HF skills marketplace (if not already added)
claude plugin marketplace add huggingface/skills
# Install the HPO skill
claude plugin install optuna-hpo@huggingface-skills
Run it
Just ask Claude to optimise your model. Be specific about:
- Model and dataset
- Budget (in dollars)
- What hyperparameters to search
- Hardware preference
The skill will handle the rest.
What's next
The current implementation supports SFT only -- DPO and GRPO are on the roadmap, and should be coming quite soon. The architecture is extensible: adding a new training method means no more than writing a new script template, which shouldn't be terribly arduous. The Gradio UI generally does the job but is, uh -- let's not mince words, it's god-awful ugly. Consider it its ugly Christmas sweater, to be shed come the new year for something nicer.
You may also be interested in this great post by @sigridjineth on using Claude Code as an experiment orchestrator.

