Hyperparameter optimisation with Optuna and Claude Code on Hugging Face Jobs

Community Article Published December 13, 2025

Last week, @burtenshaw and @evalstate showed how to get Claude Code to fine-tune a model on Hugging Face Jobs. That got me thinking: if Claude can run a single training job, why not let it run dozens -- systematically searching for optimal hyperparameters?

I'm not really a fan of vibe coding (no judgment -- I'm just faster without!), but Claude Code excels at something different: coordinating complex, multi-step processes. It's essentially a universal agent orchestrator. And hyperparameter optimisation is exactly the kind of tedious, repetitive coordination task that's perfect for delegation.

So I built an optuna-hpo skill that lets Claude Code run distributed HPO studies on Hugging Face Jobs. Here's what that looks like in practice.

The Setup

We're going to fine-tune Qwen/Qwen2.5-0.5B on a 2k subsample of wikitext-2-raw-v1, searching for optimal:

  • Learning rate (log scale: 1e-5 to 5e-4)
  • Batch size (4 or 8)
  • LoRA rank (8, 16, or 32)
  • Weight decay (0 to 0.1)
  • Warmup ratio (0 to 0.1)

Budget: $5. Hardware: T4 GPU ($0.75/hr on HF Jobs).

The Magic: One Prompt

Here's the entire interaction:

> Hi, Claude. Can you finetune Qwen/Qwen2.5-0.5B on a 2k subsample of
> wikitext/wikitext-2-raw-v1 using your optuna-hpo skill to figure out
> the ideal training hyperparameters? Your budget is $5. Do it on
> Hugging Face Jobs. And please launch the Gradio dashboard for me, too.

Claude responds:

I'm using the optuna-hpo skill to guide hyperparameter optimization
for fine-tuning Qwen2.5-0.5B on wikitext-2-raw-v1.

And then it just... does it. Creates an Optuna study. Generates trial scripts. Submits jobs to HF Jobs. Polls for completion. Extracts metrics. Launches a dashboard. All automatically.

What Actually Happens

Under the hood, Claude:

  1. Verifies prerequisites -- checks HF authentication, estimates costs
  2. Creates an Optuna study with TPE sampling and MedianPruner for early stopping
  3. Generates a training script for each trial (complete with TRL/SFT, LoRA, callbacks)
  4. Submits each trial as a HF Job using run_uv_job()
  5. Polls job status and extracts eval_loss from logs
  6. Reports back to Optuna so it can suggest smarter hyperparameters
  7. Tracks budget and stops when exhausted

Each trial runs on its own cloud GPU. The orchestrator handles rate limits, retries, and timeouts.

Real results

After just two trials (~$0.18 spent):

Trial Learning Rate Batch LoRA r Weight Decay Warmup eval_loss
0 1.28e-05 4 32 0.085 0.044 3.056
1 9.78e-05 4 8 0.092 0.064 2.869

Trial 1 already found a 6% improvement by using a higher learning rate and smaller LoRA rank. Optuna's TPE sampler is learning.

image

The Gradio dashboard (launched automatically at localhost:7860) shows optimisation history, parameter importance, and trial details -- refreshing as new results come in.

Why this matters

The boring answer: HPO is tedious. You write scripts, submit jobs, wait, parse logs, record results, repeat. Automating this saves hours.

The interesting answer: This is what agent-assisted development should look like. Not "AI writes my code" but "AI handles the coordination while I do the thinking." I decided the search space. I set the budget. Claude handled the plumbing. The skill system makes this repeatable: the same prompt next week gets the same careful workflow -- cost estimates, budget checks, proper cleanup. It's not a one-off hack but a reusable capability.

Try it yourself

Prerequisites

  • Hugging Face account with Jobs access (Pro/Team/Enterprise)
  • HF_TOKEN exported in your shell
export HF_TOKEN=hf_...

Install the skill

# Add the HF skills marketplace (if not already added)
claude plugin marketplace add huggingface/skills

# Install the HPO skill
claude plugin install optuna-hpo@huggingface-skills

Run it

Just ask Claude to optimise your model. Be specific about:

  • Model and dataset
  • Budget (in dollars)
  • What hyperparameters to search
  • Hardware preference

The skill will handle the rest.

newplot

What's next

The current implementation supports SFT only -- DPO and GRPO are on the roadmap, and should be coming quite soon. The architecture is extensible: adding a new training method means no more than writing a new script template, which shouldn't be terribly arduous. The Gradio UI generally does the job but is, uh -- let's not mince words, it's god-awful ugly. Consider it its ugly Christmas sweater, to be shed come the new year for something nicer.

You may also be interested in this great post by @sigridjineth on using Claude Code as an experiment orchestrator.

Community

Sign up or log in to comment