SonarSource
/

SonarSweep-java-gpt-oss-20b

@@ -217,6 +217,8 @@ We trained LoRA adapters across all linear layers of the experts and attention b
 ## Evaluation
 ### Code Quality
 We used SonarQube to evaluate the quality, verbosity, and complexity of Java code generated for the [ComplexCodeEval](https://github.com/ComplexCodeEval/ComplexCodeEval) and [MultiPL-E Java](https://huggingface.co/datasets/nuprl/MultiPL-E/viewer/humaneval-java) benchmarks.
@@ -225,7 +227,7 @@ The fine-tuned and base models achieve a similar pass@1 metric for code generati
 The fine-tuned model achieves this metric while generating fewer lines of code.
-For code quality, we see a dramatic reduction in both the number and density of Sonar issues, split among bugs, security vulnerabilities, and code smells (see the [Glossary](#glossary) for definitions).
 | Metric | Base Model | Fine-tuned Model |
 |--------|------------|------------------|

 ## Evaluation
+For a comprehensive analysis with detailed metrics and additional comparisons between the base model and fine-tuned model, see our [detailed evaluation report](https://huggingface.co/SonarSource/SonarSweep-java-gpt-oss-20b/blob/main/report.pdf).
 ### Code Quality
 We used SonarQube to evaluate the quality, verbosity, and complexity of Java code generated for the [ComplexCodeEval](https://github.com/ComplexCodeEval/ComplexCodeEval) and [MultiPL-E Java](https://huggingface.co/datasets/nuprl/MultiPL-E/viewer/humaneval-java) benchmarks.
 The fine-tuned model achieves this metric while generating fewer lines of code.
+For code quality, we see a dramatic reduction in both the number and density of Sonar issues, split among bugs, security vulnerabilities, and code smells (see the [Glossary](#glossary) for definitions). For granular breakdowns by issue type and severity, refer to the [detailed evaluation report](https://huggingface.co/SonarSource/SonarSweep-java-gpt-oss-20b/blob/main/report.pdf).
 | Metric | Base Model | Fine-tuned Model |
 |--------|------------|------------------|