AgentRel Benchmark
Run: 2026-04-16T19:38:17.705378+00:00ยทJudge: google/gemini-2.5-flashยทStrategy: promptfoo
All sources combined
Questions
1240
Control Avg
2.22
With Skill Avg
2.69
ฮ Delta
+0.47
โ Pass
38%
+10pp vs ctrl
๐ก Partial
33%
โ Fail
30%
Score by Category
Top Skills by Impact (ฮ)
| Skill | Control | With Skill | ฮ Impact | Pass% | Passโ | Questions |
|---|---|---|---|---|---|---|
| monad/network-configโ | 0.75 | 4.30 | +3.55 | 75% | +75pp | 20 |
| mantle/mantle-network-primerโ | 2.44 | 4.28 | +1.84 | 78% | +50pp | 18 |
| dev-tooling/ethers-vs-viemโ | 2.88 | 4.00 | +1.12 | 75% | +37pp | 8 |
| mantle/mantle-risk-evaluator๐ก | 1.00 | 2.88 | +1.88 | 38% | +32pp | 16 |
| mantle/mantle-address-registry-navigator๐ก | 1.06 | 2.75 | +1.69 | 38% | +32pp | 16 |
| mantle/mantle-smart-contract-deployer๐ก | 0.92 | 3.00 | +2.08 | 25% | +25pp | 12 |
| protocols/uniswap-v3-integration๐ก | 0.88 | 2.13 | +1.25 | 25% | +25pp | 8 |
| base/l2-dev๐ก | 2.45 | 3.50 | +1.05 | 55% | +25pp | 40 |
| ethereum/defi-math๐ก | 2.33 | 3.08 | +0.75 | 58% | +25pp | 12 |
| dev-tooling/hardhat-vs-foundry๐ก | 3.25 | 3.50 | +0.25 | 50% | +25pp | 8 |