AgentRel Benchmark

Run: 2026-04-16T19:38:17.705378+00:00ยทJudge: google/gemini-2.5-flashยทStrategy: promptfoo

All sources combined

Questions

1240

Control Avg

2.22

With Skill Avg

2.69

ฮ” Delta

+0.47

โœ… Pass

38%

+10pp vs ctrl

๐ŸŸก Partial

33%

โŒ Fail

30%

Score by Category

Top Skills by Impact (ฮ”)

SkillControlWith Skillฮ” ImpactPass%Passโ†‘Questions
monad/network-configโœ…0.754.30+3.5575%+75pp20
mantle/mantle-network-primerโœ…2.444.28+1.8478%+50pp18
dev-tooling/ethers-vs-viemโœ…2.884.00+1.1275%+37pp8
mantle/mantle-risk-evaluator๐ŸŸก1.002.88+1.8838%+32pp16
mantle/mantle-address-registry-navigator๐ŸŸก1.062.75+1.6938%+32pp16
mantle/mantle-smart-contract-deployer๐ŸŸก0.923.00+2.0825%+25pp12
protocols/uniswap-v3-integration๐ŸŸก0.882.13+1.2525%+25pp8
base/l2-dev๐ŸŸก2.453.50+1.0555%+25pp40
ethereum/defi-math๐ŸŸก2.333.08+0.7558%+25pp12
dev-tooling/hardhat-vs-foundry๐ŸŸก3.253.50+0.2550%+25pp8

๐ŸŒ By Ecosystem (Overall)

Questions

236

Control Avg

3.20

With Skill Avg

3.43

ฮ” Delta

+0.23

โœ… Pass

53%

+6pp vs ctrl

๐ŸŸก Partial

32%

โŒ Fail

15%

Score by Category

Top Skills by Impact (ฮ”)

SkillControlWith Skillฮ” ImpactPass%Passโ†‘Questions
dev-tooling/ethers-vs-viemโœ…2.884.00+1.1275%+37pp8
protocols/uniswap-v3-integration๐ŸŸก0.882.13+1.2525%+25pp8
ethereum/defi-math๐ŸŸก2.333.08+0.7558%+25pp12
dev-tooling/hardhat-vs-foundry๐ŸŸก3.253.50+0.2550%+25pp8
standards/erc-account-standards๐ŸŸก2.803.70+0.9050%+20pp10
defi/amm-lending-patternsโœ…4.174.50+0.3383%+16pp12
standards/erc-signature-standardsโœ…4.404.70+0.3090%+10pp10
ethereum/ethskills-concepts๐ŸŸก3.313.62+0.3158%+8pp26
security/oracle-price-manipulation๐ŸŸก1.671.83+0.1625%+8pp12
standards/sdk-migration-guide๐ŸŸก2.753.00+0.2545%+5pp20