A
Agent Evaluation
VERIFIED
by community
—(0 reviews)
74,209installs
Updated Feb 2026
Description
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Security Analysis
⚠️警告66/100
Open Source
Code is publicly available for audit.
Community Verified
Reviewed by the ClawHub community.
User Reviews
No ratings yet
No reviews yet. Be the first!
Community Signal
⭐ ClawHub Score3.15 / 5.00
📥 Installs74,209
🔄 Last UpdateFeb 25, 2026
🟡 Recently updated (33d ago)
Submit Your Review
Share your experience with the community and help others find the best skills.