A

Agent Evaluation

VERIFIED

by community

(0 reviews)
74,209installs
Updated Feb 2026

Description

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

Security Analysis

⚠️警告66/100

Open Source

Code is publicly available for audit.

Community Verified

Reviewed by the ClawHub community.

User Reviews

No ratings yet

No reviews yet. Be the first!

Community Signal

ClawHub Score3.15 / 5.00
📥 Installs74,209
🔄 Last UpdateFeb 25, 2026
🟡 Recently updated (33d ago)
View on ClawHub →

Submit Your Review

Share your experience with the community and help others find the best skills.