A

Agent Evaluation

VERIFIED

by community

No community reviews yet
100,748installs
Updated May 2026

Use this page as a decision snapshot for Agent Evaluation: trust signal, install momentum, real user feedback, and high-intent related pages you can compare next.

|Compare

Description

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🤖

Editorial Summary

AI-generated

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoringwhere even top

Data Sources

53GitHub Stars
65%Reddit positive

Security Analysis

⚠️警告69/100

Open Source

Code is publicly available for audit.

Community Verified

Reviewed by the ClawHub community.

Community Reviews

Real user ratings only — separate from the editorial assessment and ClawHub signal.

No community reviews yet

Editorial Assessment

This score is a SkillsReview editorial evaluation based on structured data sources. It is not the same as the community review average or the raw ClawHub score.

📊
Editorial Assessmentby SkillsReview TeamAI-generated
3.1/5

Separate from community review averages and the raw ClawHub marketplace score.

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoringwhere even top

✅ Pros

  • Benchmarks LLM agent behaviors and capabilities
  • Measures reliability and production performance
  • Supports comprehensive testing for agents

⚠️ Cons

  • No Hacker News presence to validate community interest
  • May need regular updates to match LLM advances

Data Sources

53GitHub Stars65%Reddit positive

Installed this skill? Sign in and leave the first review.

Save the skill now, come back after testing it, and help the next person choose with a quick review.

Community Signal

ClawHub Community Score4.28 / 5.00
📥 Installs100,748
🔄 Last UpdateMay 11, 2026
🟢 Actively maintained (4d ago)
ClawHub community score is a third-party marketplace signal. It is shown separately from SkillsReview editorial assessment and real user review averages.
View on ClawHub →

Historical movement

Timeline plus trend snapshots for security, reviews, and reputation tilt.

Open timeline →
Beta · Data may lag

Trend Charts

30 / 90 / 180 day snapshots for ranking movement and security-score movement.

Last updated unknown UTC

Loading trend data…

Submit Your Review

Share your experience with the community and help others find the best skills.

Newsletter

Stay updated on Agent Evaluation and the wider SkillsReview ecosystem

Get the weekly Top 5, fresh security alerts, and newly hot skills by email. You can unsubscribe from any newsletter email in one click.

One email a week. No spam. Unsubscribe any time from the email footer.

Agent Evaluation — OpenClaw AgentSkill Review | SkillsReview