Agent Evaluation

Name: Agent Evaluation
Author: community

VERIFIED

by community

—No community reviews yet

102,230installs

Updated May 2026

Use this page as a decision snapshot for Agent Evaluation: trust signal, install momentum, real user feedback, and high-intent related pages you can compare next.

|Compare

Description

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

Official Documentation

Related landing pages

Analytics for analysts Analytics for beginners Analytics for data teams Analytics for developers Analytics enterprise Analytics for growth teams Analytics for platform teams Analytics for small business Analytics tools Analytics workflows

Compare

Compare with alternatives →

Install Agent Evaluation

Run this in your OpenClaw agent to add Agent Evaluation from the ClawHub registry.

terminal

$openclaw skills install agent-evaluation

Requires ClawHub registry access. Review the security analysis below before installing.

Editorial Summary

AI-generated

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoringwhere even top

Data Sources

53GitHub stars

65%Reddit positive

Security Analysis

⚠️警告69/100

Open Source

Code is publicly available for audit.

Community Verified

Reviewed by the ClawHub community.

Community Reviews

Real user ratings only — separate from the editorial assessment and ClawHub signal.

No community reviews yet

Editorial Assessment

This score is a SkillsReview editorial evaluation based on structured data sources. It is not the same as the community review average or the raw ClawHub score.

Editorial Assessmentby SkillsReview TeamAI-generated

3.1/5

Separate from community review averages and the raw ClawHub marketplace score.

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoringwhere even top

Pros

•Benchmarks LLM agent behaviors and capabilities
•Measures reliability and production performance
•Supports comprehensive testing for agents

Cons

•No Hacker News presence to validate community interest
•May need regular updates to match LLM advances

Data Sources

53GitHub Stars65%Reddit positive

Installed this skill? Sign in and leave the first review.

Save the skill now, come back after testing it, and help the next person choose with a quick review.

Frequently asked questions

Is Agent Evaluation safe to install?

Agent Evaluation has a SkillsReview security score of 69/100. It is open source and community-verified on ClawHub. Check the full Security Analysis on this page before installing.

How much does Agent Evaluation cost?

Agent Evaluation is free to install for OpenClaw via ClawHub.

What are the best alternatives to Agent Evaluation?

You can compare Agent Evaluation side by side with similar OpenClaw skills on the SkillsReview comparison page to find the best fit for your workflow.

How do I install Agent Evaluation?

Install Agent Evaluation from ClawHub at clawhub.ai/skills/agent-evaluation, or use the install action on this page to copy the command for your OpenClaw agent.

Community Signal

ClawHub Community Score4.34 / 5.00

Installs102,230

Last UpdateMay 11, 2026

Historical movement

Timeline plus trend snapshots for security, reviews, and reputation tilt.

Open timeline →

Beta · Data may lag

Trend Charts

30 / 90 / 180 day snapshots for ranking movement and security-score movement.

Last updated unknown UTC

Loading trend data…

Submit your review

Share your experience and help others find the best skills.

Newsletter

Stay updated on Agent Evaluation and the wider SkillsReview ecosystem

Get the weekly Top 5, fresh security alerts, and newly hot skills by email. You can unsubscribe from any newsletter email in one click.