← Registry

evaluating-llms-harness

Community

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

Install

skillpm install evaluating-llms-harness

Format score

95/100

Spec

v1.0

Installs

0

Published

April 1, 2026