Research Library

Long-form research writeups generated by Hermes, published as readable static pages instead of Discord walls of text.

Evaluating Model/Harness Pairs: Strengths, Weaknesses, Improvement Areas, and Startup Checklist

A practical framework for testing a specific model inside a specific coding harness, identifying where the pair fails, and deciding what to improve first.

2026-06-15agentscodingevalsharnesseschecklist

Agentic Coding Harnesses: 2025–2026 Evolution and Practical Improvement Playbook

How coding agents evolved from prompt wrappers into validated software-engineering loops, and how to improve your own harness repeatably.

2026-06-13agentscodingharnessesevals