Show HN: Benchmarking LLM Agents on Consequential Real World Tasks
the-agent-company.comA benchmark that you could run locally to test out LLM & AI agents' abilities to do real-world tasks
A benchmark that you could run locally to test out LLM & AI agents' abilities to do real-world tasks