Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

45 points by mrqjr 20 hours ago

I recently built a small open-source tool to benchmark different LLM API endpoints — including OpenAI, Claude, and self-hosted models (like llama.cpp).

It runs a configurable number of test requests and reports two key metrics: • First-token latency (ms): How long it takes for the first token to appear • Output speed (tokens/sec): Overall output fluency

Demo: https://llmapitest.com/ Code: https://github.com/qjr87/llm-api-test

The goal is to provide a simple, visual, and reproducible way to evaluate performance across different LLM providers, including the growing number of third-party “proxy” or “cheap LLM API” services.

It supports: • OpenAI-compatible APIs (official + proxies) • Claude (via Anthropic) • Local endpoints (custom/self-hosted)

You can also self-host it with docker-compose. Config is clean, adding a new provider only requires a simple plugin-style addition.

Would love feedback, PRs, or even test reports from APIs you’re using. Especially interested in how some lesser-known services compare.

swyx 14 hours ago

idk what it is but buying that domain made it seem more commercial and therefore less trustworthy. also most people prob want to just use artificialanalysis' numbers rather than self run benchmarks (but this is ok if want to run your own)

mrqjr 9 hours ago

I honestly don't know how to make you feel credible about this project, it was just a demo site and didn't have any features that you had to pay to use, I just simply felt like I was making something that might be of some use to someone else as well.

vmeklis 5 hours ago

cool idea, seems like a nice twist on the LMArena

mdhb 17 hours ago

In what universe is a post created by a new account with zero comments and a grand total of 2 votes over the course of 2 hours doing on the front page?

bdangubic 13 hours ago

I am polishing up my blog about some FORTRAN code I wrote last week in hopes of the same :)
iRomain 17 hours ago

LLM
mrqjr 9 hours ago

I wish I could see some of your code besides tech gossip and financial news.
vntok 13 hours ago

It's an informative post about new tech, that fits pretty well here of all places.
Why would you want the author to write about something else to validate the post? That would be an appeal to authority, which is the complete opposite of what the Hacker Manifesto has always been about in terms of ethos, goals, etc.