The Mythical IO-Bound Rails App

byroot.github.io

225 points by thunderbong 4 days ago

jerf 4 days ago

Another thing I have seen that sneaks up on dynamic scripting language programs is that maybe at first, your code is mostly blocked on database. You prototype something that mostly gets some stuff from the DB and blasts it out on optimized C-based code paths (like JSON encoding) without hardly even looking at it that are fairly fast.

But then you need some authentication. And you need authorization, and more deeply than just "yea" or "nay" but doing non-trivial work to determine on some complex resource tree what is and is not permitted. And then you're taking a couple of queries and putting it together with some API call results in scripting code with some non-trivial logic. And then it grows beyond your dev prototype of 10 values to 100,000. And you use a few convenient features of your language without realizing the multiplicative slowdown they impose on your code. You add some automated plugins for turning DB values into these other things without realizing how much that takes. And you add some metadata-driven interpreter action based on what the user is asking for. And one feature at a time, you just keep stacking more and more on.

And it can really sneak up on you, but in a mature scripting language codebase it's really easy for it to not just not be blocked on database IO, but for IO to ultimately be such a tiny portion of the runtime that even if the IO was completely free that you'd still have a slow page.

Computers are very fast, and very good at what they do, and there's a lot of power to play with in a lot of cases... but it isn't always quite as much as programmers can think it is.

swatcoder 4 days ago

You're not really wrong, and what you're talking about is not really related to language choice.
What its related to is the narrative you shared in your second paragraph, which I think you were writing mostly for color, perhaps seeming to you as the only or necessary story of how software gets made.
"And then", "And then", "And then"... "One feature at a time, you just keep stacking more and more on."
There's no foresight. There's no planning. There's no architecture. There's no readiness. There's no preparation. Tomorrow's needs are always beyond the horizon and today's are to be approached strictly on their own terms.
If you've taught software development to anyone, you're used to seeing this behavior. Your student doesn't know what they don't know (because you haven't taught them yet), and they're overwhelmed by it, so they just scramble together whatever they can to make it past the next most immediate hurdle.
Much (not all) of the industry has basically enshrined that as the way for whole teams and organizations to work now, valorizing it as an expression of agility, nimbleness, and efficiency, and humility.
But it inevitably results in code and modules and programs and services and systems where nothing fully snaps together and operational efficiency is lost at every interface, and implementations for new features/challenges need to get awkwardly jammed in as new code rather than elegantly fleshed out as prefigured opportunities.
So you're right that most modern projects eventually just become swamped by these inefficiencies, with no ripe ways to resolve them. But it's not because of Rails vs Go or something, it's because pretty much everyone now aspires to build cathedrals without committing to any plans for them ahead of time.
What they gain for that is some comfort against FOMO, because they'll never face yesterday's plan conflicting with tomorrow's opportunity (since there is no plan). But they lose a lot too, as you call out very well here.
- jerf 2 days ago
  
  Dynamic scripting languages are orders of magnitude worse than static languages at this problem, though, for several reasons. Their culture tends to encourage these sorts of operations. They're slower than static languages in the first place, meaning every additional layer is that much more expensive on the clock. Their feature sets afford this style of programming, whereas static languages afford much lower cost features, if not outright zero-cost features.
  If you're going to be concerned about the sort of things you mention, then as an engineer, you need to turn those concerns into significant negative value placed on the option of dynamic scripting languages.
  Which is not to say that static languages are immune... indeed, I marvel at times at the effort and ingenuity put into bringing these issues at great cost into the static languages. Log4Shell, for instance, fundamentally stems from Java, with great effort, importing a very dynamic-scripting-language style reflection feature from the dynamic world, and then getting bitten hard by it. That's not a performance issue in this case, just a vivid example of that sort of thing. You can with enough effort layer enough frameworks and indirection in any language to make anything slow.
- nrr 4 days ago
  
  "There's no foresight. There's no planning." Couple that with "as an expression of agility," and it really rings true to me. I've worked in enough shops where the contractual obligations preclude any ability to slow down and put together a plan. A culture where you're forced to go from an angry phone call from the suits to something running in production in mere hours is a culture that finds building bookcases out of mashed potatoes acceptable.
  The best environment I've ever worked in was, ironically enough, fully invested in Scrum, but it wasn't what's typical in the industry. Notably, we had no bug tracker[0], and for the most part, everyone was expected to work on one thing together[1]. We also spent an entire quarter out of the year doing nothing but planning, roleplaying, and actually working in the business problem domain. Once we got the plan together, the expectation was to proceed with it, with the steps executed in the order we agreed to, until we had to re-plan[2].
  With the rituals built in for measuring and re-assessing whether our plan was the right one through, e.g., sprint retrospectives, we were generally able to work tomorrow's opportunity into the plan that we had. With the understanding that successfully delivering everything we'd promised at the end of the sprint was a coin toss, if we were succeeding a lot, it gave us the budget to blow a sprint or two on chasing FOMO and documenting what we learned.
  0: How did we address bugs without a bug tracker? We had a support team that could pull our andon cord for us whenever they couldn't come up with a satisfactory workaround (based on how agonizing it was for everyone involved) to behavior that was causing someone a problem. Their workarounds got added to the product documentation, and we got a product backlog item, usually put at the top of the backlog so it'd be addressed in the next sprint, to make sure that the workaround was, e.g., tested enough such that it wouldn't break in subsequent revisions of the software. Bad enough bugs killed the sprint and sent us to re-plan. We tracked the product backlog with Excel.
  1: Think pairing but scaled up. It's kinda cheesy at first, but with everyone working together like this, you really do get a lot done in a day, and mentoring comes for free.
  2: As it went: Re-planning is re-work, and re-work is waste.
  - randysalami 4 days ago
    
    Sounds amazing! Do you still work there now?
    
    nrr 3 days ago
    
    No, I left the industry.
- ljm 4 days ago
  
  This has been a real challenge in the early startup setting where everything is built to a prototypical standard with the belief that you’re basically buying time until runway is no longer a problem.
  And whatever truth there is in that, it’s all out of balance and lacks pragmatism. Sometimes the pragmatic choice isn’t to keep slapping shit together, it’s to step back and see what the big picture is. Even if you don’t have PMF yet, you need to design for something on the off chance it succeeds or you need to pivot.
  And it’s not even a pure tech issue, it can be as simple as thinking how a team of engineers could contribute, how you could onboard someone new and have them up and running quickly, what happens if you fire someone who kept everything in their head…
hinkley 4 days ago

I worked on a NodeJS app that was cpu bound long before I got there. Wages of making a zero code SaaS and thinking they could dynamically render all pages and charge customers a premium for it forever. Started losing out to simpler alternatives.
At one point I shaved almost 20% off TTFB just by cutting the setup overhead of all that telemetry, tracing, logging, feature toggle, etc code that had become idiomatic and set up first thing every time you instantiated an object. By my estimate I cut the overhead by less than 2/3 (couple big dumb mistakes, and less than 1/2 of what was left after), so that’s still over 1/8th of the overall page weight. All before a single service call. But at least I think I pulled the slope of the perf regression line down below the combined trend of faster VMs and faster EC2 instances.
Bookkeeping and other crosscutting concerns add up, particularly when every new feature gets it. It’s that old joke about how you can put so many gauges on an engine that you know its precise speed at all times, but that speed is 0.00.
snovv_crash 4 days ago

I guess this is part of the reason for C++ having zero-cost abstractions.
- pjmlp 3 days ago
  
  Kind of, the real reason is that C with Classes was Bjarne Stroustoup's "Typescript for C", he wasn't going to use C bare bones after his Simula to BCPL downgrade experience.
  However to be able to do that, C with Classes abstractions needed to perform just as well as writing raw C.

freedomben 4 days ago

I was skeptical given the title that the article could defend it's claim, but the article was strong and made great points. The only thing I would add (which I concur probably doesn't belong in the article itself):

Most databases make it trivial to see how long your queries are taking. Use that to help you identify the performance bottlenecks! If a request/response cycle takes 200 ms from the entrypoint of your rails server, and the database query reported (from the db) took 1 ms, that's a whole lot different than if the db query took 150 ms.

On scaling in general, in my experience inefficient queries are the vast majority of performance problems for most apps. It's usually fairly easy to identify these as different endpoint will perform very differently so you can get an idea of the baseline and identify aberrations. If that's happening then check your queries. If the whole app (baseline) is slow, TFA has phenonmenal guidance.

Azerty9999 4 days ago

There is a highly relevant follow-up post by the author here, too: https://byroot.github.io/ruby/performance/2025/01/23/io-inst...

scosman 4 days ago

“CPU starvation looks like IO to most eyes”

Yes! And add in that you are holding a database connection for longer than needed, putting scaling pressure on your DB, which is the harder thing to scale.

WJW 4 days ago

Having the connection for longer than needed only puts more pressure on databases on which connections are a scarce resource (basically only postgres, and even then it's easily fixed with pgbouncer). When using a database with a better connection model, holding a connection too long will actually decrease total load on the database because each connection is idle for a larger percentage of the time.
- CoolCold 4 days ago
  
  Isn't pgbouncer putting extra limitations like prepared statemens use ? Genuine question
PaulHoule 4 days ago

I'm not sure if it is the same phenomenon but I was really disappointed with the performance of a python aiohttp server [1] when I went from the application serving an html page with maybe 2-3 small images that would stick in the cache (icons) and instead was serving 20-50 photographic images per page, particularly over the wrong direction of an ADSL connection.
I thought this was low CPU but it was not low CPU enough, in the end I switched to gunicorn (running in WSL) with IIS fronting and serving the images (woulda been nginix in a pure-Linux environment) I think the tiny amount of CPU to serve the images was still getting blocked by requests that used a little more CPU. With gunicorn I'm not afraid to add tasks that crunch on the CPU a little harder too.
[1] built a lot of mini-apps that worked great before
- CoolCold 4 days ago
  
  Everything is better with Nginx!
  As sysadmin I've seen many cases when not letting backend to serve static files drops latency and load significantly. In the worst case, X-accel-redirect is still better than serving through most of the frameworks.

mike_hearn 4 days ago

That's basically right except it's worth noting the belief that RDBMS can't scale horizontally is wrong. Most RDBMS engines can't do that, but an example of one that can is Oracle. RAC clusters scale write masters horizontally up to fairly large cluster sizes (e.g. 32 write masters works OK) and obviously read replicas can be added more or less indefinitely.

RAC isn't sharding. The Oracle JDBC drivers do support automatic sharding across independent databases or clusters too, but RAC is full horizontal scaling of full SQL including joins.

It's also worth noting that if your Ruby app is CPU constrained you could take a look at TruffleRuby. It runs on the JVM, has a full JIT compiler more powerful than YJIT, can deal with native modules, and doesn't have a global interpreter lock (threads do really run in parallel).

packetlost 4 days ago

This is what I mean when I say Oracle's DB product is actually pretty good. Say what you will about Oracle the company, but their core products are good at their respective domains.
- pjmlp 3 days ago
  
  My favourite DBs are Oracle and SQL Server, and no FOSS religion is going to change my mind.
  Too many folks lose on great technology because of hating the man, 70's hippie style.
  And yes, I have used Postgres and MySQL and such.
  It is similar to all the programming languages that have never moved beyond raw command line and basic editor tooling, only better nowadays, because of Microsoft (that they hate) making VSCode (driven by Eric Gamma of Eclipse fame), and LSP a common thing for all workloads.
  - ksec 3 days ago
    
    >My favourite DBs are Oracle and SQL Server, and no FOSS religion is going to change my mind.
    I mean even mentioning MySQL is better than Postgres in some areas is not even welcomed on most of the tech internet including on HN. MySQL reached 9.0 after so many years and its news doesn't even reached HN's front page.
    >Too many folks lose on great technology because of hating the man, 70's hippie style......... It is similar to all the programming languages
    And It is not just PL. It is pretty much all across tech spectrum, and even beyond tech. I am a tech enthusiasts, not an PL, Editor, DB enthusiasts. We should be able to admit Oracle and SQL Server are better now but we strive to make similar if not better DB that is open source.
- mike_hearn 4 days ago
  
  Yeah there's a lot of other neat tricks you can do with it. Someone else in this thread observes that a major issue for most Rails apps is latency. Loading a page does 100 blocking queries serially which means 100 roundtrips (minimum), the latency from that adds up and before you know it your fast computers are taking seconds to render a page.
  Although it often gets lost inside app frameworks, you can use the Oracle drivers to dispatch every query at once in one request, or even just execute a stored procedure that has all the queries for a page registered inside the DB then work through the results from each query client-side. So you can really optimize the latency heavily if you can structure your app in such a way that it can exploit such things. JDBC supports this for instance but Active Record doesn't.
tcoff91 4 days ago

Due to Amdahl's law, you surely don't get 32x write throughput from having 32 write masters however.
- mike_hearn 4 days ago
  
  It depends a lot on what the query patterns are. If your writes are naturally spread out then yes you can get close because the 32x will be writing to 32 different parts of the database and there won't be cache blocks pingponging around. If all the writes are contending on the same parts of the same tables then you won't get 32x indeed because your queries will be waiting on each other to release locks etc.
  At least that's my understanding. I've not tuned an app that runs on such a cluster.
- yuliyp 4 days ago
  
  Amdahl's law doesn't talk about throughput, but latency. You can absolutely get 32x more throughput by having 32 parallel backends. Each individual write won't be 32x faster e2e, but that's not really the point.
  - tcoff91 3 days ago
    
    But what about when multiple nodes have to coordinate to do a write?
pas 2 days ago

Even 10y ago MySQL Galera Cluster was providing pretty good write scalability. (Without the need for PhD in Oracleolgy, and without donating your kidney to Larry.)
Andys 4 days ago

CockroachDB can too, although its licensing has made it unpopular.

phamilton 4 days ago

Beyond application performance, I've also found a non-trivial drag from the complexity of scaling Rails horizontally. Databases need connection pools, which must be external. Processes vs threads ratio need to be tuned and retuned and the application evolves. Deployments take longer to fully roll out a hundred containers. It's a bunch of tiny bits of complexity that add up.

Moving to a fully concurrent runtime (rust, golang, JVM, beam, etc) suddenly makes so much of that no longer a concern.

pqdbr 3 days ago

For someone that is experienced with Rails but not with the other runtimes mentioned, how do they eliminate the need for db connection pools and speed up deployments across nodes?
- phamilton 2 days ago
  
  To handle 25k qps with Rails, assuming 500ms per request, you need 12k Ruby processes. That probably means you run on 200 boxes with 64 CPUs each. Maybe you get a little bit of oversubscribing of processes to CPUs due to I/O. Conservatively it's still 100 boxes.
  To handle 25k qps with Rust, you can run 10 boxes. Maybe less.
  Deployments are complicated (even when we wrap them in simple abstractions). A "rare" delay that impacts 1% of boxes during a deploy will become common on 100 boxes but still somewhat rare on only 10.
  Running a single rust process per box allows a connection pooler to be in-process. It's more efficient, it supports postgres prepared statements, it's one less operational headache, etc.
- throw_aw100 2 days ago
  
  I don't have experience with Rails, but I have experience with the other runtimes.
  I read their comment as needing an external database pool to scale, not that they don't need a pool.
  When running JVM or the other runtimes, the database pool can be part of the application itself because it uses a different threading model.
  - choilive 2 days ago
    
    Rails also has an application level connection pool for the database but as I understand a connection pooler (pgbouncer et al) between PG and the application servers is still necessary when horizontally scaling.

pdhborges 4 days ago

Very nice discussion. A lot of times I see people assuming that Web Apps are IO bound anyway to justify technical decisions. Nevertheless here am I looking at our stats: 60% is python, 40% is IO and Postgres is getting faster than Python at a faster pace.

nokun7 4 days ago

Great breakdown of the "IO bound Rails app" myth! I really appreciate the focus on profiling and understanding actual application behavior rather than relying on assumptions. The points about threading, database interactions, and context-switch overhead are spot on and offer a fresh perspective on how performance issues are often misunderstood.

It's interesting how easy it is to default to surface-level fixes instead of addressing the root cause. Have you thought about adding a case study or some benchmarks for common workloads? It could really help teams put these ideas into practice.

byroot 4 days ago

The lobste.rs maintainer kindly offered to collect data in production: https://github.com/lobsters/lobsters/pull/1442
I haven't taken the time to fully analyze the data, but surface level show lobster.rs is indeed ~40% IO, as suspected: https://gist.github.com/byroot/e1fcfd1f8e172f5d9c5bfecefebdb...

pjmlp 3 days ago

Fully spot on, that is why using Tcl with lots of native extensions might have been a great solution in the late 90's dotcom wave, we quickly validated with our rewrite into .NET, still beta at the time, the huge difference that having a JIT in the box makes.

Hence why afterwards I never been a fan of any language for deployment production without AOT or JIT tooling on the reference implementation.

jupp0r 4 days ago

When people say these apps are IO bound, what they actually mean is that they are memory bound. While Rails is waiting for IO due to its lack of concurrency support and the ubiquitous use of global variables in the ecosystem, a passenger worker is taking up >300mb of memory for a medium sized Rails app. Server memory limits the amount of workers you can have waiting like that, which in turn limits overall throughput per server.

byroot 4 days ago

That may be true for very memory constrained platforms like Heroku, but is a non issue at larger scale.
Taking your figure of 300MiB per process, and assuming you might want to run 1.5 process per core to account for IO-wait, that's 450MiB per core (ignoring copy-on-write).
If you look at various hosting offering, most offer something like 4GiB of RAM per core (e.g. EC2 "general purpose" instances).
As for the lack of concurrency support, Active Record has asynchronous queries since a few versions, and is now even async/fiber compatible. So if you truly have an app that would benefit from more concurrency, you can do it.
caseyohara 4 days ago

That you mention Passenger makes me wonder how outdated your comment is. The most recent Rails community survey reported only 13% using Passenger.
https://railsdeveloper.com/survey/2024/
marcosdumay 4 days ago

They mean the app takes data from one place, does a trivial computation, and pushed it at another place. So pushing data around is all the application does.
And they don't know both such places are actually faster than the trivial computation because their code runs way slower than the ones at the origin and destination.
pydry 4 days ago

No, it just means that they spend most of their time waiting around for database and network queries to finish.

karmakaze 4 days ago

The discussion in the referenced Rails issue "Set a new default for the Puma thread count"[0] to me is much more telling of IO vs CPU than this simplified post.

There are many assumptions here especially not considering anything about the database itself. The Rails issue also considers benchmarks which when benchmarking Rails would configure the database to not be the bottleneck. That's not true of real systems. The advice to use async queries for an IO-bound app could backfire if the reason the queries are slow is because the database is overloaded--adding concurrent queries only increases its thrashing and latency.

The best thing to do is consider the whole system. Don't throw the everything that's not the Rails app as IO. Is it actually doing network IO, or is it CPU, memory, or IO bound in the database? Maybe its not even a lack of CPU proper on the database but that it's being wasted on write contention/locks. Only then will you be able to choose the right course. Another way to go is blindly try different configurations and use what works well without full understanding, which is fine until you have an outage you can't explain then scale everything just-in-case.

The ways I've seen this play out:

- small apps (few users) just need good schemas and queries

- medium apps scale(*) webapp instances that solves client CPU & memory and vertically scale single database writer instance

- large apps scale database via sharding or federation/microservices

- other apps may not be able to shard a single tenant so also scale via federation/microservices

(*) Even if webapps are the CPU/memory bottleneck, overscaling them can make the problem of too many connections to the database for long held transactions.

[0] https://github.com/rails/rails/issues/50450

hamilyon2 4 days ago

There is a certain scale at which this may ring true. This also may by a a tautological statement: when your architecture is right, your tables are well designed out, your queries are optimized to the limit, and you are smart with caching. Of course, now CPU is your problem.

hyperman1 4 days ago

I have a chain of applications. A postgres spits out data. A java application makes a CSV from it. A apache+php application takes that csv, removes the first column (we don't want to publish the id), then sends it to the destination. Both postgres and java do significant transformations, but the bottleneck for that chain is the PHP code. I already optimized it, and it is now literally 3 lines of code: read line, remove everything before first ',' , write line. This speeds up things enormously from the previous php doing fgetcsv/remove column/fputcsv, but still removing the PHP (keeping only apache ) from the chain doubles the speed of the csv download.

weaksauce 4 days ago

why does the java application/postgres output the id instead of omitting it in the first place?
- aetherson 4 days ago
  
  Adding PHP to your stack in order to drop a column from a result is a WILD decision.
  - throw_aw100 2 days ago
    
    I don't want to be a "back seat driver", but it seems strange to me as well.
    It could be that the original files are used by other processes, and that they for some reason don't want to create two separate files. Maybe an issue with office politics (works on a different team), or an attempt at saving disk space

gtirloni 4 days ago

I didn't get the point about YJIT only helping 15-30% in real apps while IO-less benchmarks see 2-3x. It seems like a counterpoint to the main argument of the article.

Instead of guessing if the DB is performing badly by looking at Rails, why not go directly to the DB?

SkiFire13 4 days ago

The point was that 15-30% is a very big speedup for a change that impacts what is supposed to take a very small part of the total time. See also Amdahl's law [1]
Suppose that the CPU-intensive part of the program was actually speed up by 3x. This means that previously it took 1.5x as much time as the one that was saved, so thoughly on the order of 22-45% of the total time of your program. In practice the speedup was probably lower than 3x, which means the 22-45% figure would be even higher.
These are really really high number for a system that was supposed to spend most of the time (>90%) blocked in IO (which was not speed up by YJIT).
For comparison, if the CPU-intensive part actually took 10% of the time, then 10% would be maximum speedup you could get (which is way lower than 15-30%) and a 3x speedup would only result in a ~6% speedup.
[1]: https://en.m.wikipedia.org/wiki/Amdahl%27s_law
byroot 4 days ago

Someone else pointed the same thing on bluesky, so I guess this part of my post isn't very clear.
First, I/O bound isn't necessarily a very well defined term, but generally speaking, it implies that you can't substantially speedup such system by speeding up code execution. The fact that YJIT did yield substantial performance improvement in the real world suggest many Rails apps aren't in fact strictly I/O bound.
Now about the 15-30% vs 2-3x, what I mean by that is the benchmark where YJIT yield this much are mostly micro-benchmarks, on more complex and heterogenous code, the gains are much smaller, hence we can assume the "Ruby" part of these applications wasn't improved by this much.
So for YJIT to be able to yield `15-30%` latency gains, this latency must have been in large part composed of Ruby execution. One last thing to note is that YJIT can only speedup pure Ruby code, in a Rails application, a large part of the CPU time isn't in pure Ruby code, but in various methods implemented in C that YJIT can't speedup.
Ultimately the lobste.rs maintainer kindly offered to run an experiment in production, so we should soon see if my assumptions hold true or if I was way off, at least for that particular app: https://github.com/lobsters/lobsters/pull/1442
Edit:
> Instead of guessing if the DB is performing badly by looking at Rails
That isn't really the topic though. The question is more about how much Ruby's GVL is actually released, and its implications.
ksec 3 days ago

If your Application is 50/50 in CPU / IO at a 100ms request. Getting 30% Reduction in total response time would mean the request is now only 70ms, DB stays the same at 50ms, your CPU is now only 20ms. That is 2.5x speed up compare to the 50ms previously. Exactly inline with the 2-3x IO-less benchmarks.

alkonaut 4 days ago

The article gives an example of how apps are usually benchmarked with log queries. But in a real world scenario wouldn't you just analyze what the app is actually doing and see whether what fraction it actually waits for IO, waits for sync primitives, or something else? Why do logging for this? It's the performance equivalent of printf debugging.

byroot 4 days ago

I use simple logging code to show how the data is collected.
In a real production setup, that timing would be reported into Datadog / NewRelic or whatever APM service, but collected in a similar way.
- alkonaut 4 days ago
  
  Sure but if you look at that logging data which you might be collecting regardless, adn you see anything that in the slightest way suggests a performance issue, or if you look at your hosting bill and wonder where the money is going, or if you need fast CPUs or fast disks etc, just having good recorded perf data would be one of the first things you'd do?
  - byroot 4 days ago
    
    I guess I really don't understand what you are trying to say.
    
    alkonaut 4 days ago
    
    I'm saying you'd just record perf stats with a normal profiler and never do any sort of decision or analysis based on what you read in a log?

rf15 4 days ago

Usually it's the inefficiencies of architectural decisions (your algos, layouts, etc.) you put into that rails app.

whstl 4 days ago

IME the Rails ecosystem and community pushes teams into architectural decisions that don't scale quite well.
In small apps it's easy to keep track of the number of queries going to the DB per http request and their cost, but with bigger apps what I often find is ActiveRecord, services, policies, all calling the database way too much and too often.
Monitoring which individual queries are slow is easy, but it doesn't help if operations are overcomplicated. Even if you chase all obvious N+1s, you might still have "sets of queries that could have been joins" but are in different classes, so it's difficult to notice.
ndriscoll 4 days ago

Right. Simple test: check the throughput your database can do with bulk queries. Do a single query select of an xml or json struct that outputs the page rails would return (so one rendered page per row). Select enough rows for the query to take 1-10s to get a sense for rendered pages/second. Do the same for inserts/updates. If you're not close to what postgres alone can do with bulk queries, you're obviously not IO bound.
Other sanity check: nvme drives are ~10000x faster than hard disks were. Are you doing 10000x more useful disk IO than people were doing with hard disks?
- anarazel 4 days ago
  
  For typical rails like applications that are "bottlenecked on the database", latency is a much bigger factor than actual query processing performance. Even if careful about placement of the database and "application servers".
  Just as an example, here's the results of pgbench -S of a single client, with pgbench running on different servers (a single pkey lookup):
  pgbench on database server: 36233 QPS
  pgbench on a different system (local 10gbit network): 2860 QPS
  It's rare to actually have that low latency to the database in the real world IME.
  If the same pkey lookup is executed utilizing pipelining, I get ~145k QPS from both locally and remotely.
  - ndriscoll 4 days ago
    
    That's still an architectural thing. If CPU isn't an issue, why would you be running your application server on a different machine from your database? Connecting to a unix domain socket will give better latency and throughput and security as long as you can scale vertically.
    Even if you have it on another machine, it's on the same switch, right (also an architectural choice)? So latency should be sub millisecond, and irrelevant for a single request. So we assume we're talking about many requests, but then why not combine/batch your queries (also an architectural choice)? Now latency (and other things like database latches and fsyncs. Very important.) is amortized and again irrelevant. You should be able to hit close to the pure pg bulk query case (and e.g. a rust application server can do this in practice for me with a 4 core machine and an old SATA SSD with ~50k read IOPS).
    Point is, the optimal case for building a web page is probably to do pg queries that return the xml of the page (letting you do subselects to combine everything into one query). So see how fast that is, and if your application can't match that, the database is not your bottleneck.
    
    anarazel 4 days ago
    
    > That's still an architectural thing. If CPU isn't an issue, why would you be running your application server on a different machine from your database?
    Because that'll be too much of a scalability limitation. Rails etc are rather CPU heavy. In cloud environments it's also typically much more feasible to scale the stateless parts up and down than the database.
    > Even if you have it on another machine, it's on the same switch, right (also an architectural choice)?
    Yep, was on the same switch in my example.
    > So latency should be sub millisecond, and irrelevant for a single request.
    In my testcase it was well below a millisecond (2.8k QPS on a single non-pipelined connection would not be possible, it implies a RTT <= 0.35ms), but I don't at all agree that that makes it irrelevant for a single request.
scosman 4 days ago

Sometimes. But if you have any real processing to do, it can be hard/impossible to do it quickly in ruby. You end up writing c extensions, or another app server in another language for the hard stuff.

FridgeSeal 4 days ago

I’ve seen “oh it’s IO-bound, there’s no point optimising for performance” rolled out so many times, I’m convinced it’s basically lost any intended meaning and as simply a cover-all defence, that many devs don’t appreciate how quick current databases and IO actually is.

anarazel 4 days ago

The absurd thing about it is that IO bound code IME often is way easier to optimize than CPU bound code. Adding batching or pipelinining is often almost mechanical work and can give huge speedups.

juwfass 4 days ago

Optimizing rails is zero sum (it's ruby after all). Do yourself a favor and use pod affinity to schedule rails right next to database while your coworker rewrites it in Elixir/Zig stack.

citizenpaul 4 days ago

Your comment is confusing. Are you reccomending pod scheduling as a band aid while you rewrite the app in something else? Or making fun of trying to rewrite?
- juwfass 4 days ago
  
  Forgot to add /s. but yes making fun of both. This is my team, all ex-FAANG.