Two Hard Things (2009)

78 points by mooreds 6 months ago

I'm partial to this non-jokey take on two hard things:

Phil Karlton famously said there are only two hard things in Computer Science: cache invalidation and naming things. I gather many folks suppose "naming things" is about whether to use camel-case or not, or picking specific symbols we use to name things, which is obviously trivial and mundane. But I always assumed Karlton meant the problem of making references work: the task of relating intension (names and ideas) and extension (things designated) in a reliable way, which is also the same topic as cache invalidation when that's about when to stop the association once invalid.

https://web.archive.org/web/20130805122711/http://lambda-the...

necovek 6 months ago

While switching to terminology unfamiliar to me (intension vs extension of a word), I never heard anyone misinterpret "naming things being hard" as whether to use camelCase or symbols to use.
It was actually about coming up with short but sufficiently expressive names that greatly improve code legibility, so certainly names that would sufficiently well represent the intent and avoid confusion.
But I am not sure how it relates to "stopping association once invalid"? Is this about renaming things when the names are not suitable anymore? That's only a very special case of "naming is hard", but I do believe naming is hard even when starting from scratch (having seen many a junior engineer come up with what they thought were descriptive names, but which mostly explained how their understanding developed over time and not the synthesised understanding once they had it fully).
xarope 6 months ago
ah, ye olde two hard things, namely:
```
  1) cache invalidation
  2) naming things
  0) off-by-one
```
- saltyoutburst 6 months ago
  
  Credit where credit is due. https://x.com/secretGeek/status/7269997868
  - xarope 6 months ago
    
    with apologies to Leon, I think I first saw it from Martin (he gives full credit to the sources), so I'll post to link here for completeness:
    https://martinfowler.com/bliki/TwoHardThings.html
    
    dahart 6 months ago
    
    Note that link credits Leon. (I’d swear I heard this variation much earlier, but maybe I’m totally wrong.)
    
    JadeNB 6 months ago
    
    That's the title link, too.
- plasticchris 6 months ago
  
  I like telling it as 3 hard things: caches and naming things.
gnkyfrg 6 months ago

In the interest of DRY, naming things is hard because when you want to reuse code in a method or library, it should be easy and intuitive to find what you need.
Most, since, many devs name by what it does, rather than how it might be found.
For example naming a function calculateHaversine won't help someone looking for a function that calculates the distance between 2 latlongs unless they know the haversine does that.
Or they default to shortest name. Atan, asin, Pow for example.
- bruce343434 6 months ago
  
  At some point you just have to browse the library, and learn conventional names for algorithms.
  If you want to synthesize this type of knowledge on the fly because you don't like learning other people's conventions, just feed the docs to chatgpt and ask if there's a function that solves your problem.
  This is why a formal education is so important, and why books like "gang of Four" are some sort of standard. They've given a name to some common patterns, allowing a more efficient form of communication and higher level of thinking. Are the patterns actually good? Are the names actually good? That is besides the point.
  - gnkyfrg 5 months ago
    
    [dead]
gsf_emergency 6 months ago
As counter-inspired by a fellow HNer, the encoding of this problem (cribbed from Dirac/von Neumann, so any inanity is all mine!) ought to be:
```
  f[<Intension|Extension>] == 0
```
- bmacho 6 months ago
  
  What are <A|B> and f[C]?
  - gsf_emergency 6 months ago
    
    The solution to the problem would reveal their identities :)
    Just in case: https://en.wikipedia.org/wiki/Bra%E2%80%93ket_notation#Hermi...
    More seriously.. https://en.wikipedia.org/wiki/Binding_(linguistics)
    And its derivatives in CS
    Etc
karmakaze 6 months ago

I've come to understand 'naning things' to be glossing over many actual difficult things: knowing the concepts you're working with, explicitly, choosing where and when to use abstractions, knowing CS foundational terminology, and maybe the most common not following single responsibility principle. There's many others I'm sure. Failure to name a thing indicates you may not actually know what you're doing or why.
TL:DR "naming things" itself was the joke all along.
- dowager_dan99 6 months ago
  
  I always interpreted the hard part was not looking at something and naming it accurately, but the impossible ask of predicting future use & context (much of it not in your control) and somehow getting that right.
  - karmakaze 6 months ago
    
    True. We make a best guess and if it turns out to be ineffective, refactor which means to choose a new factor. The other common cause of difficult naming is 'refactoring' without knowing what the new factor is, ie. blind deduplication or along arbitrary seams. My strategy is to delay abstraction if possible until we have an opinion one way or another. Some others like to have clean/abstracted code beyond current understanding.
    Not being able to make a good guess is a lack of understanding of problem domain, nicely rolled up into this catch all term.
GuB-42 5 months ago

About naming things, in the computer science world, there is this often repeated saying, as well as countless books and articles about naming things, often contradicting each others. From coding style guides to URL schemes.
But on the other side, there is administration. I work on projects with names like FRPPX21, in category PX23, same idea for tax forms, and just about everything administrative. Should I write code with variable names like that, I would be yelled at by the unfortunate guy who gets to read it.

lalaithion 6 months ago

There’s two hard problems in computer science: cacsynchronizing shared access to the same resource.he invalidation, and

rqtwteye 6 months ago

Naming things is harder

LeonB 6 months ago

I find “co-ordinating distributed transactions” to be very hard. Getting any kind of optimum cooperation between self-centred agents is tricky.

Also — “Jevon’s paradox”. That one is nasty! For example: just about anything we do to decrease use of fossils fuels by some small percent, makes the corresponding process more efficient, more profitable, and thus happen more. That’s a nasty nasty problem. I guess it’s not specific to computer science, but all engineering.

Suppafly 6 months ago

Always reminds of the JWZ quote regarding regular expressions:

>Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

shaftway 5 months ago

I've always liked the version where there are 3 hard things:

  1.) Caching things
  3.) Race conditions
  2.) Naming things
  4.) Off-by-one errors

tmarice 6 months ago

Everybody knows the two hard things are timezones and unicode.

kreetx 6 months ago

What's hard about cache invalidation?
- necovek 6 months ago
  
  Caching timestamps with timezones and translated timezone names?
cryptonector 6 months ago

Eh, only one hard thing then, because as hard as Unicode is timezones is way harder.

gregw2 5 months ago

The history of this "There are only two hard things in Computer Science: cache invalidation and naming things" quote attributed to Phil Karlton is slightly interesting.

1) According to Tom Bajzek on Phil Karlton's son's blog, the saying goes back to Phil Karlton's time at CMU in the 1970s: https://www.karlton.org/2017/12/naming-things-hard/#comment-...

2) How did Phil recognize this difficulty around cache invalidation before he even entered the workforce (going to Xerox PARC, DEC, SGI, Netscape)?

Answer: as a grad student, he was contributing to discussions around the discussion of the Hydra filesystem being designed at CMU at that time. The following 1978 paper credits discussions with him by name, which is probably a good hint where he learned about the difficulties of cache invalidation: https://dl.acm.org/doi/pdf/10.5555/800099.803221 He started out more interested in the math side of things perhaps, https://dl.acm.org/doi/pdf/10.1145/359970.359989

3) Also mildly coincidental to me is that one of SGI's core technical accomplishments in its waning years (about the time Phil left them for Netscape so he likely was not personally involved; I don't know) was dealing with memory caching in highly scalable single-system-image SMP (symmetric multiprocessing) servers when you go from 16+ CPU SMPs to a memory subsystem needing to support 512-1024 CPUs...

Answer: you have to

A) make the memory non-uniform (non-"symmetric") in it's latency to different CPUs (NUMA: (https://en.wikipedia.org/wiki/Non-uniform_memory_access)), and

B) invent new ways of handling the resulting cache coherency problems to mask the fact that some CPUs have closer access to memory than other CPUs to keep the programming model more like SMP and less like pure separate-memory clustering.

Here's a paper outlining how that was done in 1998: https://courses.cs.washington.edu/courses/cse549/07wi/files/... which in turn was based on Stanford's FLASH multiprocessor work: https://dl.acm.org/doi/pdf/10.1145/191995.192056

This cache-coherent NUMA (ccNUMA) technique went on to be used in AMD Opteron, Itanium, and Xeon SMP systems till this vary day.