The maths you need to start understanding LLMs

190 points by gpjt 4 days ago

jokoon 2 minutes ago

ML is interesting, but honestly I have trouble knowing the future of it, to see if I should learn the techniques to land a job or not be too obsolete.

There is certainly some hype, a lot of what is the market is just not viable.

armcat 38 minutes ago

One of the most interesting mathematical aspects to me are the fact that LLMs are logit emitters. And associated with this output is uncertainty. Lot of ppl talk about networks of agents. But what you are doing is accumulating uncertainty - every model in the chain introduces its own uncertainty on top of what it inherits. In some situations I've seen a complete collapse after 3 LLM calls chained together. Hence why lot of people recommend "human in the loop" as much as possible to try and reduce that uncertainty (shift the posterior if you will); or they recommend more of a workflow approach - where you have a single orchestrator that decides which function to call, and most of the emphasis (and context engineering) is placed on that orchestrator. But it all ties together in the maths of LLMs.

cpldcpu 3 minutes ago

The output uncertainties / errors of individual tokens actually do not accumulate, since the models can correct for them in later steps. This is an often cited misconception about autoregressive models.
The error tolerance of more complex llm based systems depends on the archicture. Implementing this is an engineering job, not a fundamentel weakness of the approach. You can use an orchestrator to perform verification, use data for grounding, redundancy etc. Any system can have faults and errors.

zahlman 9 minutes ago

It appears that the "softmax" is found (as I hypothesized by looking at the results, before clicking the link) by exponentiating each value and normalizing to a sum of 1. It would be worthwhile to be explicit. The exponential function is also "high-school maths", and an explanation like that is much easier to follow than the Wikipedia article (since not a lot of rigour is required here).

rsanek 2 hours ago

Anyone else read the book that the author mentions, Build a Large Language Model (from Scratch) [0]? After watching Karpathy's video [1] I've been looking for a good source to do a deeper dive.

[0] https://www.manning.com/books/build-a-large-language-model-f...

[1] https://www.youtube.com/watch?v=7xTGNNLPyMI

ForceBru 3 minutes ago

Yes, it's really good
kamranjon an hour ago

It’s good - I’m working through it right now

ryanchants 20 minutes ago

I'm currently working through Mathematics for Machine Learning and Data Science Specialization from Deeplearning.AI. It's been the best into to Linear Algebra I've found. It's worth the $50 a month just for the quizzes, labs, etc. I'm simultaneously working through the book Math and Architectures of Deep Learning, which is helping re-inforce and flesh out the ideas from the course.

[0] https://www.coursera.org/specializations/mathematics-for-mac... [1] https://www.manning.com/books/math-and-architectures-of-deep...

ozgung 2 hours ago

This is not about _Large_ Language models though. This explains math for word vectors and token embeddings. I see this is the source of confusion for many people. They think LLMs just do this to statistically predict the next word. That was pre-2020s. They ignore the 1.8+ Trillion-parameter Transformer network. Embeddings are just the input of that giant machine. We don't know what is going on exactly in those trillions of parameters.

cranx an hour ago

But we do. A series of mathematical functions are applied to predict the next tokens. It’s not magic although it seems like it is. People are acting like it’s the dark ages and Merlin made a rabbit disappear in a hat.
- ekunazanu 23 minutes ago
  
  Depends on your definition of knowing. Sure, we know it is predicting next tokens, but do we understand why they output the things they do? I am not well versed with LLMs, but I assume even for smaller modles interpretability is a big challenge.
  - lazide 17 minutes ago
    
    For any given set of model weights and inputs? Yes, we definitely do understand them.
    Do we understand the emergent properties of almost-intelligence they appear to present, and what that means about them and us, etc. etc.?
    No.
ants_everywhere 2 hours ago

But surely you need this math to start understanding LLMs. It's just not the math you need to finish understanding them.
- HSO 2 hours ago
  
  "necessary but not sufficient"
  - ants_everywhere an hour ago
    
    yes exactly :)
baxtr 2 hours ago

Wait so you’re saying it’s not a high-dimensional matrix multiplication?
- dmd 2 hours ago
  
  Everything is “just” ones and zeros, but saying that doesn’t help with understanding.

InCom-0 2 hours ago

These are technical details of computations that are performed as part of LLMs.

Completely pointless to anyone who is not writing the lowest level ML libraries (so basically everyone). This does now help anyone understand how LLMs actually work.

This is as if you started explaining how an ICE car works by diving into chemical properties of petrol. Yeah that really is the basis of it all, but no it is not where you start explaining how a car works.

jasode an hour ago

>This is as if you started explaining how an ICE car works by diving into chemical properties of petrol.
But wouldn't explaining the chemistry actually be acceptable if the title was, "The chemistry you need to start understanding Internal Combustion Engines"
That's analogous to what the author did. The title was "The maths ..." -- and then the body of the article fulfills the title by explaining the math relevant to LLMs.
It seems like you wished the author wrote a different article that doesn't match the title.
- InCom-0 40 minutes ago
  
  'The maths you need to start understanding LLMs'.
  You don't need that math to start understanding LLMs. In fact, I'd argue its harmful to start there unless your goal is to 'take me on a epic journey of all the things mankind needed to figure out to make LLMs work from the absolute basics'.
bryanrasmussen an hour ago

>Completely pointless to anyone who is not writing the lowest level ML libraries (so basically everyone). This does now help anyone understand how LLMs actually work.
maybe this is the target group of people who would need particular "maths" to start understanding LLMS.
49pctber an hour ago

Anyone who would like to run an LLM would need to perform their computations on hardware. So picking hardware that is good at matrix multiplication is important for them, even if they didn't develop their LLM from scratch. Knowing the basic math also explains some of the rush to purchase GPUs and TPUs on recent years.
All that is kind of missing the point though. I think people being curious and sharpening their mental models of technology is generally a good thing. If you didn't know an LLM was a bunch of linear algebra, you might have some distorted views of what it can or can't accomplish.
- InCom-0 an hour ago
  
  Being curious is good ... nothing wrong with that. What I took issue with above is (what I see as) attempt to derail people into low level math when that is not the crux of the question at all.
  Also: nobody who wants to run LLMs will write their own matrix multiplications. Nobody doing ML / AI comes close to that stuff ... its all abstracted and not something anyone actually thinks about (except the few people who actually write the underlying libraries ie. at Nvidia).
ivape an hour ago

Also, people need to accept that they’ve been doing regular ass programming for many years and can’t just jump into whatever they want. The idea that developers were well rounded general engineers is a myth mostly propagated from within the bubble.
Most people’s educations right here probably didn’t even involve Linear Algebra (this is a bold claim, because the assumption is that everyone here is highly educated, no cap).

stared 4 hours ago

Well, in short - basic linear algebra, basic probability, analysis (functions like exp), gradient.

At some point I tried to create an introduction step-by-step, where people can interact with these concepts and see how to express it in PyTorch:

https://github.com/stared/thinking-in-tensors-writing-in-pyt...

MichaelRazum 3 hours ago

Although is it really "understanding" or just able to write down the formulas...?
- stared an hour ago
  
  Being able to use a formula is the first, and necessary, step for understanding.
  Then it is able to work at different levels of abstraction and being able to find analogies. But at this point, in my understanding, "understanding" is a never-ending well.
  - MichaelRazum 37 minutes ago
    
    How about elliptic curve cryptography then? I just think coming with a formula is not really understanding. Actually most often the “real” formula is the end step of understanding through derivation. ML does it up side down in this regard
- misternintendo 3 hours ago
  
  In some way it is true. Like understanding how a car works purely on physics laws.

d_sem an hour ago

I think the author did a sufficient job caveating his post without being verbose.

While reading through past posts I stumbled on a multi part "Writing an LLM from scratch" series that was an enjoyable read. I hope they keep up writing more fun content.

paradite 2 hours ago

I recently did a livestream on trying to understand attention mechanism (K, Q, V) in LLM.

I think it went pretty well (was able to understand most of the logic and maths), and I touched on some of these terms.

https://youtube.com/live/vaJ5WRLZ0RE?feature=share

gozzoo 2 hours ago

The constant scrolling is very distracting. I couldn't follow up
- paradite 2 hours ago
  
  Thanks for the feedback!

kingkongjaffa 4 hours ago

The steps in this article are also the same process for doing RAG as well.

You computer an embedding vector for your documents or chunks of documents. And then you compute the vector for your users prompt, and then use the cosine distance to find the most semantically relevant documents to use. There are other tricks like reranking the documents once you find the top N documents relating to the query, but that’s basically it.

Here’s a good explanation

http://wordvec.colorado.edu/website_how_to.html

11101010001100 3 hours ago

Apologies for the metacomment, but HN is a funny place. There is a certain type of learning that is deemed good ('math for AI') and a certain type of learning that is deemed bad ('leetcode for AI').

raincole 2 hours ago

What's leedcode for AI and which site is deemed bad by HN? Without a concrete example it's just a strawman. It could be the site is deemed bad for other reasons. It could be a few vocal negative comments. It could be just not happening.
boppo1 2 hours ago

What would leetcode for AI be?
sgt101 3 hours ago

could you give an example of "HN would not like this AI leetcode"?
enjeyw 3 hours ago

I mean I kind of get it - overgeneralising (and projecting my own feelings), but I think HN favours introducing and discussing foundational concepts over things that are closer to memorising/wrote-learning. I think AI Math vs Leetcode broadly fits into that category.
apwell23 2 hours ago

honestly i would love 'leetcode for AI' . I am just so sick of all the videos and articles about it.

kekebo 2 hours ago

I keep having had the best time with Andrej Karpathy's Youtube intros into LLM math. But I haven't compared scope or quality to this submission

petesergeant an hour ago

You need virtually no maths to deeply and intuitively understand embeddings: https://sgnt.ai/p/embeddings-explainer/

oulipo2 2 hours ago

Additions and multiplications. People are making it sound like it's complicated, but NNs have the most basic and simple maths behind

The only thing is that nobody understand why they work so well. There are a few function approximation theorems that apply, but nobody really knows how to make them behave as we would like

So basically AI research is 5% "maths", 20% data sourcing and engineering, 50% compute power, and 25% trial and error

amelius an hour ago

Gradient descent is like pounding on a black box until it gives you the answers you were looking for. Ihere is little more we know about it. We're basically doing Alchemy 2.0.
The hard technology that makes this all possible is in semiconductor fabrication. Outside of that, math has comparatively little to do with our recent successes.

apwell23 2 hours ago

> Actually coming up with ideas like GPT-based LLMs and doing serious AI research requires serious maths.

Does it ? I don't think so. All the math involved is pretty straightforward.

ants_everywhere 2 hours ago

It depends on how you define the math involved.
Locally it's all just linear algebra with an occasional nonlinear function. That is all straightforward. And by straightforward I mean you'd cover it in an undergrad engineering class -- you don't need to be a math major or anything.
Similarly CPUs are composed of simple logic operations that are each easy to understand. I'm willing to believe that designing a CPU requires more math than understanding the operations. Similarly I'd believe that designing an LLM could require more math. Although in practice I haven't seen any difficult math in LLM research papers yet. It's mostly trial and error and the above linear algebra.
- apwell23 2 hours ago
  
  yea i would love to see what complicated math all this came out of. I thought rigorous math was actually an impediment to AI progress. Did any math actually predict or prove that scaling data would create current AI ?

Mallowram an hour ago

[dead]

pangeranslot 3 hours ago

[dead]