Ask HN: How to learn AI from first principles?

118 points by HardikVala 4 days ago

A variant of this question seems to get asked every 6 mo. but so far, I haven't seen this question tackled directly: If I want to learn the concepts and fundamentals of AI from first principles, what educational resources should I use?

I'm not interested in hands-on guides (eg. how to train a DNN classifier in TensorFlow) or LLM-centric resources.

So far, I've put together the following curriculum:

1 Artificial Intelligence: A Modern Approach (https://aima.cs.berkeley.edu/) - Great for learning the breadth of foundational concepts, eg. local search algorithms, building up to modern AI.

2 Probabilistic Machine Learning: An Introduction (https://probml.github.io/pml-book/book1.html) - Going more in-depth into ML.

3 Dive into Deep Learning (https://d2l.ai/) - Going deep into DL, including contemporary ideas like Transformers and Diffusion models.

4. Neural networks and Deep Learning (http://neuralnetworksanddeeplearning.com/) could also be a great resource but the content probably overlaps significantly with 3.

Would anybody add/update/remove anything? (Don't have to limit recommendations to textbooks. Also open to courses, papers, etc.)

Sorry for the semi-redundant post.

noduerme 3 days ago

The following is not a take that will get you a job or teach you precisely how LLMs work, because you can look that up yourself. However, it may inspire you and you may create something that has a better-than-lottery-ticket chance of being an improvement over the AI status quo:

Without reading about how it's done now, just think about how you think a neural network should function. It ostensibly has input, output, and something in the middle. Maybe its input is a 64x64 pixel handwritten character, and its output is a unicode number. In between the input pixels (a 64x64 array) and the output, are a bunch of neurons. Layers of neurons. That talk to each other and learn or un-learn (are rewarded or punished).

Build that. Build a cube where one side is a pixel grid and the other side delivers a number. Decide how the neurons influence each other and how they train their weights to deliver the result at the other end. However you think it should go. Just raw code it with arrays in whatever dimensions you want and make it work; you can do it in Javascript or BASIC. link them however you want. Don't worry about performance, because you can assume that whatever marginally works can be tested on a massive scale and show "impressive" results.

  • HardikVala 3 days ago

    Interesting idea. I like it.

InkCanon 4 days ago

The question depends what you mean by first principles. Usage of the phrase "first principles" has sprawled into many different things since (I think) Musk first mentioned it as a way to learn. The original, philosophical meaning of first principles meant a fundamental truth which could be used to derive others. Much of the philosophising of thinkers like Aristotle or Descartes was to uncover these truths (eg I think, therefore I am). In physics and other sciences, it means calculations using established laws, rather than approximations or assumptions. Then it got borrowed into certain circles of the tech crowd with the vague meaning of thinking about what's important or true and ignoring the rest. Then it trickled down into the learning/self help world as a hack of some sort to learn. If we take the original meaning of first principles, there aren't a great deal of absolute truths in machine learning. It is a very empirical, approximated and engineering oriented endeavor. Most of the research involves thinking of a new approach, building it and trying it on new datasets.

The other big question is why you want to learn it. If you want to learn ML in itself, than anything including the search algorithms (which used to be considered core to ML a long time ago) you mentioned is part of that. But if you want to learn ML to contribute to modern developments like LLMs, then search algorithms are virtually useless. If you aren't going to be engineering any ML or ML products, what you want is to gain some insight into it's future and the business of it. So learning things like transformer architecture is going to be far more unhelpful than say, reading about the economics of compute clusters.

Given the empirical/engineering quality of current ML, I'd say building it from scratch is really good for getting the handful of possible first principles (the fundamental functions involved, data cleaning, training, etc)

  • kingkongjaffa 3 days ago

    > Usage of the phrase "first principles" has sprawled into many different things since (I think) Musk first mentioned it as a way to learn

    In pop culture in 2010+ sure, but he was essentially parroting Feynman IIRC.

    "How to learn AI from first principles?"

    Start with https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_t... and eventually you'll get to AI, exercise left to the reader ;)

  • HardikVala 3 days ago

    Ya, the phrase "first principles" is vague...I meant starting from an axiomatic and actionable definition of AI and learning from there. The first chapter of AIMA does a swell job of enumerating different definitions of and then explicitly declaring which one is used and the foundational premises for the concepts and methods to follow. And it doesn't define AI then jump to neural networks, it gradually layers more atomic concepts, like agents (which I know, have been bastardized) and environments, until it gets to machine learning.

    > The other big question is why you want to learn it.

    Good question. I'm just looking for a wider context to understand contemporary AI. I don't know if this serves any practical purpose but I'm someone who likes to understand the "why" behind everything and starting from "first principles" helps uncover that.

    • B-Con 3 days ago

      By "first principles" do you mean something long "learn from the ground up" or " from basic building blocks"?

      I like learning things starting from small, atomic this, then building up and learning higher layers of abstraction and functionality later. I tend to find hands on totals too "top down" in the sense that they start with all the told in place and then give you a cursory look into what's actually happening.

      Personally I feel like most things in the world aren't really that complicated when you understand the building blocks. There are a few core ideas and then a bunch of layers on top to organize and utilize those ideas for different applications. So if I have an interest in something I want to learn from the ground up.

grepLeigh 2 days ago

As a learning exercise, I enjoyed Neural Networks From Scratch: https://nnfs.io/

There's also a world of statistics and machine learning outside of deep learning. I think the best way to get started on that end is an undergrad survey course like CS189: https://people.eecs.berkeley.edu/~jrs/189/

  • HardikVala 2 days ago

    Was not aware of these resources. Thanks for sharing!

CamperBob2 3 days ago

Watch Karpathy's 'Zero to Hero' videos on YouTube.

If you want a historical perspective, which is very worthwhile, start by reading about the mid-century work of McCullough and Pitts, and Minsky, Papert and their colleagues at MIT CSAIL after that.

There will be a dry spell after Minsky and Papert because of their conclusion that the OG neural-network topology that everyone was familiar with, the so-called "perceptron", was a dead end. That conclusion was premature to say the least, but in any event the hardware and training techniques weren't available to support any serious progress.

Adding hidden layers and nonlinear activation functions to the perceptron network seemed promising, in that they worked around some of Minsky's technical objections. The multi-layer perceptron was now a "universal approximator" capable of modeling any linear or nonlinear function. In retrospect that should have been considered a bigger deal than it was, but the MLP was still a PIA to train, and it didn't seem very useful at the scales achievable in hardware at the time. Anything a neural net could do, specialized code could usually do better and cheaper.

Then, in the circa-2010 timeframe, AlexNet dusted off some of the older ideas and used them to win image-recognition benchmark competitions, not by a small margin but by blowing everybody else into the weeds. That brought the multi-layer perceptron back into vogue, and almost everything that has happened since can be traced back to that work.

The Karpathy videos are the best intro to the MLP concept I've run across. Understanding the MLP is the key prereq if you want to understand current-gen AI from first principles.

jmholla 2 days ago

It isn't first principles, but I would recommend 3blue1brown's ongoing series about neural networks [0]. I think there's a benefit to seeing the high level overview helps understand the purpose of the pieces as your learning them; it can help with motivation. Or watching overviews like this after the fact it can help bridge connections theory may not elucidate.

[0]: https://www.3blue1brown.com/topics/neural-networks

  • HardikVala 2 days ago

    3blue1brown's content on NN's is awesome -- The explanations are super intuitive. But I'm also looking to understand, as you say, the big picture and understand where NN's fit.

andyjohnson0 2 days ago

Back in 2018 I did Andrew Ng's course in Machine Learning on Coursera. It was pretty much "from first principles" in that you learned a bit of linear algebra and then you implemented algorithms in Octave, working up to MNIST etc. I felt like I came out of it with a good understanding of the basics, and that ML is maths not magic.

Looks like the course has turned into a multi-course "specialization" and I have no idea if any of it is the aame as the course I did. But it might be a place to start.

  • jonnycoder 2 days ago

    I've taken part 1 of 3 in Andrew Ng's machine learning specialization which covers the math for supervised learning, linear regression, etc. As I started part 2 (neural networks), it built off the math from part 1 such as the sigmoid activation function. This is what I think of when the OP refers to learning ML from first principles. I highly recommend Andrew Ng's course and I feel like I need to take it again to really understand those basic building blocks.

riwsky 3 days ago

1. Brush up on linear algebra (matrix and vector arithmetic, etc) 2. Draw the rest of the fucking owl

  • uncomplexity_ 3 days ago

    can confirm, worked for me.

    i am now applying to current yc batch.

    • dartos 2 days ago

      Hell, all you need to do for YC is fork continue.dev and name your company after a fruit.

Maro 2 days ago

Hi, it depends on what you mean by "first principles".

If you don't have a solid background in math, then that's what you should improve upon (calculus, linear algebra, discrete math, probability theory, information theory). Some of the books you mention do cover this at the beginning, but most people take separate courses on these topics at University, with lots of homework, etc.

Also, the first book on your list is the classic textbook by Norvig, but I don't think it's actually very good. I remember reading it in my college AI course 25 years ago and it was painful back then (anybody remember "wumpus"?). It's a big book that covers too much, it's like printing out a lot of Wikipedia pages. You're better off finding books with smaller scope that focus on something you actually care about / is relevant to the way the field has developed.

  • HardikVala 2 days ago

    AIMA is wide-ranging, a lot of which is not "must-have", but "nice-to-have" knowledge. But I do like its breadth-over-depth approach to get a full scope of the AI landscape.

ipnon 3 days ago

https://a16z.com/ai-canon/

I prefer the a16z AI canon for this purpose. It’s useful and historical. It’s structured to begin with no prerequisites and work up to cutting edge research papers. And best of all it’s free and open source.

talles 2 days ago

For deep learning:

1. Linear algebra. Be comfortable with vector transformations in the vector space. This is the framework to understand how data is represented and what is going on inside the model.

2. Calculus. Specifically derivatives, up to partial derivatives and the chain rule. This is needed to later understand backpropagation, the learning. It's fine to skip integrals.

3. Vanilla neural network. Study how a simple feed forward and fully connected neural network works, in detail. Every single bit about it.

I wouldn't worry or plan anything ahead until having those. After number 3 you'll have different branches to follow and will be better equipped to pick a path.

cdicelico 2 days ago

A loved AI: A Modern Approach—still the best all around textbook on AI imho. I'd only add Susanna Epp's Discrete Mathematics with Applications, and I'd just focus on methodically and thoroughly working through those two, personally. First, Discrete Mathematics, then AI: A Modern Approach. This is exactly what I did and it was a great experience, super helpful.

  • HardikVala 2 days ago

    Will check out Discrete Mathematics, thanks!

Bjartr 2 days ago

I've been a fan of The Little Learner for the first principles side of things. It builds up the theory from almost nothing, step by step. It's got a conversational style that may turn some off, but I quite enjoyed it.

https://www.thelittlelearner.com/

3abiton 3 days ago

The question is rather a bit flawed without knowing your background, as to determine what to suggest. That being said, I would argue a lot of the novelty and recent advances in AI/ML are not yet documented in books, but rather in dense scientific papers.

wodenokoto 3 days ago

In the coursera course accompanying the statlearning book he starts with introducing a nearest neighbour algorithm and from that he develops a linear regression.

I think that is "first principals of AI". Like, what does it even mean when we ask an algorithm to "learn" from data?

Mr-Frog 4 days ago

I really enjoyed the concepts in "Artificial Intelligence, a Modern Approach" which really grounds a first-principles foundation of automated reasoning. Warning: The first 80% of the book doesn't have any sexy new deep learning approaches, but I still think it is very valuable to see the history.

  • HardikVala 4 days ago

    +1, I'm a few chapters in and its highly instructive. Gives me a deeper appreciation for the modern deep learning regime. Also, as we enter the agent supercycle, I think many of the basic algorithms for search, planning, etc. will make comeback a in a huge way.

animesh 2 days ago

Does anyone know how to purchase the PDF version of the book: "Artificial Intelligence, a Modern Approach"?

  • HardikVala 2 days ago

    No, but there is a PDF version, ahem, floating around on Reddit.

rcarr 2 days ago

Heard nothing but good things about Andrej Karpathy's series on Youtube

  • coolThingsFirst 2 days ago

    Yeah but once micrograd is built, he doesn't expand on that and jumps to bigrams etc.

charlieyu1 3 days ago

I want to study game AI but I don’t think I have found much material yet. I guess it is too niche now

crimsoneer 2 days ago

Going to plug the thoroughly excellent fast.ai courses here

markus_zhang 2 days ago

IMO learning AI from first principles means a Math/CS Master from a reputable CS university, preferably top 20.

swah a day ago

Ask them?

gamblor956 2 days ago

Step 1, abandon the concept of "first principles." First principles don't exist for most areas of study outside of pure mathematics, and for newer fields like AI they haven't been established yet so you'd just be hamstringing yourself.

Step 2: The steps above are a good plan for learning about traditional AI, and the traditional approaches, which were based on an attempt to model human thought processes. Machine learning was what the industry turned to in the early 2000s because we didn't have the hardware capabilities then to meaningfully model neural networks. We do now, but machine learning has taken over so there's very little research into modeling neural networks...about the same as there was when I was an undergrad.