An excellent book for fundamentals. Still haven't found a good textbook that covers the next level, that takes you from a student to competent practitioner. Advanced knowledge that I've picked up in this field has been from coworkers, painfully gained experience, and reading Kaggle writeups.
It gets specialized after that. You need to be more specific about the area you are interested in. Computer vision is a very broad field. For newer topics, there are often no textbooks yet because it takes time to write books and the methods and practices change quite fast, so it takes time to stand the test of time. Your best bet is arXiv and GitHub to learn the latest things.
Object detection / segmentation, human pose (2D/3D), 3D human motion tracking and modeling, multi-object tracking, re-identification and metric learning, action recognition, OCR, handwriting, face and biometrics, open-vocabulary recognition, 3D geometry and vision-language-action models, autonomous driving, epipolar geometry, triangulation, SLAM, PnP, bundle adjustment, structure-from-motion, 3D reconstruction (meshes, NeRFs, Gaussian splatting, point clouds), depth/normal/optical flow estimation, 3D scene flow, recovering material properties, inverse rendering, differentiable rendering, camera calibration, sensor fusion, IMUs, LiDAR, birds eye view perception. Generative modeling, text-to-image diffusion, video generation and editing, question answering, un- and self-supervised representation learning (contrastive, masked modeling), semi/weak supervision, few-shot and meta-learning, domain adaptation, continual learning, active learning, synthetic data, test-time augmentation strategies, low-level image processing and computational photography, event cameras, denoising, deblurring, super-resolution, frame-interpolation, dehazing, HDR, color calibration, medical imaging, remote sensing, industrial inspection, edge deployment, quantization, distillation, pruning, architecture search, auto-ML, distributed training, inference systems, evaluation/benchmarking, metric design, explainability etc.
You can't put all that into a single generic textbook.
Genuinely curious: is it even still relevant today? I've got the impression that there were a lot of these elaborate techniques and algorithms before around 2016, some of which I even learned, which subsequently were basically just replaced by some single NN-model trained somewhere in Facebook, which you maybe need to fine-tune to your specific task. So it's all got boring, and learning them today is akin to learning abacus or finding antiderivatives by hand at best.
That’s a great question. While NNs are revolutionary, they’re just one tool. In industrial Machine Vision, tasks like measurement, counting, code reading, and pattern matching often don’t need NNs.
In fact, illumination and hardware setup are often more important than complex algorithms. Classical techniques remain highly relevant, especially when speed and accuracy are critical.
And, usually you need determinism, within tight bounds. The only way to get that with a NN is to have a more classical algorithm to verify the NN's solution, using boring things like least squares fits and statistics around residuals. Once you have that in place, you can then skip the NN entirely, and you're done. That's my experience.
If your problem is well-suited for “computer vision” without neural nets, these methods are a godsend. Some of them can even be implemented with ultra low latency on RTOS MCU’s, great for real-time control of physical actuators.
Those NN-models are monstrosities that eat cycles (and watts). If your task fits neatly into one of the algorithms presented (such as may be the case in industrial design automation settings) then yes, you are most definitely better off using them instead of a neural net-based solution.
This is a great book - learned a lot from the first edition back in the day, and got the second edition as soon as it came out. It's always fun to just leaf through a random chapter.
Would love to see / hear if there are any undergrad/grad-level courses that follow this book (or others) that cover computer vision - from basic-to-advanced.
It is a good thing that links to useful resources like these are reposted every now and then. For many, like myself, this could be the first time seeing it. Perhaps a date tag would add some clarity for those who have already see it.
An excellent book for fundamentals. Still haven't found a good textbook that covers the next level, that takes you from a student to competent practitioner. Advanced knowledge that I've picked up in this field has been from coworkers, painfully gained experience, and reading Kaggle writeups.
It gets specialized after that. You need to be more specific about the area you are interested in. Computer vision is a very broad field. For newer topics, there are often no textbooks yet because it takes time to write books and the methods and practices change quite fast, so it takes time to stand the test of time. Your best bet is arXiv and GitHub to learn the latest things.
Object detection / segmentation, human pose (2D/3D), 3D human motion tracking and modeling, multi-object tracking, re-identification and metric learning, action recognition, OCR, handwriting, face and biometrics, open-vocabulary recognition, 3D geometry and vision-language-action models, autonomous driving, epipolar geometry, triangulation, SLAM, PnP, bundle adjustment, structure-from-motion, 3D reconstruction (meshes, NeRFs, Gaussian splatting, point clouds), depth/normal/optical flow estimation, 3D scene flow, recovering material properties, inverse rendering, differentiable rendering, camera calibration, sensor fusion, IMUs, LiDAR, birds eye view perception. Generative modeling, text-to-image diffusion, video generation and editing, question answering, un- and self-supervised representation learning (contrastive, masked modeling), semi/weak supervision, few-shot and meta-learning, domain adaptation, continual learning, active learning, synthetic data, test-time augmentation strategies, low-level image processing and computational photography, event cameras, denoising, deblurring, super-resolution, frame-interpolation, dehazing, HDR, color calibration, medical imaging, remote sensing, industrial inspection, edge deployment, quantization, distillation, pruning, architecture search, auto-ML, distributed training, inference systems, evaluation/benchmarking, metric design, explainability etc.
You can't put all that into a single generic textbook.
Plus photogrammetric scale recovery, rolling-shutter & generic-camera (fisheye, catadioptric) geometry, vanishing-point and Manhattan-world estimation, non-rigid / template-based SfM, reflectance/illumination modelling (photometric stereo, BRDF/BTDF, inverse rendering beyond NeRF), polarisation, hyperspectral, fluorescence, X-ray/CT/microscopy, active structured-light, ToF waveform decoding, coded-aperture lensless imaging, shape-from-defocus, transparency & glass segmentation, layout/affordance/physics prediction, crowd & group activity, hand/eye/gaze performance capture, sign-language, document structure & vectorisation charts, font/writer identification, 2-D/3-D primitive fitting, robust RANSAC variants, photometric corrections (rolling-shutter rectification, radial distortion, HDR glare, hot-pixel mapping), adversarial/corruption robustness, fairness auditing, on-device streaming perception and learned codecs, formal verification for safety-critical vision, plus reproducibility protocols and statistical methods for benchmarks
It's astounding how much there is to this field.
Genuinely curious: is it even still relevant today? I've got the impression that there were a lot of these elaborate techniques and algorithms before around 2016, some of which I even learned, which subsequently were basically just replaced by some single NN-model trained somewhere in Facebook, which you maybe need to fine-tune to your specific task. So it's all got boring, and learning them today is akin to learning abacus or finding antiderivatives by hand at best.
That’s a great question. While NNs are revolutionary, they’re just one tool. In industrial Machine Vision, tasks like measurement, counting, code reading, and pattern matching often don’t need NNs.
In fact, illumination and hardware setup are often more important than complex algorithms. Classical techniques remain highly relevant, especially when speed and accuracy are critical.
And, usually you need determinism, within tight bounds. The only way to get that with a NN is to have a more classical algorithm to verify the NN's solution, using boring things like least squares fits and statistics around residuals. Once you have that in place, you can then skip the NN entirely, and you're done. That's my experience.
If your problem is well-suited for “computer vision” without neural nets, these methods are a godsend. Some of them can even be implemented with ultra low latency on RTOS MCU’s, great for real-time control of physical actuators.
Those NN-models are monstrosities that eat cycles (and watts). If your task fits neatly into one of the algorithms presented (such as may be the case in industrial design automation settings) then yes, you are most definitely better off using them instead of a neural net-based solution.
Yes. It’s not the way you detect cats in a photo. But detecting patterns in images is a very common problem.
This is a great book - learned a lot from the first edition back in the day, and got the second edition as soon as it came out. It's always fun to just leaf through a random chapter.
Seen this post on HN so many times..
Would love to see / hear if there are any undergrad/grad-level courses that follow this book (or others) that cover computer vision - from basic-to-advanced.
Thanks!
https://execonline.cs.cmu.edu/computer-vision?utm_source=np_...
Perhaps this
It's right there on the linked website under "Slide sets and lectures".
Thanks
I must be blind
This is the right area for you to be in at least.
Touché :-)
Any updates using AI? One shot camera calibration?
This is great, but why is it posted here like it's new? This is from 2022
There's even a HN post from almost exactly 5 years ago:
Computer Vision: Algorithms and Applications, 2nd ed (szeliski.org)
0 comments
https://news.ycombinator.com/item?id=24945823
But anyway; why not? Yes, add (2020) to the title, by all means.
It is a good thing that links to useful resources like these are reposted every now and then. For many, like myself, this could be the first time seeing it. Perhaps a date tag would add some clarity for those who have already see it.