krapht 2 days ago

An excellent book for fundamentals. Still haven't found a good textbook that covers the next level, that takes you from a student to competent practitioner. Advanced knowledge that I've picked up in this field has been from coworkers, painfully gained experience, and reading Kaggle writeups.

  • bonoboTP 2 days ago

    It gets specialized after that. You need to be more specific about the area you are interested in. Computer vision is a very broad field. For newer topics, there are often no textbooks yet because it takes time to write books and the methods and practices change quite fast, so it takes time to stand the test of time. Your best bet is arXiv and GitHub to learn the latest things.

    Object detection / segmentation, human pose (2D/3D), 3D human motion tracking and modeling, multi-object tracking, re-identification and metric learning, action recognition, OCR, handwriting, face and biometrics, open-vocabulary recognition, 3D geometry and vision-language-action models, autonomous driving, epipolar geometry, triangulation, SLAM, PnP, bundle adjustment, structure-from-motion, 3D reconstruction (meshes, NeRFs, Gaussian splatting, point clouds), depth/normal/optical flow estimation, 3D scene flow, recovering material properties, inverse rendering, differentiable rendering, camera calibration, sensor fusion, IMUs, LiDAR, birds eye view perception. Generative modeling, text-to-image diffusion, video generation and editing, question answering, un- and self-supervised representation learning (contrastive, masked modeling), semi/weak supervision, few-shot and meta-learning, domain adaptation, continual learning, active learning, synthetic data, test-time augmentation strategies, low-level image processing and computational photography, event cameras, denoising, deblurring, super-resolution, frame-interpolation, dehazing, HDR, color calibration, medical imaging, remote sensing, industrial inspection, edge deployment, quantization, distillation, pruning, architecture search, auto-ML, distributed training, inference systems, evaluation/benchmarking, metric design, explainability etc.

    You can't put all that into a single generic textbook.

    • greenavocado 2 days ago

      Plus photogrammetric scale recovery, rolling-shutter & generic-camera (fisheye, catadioptric) geometry, vanishing-point and Manhattan-world estimation, non-rigid / template-based SfM, reflectance/illumination modelling (photometric stereo, BRDF/BTDF, inverse rendering beyond NeRF), polarisation, hyperspectral, fluorescence, X-ray/CT/microscopy, active structured-light, ToF waveform decoding, coded-aperture lensless imaging, shape-from-defocus, transparency & glass segmentation, layout/affordance/physics prediction, crowd & group activity, hand/eye/gaze performance capture, sign-language, document structure & vectorisation charts, font/writer identification, 2-D/3-D primitive fitting, robust RANSAC variants, photometric corrections (rolling-shutter rectification, radial distortion, HDR glare, hot-pixel mapping), adversarial/corruption robustness, fairness auditing, on-device streaming perception and learned codecs, formal verification for safety-critical vision, plus reproducibility protocols and statistical methods for benchmarks

      • thenobsta 2 days ago

        It's astounding how much there is to this field.

krick 2 days ago

Genuinely curious: is it even still relevant today? I've got the impression that there were a lot of these elaborate techniques and algorithms before around 2016, some of which I even learned, which subsequently were basically just replaced by some single NN-model trained somewhere in Facebook, which you maybe need to fine-tune to your specific task. So it's all got boring, and learning them today is akin to learning abacus or finding antiderivatives by hand at best.

  • vincenthwt a day ago

    That’s a great question. While NNs are revolutionary, they’re just one tool. In industrial Machine Vision, tasks like measurement, counting, code reading, and pattern matching often don’t need NNs.

    In fact, illumination and hardware setup are often more important than complex algorithms. Classical techniques remain highly relevant, especially when speed and accuracy are critical.

    • nomel a day ago

      And, usually you need determinism, within tight bounds. The only way to get that with a NN is to have a more classical algorithm to verify the NN's solution, using boring things like least squares fits and statistics around residuals. Once you have that in place, you can then skip the NN entirely, and you're done. That's my experience.

  • nerdsniper a day ago

    If your problem is well-suited for “computer vision” without neural nets, these methods are a godsend. Some of them can even be implemented with ultra low latency on RTOS MCU’s, great for real-time control of physical actuators.

  • EarlKing 2 days ago

    Those NN-models are monstrosities that eat cycles (and watts). If your task fits neatly into one of the algorithms presented (such as may be the case in industrial design automation settings) then yes, you are most definitely better off using them instead of a neural net-based solution.

  • monkeyelite a day ago

    Yes. It’s not the way you detect cats in a photo. But detecting patterns in images is a very common problem.

dimatura 2 days ago

This is a great book - learned a lot from the first edition back in the day, and got the second edition as soon as it came out. It's always fun to just leaf through a random chapter.

aanet 2 days ago

Seen this post on HN so many times..

Would love to see / hear if there are any undergrad/grad-level courses that follow this book (or others) that cover computer vision - from basic-to-advanced.

Thanks!

brcmthrowaway 2 days ago

Any updates using AI? One shot camera calibration?

lacoolj 2 days ago

This is great, but why is it posted here like it's new? This is from 2022

  • pthreads 2 days ago

    It is a good thing that links to useful resources like these are reposted every now and then. For many, like myself, this could be the first time seeing it. Perhaps a date tag would add some clarity for those who have already see it.