I'm amused by how quickly he shifts from "all existing image viewers have silly bugs, no-one knows what they're doing, I want to understand everything" to "here's some code I copy-pasted that seems to work". What a deeply quixotic approach to, uh, everything in this blog post.
after having read the entire thing, i don't find the same amusement. In fact, you concatenated disparate quotes and put them out of order.
1: Image viewer [libraries] have silly bugs
a. no one knows what they're doing?
2. Copy and paste code to get images on the screen
a. not very accurately
b. very slowly
3. I want to understand everything
a. here's sRGB, here's Y'UV
b. here's tone mapping
c. etc
the prose and overarching methodology reminds me of frinklang's data file[0] (https://frinklang.org/frinkdata/units.txt) which is... hilariously and educationally opinionated about ... units and measurements and conversions.
the writing style comports with how i approach problems and may not jive with other methodologies, maybe? If i have some problem i generally will try copying and pasting (or copilot.exe or whatever) just to see if the problem space has any state of art. If that does what i want, great. I'm done. if i need to do what i want 10,000 times, i usually will dig in and speed it up, or add features (event handling, etc)
I also like scathing commentary about annoyances in technology, and this article had that, for sure. the webp aside was biting!
[0] huh, this file has been updated massively since i last viewed it (mid 2010s) - it looks like earth has made some advancements in definitions of primatives - like Ampere - and things might be more exact, now. Hooray!
The scathing, self-righteous arrogance, coupled with the evident genuine insight and talent, is why I have to either laugh or cry when he goes down the route of building yet another image viewer that will very clearly come with its own share of issues, rather than putting in the legwork to actually improve the situation.
It's used by the US National Map, the National Geospatial Intelligence Agency, CAT scanners, and Second Life. JPEG 2000 lets you zoom in and access higher levels of detail without reading the whole file. Or not zoom in and read a low-rez version from a small part of the file. So it's good for map data.
Deep zooming doesn't come up much in web usage, and browsers don't support JPEG 2000.
The JPEG 2000 decoder situation isn't good. OpenJPEG is slow and has had several reported vulnerabilities. There are better expensive decoders, and GPUs can be used to help with decoding, but the most popular decoders are slow.
Arguably the largest usage of JPEG 2000 is as the format movies are distributed to theaters (Digital Cinema Package), and there's been recent work to make decoding faster with HTJ2K.
That's a different codec for a different application. "Unlike J2K-1, the HT coder is not fully embedded and hence quality scalability is largely sacrificed." [1] The outer file format with the metadata has some commonality, but that's about it.
PDFs support j2k, so every PDF renderer includes a j2k decoder. I used this fact for a while because j2k significantly outperforms most JPEG encoders on line-art (e.g. comics). I switched back to JPEG recently though as there is now a JPEG encounter that targets high bpp only uses, and is only like 15% larger, while being fire and forget (as opposed to open JPEG, where I needed to adjust the quality factor depending on the source material).
What are your thoughts on JPEG XL? Since you mentioned browser support, JPEG XL is supported in Safari, but of course nothing else. My understanding is that it pretty much has all the same functionality as JPEG 2000.
> I was able to find a reasonable amount of AVIF HDR images, but HEIC HDR is nowhere to be found.
Anything taken by an iPhone camera is an HDR HEIF image… sort of. For backwards compatibility reasons it's an SDR image with an additional attached image called a gain map that HDRifies it. (This is mathematically worse for a reason I don't remember; it causes any picture taken at a concert with blacklights or blue spotlights to look bad. Once you see this you'll never unsee it.)
I believe the very newest Sony cameras can also save HEIF images, however I don't feel like spending $2500 to upgrade my second-to-newest A7 to a newest A7 to find out.
Lightroom also recently added HDR editing so maybe it can export them now?
> (This is mathematically worse for a reason I don't remember; it causes any picture taken at a concert with blacklights or blue spotlights to look bad.
The iPhone camera sensor is prone to saturating and clipping the blue channel when strong light from a blue LED is in the frame. Once the blue channel clips at the maximum value, a typical HDR gain map won't do anything to restore more nuance to it because they're not designed to add high-frequency detail to a blob of clipped pixels with identical values in the base image.
Every time I read a story involving terminals, I am amazed that they function in the first place. The sheer amount of absurd workarounds for backwards compatibility’s sake is mind-boggling.
There's a lot of things that could be reworked in much better ways if you drop backwards compatibility and think a bit about usability, but the real problem is the time it takes versus how "just fine" things work once you finally understand how to improve on old tools.
To make things worse, tools get much better they get once you start thinking hard about re-implementing things. It seems it takes a lot of hatred to start such a project, and starting is the easy part as it only needs the first 80% of the effort.
- RPCs such as what’s used in the Microsoft Windows ecosystem
- Agent-based configuration management / IaC tools such as Puppet
Each has their own strengths and weaknesses. But for all the criticisms people make about the terminal, and many of the complaints are completely justified, it’s often those weird eccentricities that also make the terminal such a powerful interface.
Not really, because backwards compatibility with the most basic terminals has been maintained throughout it all. Instead of fragmentation, what we have is a very inconsistent soup.
My favourite part of TIFF is what they do about y-ordering.
See, sometimes people think images start at the top, so the first data at y=0 is at the top of the image, other people think they start at the bottom like a graph you draw in maths, were y=0 is clearly at the bottom of the image.
So TIFF says: That should be a parameter of the image.
Why? Well, TIFF came into existence because of scanners, when scanners were first invented each scanner would have its own data format - you're not going to store all this data because that's expensive - who owns that much tape? But when it's scanned clearly the bits you get have some sort of arrangement, maybe a bright white area is 1 and black is 0, maybe the opposite. That's kind of annoying, lets agree a standard.
OK, so as Scanner Maker A my proposed standard is: Exactly what my popular A9 Scanner does
No! As scanner maker B, clearly the standard should be what our BZ-20 model does
No! Everybody at scanner maker C knows the obvious thing to do is derive the standard from the behaviour of our popular C5 and C10 scanners!
Result: The TIFF standard says all of the above are OK, just add header data explaining what's going on. Since some scanners would scan a page from the top, those say y=0 is at the top, those vendors whose scanner works the other way say y=0 is at the bottom!
Right, it's fine if your image format has an opinion about this, what's infuriating is that TIFF lets each individual image have its own opinion, so a decent TIFF decoder needs to be able to do either for each image.
Flipping the image sometimes doesn't sound that bad. IIRC it's common for JPEGs to have rotation metadata, because sometimes you want to rotate or flip a camera image and you don't want to recompress it when you do.
Just to add some closure to this, the libTIFF library[0] comes fairly close to a universal reader. It’s been around, since the 1980s, so has had time to sand off the rough edges. It’s still being maintained and extended.
It's fairly "low-level," so could probably benefit from Façades.
my limited understanding is it's basically just raw RGB values as binary, with just enough metadata to put it on the screen. BMP from my understanding is, like most things microsoft from that era, just a memory dump of the structure/array holding the image in memory.
I wonder if grandparent meant something else, but i don't know enough this instant to guess at what other format?
Haha, if only. Read the wiki article on BMP sometime. There's RLE, huffman coding, all kinds of bitdepths and channel masks. Configurable halftoning algorithms. Embedded JPEGs and PNGs? 64-bit BMPs in some kind of fixed point format? And that's on top of the usual Microsoft DWORD BITMAPV2INFOHEADER; struct-dump gunk.
The real kicker is it's not even useful. Just say "unsupported" to everything that isn't a dumb RGB dump and no one will ever notice!
Yeah, when I tried to implement BMP I couldn't even find a good specification. In the end it was easier to write a PNG loader/writer than deal with BMP. I've also got much better compatibility with other software.
The writer was so trivial that even with the need to encapsulate the uncompressed data in a Deflate compatible bitstream it was just easier than trying to do a BMP output.
huh, i've been living under a misapprehension for a very long time. Now that you mention it i do remember run length and Huffman. I wonder if i am confusing it with a different file format that is basically just the RGB values with the WxH metadata?
You can create BMP files like that - I wrote some image generators that do that and they render fine in all image viewers. But the spec also supports much more, so if you're writing a BMP viewer then things get hairy.
Is that even possible? I once tried to make a matched pyramid (JPEG-in-)TIFF reader/writer pair and to retain some compatibility with other people who do this sort of thing (GIS, medical images, etc.). Virtually nobody agrees on how you’re supposed to do it[1]: some prescribe storing smaller versions as siblings (a NextIFD chain started by the main image), some as children of the largest one (a NextIFD chain pointed to by the SubIFD link of the main image), some as children each other (a SubIFD chain started by the main image), some as both simultaneously (a chain of identical SubIFD and NextIFD fields started by the main image). And I mean, I could decide on something for my writer. But now I’m a reader and I get a TIFF file with some IFDs somehow linked by the NextIFD and/or SubIFD fields. WTF am I supposed to do with it? Is it a pyramid? Is it a multipage document? Is it a birdplane^W a sequence of pyramidal pages? I suppose I can walk the whole thing and construct a DAG, but again, how the hell can I tell what the DAG means?
(And don’t take this as a knock against TIFF in general—as far as I know, it’s one of the few image formats that takes the possibility of large and larger-than-memory images seriously. I think HEIF also does? But ISO paywalled it after first making it publicly available, so, hard pass.)
We did a similar format, to allow editable "JPEGs." The IFD of a JFIF container was normal, except that we added a second entry, with the raw source of the image.
Traditional readers saw a JPEG (although an unusually obese one), but our software could access the second entry, which contained the raw source, and the control parameters for all the processing steps that resulted in the JPEG, so we could treat the image as nondestructive, and reversible.
It was never actually released, if I remember, but it may well be patented.
TIFF was originally designed to store drum scanner data, in realtime, so it uses strips, as opposed to tiles.
This was in the ‘90s. About the time the ink was still wet on that spec.
It was in C++, and I couldn’t do 100%, but I probably got about 80% (but not so performant). The weirdest thing, if I remember correctly, was pixel data with different sizes between stored components.
Ah yes, that was always entertaining. All the different ways additional metadata could be encoded was so much fun if you were dealing with geographical data.
PNG and JPEG are simple enough that a single person can write a useful decoder for them from scratch in a weekend or two.
The newer formats achieve better compression by adding more and more stuff. They're year-long projects to reimplement, which makes almost everyone stick to their reference implementations.
Nah the space savings can significantly cut down on bandwidth costs at scale. They'll get (and have been?) pushed by Google and friends for that reason.
ICC isn't too complex itself, but the bolted-on design of color profiles makes them annoying to handle, and easy to ignore.
You can't just handle pixels, you need to handle pixels in a context of input and output profiles. That's a pain like code-page based text encodings before Unicode, and we haven't established a Unicode equivalent for pixels yet.
The problem colour profiles solves is about how the monitor should display those colours. It’s so that what you see on the screen is going to be exactly the same shade of CMYK as what gets printed.
It’s a big problem for magazine (and equivalent) publishing. Movies too. But much less of an issue for other media industries which are targeting end user devices like smart phones and laptops.
The equivalent in typefaces would be the font rasterisation itself (like Microsoft Clear Type) rather than code pages.
"Unicode for pixels" would be something like Rec.2020 color (with some specific high depth and HDR solution defined) used in all APIs that take pixels. Currently sRGB is the closest to a universal default, but that's ASCII of pixels.
You need a monitor profile because the display protocol takes dumb numeric values that are interpreted in monitor-specific way, instead of being sent in some universal color space, and converted to monitor's internal format by the monitor itself.
In this analogy monitors are like pre-Unicode printers, where characters were just bytes, and the bytes mapped to whatever 8-bit language-specific font the printer had.
You’re assuming that monitors and printers can be trusted to accurately reproduce the colour space even if there was a profile attached (which, by the way, most monitors do actually have).
This isn’t true. Particularly with monitors where people can adjust the contract and brightness.
The reason colour profiles exist is so that computers can be calibrated to support the monitor output.
You are also ignoring the fact that environmental factors can have an effect too. Ie how the room is lit.
Comparing something standardised like writing glyphs with something highly individual (monitor calibration) doesn’t make a whole lot of sense.
Sixel is a fun side quest as well, if you want to encode it on your own. You get to work on a color quantization and/or dithering algorithm in $CURRENT_YEAR.
And if you're doing text with images, integrating both Sixel and Kitty is especially painful because they have completely different display models.
(In particular, Sixel is cell based, so you get Z-ordering for free - with the caveat that writing text on top of images destroys the cells.
Meanwhile, Kitty has Z-ordering, but it's per-image, so e.g. to draw a menu on top of an image that partially covers text, you must send a new image with space for the menu erased...)
> It may be worth noting that Chafa uses more of the block symbols, but the printouts it makes look ugly to me, like a JPEG image compressed with very low quality.
It may be worth nothing that Chafa can be configured to only use ASCII 219
Yeah, that's one of many things I love about chafa. The charsets are very flexible allowing me to get something decentish working even in the limited default fonts and no fallbacks in putty on windows. Or blacklist stuff that looked bad in my preferred linux font.
Another nice thing about it is the author made an ffmpeg patch (playable using -c:v rawvideo -pix_fmt rgba -f chafa) that was pretty darn handy for quickly triaging videos on a remote server without having to relay them to a more usable terminal.
And it has sixel support too if you happen to be in a terminal that supports that.
And since he's delegating to imagemagick, it has loaded every image format I've thrown at it, including RAW.
I think you misunderstood. The video is using ffmpeg. I was talking about RAW from a camera which my imagemagick delegates to ufraw. But running ldd on my copy of chafa I see it now links to a ton of graphic libs directly.
ASCII renditions of photos have been around basically forever. Certainly printing these out on a line printer (80x24 screen was too small) was a thing in the mid 70's, and I'd bet go back to the first scanners.
Maybe the first quantized picture credit should go to Roman mosaics which quality wise are about the same as a lo-res JPEG.
The author questions whether anyone is using the modern web formats. As I see it, nobody should be using these formats, they should just be served by content delivery networks when JPG and PNG images are requested. The idea being that graphic artists and programmers work in JPG and PNG and the browser request these image formats to actually get webp, AVIF or whatever is the latest thing.
Now, if you do right click, save image, it should then make another request with a different header that says webp, AVIF or whatever is not accepted, for the original JPG in high resolution with minimal compression to be downloaded.
> [about the (R)IFF format] Having a generic container for everything doesn’t make sense. It’s a cool concept on paper, but when you have to write file loaders, it turns out that you have very specific needs, and none of those needs play well with “anything can happen lol”.
Well, it's not like PNG and SVG are any different.
RIFF (used for .wav and.avi files) was just a pure container format. The actual payload content was compressed/represented by an open-ended set of CODECs, as indicated by the "FOURCC" (four character code) present in the file.
I know, but what is the practical difference for the file loader? In both cases you have formats with open-ended extensions (PNG chunks and XML elements), and in both cases you have to make sure that the overall format matches what you expect.
For SVG there are standards that define how to interact with unknown elements, and a set of standard elements which are actually part of the spec. With PNG, there's a minimum set of chunks that are required, and you can safely stick with those and get something that works for the vast majority of cases.
This is totally different from a container which can contain any type of data in any type of format. If you get a valid PNG, you can load something meaningful from it. If you get a generic container, you might be able to inspect it at a surface level but there's no guarantee that the contents are even conceptually meaningful to you.
Effectively the same is true for IFF-compliant formats like ILBM and AIFF, which are usually distinguished either by file extension or by file-system metadata.
It’s a bit like criticizing a format for being XML-based, under the argument that XML can represent all kinds of payloads.
That's exactly what I'm working towards. It's the difference between just using an XML file and using an XML file with a unique extension tacked onto it. It represents a subset with set expectations for behavior. That expectation is what matters.
For generic containers, there are no expectations on what's inside. If I'm making a video player, do I support MKV? Well, some of them. Not all of them. You have to try it and find out. Heck, even within a single codec it's hard to find complete support. Last I checked, browsers won't touch an H264 stream unless it's also YUV420p which also places certain restrictions on physical dimensions of the data and probably some other stuff that's escaping my mind.
If I'm making an image viewer, do I support SVG or PNG? For this example, let's say I only support PNG, but I can tell from the outset what's inside any file I might try to load. I can confidently say I can meaningfully load and display anything that uses the established standards of the PNG format to store image data. Sure, there are extensible bits, but they're not part of those core expectations surrounding the core part of the format, the image data I care about. And I didn't have to try and open the PNG first to see if it was actually an SVG. I can recognize not-bitmaps at a glance without having to inspect them any deeper than their extension.
With RIFF you not only need to be able to handle the container format, but also the specific type of payload content, which for video varied from simple uncompressed YUV formats like Y41P to proprietary compressed ones like WMV1 (Windows Media Video). Being able to handle the RIFF format therefore had no bearing on whether you'd be able to extract data from it.
> Finally, remember that the only legal text you are bound by (if at all) is the actual text of the GPL license. It does not matter what Stallman says he intended with the GPL, or what the GPL FAQ says should happen in some fantasy land.
Tell that to the lawyers when they send you a cease and desist.
The reason non-gpl compliant software don't touch GPL is not because there might be a loophole, it's that there is ni precident set in court and they don't be the ones needing to do it. This requires lawyers with expertise in both copyright law and contract law. It doesn't matter what is copyrightable if you agreed to a contract that you wouldn't do that and that is what the GPL is, a contract that you agree to that mentions how you are allowed to use the code in question.
In the end whether the GPL is enforceable in these edge cases is up to the courts not your interpretation of it and if you project becomes a roaring success do you really want to spend time and money on lawyers that you could rather spend on development.
The author quotes Google vs Oracle where the case was about using headers for compatibility: IIRC to provide an alternative implementation.
This is different from vv which uses the headers to link to the GPLed code.
IN most jurisdictions the GPL is a license, not a contract, and is definitely designed not to be a contract.
That said, as far as I can see vv is in breach of the GPL. This is a case of someone who wants there to be a loophole convincing themselves there is one.
I would definitely not redistribute vv because of that. More importantly I think it likely that people packaging software for Linux repos are not going to want to take the risk, and many will object to trying to find a loophole in GPL on principle too.
> it's that there is ni precident set in court and they don't be the ones needing to do it.
Ah yes, "ni precident". I would suggest people instead get advice from someone who has some idea what they're talking about and a good grasp of the English language to communicate it.
I once tried to get an image on the screen using the Linux framebuffer device, using Cairo in Python. It was for an embedded device. Turned out that the framebuffer supported only BGR ordering while Cairo only did RGB. Which was disappointing because I expected more flexibility.
With due respect, why are these requirements instead of nice-to-haves? The ability to perform an action, as an output, seems unrelated to target outcomes such as you proposed (be they morally justified, fads, or any other thing under the sun).
Whose morality? Ultimately, what we value is subjective. There are some seeming universal morals that most excuse or dismiss without a thought because they feel justified in the abrogation of natural rights and social contracts (to identify the spectrum).
My read is you're trying to be fun and agreeable here, but unfortunately, cats are not universally beloved and for some are abominations (either due to status as pets, because they prefer another pet, or the flavor/quality of the meat is off, or other elements we could identify in a universal census).
That morals aren't universal is pretty universal :/
I'm amused by how quickly he shifts from "all existing image viewers have silly bugs, no-one knows what they're doing, I want to understand everything" to "here's some code I copy-pasted that seems to work". What a deeply quixotic approach to, uh, everything in this blog post.
after having read the entire thing, i don't find the same amusement. In fact, you concatenated disparate quotes and put them out of order.
1: Image viewer [libraries] have silly bugs
2. Copy and paste code to get images on the screen 3. I want to understand everything the prose and overarching methodology reminds me of frinklang's data file[0] (https://frinklang.org/frinkdata/units.txt) which is... hilariously and educationally opinionated about ... units and measurements and conversions.the writing style comports with how i approach problems and may not jive with other methodologies, maybe? If i have some problem i generally will try copying and pasting (or copilot.exe or whatever) just to see if the problem space has any state of art. If that does what i want, great. I'm done. if i need to do what i want 10,000 times, i usually will dig in and speed it up, or add features (event handling, etc)
I also like scathing commentary about annoyances in technology, and this article had that, for sure. the webp aside was biting!
[0] huh, this file has been updated massively since i last viewed it (mid 2010s) - it looks like earth has made some advancements in definitions of primatives - like Ampere - and things might be more exact, now. Hooray!
Thank you for reminding me of Frink. Alan Eliasen's work is a delightful rabbit hole, and I'm so glad he's still maintaining and improving[1] Frink.
[1] https://frinklang.org/experimental.html#FrinkTNG
This comment exemplifies the best of HN - respectful and informative.
The scathing, self-righteous arrogance, coupled with the evident genuine insight and talent, is why I have to either laugh or cry when he goes down the route of building yet another image viewer that will very clearly come with its own share of issues, rather than putting in the legwork to actually improve the situation.
I guess we're reading two very different posts then.
He missed JPEG 2000.
It's used by the US National Map, the National Geospatial Intelligence Agency, CAT scanners, and Second Life. JPEG 2000 lets you zoom in and access higher levels of detail without reading the whole file. Or not zoom in and read a low-rez version from a small part of the file. So it's good for map data.
Deep zooming doesn't come up much in web usage, and browsers don't support JPEG 2000.
The JPEG 2000 decoder situation isn't good. OpenJPEG is slow and has had several reported vulnerabilities. There are better expensive decoders, and GPUs can be used to help with decoding, but the most popular decoders are slow.
Arguably the largest usage of JPEG 2000 is as the format movies are distributed to theaters (Digital Cinema Package), and there's been recent work to make decoding faster with HTJ2K.
That's a different codec for a different application. "Unlike J2K-1, the HT coder is not fully embedded and hence quality scalability is largely sacrificed." [1] The outer file format with the metadata has some commonality, but that's about it.
[1] https://ds.jpeg.org/whitepapers/jpeg-htj2k-whitepaper.pdf
Also RED camera after some obfuscation.
PDFs support j2k, so every PDF renderer includes a j2k decoder. I used this fact for a while because j2k significantly outperforms most JPEG encoders on line-art (e.g. comics). I switched back to JPEG recently though as there is now a JPEG encounter that targets high bpp only uses, and is only like 15% larger, while being fire and forget (as opposed to open JPEG, where I needed to adjust the quality factor depending on the source material).
What are your thoughts on JPEG XL? Since you mentioned browser support, JPEG XL is supported in Safari, but of course nothing else. My understanding is that it pretty much has all the same functionality as JPEG 2000.
From his footnotes:
> I was able to find a reasonable amount of AVIF HDR images, but HEIC HDR is nowhere to be found.
Anything taken by an iPhone camera is an HDR HEIF image… sort of. For backwards compatibility reasons it's an SDR image with an additional attached image called a gain map that HDRifies it. (This is mathematically worse for a reason I don't remember; it causes any picture taken at a concert with blacklights or blue spotlights to look bad. Once you see this you'll never unsee it.)
I believe the very newest Sony cameras can also save HEIF images, however I don't feel like spending $2500 to upgrade my second-to-newest A7 to a newest A7 to find out.
Lightroom also recently added HDR editing so maybe it can export them now?
> (This is mathematically worse for a reason I don't remember; it causes any picture taken at a concert with blacklights or blue spotlights to look bad.
The iPhone camera sensor is prone to saturating and clipping the blue channel when strong light from a blue LED is in the frame. Once the blue channel clips at the maximum value, a typical HDR gain map won't do anything to restore more nuance to it because they're not designed to add high-frequency detail to a blob of clipped pixels with identical values in the base image.
Lightroom seems determined not to support HEIC export, even though iOS has trouble with all of Lightroom’s other HDR formats. From my testing:
- JPEG with HDR gain map is not supported
- AVIF and JXL HDR files look fine on initial export, but do not survive being sent anywhere from the Photo Library.
So far I haven’t found a way to export an HDR file from Lightroom and then share it with anyone, iOS or Android.
Every time I read a story involving terminals, I am amazed that they function in the first place. The sheer amount of absurd workarounds for backwards compatibility’s sake is mind-boggling.
Actually, what's really amazing is that the wheel has been reinvented so many times except for terminals.
Seriously, make something new that works either with ssh or wireguatd and cement your name in fame.
There's a lot of things that could be reworked in much better ways if you drop backwards compatibility and think a bit about usability, but the real problem is the time it takes versus how "just fine" things work once you finally understand how to improve on old tools.
To make things worse, tools get much better they get once you start thinking hard about re-implementing things. It seems it takes a lot of hatred to start such a project, and starting is the easy part as it only needs the first 80% of the effort.
Several options already exist:
- HTTP management interfaces
- RPCs such as what’s used in the Microsoft Windows ecosystem
- Agent-based configuration management / IaC tools such as Puppet
Each has their own strengths and weaknesses. But for all the criticisms people make about the terminal, and many of the complaints are completely justified, it’s often those weird eccentricities that also make the terminal such a powerful interface.
The reason terminal-land is like this, is precisely because the wheel has been reinvented a few hundred times. Basically XKCD 927, perpetually.
Not really, because backwards compatibility with the most basic terminals has been maintained throughout it all. Instead of fragmentation, what we have is a very inconsistent soup.
Idea: since most of us are looking at terminals through a framebuffer, why not just have a terminal image command that works as follows:
1. Prints a bunch of blank lines corresponding to the height of the image
2. Renders the image/video using the fbdev in the space provided.
3. Scrolling the terminal whilst viewing is handled by moving/cropping the fbdev-rendered media as appropriate?
You want an idea of how tough it is, write a universal TIFF[0] reader (not writer -writers are simple).
Fun stuff. BTDT. Got the T-shirt.
[0] https://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pd...
Isn't TIFF one of those file formats that can contain just about anything, making this just about impossible?
Like, DNG files are TIFFs, so now you need a raw camera decoder, which is basically subjective.
My favourite part of TIFF is what they do about y-ordering.
See, sometimes people think images start at the top, so the first data at y=0 is at the top of the image, other people think they start at the bottom like a graph you draw in maths, were y=0 is clearly at the bottom of the image.
So TIFF says: That should be a parameter of the image.
Why? Well, TIFF came into existence because of scanners, when scanners were first invented each scanner would have its own data format - you're not going to store all this data because that's expensive - who owns that much tape? But when it's scanned clearly the bits you get have some sort of arrangement, maybe a bright white area is 1 and black is 0, maybe the opposite. That's kind of annoying, lets agree a standard.
OK, so as Scanner Maker A my proposed standard is: Exactly what my popular A9 Scanner does
No! As scanner maker B, clearly the standard should be what our BZ-20 model does
No! Everybody at scanner maker C knows the obvious thing to do is derive the standard from the behaviour of our popular C5 and C10 scanners!
Result: The TIFF standard says all of the above are OK, just add header data explaining what's going on. Since some scanners would scan a page from the top, those say y=0 is at the top, those vendors whose scanner works the other way say y=0 is at the bottom!
> My favourite part of TIFF is what they do about y-ordering.
Windows BMP (DIB) also has the convention that y=0 is at the bottom. https://en.wikipedia.org/wiki/BMP_file_format
Right, it's fine if your image format has an opinion about this, what's infuriating is that TIFF lets each individual image have its own opinion, so a decent TIFF decoder needs to be able to do either for each image.
Apple's QuickDraw GX did that, as well (I believe -it's been a long time). I think it was because it was based on display Postscript.
What a huge PItA.
Flipping the image sometimes doesn't sound that bad. IIRC it's common for JPEGs to have rotation metadata, because sometimes you want to rotate or flip a camera image and you don't want to recompress it when you do.
Just to add some closure to this, the libTIFF library[0] comes fairly close to a universal reader. It’s been around, since the 1980s, so has had time to sand off the rough edges. It’s still being maintained and extended.
It's fairly "low-level," so could probably benefit from Façades.
[0] https://libtiff.gitlab.io/libtiff/
Yup. Couldn’t do the whole spec, even after about six months of continuous work.
I was young and stupid, back then.
I learned about not biting off more than I could chew. Important lesson in humility.
BMP is pretty gross too, though fortunately far less useful.
What's gross about BMP, it's one of the easiest image formats out there.
my limited understanding is it's basically just raw RGB values as binary, with just enough metadata to put it on the screen. BMP from my understanding is, like most things microsoft from that era, just a memory dump of the structure/array holding the image in memory.
I wonder if grandparent meant something else, but i don't know enough this instant to guess at what other format?
Haha, if only. Read the wiki article on BMP sometime. There's RLE, huffman coding, all kinds of bitdepths and channel masks. Configurable halftoning algorithms. Embedded JPEGs and PNGs? 64-bit BMPs in some kind of fixed point format? And that's on top of the usual Microsoft DWORD BITMAPV2INFOHEADER; struct-dump gunk.
The real kicker is it's not even useful. Just say "unsupported" to everything that isn't a dumb RGB dump and no one will ever notice!
Embedded png was vista+ iirc (that would be BITMAPV4INFOHEADER+), and is used primarily in .ico files to contain multi-resolution icons.
I don't think I ever saw a embedded jpeg.
Yeah, when I tried to implement BMP I couldn't even find a good specification. In the end it was easier to write a PNG loader/writer than deal with BMP. I've also got much better compatibility with other software.
The writer was so trivial that even with the need to encapsulate the uncompressed data in a Deflate compatible bitstream it was just easier than trying to do a BMP output.
I've even released my code under public domain if someone is interested: http://public-domain.advel.cz/
huh, i've been living under a misapprehension for a very long time. Now that you mention it i do remember run length and Huffman. I wonder if i am confusing it with a different file format that is basically just the RGB values with the WxH metadata?
You can create BMP files like that - I wrote some image generators that do that and they render fine in all image viewers. But the spec also supports much more, so if you're writing a BMP viewer then things get hairy.
Is that even possible? I once tried to make a matched pyramid (JPEG-in-)TIFF reader/writer pair and to retain some compatibility with other people who do this sort of thing (GIS, medical images, etc.). Virtually nobody agrees on how you’re supposed to do it[1]: some prescribe storing smaller versions as siblings (a NextIFD chain started by the main image), some as children of the largest one (a NextIFD chain pointed to by the SubIFD link of the main image), some as children each other (a SubIFD chain started by the main image), some as both simultaneously (a chain of identical SubIFD and NextIFD fields started by the main image). And I mean, I could decide on something for my writer. But now I’m a reader and I get a TIFF file with some IFDs somehow linked by the NextIFD and/or SubIFD fields. WTF am I supposed to do with it? Is it a pyramid? Is it a multipage document? Is it a birdplane^W a sequence of pyramidal pages? I suppose I can walk the whole thing and construct a DAG, but again, how the hell can I tell what the DAG means?
(And don’t take this as a knock against TIFF in general—as far as I know, it’s one of the few image formats that takes the possibility of large and larger-than-memory images seriously. I think HEIF also does? But ISO paywalled it after first making it publicly available, so, hard pass.)
[1] Here’s a writeup that comes to similar conclusions: https://dpb587.me/entries/tiff-ifd-and-subifd-20240226
We did a similar format, to allow editable "JPEGs." The IFD of a JFIF container was normal, except that we added a second entry, with the raw source of the image.
Traditional readers saw a JPEG (although an unusually obese one), but our software could access the second entry, which contained the raw source, and the control parameters for all the processing steps that resulted in the JPEG, so we could treat the image as nondestructive, and reversible.
It was never actually released, if I remember, but it may well be patented.
TIFF was originally designed to store drum scanner data, in realtime, so it uses strips, as opposed to tiles.
This was in the ‘90s. About the time the ink was still wet on that spec.
It was in C++, and I couldn’t do 100%, but I probably got about 80% (but not so performant). The weirdest thing, if I remember correctly, was pixel data with different sizes between stored components.
Ah yes, that was always entertaining. All the different ways additional metadata could be encoded was so much fun if you were dealing with geographical data.
PNG and JPEG are simple enough that a single person can write a useful decoder for them from scratch in a weekend or two.
The newer formats achieve better compression by adding more and more stuff. They're year-long projects to reimplement, which makes almost everyone stick to their reference implementations.
The newer ones are destined to failure by complexity then?
Newer image formats are based on video codecs, so if you already have the video codec around then theoretically it's not too bad.
Nah the space savings can significantly cut down on bandwidth costs at scale. They'll get (and have been?) pushed by Google and friends for that reason.
Same for GIF. I've written decoders for all 3.
PNG and JPEG both have ICC color profiles, which complicates things.
Even most Windows programs (including Windows Explorer thumbnails) don't display images correctly, which is infuriating.
ICC isn't too complex itself, but the bolted-on design of color profiles makes them annoying to handle, and easy to ignore.
You can't just handle pixels, you need to handle pixels in a context of input and output profiles. That's a pain like code-page based text encodings before Unicode, and we haven't established a Unicode equivalent for pixels yet.
A “Unicode for pixels” is still just pixels.
The problem colour profiles solves is about how the monitor should display those colours. It’s so that what you see on the screen is going to be exactly the same shade of CMYK as what gets printed.
It’s a big problem for magazine (and equivalent) publishing. Movies too. But much less of an issue for other media industries which are targeting end user devices like smart phones and laptops.
The equivalent in typefaces would be the font rasterisation itself (like Microsoft Clear Type) rather than code pages.
"Unicode for pixels" would be something like Rec.2020 color (with some specific high depth and HDR solution defined) used in all APIs that take pixels. Currently sRGB is the closest to a universal default, but that's ASCII of pixels.
You need a monitor profile because the display protocol takes dumb numeric values that are interpreted in monitor-specific way, instead of being sent in some universal color space, and converted to monitor's internal format by the monitor itself.
In this analogy monitors are like pre-Unicode printers, where characters were just bytes, and the bytes mapped to whatever 8-bit language-specific font the printer had.
You’re assuming that monitors and printers can be trusted to accurately reproduce the colour space even if there was a profile attached (which, by the way, most monitors do actually have).
This isn’t true. Particularly with monitors where people can adjust the contract and brightness.
The reason colour profiles exist is so that computers can be calibrated to support the monitor output.
You are also ignoring the fact that environmental factors can have an effect too. Ie how the room is lit.
Comparing something standardised like writing glyphs with something highly individual (monitor calibration) doesn’t make a whole lot of sense.
Sixel is a fun side quest as well, if you want to encode it on your own. You get to work on a color quantization and/or dithering algorithm in $CURRENT_YEAR.
And if you're doing text with images, integrating both Sixel and Kitty is especially painful because they have completely different display models.
(In particular, Sixel is cell based, so you get Z-ordering for free - with the caveat that writing text on top of images destroys the cells.
Meanwhile, Kitty has Z-ordering, but it's per-image, so e.g. to draw a menu on top of an image that partially covers text, you must send a new image with space for the menu erased...)
For those who are interested, the best resource I've found on this is the notcurses author's wiki: https://nick-black.com/dankwiki/index.php/Theory_and_Practic...
> It may be worth noting that Chafa uses more of the block symbols, but the printouts it makes look ugly to me, like a JPEG image compressed with very low quality.
It may be worth nothing that Chafa can be configured to only use ASCII 219
Yeah, that's one of many things I love about chafa. The charsets are very flexible allowing me to get something decentish working even in the limited default fonts and no fallbacks in putty on windows. Or blacklist stuff that looked bad in my preferred linux font.
Another nice thing about it is the author made an ffmpeg patch (playable using -c:v rawvideo -pix_fmt rgba -f chafa) that was pretty darn handy for quickly triaging videos on a remote server without having to relay them to a more usable terminal.
And it has sixel support too if you happen to be in a terminal that supports that.
And since he's delegating to imagemagick, it has loaded every image format I've thrown at it, including RAW.
imagemagick does not support RAW video formats.
I think you misunderstood. The video is using ffmpeg. I was talking about RAW from a camera which my imagemagick delegates to ufraw. But running ldd on my copy of chafa I see it now links to a ton of graphic libs directly.
ASCII renditions of photos have been around basically forever. Certainly printing these out on a line printer (80x24 screen was too small) was a thing in the mid 70's, and I'd bet go back to the first scanners.
Maybe the first quantized picture credit should go to Roman mosaics which quality wise are about the same as a lo-res JPEG.
The author questions whether anyone is using the modern web formats. As I see it, nobody should be using these formats, they should just be served by content delivery networks when JPG and PNG images are requested. The idea being that graphic artists and programmers work in JPG and PNG and the browser request these image formats to actually get webp, AVIF or whatever is the latest thing.
Now, if you do right click, save image, it should then make another request with a different header that says webp, AVIF or whatever is not accepted, for the original JPG in high resolution with minimal compression to be downloaded.
And TFA instead just serves AVIF with no fallback :(
I just use ImageMagick. I put the following function in my ~/.bash_aliases:
> [about the (R)IFF format] Having a generic container for everything doesn’t make sense. It’s a cool concept on paper, but when you have to write file loaders, it turns out that you have very specific needs, and none of those needs play well with “anything can happen lol”.
Well, it's not like PNG and SVG are any different.
RIFF (used for .wav and.avi files) was just a pure container format. The actual payload content was compressed/represented by an open-ended set of CODECs, as indicated by the "FOURCC" (four character code) present in the file.
I know, but what is the practical difference for the file loader? In both cases you have formats with open-ended extensions (PNG chunks and XML elements), and in both cases you have to make sure that the overall format matches what you expect.
For SVG there are standards that define how to interact with unknown elements, and a set of standard elements which are actually part of the spec. With PNG, there's a minimum set of chunks that are required, and you can safely stick with those and get something that works for the vast majority of cases.
This is totally different from a container which can contain any type of data in any type of format. If you get a valid PNG, you can load something meaningful from it. If you get a generic container, you might be able to inspect it at a surface level but there's no guarantee that the contents are even conceptually meaningful to you.
Effectively the same is true for IFF-compliant formats like ILBM and AIFF, which are usually distinguished either by file extension or by file-system metadata.
It’s a bit like criticizing a format for being XML-based, under the argument that XML can represent all kinds of payloads.
That's exactly what I'm working towards. It's the difference between just using an XML file and using an XML file with a unique extension tacked onto it. It represents a subset with set expectations for behavior. That expectation is what matters.
For generic containers, there are no expectations on what's inside. If I'm making a video player, do I support MKV? Well, some of them. Not all of them. You have to try it and find out. Heck, even within a single codec it's hard to find complete support. Last I checked, browsers won't touch an H264 stream unless it's also YUV420p which also places certain restrictions on physical dimensions of the data and probably some other stuff that's escaping my mind.
If I'm making an image viewer, do I support SVG or PNG? For this example, let's say I only support PNG, but I can tell from the outset what's inside any file I might try to load. I can confidently say I can meaningfully load and display anything that uses the established standards of the PNG format to store image data. Sure, there are extensible bits, but they're not part of those core expectations surrounding the core part of the format, the image data I care about. And I didn't have to try and open the PNG first to see if it was actually an SVG. I can recognize not-bitmaps at a glance without having to inspect them any deeper than their extension.
With RIFF you not only need to be able to handle the container format, but also the specific type of payload content, which for video varied from simple uncompressed YUV formats like Y41P to proprietary compressed ones like WMV1 (Windows Media Video). Being able to handle the RIFF format therefore had no bearing on whether you'd be able to extract data from it.
> Finally, remember that the only legal text you are bound by (if at all) is the actual text of the GPL license. It does not matter what Stallman says he intended with the GPL, or what the GPL FAQ says should happen in some fantasy land.
Tell that to the lawyers when they send you a cease and desist.
The reason non-gpl compliant software don't touch GPL is not because there might be a loophole, it's that there is ni precident set in court and they don't be the ones needing to do it. This requires lawyers with expertise in both copyright law and contract law. It doesn't matter what is copyrightable if you agreed to a contract that you wouldn't do that and that is what the GPL is, a contract that you agree to that mentions how you are allowed to use the code in question.
In the end whether the GPL is enforceable in these edge cases is up to the courts not your interpretation of it and if you project becomes a roaring success do you really want to spend time and money on lawyers that you could rather spend on development.
The author quotes Google vs Oracle where the case was about using headers for compatibility: IIRC to provide an alternative implementation.
This is different from vv which uses the headers to link to the GPLed code.
IN most jurisdictions the GPL is a license, not a contract, and is definitely designed not to be a contract.
That said, as far as I can see vv is in breach of the GPL. This is a case of someone who wants there to be a loophole convincing themselves there is one.
I would definitely not redistribute vv because of that. More importantly I think it likely that people packaging software for Linux repos are not going to want to take the risk, and many will object to trying to find a loophole in GPL on principle too.
> it's that there is ni precident set in court and they don't be the ones needing to do it.
Ah yes, "ni precident". I would suggest people instead get advice from someone who has some idea what they're talking about and a good grasp of the English language to communicate it.
I must protest, my good sir, I reckon that the Monthy Pythons have a perfect grasp of the English language!
https://m.youtube.com/watch?v=zIV4poUZAQo
I shall quote "Ni" again to you unless you appease me.
I once tried to get an image on the screen using the Linux framebuffer device, using Cairo in Python. It was for an embedded device. Turned out that the framebuffer supported only BGR ordering while Cairo only did RGB. Which was disappointing because I expected more flexibility.
[dead]
[flagged]
Let me guess, didn't read the article?
[flagged]
With due respect, why are these requirements instead of nice-to-haves? The ability to perform an action, as an output, seems unrelated to target outcomes such as you proposed (be they morally justified, fads, or any other thing under the sun).
Morality is not something optional. It is question of survival and self preservation!
You will not get killed, become homeless, or imprisoned if your program crashes. But if it does something morally wrong, maybe you will...
Whose morality? Ultimately, what we value is subjective. There are some seeming universal morals that most excuse or dismiss without a thought because they feel justified in the abrogation of natural rights and social contracts (to identify the spectrum).
Cat pictures are the "universal morals" everybody agrees on! Anything else is questionable (or might be in future/past).
My read is you're trying to be fun and agreeable here, but unfortunately, cats are not universally beloved and for some are abominations (either due to status as pets, because they prefer another pet, or the flavor/quality of the meat is off, or other elements we could identify in a universal census).
That morals aren't universal is pretty universal :/