The bit in the article about the recovery procedure, which involves dumping info from the tape into '100-ish GB of RAM' and then using software to analyze it stuck out to me.
This video on the linked github page for the analysis software[1] is interesting:
I wonder if they'll find it suitable to bake [0] the tape first, which is quite popular in the audio restoration world but I'm not sure how much it applies to computer tape.
> This is rare enough that I'm pushing the recovery
> of it up near the top of my project queue.
The reader is left to wonder what the software librarian at the Computer History Museum could have possibly found recently that warrants a placement ahead of Unix v4 in their project queue. A copy of Atlantian Unix from the ancient Library of Alexandria?
Definitionally if they're "pushing it near the top" they're not only using FIFO, there's a priority ordering involved...
My guess is there's stuff in progress and maybe they need to arrange access to or setup the readers for a tape that old and of potentially unknown format.
So much of this is old and potentially delicate and they don't have unlimited space to work in so they'd have to pack up some other in progress digitization project to setup the tape flux digitizer and maybe have to arrange to get the correct one for this type of tape too.
Please let there be an ultimate force in the universe that spared this tape from tape degradation and/or magnetization that it can be read and extracted into a raw dump fs that we can preserve for all time. (fingers crossed)
Tapes from back then haven’t held up over the years. It all depends on the environment it was stored in.
I remember reading we're nearing a timeframe where VHS and cassette tapes made in the <=1980s will start degrading pretty seriously. So if you own lots of VHS or camcorder tapes you have a relatively short window to save old family videos... or just deal with fuzzy images and bad audio.
Somewhat related, there are people doing amazing things by modifying VHS players and tapping into the raw output from the tape heads (bypassing all of the player's other electronics), and then using modern signal processing techniques to extract unbelievable great footage from old tapes.
Check out this extraction/decoding of a 1987 VHS recording of The Cure:
Play if full screen at whatever the highest resolution your screen can take advantage of, It's amazing! Check out the quality of the big headshots of Robert Smith, the resolution of stuff like his hair is way beyond what I believed VHS to be capable of - based on growing up recording similar music acts in the 70s and 80s.
Here's the software (and descriptions of the hardware and VHS player mods) they use:
I suspect the recording technique/format on those is a similar analog signal on the tape - and from The Reg's article (quoting various sources) it sounds like they're already planning on similar approach:
"The software librarian at the CHM is the redoubtable Al Kossow of Bitsavers, who commented in the thread that he is on the case. On the TUHS mailing list, he explained how he plans to do it:
taping off the head read amplifier, using a multi-channel high speed analog to digital converter which dumps into 100-ish gigabytes of RAM, then an analysis program Len Shustek wrote: https://github.com/LenShustek/readtape
It is a '70s 1200ft 3M tape, likely 9 track, which has a pretty good chance of being recoverable."
There is one parity bit per 8 data bits, which is decently resilient - plus the recovery is pretty simple on the occasional bit flip. Combine that with the fact you can reference other sources to make up for missing/corrupted files - I think the chance that this is recoverable is pretty high provided the machine reading it is high quality. Checksumming on the source was unfortunately not commonplace until Unix 7, so it's unlikely there was any software-level integrity checks here. The tape looks like it was stored in a sealed container which is a very good sign. Those older tapes are actually more resilient than the later generation of tapes, and don't usually degrade the same way even with exposure to humidity.
Someone in the Mastodon thread mentioned the Andrew Tannenbaum "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."
So I wondered about a modern day equivalent, looked up 1tb micro-sd cards (sold locally for Nintendo Switch) and calculated that there'd be roughly space for 400 exabytes of data in a shipping container filled to the brim with SD-cards.
(SDcard being 1tb for 1.092 x 1,499 x 0,102 CM's and a shipping container being 1203 x 235 x 239 CM's inside so holding 400 million SD cards)
Actually a shipping container full of micro-SD cards hurtling down the highway has lower overall bandwidth than a 56k modem.
That's because whoever's attempting to load an ideal 400 million micro-SD cards into one will take approximately forever carefully trying to line up even one row of them on the floor of a shipping container, before having the whole thing fall over like dominoes.
And even if they manage that, the whole thing will tumble over once they need to deal with the first row of the container's side corrugation. Nobody at the department of Spherical Cows in Vacuums thought to account for those dimensions[1] not lining up with the size of micro-SD cards.
If they do manage some approximation of this it'll take forever just to drive this down the road, let alone get the necessary permits to take the thing on the highway.
Turns out not a lot of semi truck trailers or roads are prepared to deal with a 40 ft container weighing around 100 metric tons (the weight of one packed to the brim with sand, a close approximation).
The good news is that such transportation gets more fuel efficient the longer the trip is.
The bad news is that the container will arrive mostly empty, as it's discovered that shipping container door panel gaps and road vibrations conspire to spread a steady stream of micro-SD cards behind you the entire way there.
Commuters in snowy areas held up behind the slowly moving "OVERSIZED LOAD!" with a mandatory police escort wonder if it's a trial for a new type of road salt that makes a pleasant crunchy sound as you drive over it.
Finally, an attempt to recover the remaining data fails. The sharding strategy chosen didn't account for failure due to road salt ingression into the container, cards at the bottom of the container being crushed to dust by the weight of those above, or that the leased container hadn't been thoroughly cleaned since last transporting, wait, what is that smell?
100 metric tons is nothing really when it comes to trucking, logging trucks in Sweden are commonly hurling through down dirt roads at 70-80km/h with 70 ton loads (mostly limited to that because that's the maximum allowed weight without oversize escorts currently), the Finns are experimenting with 100 ton loads for logging purposes (for environmental reasons).
That's not even mentioning Australian road trains that seem to commonly pull around 150 tons with some being up to 200 tons (The load would be slightly spread out to more containers but still one truck-load).
Still, 400 million SD-cards is still a silly experiment.
The weight limits are for public roads. Private logging roads can run whatever they want and Canada had(have?) some impressive rigs routinely hauling 100+ tons. Just do an image search for Hayes/Pacific logging truck.
I tried setting that up, but now the trucker's union is refusing to talk to me, citing concerns that the platters will all spin up due to road vibration, derailing the truck in a ditch due to the cumulative gyroscopic forces.
They remain unconvinced that chatGPT has told me it "should be fine", and have inquired as to whether I don't have better things to do than trying to win increasingly obscure and contrived arguments on HN. Please advise.
Fascinating, I wasn't aware of that. They still offer an "AWS Snowball", which is 200 TB instead of 100 PB, but around the size of half a full size suitcase instead of a semi truck. You then ship that back and forth.
If you need 100 PB then moving 500 of those around seems a lot easier for everyone involved than managing a special snowflake truck.
Dans Data (RIP that website, apparently) did this backup tapes to microSD cards update about 15 years ago.
He started with "Well, first we need to know how big our station wagon is. I hereby arbitrarily declare it to be a 1985 Volvo 240, which has 2.2 cubic metres of cargo capacity." and "I'm also going to assume that the wagon isn't really packed totally full of memory cards, such that they cascade into the front whenever you brake and will avalanche out of the tailgate when it's opened. Let's say they are packed almost to the roof of the car, but in cardboard boxes, which reduce the usable cargo capacity to a nice round two cubic metres."
The calculated "Assuming uniform and perfect stacking of objects of this volume, with zero air space, you can fit 24,242,424 of them into two cubic metres."
But he also addressed the packing problem, saying:
"In the real world there'd obviously be air spaces, even if you painstakingly stack the tiny cards in perfect layers. My size approximation, that ignores the more-than-0.5mm height of the thick end of the card, could make the perfect-layers calculation quite inaccurate. But if you're just shovelling cards into the boxes and not stacking them, though, there will be even more empty space between cards, and the thicker ends won't matter much.
To use a few words you may have to hit Wikipedia about - I know I did - a random close pack of monodisperse microSD-shaped objects will be considerably tighter than one for, say, spheres. I wouldn't be surprised if it only reduced the theoretical no-air-space density by 20%, provided you shake the boxes while you're filling them.
So let's stick with a 20% density reduction from random packing, giving 0.8 times the theoretical density of perfectly-packed cards. Or nineteen million, three hundred and ninety-three thousand, nine hundred and thirty-nine cards, in the boxes, in the station wagon."
He was writing in 2015, and settled on 16GB cards and being reasonable, getting 275 pebibytes. If we switched them to the 1TB cards mentioned upthread that'd be 17 exabytes in a 2 cubic meter stationwagon cargo area, or in a 67 cubic meter shipping container you'd get 575 exibytes. And that's the "load with a shovel and shack to pack down" number, so perhaps 720EiB if someone took that forever to carefully pack them.
Your 100 tons problem is real, it seems shipping containers (both 20 and 40 foot) seem to top out with a cargo payload of 28 tons. So let's call it "only" 161EiB shovel loads.
The font of all hallucinations and incompetent math tells me "The total amount of data on the internet is estimated to be around 40 zettabytes as of 2025, which is equivalent to 40,000 exabytes." So you'd only need 250 shipping containers or so to store a copy of the entire internet. And that's barely 1% of the capacity of a modern large cargo ship. I guess for reliability you'd use 500 shipping containers in redundant mirrored RAID1 config, each half travelling on a different ship.
Dan also noted: "Unfortunately, even if your cards and card readers could all manage 50 mebibytes per second of read and write speed, getting all of that data onto and off of the cards at each end of the wagon-trip in no more than 24 hours would require around 68,400 parallel copy operations, at each end."
That works out to 2.3 million readers for one parallel copy of one containers worth of data in one day. And 570 million for 250 container's worth.
> he also addressed the packing problem, saying[...]
If we're going to take this "packing problem" a tad more seriously, then the notion that someone might spend on the order of $2.5 billion on micro-SD cards for their station wagon (assuming 1TB at $100/card), but isn't in a position to contact an SD card manufacturer to solve this problem for them is a bit absurd.
I used to be a digitial hoarder (now less so, but I still have what I hoarded, built over a 15 year period). When I moved overseas, I shipped my 120HDs (~600TB) in a container (120 HDs don't take up "that" much space (they all arrived in one piece, though 5 have died, though only 3 non recoverable, after the first one died, I made sure to image each one first and then writing it back (bitrot was a problem)).
anyways, in took 2-3 months for it to arrive (and most of that time it was waiting in either port), but by my calculation, I have needed to transfer it at a consistent 80MB/s or so (close to gigabit) to be able to net the equivalent transfer rate.
> It is a '70s 1200ft 3M tape, likely 9 track, which has a pretty good chance of being recoverable.
Not old enough to have this kind of knowledge or confidence. I wonder if instead one day I'll be helping some future generation read old floppies, CDs, and IDE/ATA disks *slaps top of AT tower*.
You might be able to use that old floppy drive. But you won't be able to use that old Pentium machine the drive is in.
Because you will need several hundred gigabytes of RAM and a very fast IO bus.
The gold standard today for archiving magnetic media is to make a flux image.
The media is treated as if it were an analog recording and sampled at such a high rate that the smallest details are captured. Interpretation is done later, in software. The only antique electronics involved are often the tape or drive head, directly connected to a high speed digitizer.
> As for CDs, I don't see the rush; the ones that were properly made will likely outlast human civilization.
Printed ones will last a lot more, but writable ones will degrade to unreadable state in a few years. I lost countless of them years ago, including the double backup of a project I worked on. Branded disks written and verified, properly stored, no sunlight, no objects or anything above them, no moisture or whatever. All gone just because of time. As I read other horror stories like mine, I just stopped using them altogether and never looked back.
>As for CDs, I don't see the rush; the ones that were properly made will likely outlast human civilization.
recordable CD-Rs or DVD-Rs do not last close to that long, and those are the ones that hold the only copies of certain bits (original versions of software, etc) that people are most interested in not losing.
manufactured CDs and DVDs hold commericial music and films that are for the most part not rare at all.
Yes, good distinction. Recordable media will most likely contain data an individual intended to save. But because it's recordable, the dyes and structures on the disc aren't as stable.
Long-lasting, good quality mastered optical media is probably mass produced and has many copies, including a distinct and potentially well-preserved source.
It's probably fair to say that a lot of mixtapes (mix CDs?) from the early 2000s are lost to dye issues...
Not that it helps to recover older data, but things are better with Blu-ray today; at least if you buy decent quality discs. Advertised lifespans are multiple decades, up to 100 years, or even 500 years for "M" discs. And in the "M" disc case, it's achieved by using a non-organic dye, to avoid the degradation issues.
'that were properly made' is doing a lot of work in that sentence. I've got a bunch of Sega Saturn and AKAI/Zero-G sample cds that are basically unreadable already due to disc rot. There was a lot of cheaply made optical media floating around in the late '90s.
Also we've started seeing some of the first wave of "properly made" CDs from the 1970s/1980s degrade. 45-50 years is a good, long run, but it is certainly not "forever".
and anything you "burnt" yourself has an even shorter life.
In many ways our storage media has become more ephemeral as capacities have increased - except LTO at least which seems to keep up with storage demands/price and durability, LTO is eternal (or long enough to be able to move it from LTO-(N-4) to LTO-N at least).
Just anecdata, but I had this concern when I worked in academia and we backed up all our data to writable DVDs. I was there 10 years after the start of the project and I periodically checked the old DVDs to make sure they weren't corrupted.
After 10 years, which was longer than the assumed shelf life of writable/rewritable DVDs at the time, I never found a single corrupt file on the disks. They were stored in ideal conditions though, in a case, in a closed climate controlled shelf, and rarely if ever removed or used.
Also, just because I think it's funny, the archive was over 4000 DVDs. (We had a redundant copies of the data compressed and uncompressed, I think it was like 3000 uncompressed 1k compressed) there was also an offsite redundant copy we put on portable IDE (and eventually SATA) drives.
My team used to maintain go-kits for continuity of operations for a government org. We ran into a few scenarios where the dye on optical media would just go, and another where replacement foam for the pelican cases off gassed and reacted with the media!
I was the procurement guy for many years, and we had no HVAC guy - we were in a state university, and there was nothing special about the DVDs we bought, they were from Newegg and other retail places, we did buy the most expensive ones because our grants allowed us to, so maybe that's a factor.
I have no doubts (hence my anecdata statement) that there could be bad DVDs in there, or that maybe over a longer time horizon that the media would be cooked.
Wow! That's pretty interesting. I can imagine wanting to store optical media in Pelican cases or similar for shock protection, ability to padlock, etc. But yeah -- what's the interaction between whatever interior foam they chose and the CD-R media and dyes? Especially after 10+ years of continuous contact?
Optical media is probably best stored well-labeled and in metal or cardboard box on a shelf in a basement that few will rarely disturb.
It was a really fun project. We basically made these disaster kits, with small MFPs, tools, laptops, cell radios and INMARSAT terminals hooked to Cisco switches (this was circa 2002-3) and a little server. We had a deal that let us stow them in unusual places like highway rest stops.
We’d deploy them to help respond to floods or other disasters.
One of the techs cooked up a great idea — use Knoppix or something like it to let us use random computers if needed. Bandwidth was tight, but enough for terminal emulators and things like registration software that ran off the little server. So that’s where we got into the CD/DVD game. We had way more media problems than we expected!
Most of the CDs we burned at home in the 1998-2005 era were still good in recent years, some DVDs in there too. Luck, I guess. No delamination or rot. Really, my main problems were figuring out file types without extentions (burned on classic Mac OS) and... appropriate programs to open them (old Painter limited edition from 1998 needs... the same thing, pretty much).
OTOH, some 12 years ago I worked IT at a newspaper and we were moving offices. The archivist got an intern in a room in our section of the building and together they spent a month or two scanning, then committing whatever physical media to burned CDs (maybe DVDs) before chucking the former to the bin.
Maybe a year after the move, a ticket was opened and I went to check the disks. None of them worked, CRC failures all over. I don't think they even considered testing them, or burning duplicates, or maybe they used a really bad drive which would produce media unreadable by anything else - although I'm only aware that this is a thing with floppies for example.
Cool tale! I have observed a mix of viable and unreadable user-burned CD media from the late 90s and early 2000s. It definitely depends on the quality of the media, quality of the burn/drive/laser, and how well it was stored interim.
My oldest disc is some bright blue Verbatim disk my childhood friend made for me so I could play our favorite game at home pre-2000. I have a bit-perfect copy, but the actual disc still reads fine in 2025 when I last tested it.
Yep, quality is definitely a factor here, as much as it can be. We had NSF funding pre-2008, so there was plenty of budget for quality media. We spared no expense, and while I stayed in a $60/night hostel in SF for conferences, our rewritable DVDs were the best money could buy at the time lol.
Take a look at floppy disk controllers like the AppleSauce, Greaseweazle, and Kryoflux for preserving floppies by recording at the flux-transition level.
This seems to be how a lot of modern history is found.
I recently got to talk to a big-ish name in the Boston music scene, who republished one of his band's original 1985 demos after cleaning the signal up with AI. He told me that he found that tape in a bedroom drawer.
I remember at one point I browsed tuhs.org in an attempt to find the source code for the original B (the language predating C) compiler. I don't think it should be in the 4th edition. I still wonder if there's a copy somewhere. I know there are a few modern implementations, but it would be interesting to look at the original.
""Douglas McIlroy ported TMG to an early version of Unix. According to Ken Thompson, McIlroy wrote TMG in TMG on a piece of paper and "decided to give his piece of paper his piece of paper," hand-compiling assembly language that he entered and assembled on Thompson's Unix system running on PDP-7."
We are not worthy, friends. We are not worthy."
Tons of info, but not much source:
"The first B compiler was written by Ken Thompson in the TMG language around 1969. Thompson initially used the TMG compiler to create a version of B for the PDP-7 minicomputer, which generated threaded code. The B compiler was later rewritten in BCPL and cross-compiled on a GE 635 mainframe to produce object code, which was then re-written in B itself to create a self-hosting compiler. "
So... a B compiler would use GE 635/Multics as a OS.
I was on Mastodon for three years. I deleted my account. When I found out that Charlie Kirk was murdered, my second thought was "well, best create yet another filter on Mastodon so I don't have to watch people celebrate Charlie Kirk being murdered" and when I caught myself having that thought I realised that being on Mastodon was a net negative for my wellbeing.
(I didn't like the guy either, by the way, or at least I knew enough about him that I knew I have much better things to do than listen to him. There are more than a few people like that, all of whom I wish find some peace in their hearts, and none of whom I wish to come to any harm.)
Mastodon is packed to the brim with literal psychopaths and people pretending to be psychopaths for imaginary Internet points. It is not an experience I suggest for anyone who is neither of those things.
From early in my Mastodon journey I made it something of rule to not follow anyone who doesn't CW politics, and mute or block many of the accounts that post politics on main unfiltered.
I don't need that many filters if people make good use of Subject lines (I do like to joke that CW is the short Welsh for Cwbject.) It means I don't see a lot of "celebrities" in my feed that cross-post from one of the other sites and doesn't add CWs because their client or cross-poster doesn't support them, but that seems to be so much the better. It also often means I remove Boost privileges in my feeds from people that will boost stuff without CWs.
That sort of curation is a lot of little bits of work over years. I can definitely understand the feeling that the easiest way to catch up on that curation is to just quit. It's why I quit Twitter (when it was still Twitter). It's why I don't bother with BlueSky or Threads. Mastodon gives me enough curation tools and I've used them for long enough that I feel happy with Mastodon.
If this is intended to accuse me of being a Charlie Kirk fan I can only conclude that you either did not read what I wrote (in which case, you should refrain from replying to it), or you are being dishonest on purpose.
> They were more or less just rewriting what you wrote.
The literal opposite of what I wrote, actually, because "that" refers to "celebrating the murder of Charlie Kirk", an activity not much associated with fans of Charlie Kirk.
Well, I won't debate whether "cool" is the right word. But ideologically I think federation is better than centralization when it can be made to work in practical terms, and Mastodon works.
Really. Where? Unhelpful answers of the form "Just install extension X for browser Y and use it to run script Z" are all anyone has ever been able to suggest when I've asked this question elsewhere.
If you click on Preferences, it's under Appearance which is the first section you see. You can change the site theme to a light option. At least on Mastodon.social.
There's no Preferences button. If there is, they've either hidden it well, or it's not visible without a login.
There should simply be a button -- a conspicuous one -- that toggles the color scheme. It's trivial to add such a button. It doesn't need to be tied to a user ID; it doesn't even need to set a cookie. The fact that no such button exists is a choice someone made, a poor choice that disregards decades of human-machine interface research.
Failure to go full Karen about goofy things like this has made the Web a little worse for almost everyone in one way or another. So... there ya go.
It's the link in the article. But I'm diverting the thread at this point and arguably making this a worse place by doing so. The other replies indicate that the question has been raised and is under consideration, which is all I can ask given that I'm not prepared to jump on Github and send them a PR myself. Thanks for your reply and the rest of the input people have offered!
I really, really hope data can be recovered from this. I’ve read a bunch of the original sources, and such an ancient C would be especially interesting to study.
Very proud to have had this found at my University :-)
Very interesting storage format too - Those tapes actually held quite a bit of data (comparatively) - around 45MB (Although this one is shorter ~1000ft and probably carries about 10-15MB which is close to V4's source code, binary and documentation size).
From the information I've read, quite likely, given that Utah is pretty dry. Also the original data might be stored in its uncompressed form, so even if there were some non-extensive damage it might still be possible to recover some data based upon guessing with context (if it contains text source code, otherwise if it is just the binaries then not that easy).
For context, I'm a geezer who got early access reading Lions' Commentary (6th Edition) and comparing it with 7th Edition source (running on a PDP-/something with no more than 128 KiB RAM). That was 1985, as Unix was spreading its way through universities. SIGSEGV haunts me to this day.
That was a full 40 years ago. And yet, 4th Edition is ancient history even to me.
I wrote that article, but I no longer post my stories to HN because I am subject to a block and anything of my own I submit is flagged [dead]. I do not know why, and I have written to ask with no reply.
I only write 5-10 articles a week so I don't exactly spam the site at high frequency, and if I don't feel I can add more context or insight to a story, I don't write it -- except at the very slowest times of the year, and I wouldn't post those stories here.
Anyway, since nobody much seems to realise this is quite a big deal, I will share the explainer I wrote yesterday:
https://www.theregister.com/2025/11/07/unix_fourth_edition_t...
Unix V4 is otherwise lost. It was the first version in C.
The bit in the article about the recovery procedure, which involves dumping info from the tape into '100-ish GB of RAM' and then using software to analyze it stuck out to me.
This video on the linked github page for the analysis software[1] is interesting:
https://www.youtube.com/watch?v=7YoolSAHR5w&t=4200s
[1] https://github.com/LenShustek/readtape
Well, the tape may not survive its second pass across the read head, so it's good to capture the analog waveform in as much fidelity as possible.
I wonder if they'll find it suitable to bake [0] the tape first, which is quite popular in the audio restoration world but I'm not sure how much it applies to computer tape.
[0] https://en.wikipedia.org/wiki/Sticky-shed_syndrome
I don't know much about tapes but it seems to apply, since the post mentions it:
> I'm hoping I don't have to bake it, since that takes a day
What a fantastic talk! Thanks for sharing.
Perhaps a prior promise to someone else?
Or you know... just the definition of a FIFO Queue. I mean, using a FILO Stack to organize your work would certainly be a choice...
Definitionally if they're "pushing it near the top" they're not only using FIFO, there's a priority ordering involved...
My guess is there's stuff in progress and maybe they need to arrange access to or setup the readers for a tape that old and of potentially unknown format.
Yes it's probably better to finish in progress stuff first.
So much of this is old and potentially delicate and they don't have unlimited space to work in so they'd have to pack up some other in progress digitization project to setup the tape flux digitizer and maybe have to arrange to get the correct one for this type of tape too.
Interesting article. I agree it is kind of a big deal. Certainly worth the effort to try to restore
Please let there be an ultimate force in the universe that spared this tape from tape degradation and/or magnetization that it can be read and extracted into a raw dump fs that we can preserve for all time. (fingers crossed)
Tapes from back then haven’t held up over the years. It all depends on the environment it was stored in.
I remember reading we're nearing a timeframe where VHS and cassette tapes made in the <=1980s will start degrading pretty seriously. So if you own lots of VHS or camcorder tapes you have a relatively short window to save old family videos... or just deal with fuzzy images and bad audio.
Somewhat related, there are people doing amazing things by modifying VHS players and tapping into the raw output from the tape heads (bypassing all of the player's other electronics), and then using modern signal processing techniques to extract unbelievable great footage from old tapes.
Check out this extraction/decoding of a 1987 VHS recording of The Cure:
https://www.youtube.com/watch?v=ks1wE_NXWv8
Play if full screen at whatever the highest resolution your screen can take advantage of, It's amazing! Check out the quality of the big headshots of Robert Smith, the resolution of stuff like his hair is way beyond what I believed VHS to be capable of - based on growing up recording similar music acts in the 70s and 80s.
Here's the software (and descriptions of the hardware and VHS player mods) they use:
https://github.com/oyvindln/vhs-decode
I suspect the recording technique/format on those is a similar analog signal on the tape - and from The Reg's article (quoting various sources) it sounds like they're already planning on similar approach:
"The software librarian at the CHM is the redoubtable Al Kossow of Bitsavers, who commented in the thread that he is on the case. On the TUHS mailing list, he explained how he plans to do it:
taping off the head read amplifier, using a multi-channel high speed analog to digital converter which dumps into 100-ish gigabytes of RAM, then an analysis program Len Shustek wrote: https://github.com/LenShustek/readtape
It is a '70s 1200ft 3M tape, likely 9 track, which has a pretty good chance of being recoverable."
I recently went through this. The problem is finding VCR's... there arent any.
And then the ones with manual tracking are even more rare, or out of the price range, which you likely need as the tracks degrade.
There is one parity bit per 8 data bits, which is decently resilient - plus the recovery is pretty simple on the occasional bit flip. Combine that with the fact you can reference other sources to make up for missing/corrupted files - I think the chance that this is recoverable is pretty high provided the machine reading it is high quality. Checksumming on the source was unfortunately not commonplace until Unix 7, so it's unlikely there was any software-level integrity checks here. The tape looks like it was stored in a sealed container which is a very good sign. Those older tapes are actually more resilient than the later generation of tapes, and don't usually degrade the same way even with exposure to humidity.
Someone in the Mastodon thread mentioned the Andrew Tannenbaum "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."
So I wondered about a modern day equivalent, looked up 1tb micro-sd cards (sold locally for Nintendo Switch) and calculated that there'd be roughly space for 400 exabytes of data in a shipping container filled to the brim with SD-cards.
(SDcard being 1tb for 1.092 x 1,499 x 0,102 CM's and a shipping container being 1203 x 235 x 239 CM's inside so holding 400 million SD cards)
Actually a shipping container full of micro-SD cards hurtling down the highway has lower overall bandwidth than a 56k modem.
That's because whoever's attempting to load an ideal 400 million micro-SD cards into one will take approximately forever carefully trying to line up even one row of them on the floor of a shipping container, before having the whole thing fall over like dominoes.
And even if they manage that, the whole thing will tumble over once they need to deal with the first row of the container's side corrugation. Nobody at the department of Spherical Cows in Vacuums thought to account for those dimensions[1] not lining up with the size of micro-SD cards.
If they do manage some approximation of this it'll take forever just to drive this down the road, let alone get the necessary permits to take the thing on the highway.
Turns out not a lot of semi truck trailers or roads are prepared to deal with a 40 ft container weighing around 100 metric tons (the weight of one packed to the brim with sand, a close approximation).
The good news is that such transportation gets more fuel efficient the longer the trip is.
The bad news is that the container will arrive mostly empty, as it's discovered that shipping container door panel gaps and road vibrations conspire to spread a steady stream of micro-SD cards behind you the entire way there.
Commuters in snowy areas held up behind the slowly moving "OVERSIZED LOAD!" with a mandatory police escort wonder if it's a trial for a new type of road salt that makes a pleasant crunchy sound as you drive over it.
Finally, an attempt to recover the remaining data fails. The sharding strategy chosen didn't account for failure due to road salt ingression into the container, cards at the bottom of the container being crushed to dust by the weight of those above, or that the leased container hadn't been thoroughly cleaned since last transporting, wait, what is that smell?
1. https://www.discovercontainers.com/wp-content/uploads/contai...
100 metric tons is nothing really when it comes to trucking, logging trucks in Sweden are commonly hurling through down dirt roads at 70-80km/h with 70 ton loads (mostly limited to that because that's the maximum allowed weight without oversize escorts currently), the Finns are experimenting with 100 ton loads for logging purposes (for environmental reasons).
That's not even mentioning Australian road trains that seem to commonly pull around 150 tons with some being up to 200 tons (The load would be slightly spread out to more containers but still one truck-load).
Still, 400 million SD-cards is still a silly experiment.
The weight limits are for public roads. Private logging roads can run whatever they want and Canada had(have?) some impressive rigs routinely hauling 100+ tons. Just do an image search for Hayes/Pacific logging truck.
Well, just replace micro-SD card with HDDs and it is pretty much how it works in the real life for absolutely massive data sizes.
Also, packing it up "taking forever" is irrelevant, that's latency, not bandwidth.
I tried setting that up, but now the trucker's union is refusing to talk to me, citing concerns that the platters will all spin up due to road vibration, derailing the truck in a ditch due to the cumulative gyroscopic forces.
They remain unconvinced that chatGPT has told me it "should be fine", and have inquired as to whether I don't have better things to do than trying to win increasingly obscure and contrived arguments on HN. Please advise.
AFAIK it's discontinued since, but AWS had a service where you could send data via a huge truck https://aws.amazon.com/blogs/aws/aws-snowmobile-move-exabyte... , it's absolutely not a fantasy.
Fascinating, I wasn't aware of that. They still offer an "AWS Snowball", which is 200 TB instead of 100 PB, but around the size of half a full size suitcase instead of a semi truck. You then ship that back and forth.
If you need 100 PB then moving 500 of those around seems a lot easier for everyone involved than managing a special snowflake truck.
Dans Data (RIP that website, apparently) did this backup tapes to microSD cards update about 15 years ago.
He started with "Well, first we need to know how big our station wagon is. I hereby arbitrarily declare it to be a 1985 Volvo 240, which has 2.2 cubic metres of cargo capacity." and "I'm also going to assume that the wagon isn't really packed totally full of memory cards, such that they cascade into the front whenever you brake and will avalanche out of the tailgate when it's opened. Let's say they are packed almost to the roof of the car, but in cardboard boxes, which reduce the usable cargo capacity to a nice round two cubic metres."
The calculated "Assuming uniform and perfect stacking of objects of this volume, with zero air space, you can fit 24,242,424 of them into two cubic metres."
But he also addressed the packing problem, saying:
"In the real world there'd obviously be air spaces, even if you painstakingly stack the tiny cards in perfect layers. My size approximation, that ignores the more-than-0.5mm height of the thick end of the card, could make the perfect-layers calculation quite inaccurate. But if you're just shovelling cards into the boxes and not stacking them, though, there will be even more empty space between cards, and the thicker ends won't matter much.
To use a few words you may have to hit Wikipedia about - I know I did - a random close pack of monodisperse microSD-shaped objects will be considerably tighter than one for, say, spheres. I wouldn't be surprised if it only reduced the theoretical no-air-space density by 20%, provided you shake the boxes while you're filling them.
So let's stick with a 20% density reduction from random packing, giving 0.8 times the theoretical density of perfectly-packed cards. Or nineteen million, three hundred and ninety-three thousand, nine hundred and thirty-nine cards, in the boxes, in the station wagon."
He was writing in 2015, and settled on 16GB cards and being reasonable, getting 275 pebibytes. If we switched them to the 1TB cards mentioned upthread that'd be 17 exabytes in a 2 cubic meter stationwagon cargo area, or in a 67 cubic meter shipping container you'd get 575 exibytes. And that's the "load with a shovel and shack to pack down" number, so perhaps 720EiB if someone took that forever to carefully pack them.
Your 100 tons problem is real, it seems shipping containers (both 20 and 40 foot) seem to top out with a cargo payload of 28 tons. So let's call it "only" 161EiB shovel loads.
The font of all hallucinations and incompetent math tells me "The total amount of data on the internet is estimated to be around 40 zettabytes as of 2025, which is equivalent to 40,000 exabytes." So you'd only need 250 shipping containers or so to store a copy of the entire internet. And that's barely 1% of the capacity of a modern large cargo ship. I guess for reliability you'd use 500 shipping containers in redundant mirrored RAID1 config, each half travelling on a different ship.
Dan also noted: "Unfortunately, even if your cards and card readers could all manage 50 mebibytes per second of read and write speed, getting all of that data onto and off of the cards at each end of the wagon-trip in no more than 24 hours would require around 68,400 parallel copy operations, at each end."
That works out to 2.3 million readers for one parallel copy of one containers worth of data in one day. And 570 million for 250 container's worth.
https://web.archive.org/web/20250313181659/http://dansdata.c...
I used to be a digitial hoarder (now less so, but I still have what I hoarded, built over a 15 year period). When I moved overseas, I shipped my 120HDs (~600TB) in a container (120 HDs don't take up "that" much space (they all arrived in one piece, though 5 have died, though only 3 non recoverable, after the first one died, I made sure to image each one first and then writing it back (bitrot was a problem)).
anyways, in took 2-3 months for it to arrive (and most of that time it was waiting in either port), but by my calculation, I have needed to transfer it at a consistent 80MB/s or so (close to gigabit) to be able to net the equivalent transfer rate.
I managed about 1.5gb/s in 2007, between our solo and the office. With a server strapped to the back of a motorcycle.
https://flic.kr/p/4bQ8jz
Relevant (what-if) xkcd
https://what-if.xkcd.com/31/
> It is a '70s 1200ft 3M tape, likely 9 track, which has a pretty good chance of being recoverable.
Not old enough to have this kind of knowledge or confidence. I wonder if instead one day I'll be helping some future generation read old floppies, CDs, and IDE/ATA disks *slaps top of AT tower*.
You might be able to use that old floppy drive. But you won't be able to use that old Pentium machine the drive is in.
Because you will need several hundred gigabytes of RAM and a very fast IO bus.
The gold standard today for archiving magnetic media is to make a flux image.
The media is treated as if it were an analog recording and sampled at such a high rate that the smallest details are captured. Interpretation is done later, in software. The only antique electronics involved are often the tape or drive head, directly connected to a high speed digitizer.
And indeed that appears to be the plan Al Kossow has for the tape: https://www.tuhs.org/pipermail/tuhs/2025-November/032765.htm...
As for CDs, I don't see the rush; the ones that were properly made will likely outlast human civilization.
> As for CDs, I don't see the rush; the ones that were properly made will likely outlast human civilization.
Printed ones will last a lot more, but writable ones will degrade to unreadable state in a few years. I lost countless of them years ago, including the double backup of a project I worked on. Branded disks written and verified, properly stored, no sunlight, no objects or anything above them, no moisture or whatever. All gone just because of time. As I read other horror stories like mine, I just stopped using them altogether and never looked back.
Don't forget about the mold that eats cd and dvds for breakfast. I saw a few dvds damaged by it
>As for CDs, I don't see the rush; the ones that were properly made will likely outlast human civilization.
recordable CD-Rs or DVD-Rs do not last close to that long, and those are the ones that hold the only copies of certain bits (original versions of software, etc) that people are most interested in not losing.
manufactured CDs and DVDs hold commericial music and films that are for the most part not rare at all.
Yes, good distinction. Recordable media will most likely contain data an individual intended to save. But because it's recordable, the dyes and structures on the disc aren't as stable.
Long-lasting, good quality mastered optical media is probably mass produced and has many copies, including a distinct and potentially well-preserved source.
It's probably fair to say that a lot of mixtapes (mix CDs?) from the early 2000s are lost to dye issues...
> lost to dye issues...
Not that it helps to recover older data, but things are better with Blu-ray today; at least if you buy decent quality discs. Advertised lifespans are multiple decades, up to 100 years, or even 500 years for "M" discs. And in the "M" disc case, it's achieved by using a non-organic dye, to avoid the degradation issues.
> structures on the disc aren't as stable.
Which is why the format has generous error correction built in.
'that were properly made' is doing a lot of work in that sentence. I've got a bunch of Sega Saturn and AKAI/Zero-G sample cds that are basically unreadable already due to disc rot. There was a lot of cheaply made optical media floating around in the late '90s.
Also we've started seeing some of the first wave of "properly made" CDs from the 1970s/1980s degrade. 45-50 years is a good, long run, but it is certainly not "forever".
and anything you "burnt" yourself has an even shorter life.
In many ways our storage media has become more ephemeral as capacities have increased - except LTO at least which seems to keep up with storage demands/price and durability, LTO is eternal (or long enough to be able to move it from LTO-(N-4) to LTO-N at least).
It’s so obvious in retrospect but I never considered they would do this! Thanks for sharing.
Just anecdata, but I had this concern when I worked in academia and we backed up all our data to writable DVDs. I was there 10 years after the start of the project and I periodically checked the old DVDs to make sure they weren't corrupted.
After 10 years, which was longer than the assumed shelf life of writable/rewritable DVDs at the time, I never found a single corrupt file on the disks. They were stored in ideal conditions though, in a case, in a closed climate controlled shelf, and rarely if ever removed or used.
Also, just because I think it's funny, the archive was over 4000 DVDs. (We had a redundant copies of the data compressed and uncompressed, I think it was like 3000 uncompressed 1k compressed) there was also an offsite redundant copy we put on portable IDE (and eventually SATA) drives.
Thank your procurement agent and hvac guy.
My team used to maintain go-kits for continuity of operations for a government org. We ran into a few scenarios where the dye on optical media would just go, and another where replacement foam for the pelican cases off gassed and reacted with the media!
I was the procurement guy for many years, and we had no HVAC guy - we were in a state university, and there was nothing special about the DVDs we bought, they were from Newegg and other retail places, we did buy the most expensive ones because our grants allowed us to, so maybe that's a factor.
I have no doubts (hence my anecdata statement) that there could be bad DVDs in there, or that maybe over a longer time horizon that the media would be cooked.
Wow! That's pretty interesting. I can imagine wanting to store optical media in Pelican cases or similar for shock protection, ability to padlock, etc. But yeah -- what's the interaction between whatever interior foam they chose and the CD-R media and dyes? Especially after 10+ years of continuous contact?
Optical media is probably best stored well-labeled and in metal or cardboard box on a shelf in a basement that few will rarely disturb.
It was a really fun project. We basically made these disaster kits, with small MFPs, tools, laptops, cell radios and INMARSAT terminals hooked to Cisco switches (this was circa 2002-3) and a little server. We had a deal that let us stow them in unusual places like highway rest stops.
We’d deploy them to help respond to floods or other disasters.
One of the techs cooked up a great idea — use Knoppix or something like it to let us use random computers if needed. Bandwidth was tight, but enough for terminal emulators and things like registration software that ran off the little server. So that’s where we got into the CD/DVD game. We had way more media problems than we expected!
Most of the CDs we burned at home in the 1998-2005 era were still good in recent years, some DVDs in there too. Luck, I guess. No delamination or rot. Really, my main problems were figuring out file types without extentions (burned on classic Mac OS) and... appropriate programs to open them (old Painter limited edition from 1998 needs... the same thing, pretty much).
OTOH, some 12 years ago I worked IT at a newspaper and we were moving offices. The archivist got an intern in a room in our section of the building and together they spent a month or two scanning, then committing whatever physical media to burned CDs (maybe DVDs) before chucking the former to the bin. Maybe a year after the move, a ticket was opened and I went to check the disks. None of them worked, CRC failures all over. I don't think they even considered testing them, or burning duplicates, or maybe they used a really bad drive which would produce media unreadable by anything else - although I'm only aware that this is a thing with floppies for example.
Cool tale! I have observed a mix of viable and unreadable user-burned CD media from the late 90s and early 2000s. It definitely depends on the quality of the media, quality of the burn/drive/laser, and how well it was stored interim.
My oldest disc is some bright blue Verbatim disk my childhood friend made for me so I could play our favorite game at home pre-2000. I have a bit-perfect copy, but the actual disc still reads fine in 2025 when I last tested it.
Yep, quality is definitely a factor here, as much as it can be. We had NSF funding pre-2008, so there was plenty of budget for quality media. We spared no expense, and while I stayed in a $60/night hostel in SF for conferences, our rewritable DVDs were the best money could buy at the time lol.
Take a look at floppy disk controllers like the AppleSauce, Greaseweazle, and Kryoflux for preserving floppies by recording at the flux-transition level.
This seems to be how a lot of modern history is found.
I recently got to talk to a big-ish name in the Boston music scene, who republished one of his band's original 1985 demos after cleaning the signal up with AI. He told me that he found that tape in a bedroom drawer.
I remember at one point I browsed tuhs.org in an attempt to find the source code for the original B (the language predating C) compiler. I don't think it should be in the 4th edition. I still wonder if there's a copy somewhere. I know there are a few modern implementations, but it would be interesting to look at the original.
The 'B' compiler was written in TMG-Compiler-Compiler. TMG (Transmogrifier)
https://github.com/amakukha/tmg
https://news.ycombinator.com/item?id=26722097
""Douglas McIlroy ported TMG to an early version of Unix. According to Ken Thompson, McIlroy wrote TMG in TMG on a piece of paper and "decided to give his piece of paper his piece of paper," hand-compiling assembly language that he entered and assembled on Thompson's Unix system running on PDP-7."
We are not worthy, friends. We are not worthy."
Tons of info, but not much source:
"The first B compiler was written by Ken Thompson in the TMG language around 1969. Thompson initially used the TMG compiler to create a version of B for the PDP-7 minicomputer, which generated threaded code. The B compiler was later rewritten in BCPL and cross-compiled on a GE 635 mainframe to produce object code, which was then re-written in B itself to create a self-hosting compiler. "
So... a B compiler would use GE 635/Multics as a OS.
Of course I found it, AFTER I hit post...
https://retrocomputingforum.com/t/b-a-simple-interpreter-com...
OT - Mastodon is seriously cool. If you haven't yet bothered, I suggest to everyone that you spend a bit of time exploring.
Mastodon is just a part of the larger Fediverse.
Yeah I guess I still haven't wrapped my mind around that other part.
I was on Mastodon for three years. I deleted my account. When I found out that Charlie Kirk was murdered, my second thought was "well, best create yet another filter on Mastodon so I don't have to watch people celebrate Charlie Kirk being murdered" and when I caught myself having that thought I realised that being on Mastodon was a net negative for my wellbeing.
(I didn't like the guy either, by the way, or at least I knew enough about him that I knew I have much better things to do than listen to him. There are more than a few people like that, all of whom I wish find some peace in their hearts, and none of whom I wish to come to any harm.)
Mastodon is packed to the brim with literal psychopaths and people pretending to be psychopaths for imaginary Internet points. It is not an experience I suggest for anyone who is neither of those things.
I'm on mathstodon.xyz (mastodon for maths) and haven't seen any of that. So I guess it's the people you subscribe to.
But I have the freedom to decide what I want to consume.
From early in my Mastodon journey I made it something of rule to not follow anyone who doesn't CW politics, and mute or block many of the accounts that post politics on main unfiltered.
I don't need that many filters if people make good use of Subject lines (I do like to joke that CW is the short Welsh for Cwbject.) It means I don't see a lot of "celebrities" in my feed that cross-post from one of the other sites and doesn't add CWs because their client or cross-poster doesn't support them, but that seems to be so much the better. It also often means I remove Boost privileges in my feeds from people that will boost stuff without CWs.
That sort of curation is a lot of little bits of work over years. I can definitely understand the feeling that the easiest way to catch up on that curation is to just quit. It's why I quit Twitter (when it was still Twitter). It's why I don't bother with BlueSky or Threads. Mastodon gives me enough curation tools and I've used them for long enough that I feel happy with Mastodon.
> I'm on mathstodon.xyz (mastodon for maths) and haven't seen any of that. So I guess it's the people you subscribe to.
I was on an automotive-focused instance. I did see a lot of that.
> But I have the freedom to decide what I want to consume.
As do I; I had the freedom to delete my account, thus avoiding the need for any active measures to make my life free of schizoposting.
TIL there's many Charlie Kirk fans amongst fans of cars?
If this is intended to accuse me of being a Charlie Kirk fan I can only conclude that you either did not read what I wrote (in which case, you should refrain from replying to it), or you are being dishonest on purpose.
I don't see how you made that leap. It was directly in reference to your reply:
> I was on an automotive-focused instance. I did see a lot of that.
They were more or less just rewriting what you wrote.
> They were more or less just rewriting what you wrote.
The literal opposite of what I wrote, actually, because "that" refers to "celebrating the murder of Charlie Kirk", an activity not much associated with fans of Charlie Kirk.
Thanks for the clarification.
TIL there's many Charlie Kirk haters amongst fans of cars?
Ouch, yeah, thanks for the explanation. Total reading comprehension failure on my part.
What makes it cool?
Well, I won't debate whether "cool" is the right word. But ideologically I think federation is better than centralization when it can be made to work in practical terms, and Mastodon works.
No "algorithm" shoving ads down your throat. Just a timeline of the accounts you follow with the posts in chronooogical order.
It shoves dark mode down your throat whether you want it or not. What could be cooler than that?
There's an option
Which puts it one up over Hacker News!
Really. Where? Unhelpful answers of the form "Just install extension X for browser Y and use it to run script Z" are all anyone has ever been able to suggest when I've asked this question elsewhere.
If you click on Preferences, it's under Appearance which is the first section you see. You can change the site theme to a light option. At least on Mastodon.social.
There's no Preferences button. If there is, they've either hidden it well, or it's not visible without a login.
There should simply be a button -- a conspicuous one -- that toggles the color scheme. It's trivial to add such a button. It doesn't need to be tied to a user ID; it doesn't even need to set a cookie. The fact that no such button exists is a choice someone made, a poor choice that disregards decades of human-machine interface research.
Failure to go full Karen about goofy things like this has made the Web a little worse for almost everyone in one way or another. So... there ya go.
I'm on two instances and both have a Preferences button.
Is it possible that your instance moved it away from the default place? What instance are you on, I can help you find what you need to click.
It's the link in the article. But I'm diverting the thread at this point and arguably making this a worse place by doing so. The other replies indicate that the question has been raised and is under consideration, which is all I can ask given that I'm not prepared to jump on Github and send them a PR myself. Thanks for your reply and the rest of the input people have offered!
Preferences shows up in the sidebar (that has the login and signup buttons) if you have an account.
Otherwise it's up to the instance admin to choose the default.
And the admin should probably use the automatic setting. There is a feature request for a user preference when not logged in.[1]
[1]: https://github.com/mastodon/mastodon/issues/30193
I really, really hope data can be recovered from this. I’ve read a bunch of the original sources, and such an ancient C would be especially interesting to study.
Very proud to have had this found at my University :-)
That is a big deal; I don't remember anything that old being available on tuhs.org.
Check here https://www.tuhs.org/Archive/Distributions/Research/
Will not be much different to the existing v5 source code, we can assume. https://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/...
This is amazing news for UNIX fans. Really hope the source can be recovered and put alongside the other historical UNIX source that's out there.
From 1973. See https://en.wikipedia.org/wiki/List_of_Unix_systems
Very interesting storage format too - Those tapes actually held quite a bit of data (comparatively) - around 45MB (Although this one is shorter ~1000ft and probably carries about 10-15MB which is close to V4's source code, binary and documentation size).
What are the odds that a medium like that has successfully stored the full data without error?
From the information I've read, quite likely, given that Utah is pretty dry. Also the original data might be stored in its uncompressed form, so even if there were some non-extensive damage it might still be possible to recover some data based upon guessing with context (if it contains text source code, otherwise if it is just the binaries then not that easy).
For context, I'm a geezer who got early access reading Lions' Commentary (6th Edition) and comparing it with 7th Edition source (running on a PDP-/something with no more than 128 KiB RAM). That was 1985, as Unix was spreading its way through universities. SIGSEGV haunts me to this day.
That was a full 40 years ago. And yet, 4th Edition is ancient history even to me.
Finally we can see the naughty stuff they recorded!
Other posts on this subject, none with discussion:
https://news.ycombinator.com/item?id=45846438
https://news.ycombinator.com/item?id=45844876
https://news.ycombinator.com/item?id=45842643
I've added https://oldbytes.space/@bitsavers/115505135441862982 and https://www.theregister.com/2025/11/07/unix_fourth_edition_t... to the toptext as well. Thanks!
Also this post from Rob Pike with interesting thread of a bit more information about tape recovery https://www.tuhs.org/pipermail/tuhs/2025-November/032758.htm...
Added up there too. Thanks!
https://news.ycombinator.com/item?id=45857695 has discussion
It does now. :-) It didn't when I posted earlier
I wrote that article, but I no longer post my stories to HN because I am subject to a block and anything of my own I submit is flagged [dead]. I do not know why, and I have written to ask with no reply.
I only write 5-10 articles a week so I don't exactly spam the site at high frequency, and if I don't feel I can add more context or insight to a story, I don't write it -- except at the very slowest times of the year, and I wouldn't post those stories here.
Ah well.