I have used perkeep. I still do at least in theory. I love the concept of it but it’s become… not quite abandonware, but it never gained enough traction to really take on a full life of its own before the primary author moved on. A bit of a tragedy because the basic idea is pretty compelling.
I evaluated it for a home server a few years ago and yeah— compelling in concept, but a system like this lives or dies by the quality of its integrations to other systems, the ability to automatically ingest photos and notes from your phone, or documents from your computer, or your tax returns from Dropbox.
A permanent private data store needs to have straightforward ways to get that data into it, and then search and consume it again once there.
I'm on the same boat. It's well designed, works great, and I really can't get it out of my head as a well-engineered project and great idea.
But it really is nearly abandoned, and outside of the happy-path the primary author uses it for, it's desolate. There is no community around growing its usage, and pull requests have sat around for months before the maintainer replies. Which is fine if that's what the author wants (he's quite busy!), but disappointing to potential adopters. I've looked at using it, but with data types that sit outside the author's use case, and you'd really need to fork it and change code all over the repo to effectively use it. It just never hit the ideal of "store everything" it promises when it has hard-coded data types for indexing and system support.
(and yes, I did look at forking it and creating my own indexer, but some things just aren't meant to be)
I've been similarly half-interested in it for... more than a decade now. The new release (which is what I assume prompted this post) looks pretty impressive (https://github.com/perkeep/perkeep/releases/tag/v0.12).
Why would this need to work with Tailscale? It just needs to be running on a machine in your tailnet to be accessible, what other integration is necessary?
I'm a co-author of tsidp, btw. You don't need tsidp with a Tailscale-native app: you already know the identity of the peer. tsidp is useful for bridging from Tailscale auth to something that's unaware of Tailscale.
I use `tsnet` and `tsidp` heavily to safely expose a bunch of services to my client devices, they've been instrumental for my little self-hosted cloud of services. Thanks for building `tsidp` (and Perkeep!) :).
That not really a surprise, the website and documentation is awful, not really selling the project well. I also get the impression there is not really customization possible, no integration of external stuff, just a monolithic blob, doing something. This kind of software can't succeed easily without an open architecture, or a proper selling documentation of how to utilize it for your own demand.
Indeed, big fan of the idea of Perkeep, and its authors (I learned a lot about writing network code in Go from reading from Brad Fitzpatrick's contributions.)
Where Perkeep uses a super cool blob server design that abstracts the underlying storage, Timelinize keeps things simpler by just using regular files on disk and a sqlite DB for the index (and to contain small text items, so as not to litter your file system).
Perkeep's storage architecture is probably more well thought-out. Timelinize's is still developing, but I think in principle I prefer to keep it simple.
I'm also hoping that, with time, Timelinize will be more accessible to a broader, less-technical audience.
I don't really understand the goal here. It feels like "wouldn't it be nice if instead of organizing a library, we just kept all of the information in a giant unsorted pile of looseleaf paper?"
How is this better then a filesystem with automated replication?
If I take a bunch of photos, those don’t have filenames (or not good ones, and not unique). They just exist. They don’t need a directory or a name.
So how are you supposed to find anything? Sure, I take photos. Most of them aren't needed after they serve their immediate purpose, but I can't be bothered to delete them, or sort or name the ones that do have a longer purpose. But at least they are organized automatically by date. For permanence, OwnCloud archives them for me automatically, from where they get sucked into my regular backups.
Why would I want to toss them all into an even less-organized pile?
[run] search queries over my higher-level objects. e.g. a “recent” directory of recent photos
How, exactly, are those search queries supposed to work? Sure, maybe date is retained in meta-info, but at best he is regaining the functionality he lost by tossing those pictures into a pile. If he is expecting actual image recognition, that could work anyway, without the pile.
It would be nice if we were a bit more in control. At least, it would be nice if we had a reliable backup of all our content. Once we have all our content, it’s then nice to search it, view it, and directly serve it or share it out to others
Sure, and that's exactly what you achieve with OwnCloud (or NextCloud, or whatever).
As for reliable backups, that's a completely different issue, which still has to be solved separately. You have got to periodically copy your data to offline storage, or you don't have real backups.
They don't mean the photos can't have names. They just observe that usually in-camera photos don't have particularly useful names like IMG_4321.JPG, same as all the other IMG_4321.JPGs that your camera has and will produce if it sees enough use.
Also that storage doesn't address a blob (or photo) by its name. But by hash / digest. You are welcome to store photo metadata with the hashes and perhaps even a good name if you care for one, in a database, on web pages, or whatever you use - if that makes it easier for you to retrieve the right photo. Probably you should.
Content object storage and retrieval (cumbersome objects) is then separate from issues of remembering what is what (small data).
Splitting storage from retrieval is a powerful abstraction. You can then build retrieval indexes based on whatever property you desire by indexing it to amortize O(N) over many queries.
Concretely, you could search by metadata (timestamp, geotag, which camera device, etc) or by content (all photos of Joe, or photos with Eiffel tower in the background, or boating at dusk...). For the latter, you just need to process your corpus with a vision language model (VLM) and generate embeddings. Btw, this is not outlandish; there are already photos apps with this such capability if you search a bit online.
This desparately needs to be on the main page to explain what this actually does, and not buried under "Docs", which isn't at all where I would expect to find this kind of thing.
Yeah, I think you perfectly nailed why this is kind of pointless. Better to abstract this thing out into two functions -- file organization and backup, because that second thing is solved easily.
And here I'm still looking for a way, with one click, to create an offline backup of the webpages each of my bookmarks points to. Such that the offline version looks and works exactly like the online version in (say) Google Chrome (e.g. the CTRL+F feature works fine). And such that I can use some key-combo and click a bookmark in my bookmarks manager (in Chrome) to open a webpage from the backup (or the backup can have its own copy of the bookmarks manager... it needs a catalog of some sort or it won't be useful).
I love ArchiveBox but the headless Chromium they use has some annoying "will break randomly and GFL trying to figure out why/how to fix it" problems (like it'll just randomly stop working because the profile is locked except the lock file isn't there and even if you tweak things to make 100% sure the profile lock is removed before and after every archive request, it'll still randomly fail on a locked profile and WHAT THE HELL IS GOING ON?!)
Although, to be fair, running it in Docker seems less fraught and breaks less often (and it's a lot easier to restart when it does break.)
(I've got a pipeline from Instapaper -> {IFTTT -> {Pinboard -> Linkhut, Dropbox, Webhook -> ArchiveBox}} which works well most of the time for archiving random pages. Used to be Pocket until Mozilla decided to be evil.)
I used SingleFile for a while but now I've switched to WebScrapBook because a lot of the pages that I save have the same images. Then I run rdfind to hard link all the identical files and save space.
Anecdotally (not to diminish any bug the parent had), SingleFile is one of my favorite extensions. Been using it for years and it's saved my ass multiple times. Thank you!
Edit: What's the best way to support the project? I'm seeing there's an option through the Mozilla store and through GitHub. Is there's a preference?
I've been using single file for five years and I've never had this issue for what it's worth. I keep a directory called Archives on my Synology that I expose with Copy Party, and I routinely back up web pages and then drop the result into my Copy Party instance for safekeeping.
I would look into what happened with the single file copies you made that didn't work because that is highly unusual.
I have SingleFile configured to post full archives to Karakeep with an HTTP POST; this enables archiving pages from my browser that Karakeep cannot scrape and bookmark due to paywalls or bot protection.
On Firefox, but I still feel the need to reply. You might find it handy, or other readers here might like it. Maybe it's also available for Chrome, I don't know.
I've been using an extension called WebScrapBook to locally save copies of interesting webpages. I use the basic functionality, but it comes with tons of options and settings.
FWIW I've had success with self-hosted [LinkDing](https://github.com/sissbruecker/linkding) and the firefox SingleFile plugin (so it archives what I'm seeing / gets around logins etc). LinkDing also links directly to Internet Archive for any URL.
I happened upon a bit of an unconventional approach to this with Zotero. It’s obviously more focused on academic research but it takes snapshots and works as a more general purpose archive tool really well.
WebRecorder [0] is the best implemention of this that I've tested. It runs as an extension in your browser, intercepting HTTP streams, so as long as you open a page in your browser the data is captured to reproduce it exactly. It outputs WARC files that are (in theory) compatible with the rest of the web archiving ecosystem, and has a WARC explorer interface to browse captured archives.
For pages with dynamic content that can't be trivially reproduced by their HTTP streams— E.G., opening the archive triggers GETs with a mismatched timestamp, even if the file it's looking for is in the WARC under a different URI— There's always SingleFile [1], and Chromium's built-in MHTML Ctrl+S export, which "bake" the content into a static page.
I've worked on and off on my own personal system which leaves the filesystem stuff to filesystems, and focuses on verifying backups/mirrors and recursing into archive formats. Also interested in warning of near-obsolete formats, like my collection of RealAudio files that are hard to decode these days.
Interesting idea. Pretty timely as I recently started working (again) on a concept cross-platform "superapp" and have been trying to think of a decent state/storage sync solution.
I've been using synching for a few years myself and it's been great. Except for when conflicts occur in my org files, which are the primary things I use it to keep synced. PerKeep may make that a complete non-issue, though I'm not 100% certain.
Beyond that though, I'm thinking this would be nice for syncing state for a cross-platform app that features multiple incarnations anywhere being in sync to a decent extent. Just need to create a PerKeep client library for the language it's in (Python).
I think many of us builds the same idea nowadays with many different tools and services. It became the "project car" of tech enthusiasts. But it's complicated and subjective enough that I guess it can not be abstracted down this way. We'd need some common platform, something like Synology was vaguely going for.
There seem to be a lot of folks who'd want this, but are hesitant because of (a) there not being more people using it or (b) there not being more releases.
This is strange in the sense that (a) didn't stop the Linux kernel from becoming more popular - if the tool satisfies the itch, use it, otherwise not. And the lack of releases could be fine if the bugs reported are minor.
Is the tool robust (no data loss)?
What has other folks on here stopped from e.g. writing more importers (if that is the main shortcoming)?
>This is strange in the sense that (a) didn't stop the Linux kernel from becoming more popular
I think this is a strange comparison. "I'm going to use this system to store all my digital stuff, and it's 1991" is altogether different from "I'm going to use this system to store all my digital stuff, and it's 2025".
I feel like there have been a number of attempts in this content addressed space and that nobody has gotten it quite right, not that the underlying idea is unsound.
At first glance, this looks like way too much to trust in the long run. I use git-annex since roughly 10 years to archive files I don't want to loose again. Does everything I want, and is pretty simple for what it gives me. A checksum for every file, replication on a file-basis, does not dictate the underlying filesystem I use. Full syncs are rather slow, but in reality, it doesn't really matter if I have to wait 3 hours or 2 days, just let it run in the background and do its thing.
I was looking for various options to archive my data (photos, documents, code), and have looked at Perkeep since a while, but then started using Git-Annex.
However, I regret this decision. Git-Annex is not usable anymore on my data because the amount of files has grown so much (millions) and Git-Annex is just too slow (it takes minutes up to even hours for some Git operation, and the FS is decently fast). I assume I would not have had those problems with Perkeep.
(Sorry for the shameless self-promotion.) I'm building an app _conceptually similar_, but with an AI on top, so you get a chat/assistant with your personal context. https://github.com/superegodev/superego (Warning: still in alpha.)
"Blob servers" are essentially leverage cloud provider like AWS/Azure/GCP, not sure how this will help making "your data is entirely under your control".
I have used perkeep. I still do at least in theory. I love the concept of it but it’s become… not quite abandonware, but it never gained enough traction to really take on a full life of its own before the primary author moved on. A bit of a tragedy because the basic idea is pretty compelling.
I evaluated it for a home server a few years ago and yeah— compelling in concept, but a system like this lives or dies by the quality of its integrations to other systems, the ability to automatically ingest photos and notes from your phone, or documents from your computer, or your tax returns from Dropbox.
A permanent private data store needs to have straightforward ways to get that data into it, and then search and consume it again once there.
I'm on the same boat. It's well designed, works great, and I really can't get it out of my head as a well-engineered project and great idea.
But it really is nearly abandoned, and outside of the happy-path the primary author uses it for, it's desolate. There is no community around growing its usage, and pull requests have sat around for months before the maintainer replies. Which is fine if that's what the author wants (he's quite busy!), but disappointing to potential adopters. I've looked at using it, but with data types that sit outside the author's use case, and you'd really need to fork it and change code all over the repo to effectively use it. It just never hit the ideal of "store everything" it promises when it has hard-coded data types for indexing and system support.
(and yes, I did look at forking it and creating my own indexer, but some things just aren't meant to be)
> There is no community around growing its usage
I just added support for perkeep in Filestash last week (https://github.com/mickael-kerjean/filestash)
Looks nice, thanks!
I've been similarly half-interested in it for... more than a decade now. The new release (which is what I assume prompted this post) looks pretty impressive (https://github.com/perkeep/perkeep/releases/tag/v0.12).
The quality of code and reputation of the authors is excellent in this new release.
I’ve never looked at it before but this seems pretty solid, definitely worth keeping an eye on or testing.
I immediately thought about how this would be awesome if it worked with tailscale - pretty complimentary tech I think.
Why would this need to work with Tailscale? It just needs to be running on a machine in your tailnet to be accessible, what other integration is necessary?
Primarily using Tailscale for authentication as well, replacing perkeep's other auth methods.
It appears that it does integrate with Tailscale for auth (but not using tsidp via OIDC like I expected): https://perkeep.org/doc/server-config#simplemode
I'm a co-author of tsidp, btw. You don't need tsidp with a Tailscale-native app: you already know the identity of the peer. tsidp is useful for bridging from Tailscale auth to something that's unaware of Tailscale.
I use `tsnet` and `tsidp` heavily to safely expose a bunch of services to my client devices, they've been instrumental for my little self-hosted cloud of services. Thanks for building `tsidp` (and Perkeep!) :).
I think @kamranjon means that, before this tailscale compatible release happened, thought about how cool it be if it worked directly with tailscale.
They released a new version today, the first release in 5 years. It looks like it was more or less dead until September.
Nice. I checked multiple time during last years if the project was dead or not. I would love to use it but it seemed to be rotting away.
That not really a surprise, the website and documentation is awful, not really selling the project well. I also get the impression there is not really customization possible, no integration of external stuff, just a monolithic blob, doing something. This kind of software can't succeed easily without an open architecture, or a proper selling documentation of how to utilize it for your own demand.
Kinda sad, as this looks interesting.
Reminds me of Timelinize https://news.ycombinator.com/item?id=45504973 https://github.com/timelinize/timelinize
Thanks for the mention!
Indeed, big fan of the idea of Perkeep, and its authors (I learned a lot about writing network code in Go from reading from Brad Fitzpatrick's contributions.)
Where Perkeep uses a super cool blob server design that abstracts the underlying storage, Timelinize keeps things simpler by just using regular files on disk and a sqlite DB for the index (and to contain small text items, so as not to litter your file system).
Perkeep's storage architecture is probably more well thought-out. Timelinize's is still developing, but I think in principle I prefer to keep it simple.
I'm also hoping that, with time, Timelinize will be more accessible to a broader, less-technical audience.
I don't really understand the goal here. It feels like "wouldn't it be nice if instead of organizing a library, we just kept all of the information in a giant unsorted pile of looseleaf paper?"
How is this better then a filesystem with automated replication?
The overview is very comprehensive: https://perkeep.org/doc/overview
Consider this example that he gives:
If I take a bunch of photos, those don’t have filenames (or not good ones, and not unique). They just exist. They don’t need a directory or a name.
So how are you supposed to find anything? Sure, I take photos. Most of them aren't needed after they serve their immediate purpose, but I can't be bothered to delete them, or sort or name the ones that do have a longer purpose. But at least they are organized automatically by date. For permanence, OwnCloud archives them for me automatically, from where they get sucked into my regular backups.
Why would I want to toss them all into an even less-organized pile?
[run] search queries over my higher-level objects. e.g. a “recent” directory of recent photos
How, exactly, are those search queries supposed to work? Sure, maybe date is retained in meta-info, but at best he is regaining the functionality he lost by tossing those pictures into a pile. If he is expecting actual image recognition, that could work anyway, without the pile.
It would be nice if we were a bit more in control. At least, it would be nice if we had a reliable backup of all our content. Once we have all our content, it’s then nice to search it, view it, and directly serve it or share it out to others
Sure, and that's exactly what you achieve with OwnCloud (or NextCloud, or whatever).
As for reliable backups, that's a completely different issue, which still has to be solved separately. You have got to periodically copy your data to offline storage, or you don't have real backups.
Seriously, I'm just not seeing it...
> So how are you supposed to find anything?
They don't mean the photos can't have names. They just observe that usually in-camera photos don't have particularly useful names like IMG_4321.JPG, same as all the other IMG_4321.JPGs that your camera has and will produce if it sees enough use.
Also that storage doesn't address a blob (or photo) by its name. But by hash / digest. You are welcome to store photo metadata with the hashes and perhaps even a good name if you care for one, in a database, on web pages, or whatever you use - if that makes it easier for you to retrieve the right photo. Probably you should.
Content object storage and retrieval (cumbersome objects) is then separate from issues of remembering what is what (small data).
Splitting storage from retrieval is a powerful abstraction. You can then build retrieval indexes based on whatever property you desire by indexing it to amortize O(N) over many queries.
Concretely, you could search by metadata (timestamp, geotag, which camera device, etc) or by content (all photos of Joe, or photos with Eiffel tower in the background, or boating at dusk...). For the latter, you just need to process your corpus with a vision language model (VLM) and generate embeddings. Btw, this is not outlandish; there are already photos apps with this such capability if you search a bit online.
> If I take a bunch of photos, those don’t have filenames (or not good ones, and not unique). They just exist. They don’t need a directory or a name.
At least all the photos I take have a date and place attached to them. That is usually all the info I need to find them.
This desparately needs to be on the main page to explain what this actually does, and not buried under "Docs", which isn't at all where I would expect to find this kind of thing.
Seriously. They should just straight up replace the front page with this.
Yeah, I think you perfectly nailed why this is kind of pointless. Better to abstract this thing out into two functions -- file organization and backup, because that second thing is solved easily.
And here I'm still looking for a way, with one click, to create an offline backup of the webpages each of my bookmarks points to. Such that the offline version looks and works exactly like the online version in (say) Google Chrome (e.g. the CTRL+F feature works fine). And such that I can use some key-combo and click a bookmark in my bookmarks manager (in Chrome) to open a webpage from the backup (or the backup can have its own copy of the bookmarks manager... it needs a catalog of some sort or it won't be useful).
Have you tried ArchiveBox https://github.com/ArchiveBox/ArchiveBox ? It's a pretty solid implementation of that pattern.
I love ArchiveBox but the headless Chromium they use has some annoying "will break randomly and GFL trying to figure out why/how to fix it" problems (like it'll just randomly stop working because the profile is locked except the lock file isn't there and even if you tweak things to make 100% sure the profile lock is removed before and after every archive request, it'll still randomly fail on a locked profile and WHAT THE HELL IS GOING ON?!)
Although, to be fair, running it in Docker seems less fraught and breaks less often (and it's a lot easier to restart when it does break.)
(I've got a pipeline from Instapaper -> {IFTTT -> {Pinboard -> Linkhut, Dropbox, Webhook -> ArchiveBox}} which works well most of the time for archiving random pages. Used to be Pocket until Mozilla decided to be evil.)
https://github.com/karakeep-app/karakeep
https://github.com/gildas-lormeau/SingleFile
I used SingleFile for a while but now I've switched to WebScrapBook because a lot of the pages that I save have the same images. Then I run rdfind to hard link all the identical files and save space.
Thanks. I've tried SingleFile. I made some backups using the Chrome Extension. I was unable to open them a couple of years later. So I abandoned it.
Will try karakeep.
Author of SingleFile here. Sorry, this is obviously not normal. Please feel free to report any bugs here https://github.com/gildas-lormeau/SingleFile/issues.
Anecdotally (not to diminish any bug the parent had), SingleFile is one of my favorite extensions. Been using it for years and it's saved my ass multiple times. Thank you!
Edit: What's the best way to support the project? I'm seeing there's an option through the Mozilla store and through GitHub. Is there's a preference?
Thank you also for the kind words! Regardoing support, you can choose whichever method you prefer; it makes no difference to me actually.
I've been using single file for five years and I've never had this issue for what it's worth. I keep a directory called Archives on my Synology that I expose with Copy Party, and I routinely back up web pages and then drop the result into my Copy Party instance for safekeeping.
I would look into what happened with the single file copies you made that didn't work because that is highly unusual.
I have SingleFile configured to post full archives to Karakeep with an HTTP POST; this enables archiving pages from my browser that Karakeep cannot scrape and bookmark due to paywalls or bot protection.
https://docs.karakeep.app/guides/singlefile/
Thanks for mentioning it was about to hack something together myself.
Also works with Linkding
On Firefox, but I still feel the need to reply. You might find it handy, or other readers here might like it. Maybe it's also available for Chrome, I don't know.
I've been using an extension called WebScrapBook to locally save copies of interesting webpages. I use the basic functionality, but it comes with tons of options and settings.
FWIW I've had success with self-hosted [LinkDing](https://github.com/sissbruecker/linkding) and the firefox SingleFile plugin (so it archives what I'm seeing / gets around logins etc). LinkDing also links directly to Internet Archive for any URL.
I happened upon a bit of an unconventional approach to this with Zotero. It’s obviously more focused on academic research but it takes snapshots and works as a more general purpose archive tool really well.
WebRecorder [0] is the best implemention of this that I've tested. It runs as an extension in your browser, intercepting HTTP streams, so as long as you open a page in your browser the data is captured to reproduce it exactly. It outputs WARC files that are (in theory) compatible with the rest of the web archiving ecosystem, and has a WARC explorer interface to browse captured archives.
For pages with dynamic content that can't be trivially reproduced by their HTTP streams— E.G., opening the archive triggers GETs with a mismatched timestamp, even if the file it's looking for is in the WARC under a different URI— There's always SingleFile [1], and Chromium's built-in MHTML Ctrl+S export, which "bake" the content into a static page.
0: https://chromewebstore.google.com/detail/webrecorder-archive...
1: https://github.com/gildas-lormeau/SingleFile
No options?
Previously:
Keep Your Stuff, for Life - https://news.ycombinator.com/item?id=23676350 - June 2020 (109 comments)
Perkeep: personal storage system for life - https://news.ycombinator.com/item?id=18008240 - Sept 2018 (62 comments)
Perkeep – Open-source data modeling, storing, search, sharing and synchronizing - https://news.ycombinator.com/item?id=15928685 - Dec 2017 (105 comments)
they've been around for 8 years and are still in 0.12?!
What's wrong with that? That seems like more than one release per year, and all roughly compatible with each other.
They just released 0.12 today or yesterday (5 years to the day), which is probably a reason the project is on HN.
I wish bradfitz had more time to work on it.
Well good news, he's writing the latest commits
I've worked on and off on my own personal system which leaves the filesystem stuff to filesystems, and focuses on verifying backups/mirrors and recursing into archive formats. Also interested in warning of near-obsolete formats, like my collection of RealAudio files that are hard to decode these days.
Interesting idea. Pretty timely as I recently started working (again) on a concept cross-platform "superapp" and have been trying to think of a decent state/storage sync solution.
I just use synching. Works well. A bit wasteful, but I have many things in syncthing in triplicate. (Phone, laptop, desktop).
I've been using synching for a few years myself and it's been great. Except for when conflicts occur in my org files, which are the primary things I use it to keep synced. PerKeep may make that a complete non-issue, though I'm not 100% certain.
Beyond that though, I'm thinking this would be nice for syncing state for a cross-platform app that features multiple incarnations anywhere being in sync to a decent extent. Just need to create a PerKeep client library for the language it's in (Python).
I think many of us builds the same idea nowadays with many different tools and services. It became the "project car" of tech enthusiasts. But it's complicated and subjective enough that I guess it can not be abstracted down this way. We'd need some common platform, something like Synology was vaguely going for.
First new release in 5 years?
There seem to be a lot of folks who'd want this, but are hesitant because of (a) there not being more people using it or (b) there not being more releases.
This is strange in the sense that (a) didn't stop the Linux kernel from becoming more popular - if the tool satisfies the itch, use it, otherwise not. And the lack of releases could be fine if the bugs reported are minor.
Is the tool robust (no data loss)?
What has other folks on here stopped from e.g. writing more importers (if that is the main shortcoming)?
edit: typo corrected
>This is strange in the sense that (a) didn't stop the Linux kernel from becoming more popular
I think this is a strange comparison. "I'm going to use this system to store all my digital stuff, and it's 1991" is altogether different from "I'm going to use this system to store all my digital stuff, and it's 2025".
I feel like there have been a number of attempts in this content addressed space and that nobody has gotten it quite right, not that the underlying idea is unsound.
At first glance, this looks like way too much to trust in the long run. I use git-annex since roughly 10 years to archive files I don't want to loose again. Does everything I want, and is pretty simple for what it gives me. A checksum for every file, replication on a file-basis, does not dictate the underlying filesystem I use. Full syncs are rather slow, but in reality, it doesn't really matter if I have to wait 3 hours or 2 days, just let it run in the background and do its thing.
I was looking for various options to archive my data (photos, documents, code), and have looked at Perkeep since a while, but then started using Git-Annex.
However, I regret this decision. Git-Annex is not usable anymore on my data because the amount of files has grown so much (millions) and Git-Annex is just too slow (it takes minutes up to even hours for some Git operation, and the FS is decently fast). I assume I would not have had those problems with Perkeep.
Do you backup your .gitt artefacts? Is it even optimal? Sounds like interesting idea.
I like this... right now I'm using a ras pi 3 or 4 as a file server and it seems to mostly work?
What kind of storage are you using? (SSD, compact flash etc.)
Can it be used with AI to create your personal context?
(Sorry for the shameless self-promotion.) I'm building an app _conceptually similar_, but with an AI on top, so you get a chat/assistant with your personal context. https://github.com/superegodev/superego (Warning: still in alpha.)
This looks fantastic!
I've been thinking about building a similar application for a while now, and you gave me some great ideas.
Will try it out today.
like nas?
"Blob servers" are essentially leverage cloud provider like AWS/Azure/GCP, not sure how this will help making "your data is entirely under your control".
[dead]