If you're going to host user content on subdomains, then you should probably have your site on the Public Suffix List https://publicsuffix.org/list/ .
That should eventually make its way into various services so they know that a tainted subdomain doesn't taint the entire site....
I think it's somewhat tribal webdev knowledge that if you host user generated content you need to be on the PSL otherwise you'll eventually end up where Immich is now.
I'm not sure how people not already having hit this very issue before is supposed to know about it beforehand though, one of those things that you don't really come across until you're hit by it.
In the past, browsers used an algorithm which only denied setting wide-ranging cookies for top-level domains with no dots (e.g. com or org). However, this did not work for top-level domains where only third-level registrations are allowed (e.g. co.uk). In these cases, websites could set a cookie for .co.uk which would be passed onto every website registered under co.uk.
Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list. This is the aim of the Public Suffix List.
(https://publicsuffix.org/learn/)
So, once they realized web browsers are all inherently flawed, their solution was to maintain a static list of websites.
God I hate the web. The engineering equivalent of a car made of duct tape.
Remember, this is a free service that Google is offering for even their competitors to use.
And it is incredibly valuable thing. You might not think it is, but internet is filled utterly dangerous, scammy, phisy, malwary websites and everyday Safe Browsing (via Chrome, Firefox and Safari - yes, Safari uses Safe Browsing) keeps users safe.
If immich didnt follow best practice that's Google's fault? You're showing your naivety, and bias here.
Please point me to where GoDaddy or any other hosting site mentions public suffix, or where Apple or Google or Mozilla have a listing hosting best practices that include avoiding false positives by Safe Browsing…
>You might not think it is, but internet is filled utterly dangerous, scammy, phisy, malwary websites
Google is happy to take their money and show scammy ads. Google ads are the most common vector for fake software support scams. Most people google something like "microsoft support" and end up there. Has Google ever banned their own ad domains?
Google is the last entity I would trust to be neutral here.
I thought this story would be about some malicious PR that convinced their CI to build a page featuring phishing, malware, porn, etc. It looks like Google is simply flagging their legit, self-created Preview builds as being phishing, and banning the entire domain. Getting immich.cloud on the PSL is probably the right thing to do for other reasons, and may decrease the blast radius here.
> Is that actually relevant when only images are user content?
Yes. For instance in circumstances exactly as described in the thread you are commenting in now and the article it refers to.
Services like google's bad site warning system may use it to indicate that it shouldn't consider a whole domain harmful if it considers a small number of its subdomains to be so, where otherwise they would. It is no guarantee, of course.
I think this only is true if you host independent entities. If you simply construct deep names about yourself with demonstrable chain of authority back, I don't think the PSL wants to know. Otherwise there is no hierarchy the dots are just convenience strings and it's a flat namespace the size of the PSLs length.
This is not about user content, but about their own preview environments! Google decided their preview environments were impersonating... Something? And decided to block the entire domain.
> PostgreSQL USER is cursed
> The USER keyword in PostgreSQL is cursed because you can select from it like a table, which leads to confusion if you have a table name user as well.
As it says, bulk inserts with large datasets can fail. Inserting a few thousand rows into a table with 30 columns will hit the limit. You might run into this if you were synchronising data between systems or running big batch jobs.
Sqlite used to have a limit of 999 query parameters, which was much easier to hit. It's now a roomy 32k.
I really don't know how they got nerds to think scummy advertising is cool. If you think about it, the thing they make money on - no user actually wants ads or wants to see them, ever. Somehow Google has some sort of nerd cult that people think its cool to join such an unethical company.
The one thing I never understood about these warnings is how they don't run afoul of libel laws. They are directly calling you a scammer and "attacker". The same for Microsoft with their unknown executables.
They used to be more generic saying "We don't know if its safe" but now they are quite assertive at stating you are indeed an attacker.
This may not be a huge issue depending on mitigating controls but are they saying that anyone can submit a PR (containing anything) to Immich, tag the pr with `preview` and have the contents of that PR hosted on https://pr-<num>.preview.internal.immich.cloud?
Doesn't that effectively let anyone host anything there?
I think only collaborators can add labels on github, so not quite. Does seem a bit hazardous though (you could submit a legit PR, get the label, and then commit whatever you want?).
Exposure also extends not just to the owner of the PR but anyone with write access to the branch from which it was submitted. GitHub pushes are ssh-authenticated and often automated in many workflows.
A friend / client of mine used some kind of WordPress type of hosting service with a simple redirect. The host got on the bad sites list.
This also polluted their own domain, even when the redirect was removed, and had the odd side effect that Google would no longer accept email from them. We requested a review and passed it, but the email blacklist appears to be permanent. (I already checked and there are no spam problems with the domain.)
We registered a new domain. Google’s behaviour here incidentally just incentivises bulk registering throwaway domains, which doesn’t make anything any better.
> There is a user in the JavaScript community who goes around adding "backwards compatibility" to projects. They do this by adding 50 extra package dependencies to your project, which are maintained by them.
Tangential to the flagging issue, but is there any documentation on how Immich is doing the PR site generation feature? That seems pretty cool, and I'd be curious to learn more.
Pretty sure Immich is on github, so I assume they have a workflow for it, but gitlab has first-class support for this which I've been using for years: https://docs.gitlab.com/ci/review_apps/
I’ve heard anecdotes of people using an entirely internal domain like “plex.example.com” even if it’s never exposed to the public internet, google might flag it as impersonating plex. Google will sometimes block it based only on name, if they think the name is impersonating another service.
Its unclear exactly what conditions cause a site to get blocked by safe browsing. My nextcloud.something.tld domain has never been flagged, but I’ve seen support threads of other people having issues and the domain name is the best guess.
I'm almost positive GMail scanning messages is one cause. My domain got put on the list for a URL that would have been unknowable to anyone but GMail and my sister who I invited to a shared Immich album. It was a URL like this that got emailed directly to 1 person:
Then suddenly the domain is banned even though there was never a way to discover that URL besides GMail scanning messages. In my case, the server is public so my siblings can access it, but there's nothing stopping Google from banning domains for internal sites that show up in emails they wrongly classify as phishing.
Think of how Google and Microsoft destroyed self hosted email with their spam filters. Now imagine that happening to all self hosted services via abuse of the safe browsing block lists.
if it was just the domain, remember that there is a Cert Transparency log for all TLS certs issued nowadays by valid CAs, which is probably what Google is also using to discover new active domains
It doesn’t seem like email scanning is necessary to explain this. It appears that simply having a “bad” subdomain can trigger this. Obviously this heuristic isn’t working well, but you can see the naive logic of it: anything with the subdomain “apple” might be trying to impersonate Apple, so let’s flag it. This has happened to me on internal domains on my home network that I've exposed to no one. This also has been reported at the jellyfin project: https://github.com/jellyfin/jellyfin-web/issues/4076
Well, that's potentially horrifying. I would love for someone to attempt this in as controlled of a manner as possible. I would assume it's possible for anyone using Google DNS servers to also trigger some type of metadata inspection resulting in this type of situation as well.
Also - when you say banned, you're speaking of the "red screen of death" right? Not a broader ban from the domain using Google Workplace services, yeah?
Not sure if this is exactly the scenario from the discussed article but it's interesting to understand it nonetheless.
TL;DR the browser regularly downloads a dump of color profile fingerprints of known bad websites. Then when you load whatever website, it calculates the color profile fingerprint of it as well, and looks for matches.
(This could be outdated and there are probably many other signals.)
I had my personal domain I use for self-hosting flagged. I've had the domain for 25 years and it's never had a hint of spam, phishing, or even unintentional issues like compromised sites / services.
It's impossible to know what Google's black box is doing, but, in my case, I suspect my flagging was the result of failing to use a large email provider. I use MXRoute for locally hosted services and network devices because they do a better job of giving me simple, hard limits for sending accounts. That way if anything I have ever gets compromised, the damage in terms of spam will be limited to (ex) 10 messages every 24h.
I invited my sister to a shared Immich album a couple days ago, so I'm guessing that GMail scanned the email notifying her, used the contents + some kind of not-google-or-microsoft sender penalty, and flagged the message as potential spam or phishing. From there, I'd assume the linked domain gets pushed into another system that eventually decides they should blacklist the whole domain.
The thing that really pisses me off is that I just received an email in reply to my request for review and the whole thing is a gas-lighting extravaganza. Google systems indicate your domain no longer contains harmful links or downloads. Keep yourself safe in the future by blah blah blah blah.
Umm. No! It's actually Google's crappy, non-deterministic, careless detection that's flagging my legitimate resources as malicious. Then I have to spend my time running it down and double checking everything before submitting a request to have the false positive mistake on Google's end fixed.
Convince me that Google won't abuse this to make self hosting unbearable.
> I suspect my flagging was the result of failing to use a large email provider.
This seems like the flagging was a result of the same login page detection that the Immich blog post is referencing? What makes you think it's tied to self-hosted email?
I think the other very interesting thing in the reddit thread[0] for this is that if you do well-known-domain.yourdomain.tld then you're likely to get whacked by this too. It makes sense I guess. Lots of people are probably clicking gmail.shady.info and getting phished.
As someone who doesn't like Google and absolutely thinks they need to be broken up, no probably not. Google's algorithms around security are so incompetent and useless that stupidity is far more likely than malice here.
Incompetently or "coincidentally" abusing your monopoly in a way that "happens" to suppress competitors (while whitelisting your own sites) probably won't fly in court. Unless you buy the judge of course.
Intent does not always matter to the law ... and if a C&D is sent, doesn't that imply that intent is subsequently present?
Defamation laws could also apply independently of monopoly laws.
Callous disregard for the wellbeing of others is not stupidity, especially when demonstrated by a company ostensibly full of very intelligent people. This behavior - in particular, implementing an overly eager mechanism for damaging the reputation of other people - is simply malicious.
If you're going to host user content on subdomains, then you should probably have your site on the Public Suffix List https://publicsuffix.org/list/ . That should eventually make its way into various services so they know that a tainted subdomain doesn't taint the entire site....
I think it's somewhat tribal webdev knowledge that if you host user generated content you need to be on the PSL otherwise you'll eventually end up where Immich is now.
I'm not sure how people not already having hit this very issue before is supposed to know about it beforehand though, one of those things that you don't really come across until you're hit by it.
This is the first time I hear about https://publicsuffix.org
You're in good company! From 12 days ago: https://news.ycombinator.com/item?id=45538760
I’ve been doing this for at least 15 years and it’s the first I heard of this.
Fun learning new things so often but I never once heard of the public suffix list.
That said, I do know the other best practices mentioned elsewhere
so its skill issue ??? or just google being bad????
I will go with Google being bad / evil for 500.
Google 90s to 2010 is nothings like Google 2025. There is a reason they removed "Don't be evil" ... being evil and authoritarian makes more money.
God I hate the web. The engineering equivalent of a car made of duct tape.
"The engineering equivalent of a car made of duct tape"
Kind of. But do you have a better proposition?
They aren't hosting user content; it was their pull request preview domains that was triggering it.
This is very clearly just bad code from Google.
The root cause is bad behaviour by google. This is merely a workaround.
Remember, this is a free service that Google is offering for even their competitors to use.
And it is incredibly valuable thing. You might not think it is, but internet is filled utterly dangerous, scammy, phisy, malwary websites and everyday Safe Browsing (via Chrome, Firefox and Safari - yes, Safari uses Safe Browsing) keeps users safe.
If immich didnt follow best practice that's Google's fault? You're showing your naivety, and bias here.
Please point me to where GoDaddy or any other hosting site mentions public suffix, or where Apple or Google or Mozilla have a listing hosting best practices that include avoiding false positives by Safe Browsing…
>You might not think it is, but internet is filled utterly dangerous, scammy, phisy, malwary websites
Google is happy to take their money and show scammy ads. Google ads are the most common vector for fake software support scams. Most people google something like "microsoft support" and end up there. Has Google ever banned their own ad domains?
Google is the last entity I would trust to be neutral here.
Oh c’mon. Google does not offer free services. Everyone should know that by now.
I thought this story would be about some malicious PR that convinced their CI to build a page featuring phishing, malware, porn, etc. It looks like Google is simply flagging their legit, self-created Preview builds as being phishing, and banning the entire domain. Getting immich.cloud on the PSL is probably the right thing to do for other reasons, and may decrease the blast radius here.
Is that actually relevant when only images are user content?
Normally I see the PSL in context of e.g. cookies or user-supplied forms.
> Is that actually relevant when only images are user content?
Yes. For instance in circumstances exactly as described in the thread you are commenting in now and the article it refers to.
Services like google's bad site warning system may use it to indicate that it shouldn't consider a whole domain harmful if it considers a small number of its subdomains to be so, where otherwise they would. It is no guarantee, of course.
I think this only is true if you host independent entities. If you simply construct deep names about yourself with demonstrable chain of authority back, I don't think the PSL wants to know. Otherwise there is no hierarchy the dots are just convenience strings and it's a flat namespace the size of the PSLs length.
Does Google use this for Safe Browsing though?
Looks like it? https://developers.google.com/safe-browsing/reference/URLs.a...
This is not about user content, but about their own preview environments! Google decided their preview environments were impersonating... Something? And decided to block the entire domain.
Aw. I saw Jothan Frakes and briefly thought my favorite Starfleet first officer's actor had gotten into writing software later in life.
Be sure to see the team's whole list of Cursed Knowledge. https://immich.app/cursed-knowledge
The Postgres query parameters one is funny. 65k parameters is not enough for you?!
> PostgreSQL USER is cursed > The USER keyword in PostgreSQL is cursed because you can select from it like a table, which leads to confusion if you have a table name user as well.
is even funnier :D
As it says, bulk inserts with large datasets can fail. Inserting a few thousand rows into a table with 30 columns will hit the limit. You might run into this if you were synchronising data between systems or running big batch jobs.
Sqlite used to have a limit of 999 query parameters, which was much easier to hit. It's now a roomy 32k.
Right, for postgres I would use unnest for inserting a non-static amount of rows.
COPY is often a usable alternative.
Insane that one company can dictate what websites you're allowed to visit. Telling you what apps you can run wasn't far enough.
I really don't know how they got nerds to think scummy advertising is cool. If you think about it, the thing they make money on - no user actually wants ads or wants to see them, ever. Somehow Google has some sort of nerd cult that people think its cool to join such an unethical company.
Turns out it's cool to make lots of money
The one thing I never understood about these warnings is how they don't run afoul of libel laws. They are directly calling you a scammer and "attacker". The same for Microsoft with their unknown executables.
They used to be more generic saying "We don't know if its safe" but now they are quite assertive at stating you are indeed an attacker.
This may not be a huge issue depending on mitigating controls but are they saying that anyone can submit a PR (containing anything) to Immich, tag the pr with `preview` and have the contents of that PR hosted on https://pr-<num>.preview.internal.immich.cloud?
Doesn't that effectively let anyone host anything there?
I think only collaborators can add labels on github, so not quite. Does seem a bit hazardous though (you could submit a legit PR, get the label, and then commit whatever you want?).
Exposure also extends not just to the owner of the PR but anyone with write access to the branch from which it was submitted. GitHub pushes are ssh-authenticated and often automated in many workflows.
Excellent idea for cost-free phishing.
There's a reason GitHub use github.io for user content.
A friend / client of mine used some kind of WordPress type of hosting service with a simple redirect. The host got on the bad sites list.
This also polluted their own domain, even when the redirect was removed, and had the odd side effect that Google would no longer accept email from them. We requested a review and passed it, but the email blacklist appears to be permanent. (I already checked and there are no spam problems with the domain.)
We registered a new domain. Google’s behaviour here incidentally just incentivises bulk registering throwaway domains, which doesn’t make anything any better.
Wow. That scares me. I've been using my own domain that got (wrongly) blacklisted this week for 25 years and can't imagine having email impacted.
Them maintaining a page of gotchas is a really cool idea - https://immich.app/cursed-knowledge
> There is a user in the JavaScript community who goes around adding "backwards compatibility" to projects. They do this by adding 50 extra package dependencies to your project, which are maintained by them.
This is a spicy one, would love to know more.
Tangential to the flagging issue, but is there any documentation on how Immich is doing the PR site generation feature? That seems pretty cool, and I'd be curious to learn more.
Pretty sure Immich is on github, so I assume they have a workflow for it, but gitlab has first-class support for this which I've been using for years: https://docs.gitlab.com/ci/review_apps/
It's open source, you can find this trivially yourself in less than a minute.
https://github.com/immich-app/devtools/tree/a9257b33b5fb2d30...
Wow. What a rude way to answer.
If you block those internal subdomains from search with robots.txt, does Google still whine?
I’ve heard anecdotes of people using an entirely internal domain like “plex.example.com” even if it’s never exposed to the public internet, google might flag it as impersonating plex. Google will sometimes block it based only on name, if they think the name is impersonating another service.
Its unclear exactly what conditions cause a site to get blocked by safe browsing. My nextcloud.something.tld domain has never been flagged, but I’ve seen support threads of other people having issues and the domain name is the best guess.
I'm almost positive GMail scanning messages is one cause. My domain got put on the list for a URL that would have been unknowable to anyone but GMail and my sister who I invited to a shared Immich album. It was a URL like this that got emailed directly to 1 person:
https://photos.example.com/albums/xxxxxxxx-xxxx-xxxx-xxxx-xx...
Then suddenly the domain is banned even though there was never a way to discover that URL besides GMail scanning messages. In my case, the server is public so my siblings can access it, but there's nothing stopping Google from banning domains for internal sites that show up in emails they wrongly classify as phishing.
Think of how Google and Microsoft destroyed self hosted email with their spam filters. Now imagine that happening to all self hosted services via abuse of the safe browsing block lists.
if it was just the domain, remember that there is a Cert Transparency log for all TLS certs issued nowadays by valid CAs, which is probably what Google is also using to discover new active domains
It doesn’t seem like email scanning is necessary to explain this. It appears that simply having a “bad” subdomain can trigger this. Obviously this heuristic isn’t working well, but you can see the naive logic of it: anything with the subdomain “apple” might be trying to impersonate Apple, so let’s flag it. This has happened to me on internal domains on my home network that I've exposed to no one. This also has been reported at the jellyfin project: https://github.com/jellyfin/jellyfin-web/issues/4076
Well, that's potentially horrifying. I would love for someone to attempt this in as controlled of a manner as possible. I would assume it's possible for anyone using Google DNS servers to also trigger some type of metadata inspection resulting in this type of situation as well.
Also - when you say banned, you're speaking of the "red screen of death" right? Not a broader ban from the domain using Google Workplace services, yeah?
Chrome sends visited urls to Google (ymmv depending on settings and consents you have given)
This seems related to another hosting site that got caught out by this recently:
https://news.ycombinator.com/item?id=45538760
Not quite the same (other than being an abuse of the same monopoly) since this one is explicitly pointing to first-party content, not user content.
Regarding how Google safe browsing actually works under the hood, here is a good writeup from Chromium team:
https://blog.chromium.org/2021/07/m92-faster-and-more-effici...
Not sure if this is exactly the scenario from the discussed article but it's interesting to understand it nonetheless.
TL;DR the browser regularly downloads a dump of color profile fingerprints of known bad websites. Then when you load whatever website, it calculates the color profile fingerprint of it as well, and looks for matches.
(This could be outdated and there are probably many other signals.)
Is there any linkage to the semifactoid that immich Web gui looks very like Google Photos or is that just one of the coincidences?
Not a coincidence, Immich was started as a personal replacement for Google Photos.
I tried to submit this, but the direct link here is probably better than the Reddit thread I linked to:
https://old.reddit.com/r/immich/comments/1oby8fq/immich_is_a...
I had my personal domain I use for self-hosting flagged. I've had the domain for 25 years and it's never had a hint of spam, phishing, or even unintentional issues like compromised sites / services.
It's impossible to know what Google's black box is doing, but, in my case, I suspect my flagging was the result of failing to use a large email provider. I use MXRoute for locally hosted services and network devices because they do a better job of giving me simple, hard limits for sending accounts. That way if anything I have ever gets compromised, the damage in terms of spam will be limited to (ex) 10 messages every 24h.
I invited my sister to a shared Immich album a couple days ago, so I'm guessing that GMail scanned the email notifying her, used the contents + some kind of not-google-or-microsoft sender penalty, and flagged the message as potential spam or phishing. From there, I'd assume the linked domain gets pushed into another system that eventually decides they should blacklist the whole domain.
The thing that really pisses me off is that I just received an email in reply to my request for review and the whole thing is a gas-lighting extravaganza. Google systems indicate your domain no longer contains harmful links or downloads. Keep yourself safe in the future by blah blah blah blah.
Umm. No! It's actually Google's crappy, non-deterministic, careless detection that's flagging my legitimate resources as malicious. Then I have to spend my time running it down and double checking everything before submitting a request to have the false positive mistake on Google's end fixed.
Convince me that Google won't abuse this to make self hosting unbearable.
> I suspect my flagging was the result of failing to use a large email provider.
This seems like the flagging was a result of the same login page detection that the Immich blog post is referencing? What makes you think it's tied to self-hosted email?
Wonder if there would be any way to redress this in small claims court.
google: we make going to the DMV look delightful by comparison!
They are not the government and should not have this vast, unaccountable monopoly power with no accountability and no customer service.
the government probably shouldn't either?
I think the other very interesting thing in the reddit thread[0] for this is that if you do well-known-domain.yourdomain.tld then you're likely to get whacked by this too. It makes sense I guess. Lots of people are probably clicking gmail.shady.info and getting phished.
0: https://old.reddit.com/r/immich/comments/1oby8fq/immich_is_a...
So we can't use photos or immich or images or pics as a sub-domain, but anything nondescript will be considered obfuscated and malicious. Awesome!
[dead]
[flagged]
As someone who doesn't like Google and absolutely thinks they need to be broken up, no probably not. Google's algorithms around security are so incompetent and useless that stupidity is far more likely than malice here.
Incompetently or "coincidentally" abusing your monopoly in a way that "happens" to suppress competitors (while whitelisting your own sites) probably won't fly in court. Unless you buy the judge of course.
Intent does not always matter to the law ... and if a C&D is sent, doesn't that imply that intent is subsequently present?
Defamation laws could also apply independently of monopoly laws.
Callous disregard for the wellbeing of others is not stupidity, especially when demonstrated by a company ostensibly full of very intelligent people. This behavior - in particular, implementing an overly eager mechanism for damaging the reputation of other people - is simply malicious.