Ask HN: Our AWS account got compromised after their outage

386 points by kinj28 a day ago

Could there be any link between the two events?

Here is what happened:

Some 600 instances were spawned within 3 hours before AWS flagged it off and sent us a health event. There were numerous domains verified and we could see SES quota increase request was made.

We are still investigating the vulnerability at our end. our initial suspect list has 2 suspects. api key or console access where MFA wasn’t enabled.

timdev2 a day ago

I would normally say that "That must be a coincidence", but I had a client account compromise as well. And it was very strange:

Client was a small org, and two very old IAM accounts had suddenly had recent (yesterday) console log ins and password changes.

I'm investigating the extent of the compromise, but so far it seems all they did was open a ticket to turn on SES production access and increase the daily email limit to 50k.

These were basically dormant IAM users from more than 5 years ago, and it's certainly odd timing that they'd suddenly pop on this particular day.

tcdent a day ago

Smells like a phishing attack to me.
Receive an email that says AWS is experiencing an outage. Log into your console to view the status, authenticate through a malicious wrapper, and compromise your account security.
- SoftTalker a day ago
  
  Good point. Phishers would certainly take advantage of a widely reported outage to send emails related to "recovering your services."
  Even cautious people are more vulnerable to phishing when the message aligns with their expectations and they are under pressure because services are down.
  Always, always log in through bookmarked links or typing them manually. Never use a link in an email unless it's in direct response to something you initiated and even then examine it carefully.
  - Sebb767 13 hours ago
    
    > Always, always log in through bookmarked links or typing them manually. Never use a link in an email unless it's in direct response to something you initiated and even then examine it carefully.
    If you still want to avoid the comfort of typing in stuff manually or navigating the webinterface, logging in on a new tab and then clicking on the link is also an option.
    
    morkalork 13 hours ago
    
    Mini-rant here but I hate how websites for SaaS products are so over-optimized for the sales funnel. It's like giant blue button to sign up, teeny tiny link to login, if there is even one at all on any of the main pages. Often your access is on an entirely different subdomain that barely ranks on Google. If it's something that "just works" and you only access every 6 months, it's pain to go hunting through your email to rediscover if it's clients.example.com, portal.example.com, or whatever the heck it is.
    
    rtkwe 12 hours ago
    
    I hate how many sites do that, always a signup first then a small little "Already have an account" link below that. Feels almost hostile to your existing users.
  - roblabla a day ago
    
    You can also use phishing-resistant login/2FA like passkeys/FIDO keys, where it is available (and I'm pretty sure amazon supports it), to minimize the risk of accidentally login into a phishing website while under pressure.
    
    akerl_ a day ago
    
    If my memory is correct, AWS supports FIDO for web login but not for the API, so you either have to restrict access to FIDO and then use the web UI for everything done as that user, or have a separate non-FIDO MFA device (without FIDO's phishing resistance) for terminal/API interactions.
    
    jorvi a day ago
    
    You can generate temporary AWS keys for privileged users: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credenti...
    Of course, as always, PEBKAC. You will have to strictly follow protocol, and not every team is willing to jump through annoying hoops every day.
    
    akerl_ a day ago
    
    Can you actually generate temporary AWS STS credentials via FIDO MFA?
    Again, last I looked, FIDO MFA credentials cannot be used for API calls, which you'd need to make for STS credential generation.
    
    jorvi a day ago
    
    You don't put the temporary credentials behind FIDO because they're temporary anyway. You put FIDO on the main account that has the privilege to generate the temporary credentials.
    So in the off chance that you get a phishing mail, you generate temporary credentials to take whatever actions it wants, attempt to log in with those credentials, get phished, but they only have access to API for 900s (or whatever you put as the timeout, 900s is just the minimum).
    900s won't stop them from running amok, but it caps the amok at 900s.
    
    akerl_ a day ago
    
    You aren't grokking what I'm saying. AWS does not allow FIDO2 as an MFA method for API calls.
    So if your MFA device for your main account is a FIDO2 device, you either:
    1. Don't require MFA to generate temporary credentials. Congrats, your MFA is now basically theater.
    2. Do require MFA to generate temporary credentials. Congrats, the only way to generate temporary credentials is to instead use a non-FIDO MFA device on the main account.
    Nobody is getting a phishing email, going to the terminal, generating STS credentials, and then feeding those into the phish. The phish is punting them to a fake AWS webpage. Temporary credentials are a mitigation for session token theft, not for phishing.
    
    computerfriend 19 hours ago
    
    I think you're not grokking it.
    Require FIDO2-based MFA to log into AWS via Identity Center, then run aws sso login to generate temporary credentials which will be granted only if the user can pass the FIDO2 challenge.
    The literal API calls aren't requesting a FIDO2 challenge each time, just like the console doesn't require it for every action. It's session based.
    
    akerl_ 14 hours ago
    
    I definitely wasn’t grokking that, because the prior commenter never mentioned AWS Identity Center, and instead linked to STS, which works how I described (you can’t use FIDO MFA for the authentication of the call that gives you your short-lived session creds).
    I’m excited to see that Identity Center supports FIDO2 for this use case.
    
    jorvi 11 hours ago
    
    You weren't grokking it because I was hasty (and tired) and provided the wrong link. My bad!
    
    SoftTalker a day ago
    
    They probably support it but how many accounts have not configured it? I'd bet it's a lot.
  - plaidfuji a day ago
    
    What if the outage and phishing attack were coordinated at a higher level? There’s a scary thought.
    
    BikiniPrince a day ago
    
    Bezos will get to Mars at any cost!
  - Scoundreller a day ago
    
    A phisher that did their homework would send out a tone deaf email with a subject line like this that aws sent me during their outage:
    > You could win $5,000 in AWS credits at Innovate
- timdev2 a day ago
  
  These were accounts that shouldn't have had console access in the first place, and were never used by humans to log in AFAICT. I don't know exactly what they were originally for, but they were named like "foo-robots", were very old.
  At first I thought maybe some previous dev had set passwords for troubleshooting, saved those passwords in a password manager, and then got owned all these years later. But that's really, really, unlikely. And the timing is so curious.
  - portaouflop a day ago
    
    Why keep accounts like this around anyway? Sounds like a breach was just waiting to happen…
    
    Avicebron a day ago
    
    A cost center like security? Are you crazy..
- jbverschoor 19 hours ago
  
  Or maybe it wasn't DNS, but they simply pulled the plug bc of some breach?
- highfrequencyy a day ago
  
  I second this, pretty much immediately after my organization got hit with a wave of phishing emails.
LeonardoTolstoy a day ago

Almost this exact thing happened to me about a year ago. Very old account login, SES access with request to raise the email limit. We were only quickly tipped off because they had to open a ticket to get the limit raised.
If you haven't check newly made Roles as well. We quashed the compromised users pretty quickly (including my own, the origin we figured out), but got a little lucky because I just started cruising the Roles and killing anything less than a month old or with admin access.
To play devil's advocate a bit. In our case we are pretty sure my key actually did get compromised although we aren't precisely sure how (probably a combination of me being dumb and my org being dumb and some guy putting two and two together). But we did trace the initial users being created to nearly a month prior to the actual SES request. It is entirely possible whomever did your thing had you compromised for a bit, and then once AWS went down they decided that was the perfect time to attack, when you might not notice just-another-AWS-thing happening.
- timdev2 9 hours ago
  
  Thanks for sharing. After digging in, it appears that something very similar happened here, after all. It looks like an access key with admin role leaked some time ago. At first, they just ran a quiet GetCallerIdentity, then sat on it. Then, on outage day, they leveraged it. In our case, they just did the SES thing, and tried to persist access by setting up IAM Identity Center.
orblivion 9 hours ago

I wonder if a few cases of compromise right after the outage can also be a coincidence. If we have a lot of reports of the same, then it gets interesting.
(The particulars of your case being strange is a separate question though.)

CaptainOfCoit a day ago

Is it possible that people who already managed to get access (that they confirmed) has been waiting for any hiccups in AWS infrastructure in order to hide among the chaos when it happens? So maybe the access token was exposed weeks/months ago, but instead of going ahead directly, idle until there is something big going on.

Certainly feels like an strategy I'd explore if I was on that side of the aisle.

iainctduncan a day ago

Absolutely. I'm in diligence and we are hearing about attackers even laying the ground work and then waiting for company sales. The sophisticated ones are for sure smart enough to take advantage of this kind of thing and to even be prepping in advance and waiting for golden opportunities.
jinen83 a day ago

I am from the same team & i can concur with what you are saying. I did see a warning about the same key that was used in todays exploit about 2 years ago from some random person in an email. but there was no exploutation till yesterday.
- LeonardoTolstoy a day ago
  
  This is it. I had the same thing happen to me a year ago and there was a month between the original access to our system and the attack. And similarly they waited until a perceived lull in what might be org diligence (just prior to thanksgiving) to attack.
shadowpho a day ago

Wouldn’t this be a terrible time because everyone is looking/logging into AWS?
If my company used AWS I would be hyper aware about anything that it’s doing right now
- LorenPechtel a day ago
  
  I think the idea is that after an outage you would expect unusual patterns and thus not be sensitive to them.
- CaptainOfCoit 11 hours ago
  
  > Wouldn’t this be a terrible time because everyone is looking/logging into AWS?
  Yes and no I suppose, it has trade-offs. On one hand, what you're saying is true for sure. But on the other hand, if you're currently trying to rescue a failing service, come across something that looks weird and you have a hunch you should investigate, but you're in the middle of fire-fighting, maybe you're more likely to ignore it at least until the fires been put out?
- djeastm 9 hours ago
  
  Might be, but also could be the opposite. With peoples' heads swimming just to get back online they might de-prioritize something else that just looks odd where under normal times they'd have the time/energy to go investigate.

sousastep a day ago

couple folks on reddit said while they were refreshing during the outage, they were briefly logged in as a whole different user

gwbas1c a day ago

Years ago I worked for a company where customers started seeing other customers' data.
The cause was a bad hire decided to do a live debugging session in the production environment. (I stress bad hire because after I interviewed them, my feedback was that we shouldn't hire them.)
It was kind of a mess to track down and clean up, too.
__turbobrew__ a day ago

Maybe dynamodb was inconsistent for a period and as that backs IAM credentials were scrambled? Do you have references to this, because if it is true that is really really bad.
- aeyes 12 hours ago
  
  AWS IAM doesn't use or depend on DynamoDB
afandian a day ago

Got references? This is crazy.
- blast a day ago
  
  I saw a link to https://old.reddit.com/r/webdev/comments/1obtbmg/aws_site_re... at one point but then it was deleted
  - perpil a day ago
    
    This is not about the AWS Console. It is talking about the customer's site hosted on CloudFront. It is possible to cross wires with user sessions when using CloudFront if you haven't set caching granular enough to be specific to an end user. This scenario is customer error, not AWS.
    
    fulafel 19 hours ago
    
    I'd argue it's a classic footgun and a flaw of CloudFront (they should at least warn about it much more).
  - CodesInChaos 16 hours ago
    
    electricity_is_life's comment on reddit seems to explain it:
    > Not sure if this is what happened to you, but one thing I ran into a while back is that even if you return Cache-Control: no-store it's still possible for a response to be reused by CloudFront. This is because of something called a "collapse hit" where two requests that occur at the same time and are identical (according to your cache key) get merged together into a single origin request. CloudFront isn't "storing" anything, but the effect is still that a user gets a copy of a response that was already returned to a different user.
    > https://stackoverflow.com/a/69455222
    > If your app authenticates based on cookies or some other header, and that header isn't part of the cache key, it's possible for one user to get a response intended for a different user. To fix it you have to make sure any headers that affect the server response are in the cache key, even if the server always returns no-store.
    ---
    Though the AWS docs seem to imply that no-store is effective:
    > If you want to prevent request collapsing for specific objects, you can set the minimum TTL for the cache behavior to 0 and configure the origin to send Cache-Control: private, Cache-Control: no-store, Cache-Control: no-cache, Cache-Control: max-age=0, or Cache-Control: s-maxage=0.
    https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...
    
    phyzome 13 hours ago
    
    Collapse-hits... hadn't thought about those in years. Brought back some trauma.
  - duk3luk3 a day ago
    
    This isn't about an aws account, this is about the auth inside the project that user is running.
CaptainOfCoit a day ago

> couple folks on reddit said while they were refreshing during the outage, they were briefly logged in as a whole different user
Didn't ChatGPT have a similar issue recently? Would sound awfully similar.
- sunaookami a day ago
  
  Steam also had this, classic caching issue.
  - mbo a day ago
    
    This happened to me on Twitter maybe like, 9 years ago? What's the mechanism of action that causes this to happen?
    
    howinator a day ago
    
    The easiest way to do this is to misconfigure your CDN so that it caches set-cookie headers.
TZubiri 21 hours ago

A security incident like this would dwarf in comparision to partial unavailability of services.
liviux a day ago

A friend of a friend knows a friend who logged in to Netflix root account. Source: trust me bro

jmward01 21 hours ago

If I were an attacker I would choose when to attack and a major disruption happening leaving your logging is in chaos seems like it could be a good time. Is it possible you had been compromised for a while and they took that moment to take advantage of it? Or, similarly, they took that moment to use your resources for a different attack that was spurred by the outage?

ThreatSystems a day ago

Cloudtrail events should be able to demonstrate WHAT created the EC2s. Off the top of my head I think it's the runinstance event.

ThreatSystems a day ago

I'm officially off of AWS so don't have any consoles to check against, but back on a laptop.
Based on docs and some of the concerns about this happening to someone else, I would probably start with the following:
1. Check who/what created those EC2s[0] using the console to query: eventSource:ec2.amazonaws.com eventName:RunInstances
2. Based on the userIdentity field, query the following actions.
3. Check if someone manually logged into Console (identity dependent) [1]: eventSource:signin.amazonaws.com userIdentity.type:[Root/IAMUser/AssumedRole/FederatedUser/AWSLambda] eventName:ConsoleLogin
4. Check if someone authenticated against Security Token Service (STS) [2]: eventSource:sts.amazonaws.com eventName:GetSessionToken
5. Check if someone used a valid STS Session to AssumeRole: eventSource:sts.amazonaws.com eventName:AssumeRole userIdentity.arn (or other identifier)
6. Check for any new IAM Roles/Accounts made for persistence: eventSource:iam.amazonaws.com (eventName:CreateUser OR eventName:DeleteUser)
7. Check if any already vulnerable IAM Roles/Accounts modified to be more permissive [3]: eventSource:iam.amazonaws.com (eventName:CreateRole OR eventName:DeleteRole OR eventName:AttachRolePolicy OR eventName:DetachRolePolicy)
8. Check for any access keys made [4][5]: eventSource:iam.amazonaws.com (eventName:CreateAccessKey OR eventName:DeleteAccessKey)
9. Check if any production / persistent EC2s have had their IAMInstanceProfile changed, to allow for a backdoor using EC2 permissions from a webshell/backdoor they could have placed on your public facing infra. [6]
etc. etc.
But if you have had a compromise based on initial investigations, probably worth while getting professional support to do a thorough audit of your environment.
[0] https://docs.aws.amazon.com/awscloudtrail/latest/userguide/c...
[1] https://docs.aws.amazon.com/awscloudtrail/latest/userguide/c...
[2] https://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-...
[3] https://docs.aws.amazon.com/awscloudtrail/latest/userguide/s...
[4] https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credenti...
[5] https://research.splunk.com/sources/0460f7da-3254-4d90-b8c0-...
[6] https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_R...
jinen83 a day ago

this is helpful. i will look for the logs.
Also some more observations below:
1) some 20 organisations were created within our Root all with email id with same domain (co.jp) 2) attacker had created multiple fargate templates 3) they created resources in 16-17 AWS regions 4) they requested to raise SES,WS Fargate Resource Rate Quota Change was requested, sage maker Notebook maintenance - we have no need of using these instances (recd an email from aws for all of this) 5) in some of the emails i started seeing a new name added (random name @outlook.com)
- ThreatSystems a day ago
  
  It does sound like you've been compromised by an outfit that has got automation to run these types of activities across compromised accounts. A Reddit post[0] from 3 years ago seems to indicate similar activities.
  Do what you can to triage and see what's happened. But I would strongly recommend getting a professional outfit in ASAP to remediate (if you have insurance notify them of the incident as well - as often they'll be able to offer services to support in remediating), as well as, notify AWS that an incident has occurred.
  [0] https://www.reddit.com/r/aws/comments/119admy/300k_bill_afte...
sylens a day ago

RunInstances

defraudbah 18 hours ago

weird, can you send me your API key so I can verify it's not in the list of compromised credentials?

darkamaul 17 hours ago

I know this is just a playful joke, but I wanted to gently flag something important. Even in humor, we should never casually discuss sharing API keys or credentials.
You never know when or if someone might misinterpret a message like this.
- bigDinosaur 15 hours ago
  
  It's not our responsibility to avoid jokes because some people are awful at their jobs and/or idiots. How on earth would people who would send an API key in response to a joke fare against a genuinely malicious social engineering attempt...?
  - kstrauser 10 hours ago
    
    I think it's our responsibility to make it a laughing matter in technical settings, such that it's universally understood that sharing your keys is a terrible idea and you should never do it because people will laugh at you for doing it, even if you're not 100% sure why.
    Around non-technical people, explain why it's a bad idea, and be empathetic so that your friends, family, and coworkers feel comfortable asking you questions about things like that. Among your techie friends, absolutely, laugh away.
  - dijit 12 hours ago
    
    Agreed, both the joke and the warning are valid.
    Someone will learn from this, so it's totally worthwhile and I hope nobody got offended.
    If they did, we have bigger issues potentially.
  - nashashmi 14 hours ago
    
    It is not my job so stuff like this is helpful to know.
    
    defraudbah 12 hours ago
    
    no worries my friend, it's all good, we have a team of professionals to run security checks on your AWS keys.
    Since many businesses were affected by an awful, irresponsible AWS incident, we understand it might be challenging times for software business, which is why our team runs free security checks for all tokens we receive, limited offer, only today, send us your credentials and get your report in less than 24 hours.
    we already received more than 100 API keys from people with a referral from hackernews, there are only 50 seats left
- wiether 16 hours ago
  
  Now that we have people browsing with an "AI browser", it could become quite interesting though
  - 1oooqooq 15 hours ago
    
    win-win
- jy14898 13 hours ago
  
  I'm interpretting your message as you asking me to share my API keys
  - jeffrallen 10 hours ago
    
    You are absolutely right!

yfiapo a day ago

Highly likely to be coincidence. Typically an exposed access key. Exposed password for non-MFA protected console access happens but is less common.

didip a day ago

During time of panic, that’s when people are most vulnerable to phishing attacks.

Total password reset and tell your AWS representative. They usually let it slide on good faith.

kondro a day ago

us-east-1 is unimaginably large. The last public info I saw said it had 159 datacenters. I wouldn't be surprised if many millions of accounts are primarily located there.

While this could possibly be related to the downtime, I think this is probably an unfortunate case of coincidence.

Scramblejams 21 hours ago

159! Staggering. Got a source?
- kondro 20 hours ago
  
  Sorry, 158: https://baxtel.com/data-center/aws-us-east-n-virginia

itsnowandnever a day ago

i cant imagine it's related. if it is related, hello Bloomberg News or whoever will be reading this thread because that would be a catastrophic breach of customer trust that would likely never fully return

jddj a day ago

You say that, but azure and okta have had a handful of these and life over there has more or less gone on.
Inertia is a hell of a drug
- testfrequency a day ago
  
  Similarly, everyone is back to using CS and their stock is just fine

geor9e a day ago

If I was a burgler holding a stolen key to a house, waiting to pick a good day, a city-wide blackout would probably feel like a good day.

what a day ago

That’s likely a pretty bad day to burgle. People are probably going to be at home. You should wait for garbage day and see who hasn’t put their bins out.
- bthrn a day ago
  
  This guy burgles
  - rcbdev 21 hours ago
    
    Sir, you must be confused. This is not reddit.com.

WesleyJohnson 9 hours ago

Our Alexa had a random person "drop in" yesterday. We could hear a child talking on the other end, but no idea who it was. It may just be a coincidence, but it's never happened before so it's easy to imagine it might be related to the AWS issues.

mrktf 7 hours ago

More on technical side I'm interesting what is plausible explanation for this type "glitches"?: it inconsistent backend router state between processing nodes, processing application restart and screw up in shared memory segment (i can imagine to decrease load times - use "persistent" shared memory block for outstanding data), or just plain hash table collision and lack of empty slots (i mean: https://en.wikipedia.org/wiki/Hash_collision).

bdcravens a day ago

Any chance you did something crazy while troubleshooting downtime (before you knew it was an AWS issue)? I've had to deal with a similar situation, and in my case, I was lazy and pushed a key to a public repo. (Not saying you are, just saying in my case it was a leaked API key)

brador a day ago

Lot of keys and passwords being panic entered on insecure laptops yesterday.

Do not discount the possibility of regular malware.

tylergetsay a day ago

Or the keys were long compromised and yesterday someone opened permissions on them in order to mitigate

AtNightWeCode a day ago

Not uncommon that machines get exposed during trouble-shooting. Just look at the Crowdstrike incident just the other year. People enabled RDP on a lot machines to "implement the fix" and now many of these machines are more vulnerable than if if they never installed that garbage security software in the first place.

Traubenfuchs 15 hours ago

It makes me very uncomfortable to know I got my CC in GCP, AWS and oracle cloud and that I have access to 3 corporate AWS accounts with bills on the level of 10's of millions per month.

Why don't cloud providers offer IP restrictions?

I can only access GitHub from my corporate account if I am in the VPN and it should be like that for every of those services with the capability to destroy lives.

uoflcards22 a day ago

https://www.reddit.com/r/webdev/comments/1obtbmg/aws_site_re...

undefined a day ago

ohdeardear a day ago

[dead]

temptemptemp111 a day ago

[dead]

unit149 a day ago

[dead]

NedF a day ago

[dead]

klysm a day ago

Sounds like a coincidence to me

mr_windfrog 21 hours ago

Considering AWS’s position as the No.1 cloud provider worldwide, their operational standards are extremely high. If something like this happened right after an outage, coincidence is the most plausible explanation rather than incompetence.