Amazon Web Services, one of the biggest suppliers of cloud-based infrastructure, announced a major outage that affected several major websites on Monday.
Within hours, several websites were brought online again; therefore, Downdetector indicated another rise in user reports of Amazon, AWS, and Alexa outages at noon ET.
The most recent update of the company at 6:53 p.m. ET reported that “all AWS services returned to normal operations” shortly after 6 p.m. ET.
AWS said there are still messages in the backlog queue that will be processed in the next few hours on some of its services.
In a note, the company said that “We will share a detailed AWS post-event summary.”
The release was made after outages and delays continued into Monday afternoon, with the company noting “increased error rates” among customers attempting to spin new instances on EC2, the popular cloud offering of Amazon that offers capacity of virtual servers.
The company wrote, “We are working to fully restore service as quickly as possible.”
At approximately 1:30 p.m. ET, AWS reported that it was experiencing “early signs” of EC2 recovery in certain areas and that it was implementing corrective measures in the rest of the areas, “at which point we expect launch errors and network connectivity issues to subside.”
Amazon also ensured that the outage affected Amazon.com, certain of its subsidiaries, and AWS customer support services.
It was initially reported at 3:11 a.m. ET in the primary US-East-1 zone, located in northern Virginia, in AWS.
A message posted on the status page of AWS stated that it had DNS issues with its database service DynamoDB, which forms the foundation of most other AWS applications.
DNS, or Domain Name System, converts names of websites into IP addresses to enable browsers and any other applications to load.
AWS attributed an “operational issue” that impacted several of its services, and stated that it was “working on multiple parallel paths to accelerate recovery,” in an update at 5:01 a.m. ET. It affected more than 70 of its own services.
At 6.35 a.m. ET AWS issued an update stating that it had implemented “fully mitigated” on the DNS issue and service operations at AWS were “succeeding normally.”
According to Synergy Research Group, AWS is the largest cloud infrastructure technology provider with approximately a third of the market share compared to Microsoft and Google.
Millions of businesses and organizations use AWS as a cloud-computing service provider, including servers and storage.
Several companies were affected by the outage
Downdetector indicated that the user complaints of issues with websites such as Disney+, Lyft, the McDonald’s app, The New York Times, Reddit, Ring doorbells, Robinhood, Snapchat, United Airlines, T-Mobile, and Venmo.
According to Downdetector, British government websites Gov.uk and HM Revenue and Customs were also facing the issues.
A government spokesperson told CNBC: “We are aware of an incident affecting Amazon Web Services, and several online services which rely on their infrastructure. Through our established incident response arrangements, we are in contact with the company, who are working to restore services as quickly as possible.”
Lloyds Banking Group confirmed that the services of the organization had been impaired and requested its customers “to bear with us” as the bank strived to revive its services. It reported approximately 20 minutes later; the services were restored.
Important tools within Amazon were also affected. Employees on warehouse and delivery routes, as well as Flex service drivers working with Amazon, reported on Reddit that internal systems were down at many locations.
Some warehouse workers were told to wait in break rooms and loading bays during their shifts. However, they could not retrieve the Amazon Anytime Pay application, which allows workers to access part of their paycheck instantly.
The outage also brought to a halt Seller Central, which third-party sellers in the Amazon platform use to operate their businesses.
A spokesperson stated CNBC that Reddit “working on scaling Reddit back to 100 percent as we speak.”
In some cases, customers of the United and Delta Air Lines on social media platforms complained that they were unable to locate their reservations, check in, or drop off bags.
According to a T-Mobile spokesperson, its customers were having problems in accessing other sites or services because of the AWS disruption, but there “was no outage or service disruption” at the carrier.
Canvas, an online learning platform to serve course materials and submit work, also reported it was affected by the “ongoing AWS incident.”
Other users of social media mentioned disruption in cloud-based games such as Roblox and Fortnite, with crypto exchange Coinbase reporting that a large number of users could not access the service as a result of the outage.
Canva, a graphic design tool, reported that it was “experiencing significantly increased error rates, which are impacting functionality on Canva. There is a major issue with our underlying cloud provider.”
Perplexity was also a generative artificial intelligence search tool that was affected. In a post on X (formerly known as Twitter), CEO Aravind Srinivas said, “The root cause is an AWS issue. We’re working on resolving it.”
This is not the first instance where large firms have been hit by a technical problem in the recent past.
A buggy software update by the cybersecurity company CrowdStrike exposed the vulnerability of the world’s technology infrastructure when it caused millions of dollars’ worth of havoc by triggering a blackout of the Microsoft Windows system and grounding thousands of flights in the process in late July 2024. However, the hospitals and banks were also put under its impact.
There have been other outages that occurred in AWS in recent years. A disruption took many websites offline in 2023, but a more critical outage in 2021 impacted websites and services worldwide, including delivery operations at Amazon, which was briefly shut down.
Largely, Amazon, Microsoft, and Google have competed to capture enterprise customers.
Following a Microsoft lawsuit on its collection of productivity software in mid-May of this year, Google tried to capitalize on the service failure by offering its own software and a business continuity plan that would run its Workspace service to the same Microsoft 365.
Google, in a blog, wrote last week that, “Just because Microsoft 365 goes down — and it’s a question of when and for how long, not if — doesn’t mean that your teams need to go back to using pen and paper.”
Google Cloud Services experienced a long downturn that affected many large service providers, such as OpenAI and Shopify, in June. The company claimed that the outage came about due to the various layers of defective new updates.
The AWS outage on Monday does not seem to be related to a cyberattack, but rather a “technical fault affecting one of Amazon’s main data centres,” said Chief Digital Officer of cybersecurity company NymVPN, in a statement.
He stated that “These issues can happen when systems become overloaded or a key part of the network goes down, and because so many websites and apps rely on AWS, the impact spreads quickly.”
A spokesperson of Amazon referred to the service health dashboard as AWS when interviewed.
In a statement, IT professor at the University of Notre Dame’s Mendoza College of Business and former computer scientist with the National Security Agency, Mike Chapple, said, “DynamoDB isn’t a term that most consumers know. However, it is one of the record-keepers of the modern Internet.”
He reported that “We’ll learn more in the hours and days ahead, but early reports indicate that this wasn’t actually a problem with the database itself. The data appears to be safe. Instead, something went wrong with the records that tell other systems where to find their data.”
He further added that “This episode serves as a reminder of how dependent the world is on a handful of major cloud service providers: Amazon, Microsoft, and Google. When a major cloud provider sneezes, the Internet catches a cold.”