The company is working to restore service as users report problems

The AWS outage shows how fragile our infrastructure is, says David Kennedy of TrustedSec

Amazon Web Services, a leader in the cloud infrastructure market, reported a major outage on Monday that brought down numerous large websites.

However, many websites were back online within a few hours Down detector showed another increase in user reports of outages at Amazon, AWS and Alexa around midday ET.

The company’s latest update at 5:48 p.m. ET noted a recovery in parts of EC2, which is very popular Cloud service provides virtual server capacity.

“There is a backlog of analytics and reporting data that we need to process and we expect to clear the backlog in the next two hours,” the company said in a statement.

Amazon pointed to “increased error rates” for customers when trying to launch new instances in EC2 in a blog post Monday afternoon.

“We are working to fully restore service as quickly as possible,” the company wrote at the time.

At around 1:30 p.m. ET, AWS said it was seeing “early signs” of EC2 recovery in some regions and was making corrections in the remaining areas. “At this point, we expect startup failures and network connectivity issues to subside.”

Amazon also confirmed that the outage impacted Amazon.com, some of its subsidiaries and AWS customer service.

The outage was first reported at 3:11 a.m. ET in AWS’s main US-East-1 region in northern Virginia. A note on AWS’s status page said it was experiencing DNS issues with DynamoDB, the database service that powers many other AWS applications.

DNS (Domain Name System) translates website names into IP addresses so that browsers and other applications can load.

AWS cited an “operational issue” affecting multiple services in an update at 5:01 a.m. ET and said it was “working on multiple parallel paths to speed recovery.” More than 70 of our own services were affected.

AWS said in an update at 6:35 a.m. ET that the DNS issue had been “fully resolved” and that AWS service operations were “normally successful.”

AWS is the leading provider of cloud infrastructure technology, accounting for around a third of the market, ahead of Microsoft and Google Synergy Research Group. Millions of companies and organizations rely on AWS for cloud computing services such as servers and storage.

Large companies are affected

Downdetector displayed user reports indicating problems on websites including Disney+, LyftThe McDonald’s app, The New York Times, Redditdoorbells ring, Robinhood, Snapchat, United AirlinesT-Mobile and Venmo.

According to Downdetector, British government websites Gov.uk and HM Revenue and Customs also experienced problems.

A government spokesperson told CNBC: “We are aware of an incident affecting Amazon Web Services and several online services that rely on its infrastructure. Through our established incident response arrangements, we are in contact with the company as they work to restore services as quickly as possible.”

Lloyds Banking Group confirmed that some of its services had been affected and asked customers to “be patient with us” while it worked to restore those services. About 20 minutes later it said services were back online.

The outage also paralyzed important tools at Amazon. Warehouse and delivery employees and drivers for Amazon’s Flex service, reported on Reddit that internal systems at many locations were offline. Some warehouse workers were ordered to stand by in break rooms and loading areas during their shifts while they were unable to load Amazon’s Anytime Pay app, which gives employees instant access to a portion of their pay.

Seller Central, the hub through which Amazon’s third-party sellers manage their businesses, was also disabled by the outage.

Reddit is also “currently working on reducing Reddit back to 100 percent,” a spokesperson told CNBC.

Some United and Delta Air Lines Customers reported on social media that they could not find their reservations online, Check in or drop off luggage.

A T-Mobile spokesman said its customers had problems using other websites or services due to the AWS outage, but there was “no outage or service interruption” with the carrier.

Canvas, an online teaching platform for hosting course information and submitting assignments, said It was also hit by the “ongoing AWS incident.”

Other social media users reported disruptions to cloud-based games including Roblox and Fortnite, while crypto exchange Coinbase said many users were unable to access the service due to the outage.

Graphic design tool Canva said it is “experiencing significantly elevated error rates that are impacting Canva’s functionality. There is a major issue with our underlying cloud provider.”

The generative artificial intelligence search tool Perplexity was also affected. “The root cause is an AWS issue. We are working to resolve it,” said CEO Aravind Srinivas in a post on X.

Centralized software

It is not the first time in recent history that large companies have been affected by a technical problem. In July 2024 a incorrect software upgrade from cybersecurity firm Crowdstrike has revealed the fragility of the global technology infrastructure as it caused Microsoft Windows systems to fail, causing millions of dollars’ worth of chaos and grounding thousands of flights. Hospitals and banks were also affected.

AWS has also experienced other outages in recent years. A disruption in 2023 Many websites were offline for several hours while a more serious outage occurred in 2021 Websites and services around the world were affected, including some of Amazon’s own delivery operations, which briefly came to a halt.

Amazon, Microsoft and Google have long been fighting to win corporate customers. After a failure Microsoft’s productivity software suite earlier this month, Google tried to capitalize about the service outage by introducing its own tools and business continuity plan that runs its Workspace service alongside Microsoft 365.

In a blog post last week, Google wrote: “Just because Microsoft 365 is down – and it’s a matter of when and how long, not if – doesn’t mean your teams have to go back to pen and paper.”

Google’s cloud services went under in June for an extended period of time, disrupting several major service providers such as OpenAI and Shopify. The company said The outage was caused by multiple levels of faulty recent updates.

Monday’s AWS outage does not appear to have been caused by a cyberattack, but rather is a “technical error affecting one of Amazon’s main data centers,” Rob Jardin, chief digital officer at cybersecurity firm NymVPN, said in a statement.

“These issues can occur when systems become overloaded or a key part of the network fails. Because so many websites and apps rely on AWS, the impact spreads quickly,” he added.

An Amazon spokesperson referred to AWS’s service health dashboard when asked for comment.

In fact, “DynamoDB is not a term that most consumers are familiar with,” Mike Chapple, an IT professor at the University of Notre Dame’s Mendoza College of Business and a former computer scientist at the National Security Agency, said in a statement. However, it is “one of the record holders of the modern Internet”.

“We will learn more in the coming hours and days, but early reports suggest that this was not actually a problem with the database itself. The data appears to be safe. Instead, something went wrong with the records that tell other systems where to find their data,” he added.

“This episode is a reminder of how dependent the world is on a handful of major cloud service providers: Amazon, Microsoft and Google. When one major cloud provider sneezes, the Internet catches a cold.”

— CNBC’s Leslie Josephs and Jennifer Elias contributed to this report.

Clarification: This article has been updated to clarify that there was no service interruption with T-Mobile.

Source link

Spread the love