Facebook, Instagram, WhatsApp Suffer Widespread OutageSocial Media Giant Confirms Incident via Twitter; Analysis Suggests DNS Issue
Update (Oct. 8): The site DownDetector.com began tracking additional outages on Friday, beginning around 3 p.m. Eastern. It does not appear as widespread as the six-hour incident from Oct. 4, with reports subsiding by 4:30 p.m. ISMG's team was able to access Facebook via app and browser Friday. The platform acknowledged the outage on its Twitter page.
Update (Oct. 4): As of 5:57 p.m. Eastern Time on Monday, Facebook service had been restored.
Social media giant Facebook experienced a global outage on Monday that also involved its properties - including Instagram, Messenger and WhatsApp, according to outage tracker DownDetector.com.
According to Cisco's internet analysis division, ThousandEyes, the tech giant experienced a Domain Name System issue - a service which enables readable domains to connect to numeric IP addresses. The incident reportedly hindered access to Facebook's tools and apps.
Late on Monday, Facebook's Santosh Janardhan, who is vice president of infrastructure, apologized for the problems via a short blog post. The source was a configuration change to backbone routers that coordinate network traffic between its data centers.
That change then "impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem," Janardhan writes.
"This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt," Janardhan writes.
No user data was at risk, Janardhan writes. "We apologize to all those affected, and we're working to understand more about what happened today so we can continue to make our infrastructure more resilient," he writes.
Beginning at around 11 a.m. Eastern on Monday, the outage site began displaying user-submitted problem reports that soon snowballed to tens of thousands of users, and then around 125,000 users by noon Eastern. By around 3:00 p.m. Eastern, that number had subsided, with some 34,000 users reporting issues. The total number of affected users was likely higher than the tracking service's tally, as it collates sources - including those submitted by users.
The Facebook website returned an error message at the time of initial publication.
In a prepared statement shared with Information Security Media Group on Monday afternoon, Facebook CTO Mike Schroepfer said, "Sincere apologies to everyone impacted by outages of Facebook-powered services. We are experiencing networking issues and teams are working as fast as possible to debug and restore."
Dane Knecht, vice president of the web infrastructure and security firm Cloudflare, took to Twitter midday Monday, suggesting that Facebook's Border Gateway Protocol, or BGP, routes "have been withdrawn from the internet."
With a malfunction/misconfiguration of the BGP system - which enables the internet to exchange routing information between systems - Facebook's DNS servers would remain inaccessible.
A Reddit user who claims to be a Facebook employee aware of the outage and recovery suggested that the incident likely stems from a configuration change. The fix, the user suggests, would then come from technicians with physical router access, according to the user, as reported by Ars Technica.
Facebook Takes to Twitter
Following the outage, Facebook spokesman Andy Stone took to competitor platform Twitter, saying: "We're aware that some people are having trouble accessing our apps and products. We're working to get things back to normal as quickly as possible, and we apologize for any inconvenience."
The same message was posted to Facebook's main Twitter profile. Similar messages appeared on WhatsApp and Instagram's Twitter handles.
Twitter users were quick to bemoan the Facebook outage, using the now-trending hashtag #facebookdown.
Bill Lawrence, a former cybersecurity instructor at the U.S. Naval Academy and currently CISO with the firm SecurityGate, says of the incident, "Outages like this show that, for all that was learned since the DDoS attack on Dyn in October of 2016, five years later the internet remains fragile when services like DNS get interrupted for some reason."
Jake Williams, formerly of the National Security Agency's elite hacking team and currently CTO at BreachQuest, says, "The Facebook outages are certainly BGP-related. What we don't yet know is what happened. … [But] the fact that Facebook hasn't corrected the issue yet is odd."
Third parties that rely on Facebook credentials, including games such as Match Masters, also experienced issues. Match Masters took to Twitter on Monday to write: "Hold on tight! If [your] game isn't running as usual please note that there's been an issue with Facebook login servers and the moment this gets fixed all will be back to normal!"
Facebook suffered similar outages in March and July, news wire service Reuters previously reported.
John Bambenek, principal threat hunter at the firm Netenrich, adds that aging internet protocols "were not designed with the scale of the internet as it exists today," and thus are "very susceptible to human error," which has been speculated here.
Bambenek notes, "This problem will [likely] get worse as these protocols are taken for granted."
Facebook shares fell more than 5% on Monday, making it one of the platform's worst days in nearly a year.
The outage comes just one day after "60 Minutes" aired a segment featuring Facebook whistleblower, Frances Haugen, who referenced the company's internal research and its alleged knowledge of, and alleged inaction around, hateful and/or violent content and misinformation shared across the platform.
In July, a similar outage struck the content delivery network supplier Akamai, which found several corporate websites - including Delta Airlines, Amazon Web Services and AT&T - temporarily knocked offline (see: Resiliency Is Key to Surviving a CDN Outage).
At the time, the company said its rollout of a new software configuration for its Edge DNS service triggered a bug in the DNS system, which caused a disruption affecting the availability of some customers' websites. After about an hour, the company resumed normal operations.