Fortnite developers, Epic Games has attributed last weekend’s outtage to severe overload of their servers, which caused recurring interruptions for players. It was reported that the game hit a peak of 3.4 million concurrent users which mainly led to the disruption – definitely a problem other companies would kill to have. Official announcement and description which was released on the dev’s website read (in case you’re into these things) :

The extreme load caused 6 different incidents between Saturday and Sunday, with a mix of partial and total service disruptions to Fortnite.

“… Fortnite has a service called MCP (remember the Tron nemesis?) which players contact in order to retrieve game profiles, statistics, items, matchmaking info and more. It’s backed by several sets of databases used to persistently store this data. The Fortnite game service is our largest database to date.

The primary MCP database is comprised of 9 MongoDB shards, where each shard has a writer, two read replicas, and a hidden replica for redundancy. At a high level user specific data is spread across 8 shards, whereas the remaining shard contains matchmaking sessions, shared service caches, and runtime configuration data.

The MCP is architected such that each service has a db connection pool to a sidecar process that in turn maintains a connection pool to all of our shards. At peak the MCP handles 124k client requests per second, which translates to 318k database reads and 132k database writes per second with a sub 10ms average database response time. Of that, matchmaking requests account for roughly 15% of all db queries and 11% of all writes across a single shard. In addition our current matchmaking implementation requires data to be in a single collection.

At peak we see an issue where the matchmaking shard begins queuing writes waiting on available writer resources. This can cause db update times to spike in the 40k+ ms range per operation causing MCP threads to block. Players experience unusually long wait times not just attempting to matchmake, but with all operations.  We have investigated this in detail and it is currently unclear to us and support why our writes are being queued in this way but we are working towards a root cause.

This issue does not recover and the db process soon becomes unresponsive, at which point we need to perform a manual primary failover in order to restore functionality. During these outages this procedure was being repeated multiple times per hour. Each failover causing a brief window of matchmaking instability followed by recovery.”

TL;DR – Our system could not handle so many players at once.

However, Epic did not disclose how many percent of these players logged-in were playing the free-to-play version of Fortnite, PUBG rival, Battle Royale mode and whether they purchased the the full game. Regardless, expect improvements on Epic Games’ end as they sought to put full focus on the game having shut down its MOBA, Paragon, and moving the team to work on Fortnite.


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s