Newsletter

The failure of D2R: Data server overload and “Legacy code” | XFastest News

Diablo II still has a certain weight in the hearts of many old players. Recently, “Diablo II: Rebirth of Hellfire” suffered from server disconnection, backtracking, and even inability to connect to play. The main cause of the incident was also PezRadar has set up a long forum for group managers to get a general understanding of the current situation of D2R.

Server problem:

First of all, D2R uses a global database to store the only true source of the progress of all player characters, while in North America, Europe and Asia there are independent regional databases, and players connect mainly through regional databases, and keep better. The ping value and connection stability of, and will regularly update the regional database to the global database.

But in the time zone of Saturday 10/9 (Pacific time), this time should be the 10/10 double ten consecutive holiday in Taiwan. At this time, the server encounters a larger data traffic than when the game was first launched, causing the server to Unable to load and disconnected greatly; and the day before this, D2R has been updated for the enhancement of game creation (door opening).

That is, when the traffic exceeded expectations and the update of D2R caused global disconnection, this is why D2R decided to backtrack the server and return to the previous game version.

The next day and Sunday, the same server overload situation appeared, and hundreds of thousands of game rooms could be created within tens of minutes, which also caused the server to be overloaded again. During this period, the game team first performed repairs for the (offline) backup global database. And optimize, and prepare to switch between the backup global database and the online global database on 10/11,

However, this switching process caused the backup server to perform the wrong backup status, and it was also found that the past “query” was too expensive for server performance. After it was deleted, the qualification check when players joined the game was optimized to reduce the burden on the server. (OS: This is not just copying the previous Code, which caused the server resources to be too much and the connection was disconnected!)

Then 10/12 came again. With as many as 100,000 connections in each region, Blizzard had to focus on the modification and optimization of the core of the game, and invited Blizzard and a team of third-party engineers to help with repairs.

Why are there so many questions:

The point is here. The cause of the D2R server’s failure and the disconnection: legacy code. Because the game was developed to hope to be loyal to the original, a lot of old code is left behind, and of course it cannot meet the behavior of modern online games.

To put it simply, this service needs to handle game creation, joining, update/read/filter the game list, verify the health of the game server, and read the characters from the database, and ensure that the list of characters seen by the player is correct.

And this service can only be run in a single way to ensure that the game data seen by all gamers is up-to-date. However, even if the programming of this function is optimized, but the old legacy code is retained too much, quite a lot of them are still exposed. problem.

In short, the game stores the global database too frequently. Instead, it should be stored in the regional database in advance, and the data will not be stored back in the global database until you need to be unlocked (only saving you to the global database when we need to unlock you).

This can effectively reduce the burden on the server. The architecture and code are currently being modified, but time architecture, testing, and actual updates are required.

Progress loss:

Then, for players who have lost their progress due to backtracking, or who have really played well and all kinds of equipment have been rolled back, I can only say sorry to you….

Currently in progress:

At present, D2R also performs two oriented operations for game connections. The first is Rate limiting. For example, when the game room is recreated within 20 seconds, an error message will pop up saying that it cannot be connected to the game server, which is to further reduce the player. The speed of room creation; Login Queue Creation, that is, in the case of a large number of game players, the login queue is queued, just like what you experience in “World of Warcraft”.

And split the main large service function into small services, which is currently in progress and can be updated, and when this major function is split, the game server can be managed more effectively and the overall burden will be reduced.

Too much pain, too little gain

source: us.forums.blizzard.com

Further reading: