GameNet’s RGS (Reliable Gaming Service) Module
Introduction:
The RGS module was conceived in an attempt to allow game programmers to easily develop reliable networked games, with advanced features such as Failovers during Server crashes. The RGS module attempts to provide these services through a very simple game programming API, which consists of the functions: Discover(), Register(), Join(), Leave(), Send(), Recv(), Shutdown(). The Application also needs to register an event callback which allows the notification of Join/Exit information.
For advanced feature of Failover support, the Applications needs to register a Failover Event Callback, and a Package State Callback. The Application must also support two properties. First, it must be able to handle client exits without communicating to the server (i.e. client’s must consistently adjust their states when an exit happens without server communication). Second, given all the client states at any given time we should be able to reconstruct the Server state.
The advance feature listed above is available for only client-server type applications. For peer-to-peer games RGS only provides reliable Join/Exit services.
Model & Implementation:
RGS runs over the Reliable Group Membership Protocol (RGMP) JavaGroups/Ensemble. It effectively uses Ensemble’s reliability guarantees to provide the Application with reliable Join/Exit notification. And it also uses Ensemble’s Reliable Transport to provide Reliable Transport to the Application. However it differs from Ensemble in two key ways: its Join/Exit notification latency, and the Server Failover support.
Event Notification:
In JavaGroups, new views are only sent up to the client when a Receive is called on the channel. In a gaming application, if the app does not call receive for some time, it will not get notification about the status of the group for that time period. However, the RGS module has a separate worker thread, which performs the receive operation repeatedly with a certain delay DELTA. This allows two functions to happen. First the we receive view changes quickly, so we can notify clients of Join/Exit events quickly (at least with delay DELTA). Second, the RGS protocol can run internally, transparently to the client. Also, whenever we receive a message which is not an RGS internal message, we simply put it on a RecvQueue, which the application’s receive function checks whenever it wants to do a receive. The internal protocol of RGS is necessary for realizing whether the application is joining a channel on which a real game exists, or whether it is joining a channel which is filled with newcomers, and the original game has crashed some time ago. This is handled in RGS through InitalAck messages. Therefore, if an RGS becomes full member in a game it sends InitialAck to newcomers. When a newcomer gets an InitialAck, it knows that a game exists, and it too becomes a full member. Furthermore, since the worker thread allows for quicker notification of view changes, the RGS sub-systems detect server crashes quickly as well. And the entire fail-over procedure in essence happens inside the thread. The InitialAck messages in a Client-Server application mode are also used to transfer the server address to newcomers.
Failovers:
As mentioned, the second key difference between an application using RGS vs. an application using Ensemble is the ability to failover. RGS allows built in, automatic, failover support for Client-Server style applications. When a Server failure is noticed, it is noticed by all the existing full member clients in the same order (the RGS really depends on Ensemble total ordering guarantee). Then each client constructs what is called a Freeze View. The Freeze View consists of clients which were already full members before the View denoting the Server Crash came in. Now, the full member RGS’s create the same freeze view. From here onwards, we run the Client-side failover state machine (Please see attached Client-side failover state machine). Essentially, the first client in the view goes out to the GDS to get a server list. It then performs negotiations with each server in the group. The first server to join the failed game, becomes the "Failover Server". The clients then transfer states to the server, if crashes happen during state transfer, the server sends a NACK, and clients retransfer states. After the state transfer completes successfully, the server reconstructs the complete game state and sends an ACK, after which the entire system resumes normal operation.
When a Server receives a Negotiation message from a client, it either agrees to failover or disagrees. If it agrees, then it disconnects from its existing channel, and goes to the failed game’s channel. Now, it advertises itself, and waits for the first state transfer message to come by. The first round of state transfer messages to the server contain the Freeze View. The server compares the Freeze View against the latest view and appropriately trims it. If it actually has to trim the Freeze View then it knows that after the state transfer began, some client crashes occurred. It then collects all the states and throws them away, and sends out a NACK. When it receives its own NACK back, it records the Freeze View at that time. Again, states are collected and if the Freeze View has changed by the time the last state is collected, the process is repeated. However, if the Freeze View does not change when the last state is collected, the server sends out an ACK. Until it receives its own ACK, it handles all new client crashes by popping them up to the upper Application (we are now consistent with the other clients, so we will make the same state changes for a given crash that everyone else will). When the ACK is received by everyone normal operations are resumed
(Please see attached Client-side failover state machine)..This completes a general overview of RGS operations. For a more technical description of the RGS API for clients please see the RGS JavaDoc provided with the GameNet installation package. For even further in-depth information please study the RGS source code provided with the installation package.