Table of Contents
Data synchronization differs highly from case to case.
Let’s say we have multiple databases each serving multiple clients. There are plenty of different reasons for having multiple databases.
One example would be a high latency from some clients because of the location of the database. Another example would be some of your clients might not have a steady internet connection and need a database in their network. In this case any modified data by one client only exists only on the database he is connected to.
This results into multiple databases with unique data. This is what we want to change.
We are going to implement a synchronization process that synchronzises multiple databases over one master database.
The goal is to have one master-database to which any other database can connect to whenever they want. Because there might not be a constant network connection between the databases the user can manually start the synchronization by clicking a button.
The structure of the planned system would look similar to this:
As an example let’s say we have three databases on different locations each serving three clients. Each of the three databases only save the data for their clients.
The goal is that the data content is the same across all databases.
To archieve this we need a database being the master-database. The master-database has to be reachable by any of the client-databases. This could be implemented by using one of the three databases as the master-database.
In our example we take an additional database which is implemented in the cloud.
Any synchronization runs between the master-database and one of the three client-databases. After every synchronization the two synchronized databases contain the same data.
So in order to have the same data on all three databases five synchronizations would need to happen. These five steps are displayed below.
The first important change will be on the entities.
It is not possible to transfer deleted entities without changing the data structure.
The reason for this is the following. Imagine a system with two databases.
Database 1 creates a new entity. Now this entity exists only on one database.
As soon as synchronization is run, the entity is available on both sides as required.
If database 1 now deletes this entity, it is only present on one database again.
As soon as database 2 changes this entity and the synchronization is started, the entity would be transferred again. This ensures that it is impossible to synchronize deleted entities without changing the data structure.
There are two ways to solve this problem. Either we create a new table that stores all IDs of all deleted entities. Or we give each entity a “IsDeleted” flag.
We decided to use the entity flag.
Any entity that has to be synchronized needs to implement an interface identifying them as syncable.
The Interface has 5 Properties:
- Id: Identifies the entity.
- CreatedAt: Saves the entity creation time. Needed to spot newly added entities that still needs to be synced.
- ChangedAt: Saves the time of the last entity modification. Needed to spot newly modified entities that still needs to be synced.
- SyncedAt: Saves the time when the entity was synchronized last. Needed to compare which entities need to be synced. If SyncedAt on the local database is earlier than on the master database we need to update the local entity. If SyncedAt is later than on the master database we need to update the master entity.
- IsDeleted: Saves whether or not the entity is deleted. This Flag is needed to synchronize deleted entities. This way we update ChangedAt on the database so the remote database will synchronize the entity and also mark the entity as deleted.
The EntityFramework.DbContext that manages your connection to the database needs to Log any changes of the newly saved entities.
To achieve this we override the SaveChanges() and SaveChangesAsync() function inside our DbContext:
We also need to implement the LogTimeStamp() method:
Important to notice is that the DbContext class now also needs a Boolean-Flag named “KeepTimeStamps” that is set to false by default.
We will set this to true during the synchronization so newly added entities from the sync won’t change their added and changed dates.
Ordering of tables
Before we can synchronize the entities, we need to order them by their dependencies. The reason for this is that we insert or update entities one after the other and if a new entity has a required dependency to a not yet created entity, EF will crash.
Let’s say we have a database with 4 tables. The first table “Shelf” has a foreign key to the second table “Book”. The table “Book” has a FK to the third table “Page”. The fourth table “Person” has no dependencies. All these foreign keys are required.
This would lead to the following order:
The entity Person can be placed before, after or in between that order.
Note that before an entity has been added to the database all dependencies were created.
Let’s take the ordering example and see a working implementation.
Firstly, we create the 4 entities. Each of these entities need to implement the “ISyncableEntity”-Interface that we discussed in chapter 3.1.
In addition, Shelf needs a reference to Book and Book needs a reference to Page.
The “Shelf”-entity looks as follows:
Now we need the actual Synchronization-function.
First the full code will be displayed. After that we will take a walk through the code.
The parameters of the function are:
- strategy: An enum that defines what happens in case of a data-conflict.
- from: The start date and time for the sync. Any data changes before this time is not being included in the synchronization.
- to: The end date and time for the sync. Any data changes after this time is not being included in the synchronization.
- hereTable: The table of the local database that needs to be synchronized.
- thereTable: The table of the remote database that needs to be synchronized.
- excludes: Any columns that are excluded from the synchronization.
Firstly, we get all entities of both sides and get rid of any duplicates. After that we check for any conflicts by comparing their Id.
If any conflicts exist (if any entity got modified on both sides during that timeframe), the conflict-strategy decides on what to do with these conflicts.
In this example we are going to use the OverrideOld-strategy that simply discards changes that occurred before the latest change.
Next up we set the SyncedAt property of all locally changed entities to the current time.
This way all synchronized entities on the server have the valid SyncedAt-Time.
To add or update each entity on the other side we create a hard copy of the entity so it does not have any connection to the current dbContext.
The Mapper is mapping every property from the changed entity to the newly created target entity while excluding all specified properties.
Then we use the copied entity for the IDbSet.AddOrUpdate()-Method. AddOrUpdate compares the entities with the same id based on their properties.
If no entity with the id exists the entity gets added to the database. Otherwise the entity will be updated.
Now we need to call the function for each table in the database. In this case it will be four times.
This example synchronizes data day by day and reads and writes the LastSyncedTime on the Filesystem.
Note that we are synchronizing the Pages-table before Books and the Books-table before Shelves. After each of those calls a SaveChanges() needs to be called in order to execute the SQL Transaction. This is only important for tables that depend on a different table as discussed in chapter 4.3. (Ordering of tables) That is also why there is no need to call SaveChanges after the Persons-table.
Make sure that there is a SaveChanges() called after the last table.
We managed to dynamically synchronize data between multiple SQL-Databasese with just a few code adjustments. The code is easily expandable and can be called from any source.
If you have any questions please leave them below.