As our core database grows, we're looking for opportunities to segment the data, and spread the load across multiple systems. In some cases, there may be requirements at the application layer for data to be loosely referenced from more than one database. Should we experience a failure that requires a rollback or recovery from a previously backed up copy of the database, I want to make sure that I can get all databases back to a "synchronized" state.
One method we're considering is leveraging "snapshots" at the storage layer, ensuring that all databases are in standby mode, taking the snapshots for all the databases, then returning the databases to normal production mode.
We will also be working to minimize the dependencies across multiple databases, and make the application robust enough to deal with unsynchronized data appropriately.
What other strategies have you used and would you recommend?
Well I will play as a devils advocate here and ask You why synchronize databases at all? This task, as You might know best working on such a highly load site like LN, might become a serious bottleneck. Historically database sharing/replication is rather though towards availability, but not necessarily towards scalability.
You should definitely look at recently hot, and wide discussed topic - Shards. I know LN uses java, and there are some interesting initiaties in this area for some time. As a example You might take a look at Hibernate shards implementation (though I'm not sure Hibernate or any kind of ORM is the best approach for You, except transparent caching).
Fragmenting your database in order to spread the load over multiple servers is probably a poor choice, at least in my view. Most major database software vendors provide a way for you to run a single database over multiple servers. Oracle, the software I am most familiar with, provides a feature called Real Application Clusters (RAC), which provides this functionality. A single database is stored on shared storage (on, say, an EMC Symmetrix, CLARiiON or Celerra array; shameless plug here). Then multiple servers are allowed to mount and open the database simultaneously. Concurrency on the data layer is provided by the database kernel using a feature called cache synchronization. This permits locking of blocks across the cluster so that no machine is allowed to overwrite the pending writes of any other machine in the cluster.
This actually works very well. I would strongly suggest that you consider a shared storage cluster environment like Oracle RAC 10g rather than fragmenting your database into multiple islands of data.
Having said that, EMC does provide the functionality that you are looking for. It is called consistency groups. No one else in the storage industry has this feature, so I am actually working against my employer's interest by recommending something like Oracle RAC 10g instead. However, I would be dishonest if I did not do so, since I really do believe that it is a better solution.
Consistency groups are basically the notion that you can create a replicated set of data on multiple LUNs, volumes, or even arrays which is in the same state from a data standpoint. That is, they are all created simultaneously in terms of I/O. No write to any of the objects in the consistency group is allowed to be later than any other write. They all represent the exact same moment in time from the standpoint of the data.
What that means to you is that you can take, say, a snapshot of multiple LUNs spanning multiple volumes or even arrays, and the data ordering for that snapshot will be maintained for all of the objects in the consistency group. For all practical purposes, it is exactly like you took a snapshot on each object simultaneously.
Consistency groups are explained in one of the posts on my blog. See the first URL that I point to below. The second URL is the blog itself. Please check it out. I think you will find it interesting.
Hope this helps. Let me know if you would like further information on any of this.
Links:
http://oraclestorageguy.typepad.com/oraclestorageguy/2007/07/why-emc-techno...
http://oraclestorageguy.typepad.com/oraclestorageguy
I would have a look a today's storage devices (NAS etc). These appliances have gone far beyond the concept of just storing data.
A good implementation of that approach should give you a secure and performant platform. Not cheap though ...
Recommend considering Oracle "Streams" which very much fits the bill for what you are trying to do. We use this technique extensively, to keep multiple databases in sync.
We have written an extensive writeup on some things to be aware of, however. See the web site below
When you say synchronized, how far behind can you live with? Obviously it can't be instantaneous because of latency in replication. The only to solve this is to mirror the data between your primary and secondary. Snapshot obviously has lots of latency because you have to space out your snapshot intervals. The only to really be "synchronized" is to do dual-writes (or dual loads), 1 to the primary and 1 to the replicated instance. As for being able to reference more than 1 DB, it depends largely on how your app is designed.
It's pretty hard to give you a good answer from a short email. But I hope it's clear and it helps in someway
I do not know how big are your databases, but Oracle RAC (Real Application Cluster) with Partitioning and Flashback could be solution for you. Standby Database you can use for reports and test. With this features you can restore date more precisely.
Maybe you think in MSSQL therms where database is like schema in Oracle. With proper design schemas and tablespaces you have 100% control over data application (without stop OS/DB).
Restore/Synchronize all in one step it always hard and it should not happen
Based on your question, it sounds like you have multiple applications running, each with its own database, and at least one "meta application" that needs data from two or more of these application databases. An example of this would be having an HR database and a Financials database that are each referenced by their own application, and both of them are referenced by a data warehouse application. This synchronization issue is a common problem faced in disaster recovery too, and some of the best solutions I've seen are covered in various disaster recovery systems.
There are a number of solutions with different costs and capabilities... The biggest difference between the solutions is how closely they manage to synchronize the data at the point of backup (ranging from days to milliseconds), with cost increasing very rapidly as latency decreases.
The simplest answer is to simply quiesce all of the databases (by forcing the users to read-only or no access), then doing backups of all the databases while they are in the quiesced state. This is easy, but it means that you need to "lock the door" on the users, often at a critical processing time, which does not make you many friends in the business!
Some database engines have ways to synchronize the differences between database backups. This allows you to essentially manage the backup process so that the users do not perceive any delay or lockout from their data. While this is very attractive, and has little "cash cost" if your vendor supports it, problems can be truly catastrophic to the business.
In the DataWarehousing community, the most enthusiastically supported solution is a fully redundant site, so that hardware forced downtime becomes practically impossible. This is usually coupled with extreme dev/qa/production separation so that software and human error conditions become almost unheard of too.
FWIW, very few if any sites truly reach the stage of 100% synchronization this way. It is far easier and cheaper to support multiple schemas within a single database, and if every system that needs to be synchronized can run in a single database, then the synchronization of backups and restores of those systems becomes a non-issue... Everything is in one container, so everything is implicitly synchronized!
A person with two watches cannot tell what time it is.
In other words - you can only have one source of truth for a business entity. If you must seperate your information into multiple persistant storage areas - one must be the source of truth for a given entity, the others have copies for processing/performance convenience (though perhaps web service syncronous access to the information may be better than replication)
There are a variety of ways to publish information from the source of truth to any interested processing entities. I believe the key is to make these interfaces self-healing - that is, if a subscriber of this information goes off line for a second or a month, when it is restored, the interface must have the smarts to have the subscriber request from the publisher (the source of truth) for any updates since the last successfull sync. It makes for cheap production support too.
most companies do interfaces poorly. they either
a) have daily batch full record dumps - what a waste of processing
b) the infinite 'delta' - every time interval the publisher sends out updates
for what happend over the past interval. Unfortunately, this is bound to cause headaches due to missed data and production support nightmares when certain systems need to be taken down,or corruption occurs, or for any other of a hundred reasons why the subscriber could become out of sync with the publisher.
my recommendation - a variable time window delta - where the subscriber can alter the delta as needed. if you need a full dump - the delta starts at the beginning of time. if not - just grab the last interval.
finally, don't implement any kind of a stateful interface. if a user or customer record was updated - send a complete logical record for that customer or user
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment