High availability: Oracle RAC vs. RAC One Node vs. Data Guard
I try to say as simple as possible. If we have to find HA solution for ORACLE DB, I will introduce for you the alternatives. I collected as much information as possible about the available technologies such as RAC, One Node RAC and Data Guard. In this article I do not want to consider explaining all of these technologies, more than compare with each other, help to someone summaries the features, advantages, disadvantages. And just one other important aspect: I take the focus just for high availability and I will not concentrate the other criteria. So, let’s start with basic introduction part:
Solutions
1. Oracle RAC
The idea is to utilize the additional resources of multiple machines to satisfy higher load demands (scalability) as well as provide a higher level of availability since connections can be directed to any available instance.
– servers connecting to the same database (shared storage) simultaneous
– servers are usually same location “in a room”
– RAC is mainly for load balancing
– the application should be “cluster ready”
2. Oracle RAC One (one node RAC)
Same infrastructure as a classical RAC, just here it is a single instance of RAC running on one node of cluster while the 2nd node is in a cold standby mode. It is provide a cold fail-over solution for Oracle database.
– built-in cluster fail-over for HA but not to load balance unlike regular RAC
– useful for some maintenance purpose like rolling upgrade or proactive upgrade
– it is capable for online upgrade to real RAC
3. Oracle Data Guard
Data Guard provides for continuity of operations – if the room in which your RAC cluster resides “goes away” (fire, flood, main hw failure, whatever) data guard is ” somewhere else, ready to take over” (fail over site).
– designed for disaster recovery and business continuity solution (cost-effective way)
– the DG provide many extra features to use the secondary site database (reporting, testing, UAT env., etc.)
– possible to build total replication (physical) or just some part (logical)
Hardware/Network aspect
1. Oracle RAC
– the infrastructure complex and requires many network interfaces
– servers communicate via interconnect connection (latency between nodes critical – cache fusion)
– long distance sites between nodes required special hw component (stretch cluster)
– generally the RAC has significant overhead compare using classical infrastructure (not ideal in cloud environment)
– recommended to use similar capacity nodes
2. Oracle RAC One (one node RAC)
-it requires the same infrastructure component like regular RAC environment.
– it has less overhead than regular RAC
3. Oracle Data Guard
– servers connecting different database in different sites
– servers need standard network connection to transfer the transaction log (archive logs)
– shared storage possible to use but not required
– primary and secondary nodes could be different
Availability
1. Oracle RAC
– this protects from instance failures but not data or storage failures (storage level replication)
– for data recovery takes as long as in normal solution (but it has parallel recovery option)
– limited protection against the human errors
2. Oracle RAC One (one node RAC)
– RAC One provide fast server relocation but not 100% continuous availability
– it it not designed for DR, not suitable for mission critical applications without DR solution
– using this technology somewhere between the RAC and DG
3. Oracle Data Guard
– primarily a backup solution in the event of failure at the primary database (human errors, corruptions, etc.), recovery can just take few minutes. Human error is responsible for more than 75 percent of Oracle outages (Human error 79% – HW error 21 %).
– DG could provides solutions against the human errors (ability to run in a delayed apply mode)
Switch over after failure
1. Oracle RAC
– zero downtime for instance level failure
– open sessions relocated automatically, open transactions has to be repeat
2. Oracle RAC One (one node RAC)
– switch over takes less then 5 minutes
– on failure, first try to restart the service the primary site, then automatically switch over if was not success
3. Oracle Data Guard
– switch over takes from less then 5 minutes up to few minutes, depend on the settings.
– switch over can taken automatically but usually manual operation
Cost
1. Oracle RAC
– usually RAC is more expensive solution
– RAC is very popular instead of their complexity, because the ORACLE marketing machine is very effective
2. Oracle RAC One (one node RAC)
– Oracle has separate pricing policy for this solution, the cost less then the classic RAC but still expensive
3. Oracle Data Guard
– no extra option cost required, but have to buy license for secondary site as well.
– if we want to use the standby site for reporting we have option to do that but we have to buy a new option for it (called “Active Data Guard”)
Dynamic computing, cloud ready
1. Oracle RAC
– the clustering infrastructure required to run RAC, it not trivial to provide that
– the virtualization has overhead and the clustering solution also has relevant overhead, this make together significant performance disadvantages (what will answer for it the ORACLE in version 12c )
2. Oracle RAC One (one node RAC)
– required the same infrastructure as classical RAC (complex) and could be performance issue
(- anyway it is good idea to start the service just one node and extend with others on demand, but not for availability)
3. Oracle Data Guard
– more suitable in cloud (dynamic) environment, switching between data centres
Conclusion
Both DataGuard and RAC have their strengths and weaknesses. Some sites even use both. that’s why the oracle is recommended for “maximum availability architecture”. If we put together: RAC+DG (+logical standby), system availability could be 99.9999% the but it cost very high. To understand this result, we have to admit To understand this result, we have to check the strengths (this technologies complement each other):
– RAC has offer less downtime for OS/DB upgrade and instance level failure
– DG can protect from human and storage errors
– logical (DG) standby can eliminate the downtime for every OS/DB upgrade (but it must support from apps.)