Oracle Deployment 11G RAC to New Hardware
No matter how you plan or how well you’re prepared, upgrading core production components tends to be a difficult task. This is exactly the reason why people are running on older software/hardware for years trying to avoid the upgrades, even though Hardware and Software vendors are pushing hard for upgrades threatening clients that they are not going to be supporting their products any longer.
There are multiple challenges in upgrading to a newer version, but all comes to three major questions.
- Will applications work with the new version without any new bugs or outages?
- Will new version perform the same way or better?
- How are we actually going to do that?
Digital Edge is helping one of its clients move from Oracle 10G RAC to Oracle 11G RAC. Oracle RAC is powering most critical business applications of the client, and is central repository for all its data.
First let me explain WHY we had to do it even though I think it is a bad move.
Why we had to do it:
a. We purchased a new EMC VNX SAN, and originally wanted to migrate Oracle ASM and Oracle cluster ware on the new EMC SAN.
b. The client wanted to get more performance out of Oracle, as they started to bottleneck CPUs. We were thinking about new hardware and started talking to Oracle.
c. Oracle suggested that to be in compliance with licensing we would have to change the CPU architecture. Here is the quote:
“Your current production cluster is based on 2 x R900 servers each with 2 x Quad Core Xeon CPUs each CPU running at 2.4Ghz
The recommended architecture is based on a primary 2 node server cluster for prod and a 2 node cluster for DR.
Each R710 is configured with a single very FAST quad core processor, this was specifically recommended and laid out in the
presentation so you Oracle License costs could be managed, if this was not configured in this manner and you continued
to run servers with 2 x Quad core configuration, you would be looking a license exposure well above $700K.”
So now we ended up with 2 Node Oracle 11G RAC in production, and 2 Node Oracle 11G RAC in DR with a plan for physical Data Guard between them.
Why it is bad:
a. I feel that the servers that we run on are pretty good. Those are Dell 900 with 2 CPUs. We had to get 2 CPUs vs 4 CPUs available to be in compliance with Oracle license.
b. Oracle 10G is on production boxes across the industry for a long time. Has its glitches and problems, but it is a known devil.
c. The main app and all supporting and secondary apps are tuned for 10G performance.
d. In attempt to comply with Oracle new license model we had to move to more powerful processes but have less processing power counting cores.
Here is what we have to do now:
1. Find hardware to deploy testing environment in parallel of PRODUCTION and QA environment that we already have; keeping QA for new releases of software. We also need to have enough hardware to produce adequate load to see how new implementation is configured.
2. Make sure that business users will not confuse TEST environment with actual PRODUCTION as it would look almost the same.
3. Make sure that we run all business cases but at the same time performing stress testing measuring performance base lines.
4. We have to plan for switch over process and measure the time for the switch over process.
5. During the client tests we have to gather base line stats and tune statements per new 11G optimizer.
As everybody knows the hardware cost is more about price of power. Electricity cost has increased in the last two years. So to bring parallel infrastructure up for a month or two, would require additional power, processing capacities, etc...
“This is a great application for public cloud” would jump someone who is relatively new in financial processing – not so fast. The data has to stay secured in the secured perimeter. We cannot just build client servers somewhere outside; also latency between cloud servers to Oracle would invalidate any real stress test.
Certainly our virtualized application layer helped a lot. We had enough resources in our VM clusters to build most of the parallel processing. However we still had to power up a few older physical servers to compensate.
Lastly even with strategic planning this big upgrade put a stop on multiple depending infrastructural project for at least month or two.
By the end of the project I will try to calculate the cost of pre-production test effort to have an idea what the technology upgrades actually cost to the end client and if the whole thing is worth it.