Microsoft's-Thinking Oracle RAC Think Again Whitepaper
Now that I have actually kind of read this paper, I will comment on it. I say kind of because I cannot focus on non-technical matters, the same way I get really frustrated trying to explain something to a person that just cannot understand the difference between an argument substantiated by hard analysis and an argument that seems logical but is not the predominant underlying effect.
First, Microsoft competes with Oracle, and MS gets hit with this all the time so of course they need to collect in one place their best arguments. On a side note, many Microsoft whitepapers list the author(s). When an artist paints a masterpiece, he/she will put a signature on it because he takes pride in his work. Not all talented artists can paint what they want and must accept commercial work (having food to eat is not overrated?). In such cases, he does not want his peers to know that he had to stoop to do such work and prefers it be anonymous. There is no listed author for this paper.
The first argument in the paper is valid. Only a small percentage of customers have actually deployed RAC. Back in the OPS days, one prominent expert said that he had never seen for himself an OPS deployment that actually achieved positive scaling, but he did hear from someone he considered competent on this matter that one customer did. RAC is much better than OPS (to get something right, you do have to screw it up once or twice or trice). The Oracle people I talk (more hurling insults/ridicule back and forth than talking actually) with do say that the technical skills to deploy RAC are not common among Oracle DBAs.
But the fact that Oracle RAC is not really required by most people does not stop sales and marketing from making a big deal out of it. Funny how CIOs are influenced by the scaling argument (men are most susceptible the size thing) even though it is of no consequence to their specific environment.
Of course RAC is expensive. The alternative is buying big iron which is also expensive. If a solution is not really painfully expensive, is it any good? If one project manager turns what should have been a $1M project into a $10M project and another manager does his project for $1M, who will know it should have been a $1M project? Who will get the higher job ranking come evaluation time? That is: the next promotion and big raise?
When a project is so hideously expensive, the CIO must go to the CEO, who in turn may need to go the board for approval. Well if this project becomes really messed up, the big bosses are not inclined to declare it a failure because they will look bad as well, that is, they had endorsed the project. When a small project runs into difficulty, no matter who is at fault, it is not hard for the CEO/CIO to pin the blame on a lowly project manager, i.e., fire him.
If what you need is SQL Server cluster style fail-over redundancy, then the expense of RAC licensing does not make sense. So RAC only makes sense if scaling performance is needed. The paper talks about scale-out OLTP [by which I mean with distributed partitioned views]. Scale out on any DBMS is not simple and not just for the reasons described. If you understand in detail how the [SQL Server] cost base optimizer (CBO) works with respect to local and remote data, you will understand the horrible implications. [In the execution plan with remote data, the row count estimates are mostly 1,000 or 10,000 rows. When two sources of 1,000 rows each are joined, the output is 1M rows. When this is joined to a large local table, the plan is frequently a table scan.][Technically, RAC scales out on hardware, but does not have a DPV database design.]
If one wanted to point out an issue with Oracle RAC for OLTP, the most apparent is the near complete absence of published benchmark results. There is one RAC TPC-C (there are no Oracle TPC-E period) and it was done long ago (12/2003). The TPC-C RAC pub was 16 HP Integrity rx5670 each with 4 Itanium 2 1.5GHz for a total of 64 cores, a score of 1,184,893tpm-C at total cost $6.5M, versus the contemporary result for a HP Superdome, same 64 Itanium 2 processors, scored 1,008,144 tpm-C at $8.4M cost. The RAC had 17% better performance at 22% lower cost. Looking closely at the price detail, the cost of memory for the rx5670 was $1.4M versus $5M on the Superdome. It is a little difficult to compare pricing because the Superdome discount was about 45% versus 25% for the clustered rx5670. The major software licensing difference was $640K for RAC and $320K for partitioning. So a big chunk of the price advantage is because of a memory pricing anomaly.
| Nodes | 1 | 16 |
| System | Superdome | rx5670 |
| Database | Oracle 10g | Oracle 10g+RAC+Part |
| Report Date | 11/04/2003 | 12/08/2003 |
| Tpm-C | 1,008,144 | 1,184,893 |
| Total System Cost | $8,397,262 | $6,541,770 |
| Price/Performance | $8.33 per tpm-C | $5.52/tpm-C |
| Processors | 64 Itanium 2 1.5GHz 6M | 16x4 Itanium 2 1.5GHz 6M |
| Memory | 1024GB | 768GB (16x48) |
| Disks | 2100+120 | 672+1344+224 |
| HBA | 28 FC 2Gb/s | 64 x FC |
| Costs | | |
| Processors | $1,280,000 ($40K ea) | $528,000 ($8.25K ea) |
| Memory | $4,998,400 $39K for 8GB | $1,440,000 $7500 for 4GB |
| Server Subtotal | $7,085,433 | |
| Storage | $5,032,188 | |
| Server+Storage | | $4,694,618 |
| Oracle 10g | $1,280,000 | $1,280,000 |
| RAC+Partitioning | | $960,000 |
| Discounts | $7,000,000 | $1,900,000 |
Since then, there have been no Oracle RAC TPC-C publications. This usually means there is no good news. If MS wants to criticize RAC for OLTP, I am ok with it. In other posts, I argued that going forward, the new Intel QPI and existing AMD HT interconnects should allow building big-iron (scale-up) systems with better scaling than RAC, on account of the higher bandwidth and lower latency that can be achieved versus RAC going over Infiniband. This is a theoretical argument that needs actual measurements to assess its validity.
On the Data Warehouse side, there are many Oracle RAC TPC-H publications. From the results, I think RAC has decent scalability, and I am really happy that I can bring a better balance of processor power, memory and storage than I can with big iron, or rather, a max'ed out server system (i.e., expensive big capacity DIMMs, and high priced storage). So apparently my arguments above on interconnect bandwidth and latency are not as important on DW. MS does mention they will soon have their own MPP solution, so that good, because I am too old to learn Oracle, and I am relatively happy (meaning I am bitching a lot) doing big SQL Server projects.
I will expand on what I mean by balance. Lets compare 2 recent TPC-H 1000GB results. A 64-node, BL460c with 2 QC Xeon 5450 3.0GHz, 32GB memory per node, (128 sockets, 512 cores, 2TB memory) scored 1,166,976 QphH at total cost $6.3M compared to a 32-socket, 64 core Superdome, 384GB memory score of 123,323 QphH at cost $2.5M. Put aside for this discussion the fact that Itanium is still dual core on a 90nm process while Xeon is quad core on 45nm. For 8X the number of cores, the performance gain is 9.5X. Keeping on mind that the Xeon 5450 core is about 50% faster than the Itanium 1.6GHz based on SPEC CPU int 2006, this is about right. Also the RAC system used the Exadata storage which offloads some processing, but probably not too much.
The cost breakdown for the Itanium is $736K for processors, $800K for memory, $1.3M for storage (EVA SAN), $470K for Oracle (+54% discount on all). In the RAC-Xeon system, it is $160K for processors, $85K for memory, $200K for Infiniband, $540K for storage, $6M for Oracle (30% discount on Oracle only). OK, this is not what I was really getting at. If you max out a system, it means buying the 8GB DIMMs which cost 4X more than the 4GB DIMMs but does not contribute proportionately higher performance. The big gain in the RAC system is that its possible to configure enough memory to fit the database and make do with a less powerful storage system. This argument goes out the window if the RAC memory cannot encompass the entire DB plus working space. Still, it is good to have this avenue.
| System | Superdome | BL460 Cluster |
| Database | Oracle 11g + Partitioning | Oracle 11gR2, RAC |
| QphH@1000GB | 123,323 | 1,166,976 |
| TPC-H Power | 118,577 | 782,608 |
| TPC-H Throughput | 128,259 | 1,740,122 |
| Total System Cost | $2,532,527 | $6,320,001 |
| Processors | 32 Itanium 9140 1.6GHz | 128 X5450 3GHz |
| Cores | 64 | 512 |
| Memory | 384GB | 2080GB |
| Disks | 768 | 128+6x12 |