Latest News

Nehalem, Shanghai, and Dunnington performance notes

Friday, 14 August 2009 , Posted by owner at 00:34

Back in March 2008, I discussed the SAP SD 2 Tier results for some AMD Opteron and Intel Xeon systems. Here are some more recent results. All Opteron and Xeon processors below are on the 45nm process except for the Opteron 8360, which is a 65nm product. For some reason, HP has not posted a 4-socket result for the 45nm Opteron 8384, where the bigger 6M L3 cache in known to improve network round-trip performance. The Itanium is a dual-core 90nm product, which is at serious disadvantage (to be replaced Tukwila, a 65nm quad-core). Note that the Xeon 7460 is a six core processor. My expectation is that the 4-way Opteron 8384 should be around 22-23K which would be very competitive for a quad-core going against a six-core.

 

System             Processors                                           Users                SAPS

DL785G5           8 x 2.7GHz        Opteron 8384                7,101                 35,400 (O10,Lin)

DL580G5           4 x 2.66GHz      Xeon X7460                   5,155                 25,830 (S2K5)

DL585G5           4 x 2.5GHz        Opteron 8360                3,801                 19,020 (S2K5)

 

DL380G6           2 x 2.93GHz      Xeon X5570                   4,995                 25,000 (S2K5)

DL385G5           2 x 2.7GHz        Opteron 2384                2,752                 13,780 (S2K5)

DL380G5           2 x 3.33GHz      Xeon 5470                     2,518                 12,600 (S2K5)

BL860C             2 x 1.66GHz      Itanium 9140M                  501                  5,850

 

Of special note is the exceptional result for the 2-socket Xeon X5570, based on the new Nehalem processor, expected to be available some time in the first half of 2009. This is not entirely unexpected. First, the integrated memory controller probably helps in network round-trip intensive operations. Second is the return of Hyper-Threading, as Nehalem was designed in Oregon, while the Core 2 architecture is an Israeli design (each design team has their own opinion of various micro-architecture features). On the last Oregon design, the Pentium 4 NetBurst core, I noted that Hyper-Threading improved network round-trip performance by 15-20%, but did not actually improve any SQL operation outside of the network round-trip. Later, I measured 40-50% performance gains for HT on LiteSpeed backup compression tests. I have heard that while the theory behind HT is sound, certain operations such as acquiring locks (at the C/C++ level) can cause problems. The compression algorithm has no such issues, and probably represents the upper bound on what could be achieved with HT. Supposedly the Itanium HT, which was introduced with Montecito after the last NetBurst, had some improvements over NetBurst. Now that the Oregon team has had 8+ years to investigate HT characteristics, we should expect a much improved HT with Nehalem.

 

My expectation is HT has the biggest benefit in SAP type environments, i.e. stored procedure calls that retrieve a single row (50-100 CPU-micro-sec), moderate benefit in TPC-C and E type environments (on the order of 1-3 CPU-milli-sec), and less or no benefit in large TPC-H type queries. When I get the Nehalem Xeon 55xx system, I will look into this.

 

Unisys to focus on Xeon over Itanium?

On a side note, see the article regarding Unisys.

http://news.cnet.com/8301-13924_3-10167332-64.html?part=rss&subj=news&tag=2547-1_3-0-20

Unisys just posted a 10TB TPC-H result for a 16-socket Xeon X7460. While the X7460 has six cores, Unisys only enabled four cores per socket, in part, because the current version of Windows and SQL Server only support up to 64 cores. The result is 26% higher than for a 32-socket 64 core Itanium 2 system. The presumption is that later this year, Intel will release the quad-core Itanium, codename Tukwila, so this result might be representative of 16-socket systems in late 2009. Even if Tukwila can achieve 2.0GHz, it will probably just be comparable to the X7460. After Windows Server 2008 R2 releases, the 16-way X7460 will have all 72 cores available for the through-put portion of TPC-H.

 

64-core TPC-H 10TB

Xeon X7460 Six-core 2.66GHz (Dunnington, 4 cores used) 80,172.7 (ES7600R)

Itanium 9140N Dual-core 1.6GHz (Montecito) 63,650.9 (Superdome)

 

Dell TPC-E results for Shanghai versus Dunnington

Both systems 4 sockets, 64GB memory, 

4 x six-core Dunnington 2.66GHz, 16M L3   671.35 tpsE

4 x quad-core Shanghai 2.7GHz, 6M L3      635.43 tpsE

So Shanghai quad-core competes very well against Dunnington with six cores. The large cache relative to Barcelona (2M L3) really helped 

 

SSD Test Platform

Anyway, I am all set to buy the new 2-socket Xeon 5500 series as soon as one becomes available. I will look into Nehalem performance relative to Core 2, with and without HT. I will try to configure this system with 2 PCI-E SAS RAID controllers and 8 SSD (2 per x4 SAS port, 4 devices per controller) initially, and then expand to 4 RAID controllers with 24-32 SSD as budget allows (probably 3 SSD and 1 HDD per x4 SAS port). I should also get a Shanghai platform as well, as my last good numbers for Opteron are now very old, but this is my own money. Business is down with the economy, and too many consultants are dropping their rates to get business. I am not inclined to do so. So I should have time to bring my past performance papers, many of which pertain to SQL Server 2000, up to date. I will also try to re-release some of my performance tools like SQL Clone on www.qdpma.com. I should also be able to release new tools, one for Profiler Trace analysis and another for performance tuning using dm_db_index_usage_stats and dm_exec_query_stats.


Currently have 0 comments:

Leave a Reply

Post a Comment