×

Warning

JUser: :_load: Unable to load user with ID: 78

Published in Reviews

Core i7 965 in the lab reaches 4GHz

by on04 November 2008

Index

Image Image

Preview: New Architecture with tricky settings


You may have wondered why we did not pop up with the Core i7 review yesterday, but be assured we are testing it intensively. While the 45nm Core 2 Duos were just a evolutionary step, Nehalem is a new architecture. Everything new brings problems or at least unexpected results, so we had to dig much deeper to understand how it works. Therefore, we give you an overview and some background information today, while the benches will arrive when our extensive tests are completed.

Image




After two years and three months Intel brings a new architecture to the masses. While Core 2 Duos and Quads do well in the desktop market, Intel struggled with their server platform, especially with more than two CPUs involved. To solve this, Intel had to abandon its previous FSB handling, because it's not efficient with more than two CPUs. Of course, they did not incorporate Hypertransport which was developed by AMD ages ago, they developed their QPI counterpart. In fact, QPI is quite the same as Hypertransport but it avoids joining an alliance controlled by AMD and, of course, nVidia. This would have made things easy for them to develop chipsets and Intel is fond of its chipset income.

Image




Cache:

Another disadvantage was the way their Quads were built by sticking two dual-cores on a chip, where interconnecting two chips is an easy way, but the communication with the memory controller introduces quite huge latencies. So, Intel went again the AMD path and developed a native quad-core, with a similar cache layout as its rival. Introducing an L3 cache which, of course, is slower compared the massive 2x6MB L2 cache on the QX Quads and because it's only clocked x20 which gives you only 2.66GHz compared to the nominal 3.2GHz of the i7 965 Extreme. A smaller cache also saved a lot of transistors which, on the other hand, increases yield and therefore reduces costs. The transistor count went down from 820 million to 731 million.

To improve performance the L1 caches have now 3 instead of 4 cycles latency, the L2 cache increases from 11 to 15 cycles and the new L3 cache needs 39 cycles. On the other hand, the TLB was increased to 64 entries from previous 16 and the memory controller in the CPU itself reduces latencies about 50% compared to the previous access via FSB to the Northbridge chip. Overall, this decision should not impact performance, in the worst case scenario it is only slightly behind the predecessors, while in most cases it's faster, because there are only few applications which would fit into the 6MB L2 cache of a Penryn CPU.

Image

 

Memory Controller:

For some reason Intel decided to design a triple-channel memory controller. We think the memory subsystem is antique at best and a real new solution should have taken place, because the step from DDR2 to DDR3 is quite minor and inadequate as we have proved a long time ago here. While DDR3 offers better power-consumption this advantage is neglected by high-speed modules which go as high as 2.1V. At least now memory companies go the right way introducing higher speed modules with lower voltage requirements, due to the fact Intel stated their internal memory controller can't handle more than 1.65V, otherwise you may damage the CPU. Board vendors will circumvent this by introducing memory voltage controllers decoupled from the CPU, but in the OEM market such costs will always be avoided.

A third channel gives you only 8.5GB/s bandwidth, while the dual-channel interleaving gives you the full bandwidth of 25.6GB/s. To make it work better, Intel would have needed a quad-channel interface, but you know space is limited on mainboards and they won't introduce a new board-size standard after their fiasco with BTX. We will, of course, show you if there's any performance difference by going triple-channel over dual-channel in the review.




SMT aka Hyperthreading:

An old acquaintance made it back into Nehalem: Hyperthreading. While with a P4 this feature was important to get some juice out of the Netburst architecture, it is not necessary for Nehalem at all, because this CPU is fast on its own. But Hyperthreading or as Intel calls it SMT, does only require about 5% more transistors but can increase the performance up to 30%. With some applications, however, it will considerably slow down performance, so you have to test for yourself if your applications will benefit from it. There is no general rule to say which is better. This slowed down our benching efforts, because we had unexplainable slow benches and to look into it took us considerable time; at least now you know and we will show you the results in our review.

 

Design with Power Consumption in mind:


A big difference from previous designs is the approach Intel went through to improve performance. The old CPUs were designed with the 1:1 law, so if you improve the performance 1%, it can cost 1% of additional power. Nehalem is "greener" because now 2% of performance is limited to 1% more power consumption. Overall, we can say idle performance was great, but overall power consumption can be higher as current Penryn offers.

 

Overclocking and Turbo Mode:


Overclocking is our only concern, as things went quite wrong. We were used to the fact that playing around with the FSB and increasing it increased performance, as well. Due to QPI there is no FSB, same as Athlon CPUs, but you still get a host clock. This went down to the 133MHz we knew from before the quadruple FSB was introduced. All frequencies are now calculated with that clock.

If you get an Extreme CPU it's somewhat easier. Intel introduced a Turbo Mode, which means, the CPU can overclock itself when demanding applications run. On the other hand, this leaves the idle power at its lowest level. Increasing the TDP, the maximum current and dynamic overvoltaging got us to 4GHz easily.

Image

For non-Extreme editions that will get tricky. You have to increase the host clock, need to reduce the clocks for memory and QPI and also the chipset needs more juice. On our "Smackover" Intel board, this is really a mess, because every setting is in a different BIOS screen und the BIOS does not calculate any frequencies for you, so a table-calculator will help. We hope other vendors will do this in a smarter way.

 

Stay tuned for the review, which will be online as soon as possible.

 

 

Last modified on 06 November 2008
Rate this item
(0 votes)