Back in 2007, we wrote that Larrabee was initially designed as a discrete graphics engine and was also capable of computing highly parallel applications while preserving x86 programmability. In May 2010, Bill Kircos, Intelâ€™s Director of Product and Technology Media Relations, announced that the Larrabee project would never materialize as a discreet GPU part and would instead be transitioned into a new architecture leveraging both Larrabee and Intelâ€™s many core research projects.
During ISC 2010, that architecture soon came to be known to the HPC crowd as Intel MIC (Many Integrated Core). In its official announcement, Intel outlined plans to ship a MIC development kit platform to select customers known as Knights Ferry. According to Slide 34 of Skaugenâ€™s keynote presentation, Knights Ferry is an x86-based design with 32 cores on a single chip, each with four threads, a 32KB L1 instruction cache, a 32KB L1 data cache, and a 256KB L2 cache. In total, the chip has 8MB of shared L2 cache, which some analysts note to be an interesting design point as many high-parallel applications do not require such a large on-chip cache.
Each processor has a very wide 512-bit vector unit allowing 16 single-precision floating point operations to be computed in a single instruction, with double-precision floating point operations yielding half throughput value.
Although the Knights Ferry development kit looks very similar to the outline of GPU, we are reminded to mention that it isn't a GPU because it has x86 cores. Besides, Intel would never do such a thing. Nevertheless, the card comes with a dual-slot heatsink, features up to 2GB of GDDR5 memory, and connects to a standard PCI-Express 2.0 motherboard slot. Intel advertises MIC as an â€śIntel Co-Processor Architecture,â€ť so by nature it can become drop-in compatible with an Intel Xeon chip without the need to reprogram application code in another language.
HPCwire.com has published a detailed architecture comparison between Intelâ€™s Knights Ferry based on MIC architecture and Nvidiaâ€™s Tesla products based on Fermi architecture. As noted by Michael Wolfe, Slide 33 from Skaugenâ€™s keynote presentation depicts the Knights Ferry architecture layout with remarkable similarity to the 2008 SIGGRAPH article describing Larrabee.
Figure 1: Schematic of the Larrabee many-core architecture: The number of CPU cores and the number and type of co-processors and I/O blocks are implementation-dependent, as are the positions of the CPU and non-CPU blocks on the chip.
Figure 2: Schematic of the Intel MIC (Many Integrated Core) architecture
Given the fact that Knights Ferry is not a commercially available product, it remains unclear whether or not it has similar design aspects to Knights Corner, the first MIC product Intel plans to launch. According to official plans, it will be manufactured on the 22nm half-pitch process node, will contain over 50 cores, and will be released sometime in 2011. All in all, we expect the next 16 months in the HPC sector to hold many interesting application performance competitions among Intel, AMD and Nvidia. While Intel boasts its x86 instruction set as a provider of maximum compatibility with existing applications without need for dual language programming on processor and co-processor, AMD and Nvidia focus their efforts on maximizing floating point throughput using a heterogeneous combination of CPUs with their Evergreen and Fermi GPU architectures.