Published in PC Hardware

Google’s AI chips takes normal chips to the cleaners

by on06 April 2017


At least for voice recognition


Google’s attempts to solve its voice recognition problems might have resulted in an AI chip which cleans the clocks of the big chip makers.

Four years ago, Google was faced with a problem that if its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centres to handle the numbers.

According to PC WorldGoogle decided to splash out on some research to create dedicated hardware for running machine- learning applications like voice recognition.

Its boffins came up with something called a Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks.

According to a paper Google published yesterday, the chip out-performs CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed.

A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.

Google has used TPUs in its data centres since 2015 and they’ve been put to use improving the performance of applications including translation and image recognition. The TPUs are particularly useful when it comes to energy efficiency, which is an important metric related to the cost of using hardware at massive scale.

TPUs are application-specific integrated circuits (ASIC) which means that you could never use them in something as flexible as a PC. It is more like it is a chip which is designed to run just one aspect of a single game. The TPUs are rather better at latency compared to other silicon.

Google's Norm Jouppi said that machine learning systems need to respond quickly in order to provide a good user experience.

“The point is, the internet takes time, so if you’re using an internet-based server, it takes time to get from your device to the cloud, it takes time to get back. Networking and various things in the cloud — in the data centre — they takes some time. So that doesn’t leave a lot of [time] if you want near-instantaneous responses.”

Google tested the chips on six different neural network inference applications, representing 95 percent of all such applications in Google’s data centres.

The next thing that will happen with the technology is when it gets the GDDR5 memory that’s present in an Nvidia K80 GPU. The company’s research said that memory bandwidth constrained the performance of several applications.

Google’s paper claims that there’s room for additional software optimisation to increase performance. This includes convolutional neural network applications as a candidate.

While neural networks mimic the way neurons transmit information in humans, CNNs are modeled specifically on how the brain processes visual information.

“As CNN1 currently runs more than 70 times faster on the TPU than the CPU, the CNN1 developers are already very happy, so it’s not clear whether or when such optimizations would be performed,” the authors wrote.

Last modified on 06 April 2017
Rate this item
(0 votes)

Read more about: