Published in News

Hugging Face and chums launch StarCoder 2

by on29 February 2024


Billed as the ultimate code generator

Hugging Face and its mates ServiceNow and Nvidia have cooked up StarCoder 2, an open-source code generator that they claim lets you write code like you know what you are doing.

The original came out last year, and they've been working on a sequel ever since. StarCoder 2 isn't just one code generator but a whole family.

Now, it comes in three flavours, the first two of which can run on most modern gadgets: A 3-billion-parameter (3B) model trained by ServiceNow, A 7-billion-parameter (7B) model trained by Hugging Face, and A 15-billion-parameter (15B) model trained by Nvidia, the latest to join the StarCoder gang.

StarCoder 2 can help you finish your code or find bits of code when you ask it in plain English, like most other code generators. Trained with 4x more data than the original StarCoder (67.5 terabytes versus 6.4 terabytes), StarCoder 2 delivers what Hugging Face, ServiceNow and Nvidia say is "much" better performance at lower costs.

StarCoder 2 can be tweaked "in a few hours" using a gadget like the Nvidia A100 on your own or someone else's data to create apps like chatbots and personal coding helpers. And, because it was trained on a more extensive and varied data set than the original StarCoder (~619 programming languages), StarCoder 2 can make more spot-on, intelligent predictions -- or so they claim.

StarCoder 2's success at coding depends on the test, but it seems to be faster than one of the versions of Code Llama, Code Llama 33B. Hugging Face says that StarCoder 2 15B matches Code Llama 33B on some code completion tasks at twice the speed.

However, it's not clear which tasks Hugging Face didn't mention. StarCoder 2, as an open-source bunch of models, also has the edge of running on your own and "learning" your code or codebase -- an excellent option for devs and companies who don't want to share their code with a cloud-based AI. Hugging Face, ServiceNow, and Nvidia say that StarCoder 2 is fairer- and less risky- than its rivals.

Unlike code generators like GitHub and Copilot, StarCoder 2 was trained only on data that was OK to use from Software Heritage, the charity that keeps code safe. Before StarCoder 2's training, BigCode, the team behind most of StarCoder 2's plan, gave code owners a chance to say no to the training set if they wanted. As with the original StarCoder, StarCoder 2's training data is up for developers to copy, redo or check.

TechCrunch's Kyle Wiggers said StarCoder 2's license may still be a problem for some as it is licensed under the BigCode Open RAIL-M 1.0, which aims to promote good use by putting 'light touch' limits on model users and downstream users.

"While less limiting than many other licenses, RAIL-M isn't truly 'open' because it doesn't let developers use StarCoder 2 for anything they want (medical advice-giving apps are a no-no, for example). Some commentators say RAIL-M's rules may be too vague to follow in any case -- and that RAIL-M could clash with AI rules like the EU AI Act."

Last modified on 29 February 2024
Rate this item
(0 votes)