Connect with us


Stability AI launches StableCode, a large language model for code generation

“We’re going to engage and work with the community to see what interesting directions they come up with and explore the generative developer space,” Cooper said.

Stability AI is well known for its Stable Diffusion image generation model, but that’s not all the generative AI startup is interested in. Stability AI is now also involved in code generation.

Stability AI today announced the first public release of StableCode, a new open large language model (LLM) designed to help users generate code. StableCode is available at three different levels: a base model for common use cases, a statement model, and a long context window model that supports up to 16,000 tokens.

The StableCode model uses the original programming language dataset from the open source BigCode project, as well as additional filtering and fine-tuning from Stability AI. Initially, StableCode will support development in the Python, Go, Java, JavaScript, C, and C++ programming languages.

“With StableCode, we would like to create a model like Stable Diffusion that has helped every person in the world become an artist,” said Christian Laforte, head of research at Stability AI. “Essentially let anyone who has good ideas and maybe has a problem write a program that just solves that problem.”

The preparation of any LLM relies on data, and for StableCode, this data comes from the BigCode project. Using BigCode as the basis for a generative AI code tool for LLM is not a new idea. Back in May, HuggingFace and ServiceNow launched the open-source StarCoder LLM program, which is based on BigCode.

Nathan Cooper, Lead Scientist at Stability AI, explained that StableCode training involved significant filtering and cleaning of BigCode data.

“We love BigCode, they do an amazing job in data management, model management and model training,” said Cooper. “We took their datasets, applied additional filters to improve the quality, as well as build a version of the model with a larger context window, and then trained it on our cluster.”

Stability AI has also taken a number of learning steps beyond the core BigCode model, Cooper said. These steps included sequential training in specific programming languages.

“It’s very similar to the natural language approach where you start by pre-training a generic model and then refining it on a specific set of problems, or in this case, languages,” Cooper said.

Looking beyond BigCode, the long context version of StableCode can provide significant benefits to users.

The long context window version of StableCode has a context window of 16,000 tokens, which Stability AI claims is more than any other model. Cooper explained that the longer context window allows for more specialized and complex code generation hints. This also means that the user can ask StableCode to look at a medium-sized codebase that includes several files to help understand and generate new code.

“You can use this longer context window to let the model know more about your codebase and what functions are defined in other files,” says Cooper. “That way, when she suggests code, it can be more tailored to your codebase and your needs.”

StableCode, like all modern generative AI models, is based on a transformer neural network. Instead of using the ALiBi (Attention with Linear Biases) approach for positioning outputs in the model – an approach taken by, for example, StarCoder for its open generative model – StableCode uses an approach known as RoPE (Rotary Position Embedding).

Cooper noted that the ALiBi approach in transformer models tends to value current tokens more than past ones. In his opinion, this approach is not ideal for code, because, unlike natural language, code does not have a given narrative structure with a beginning, middle, and end. Code functions can be defined at any point in the application’s flow.

“I don’t think coding fits this idea of ​​weighing the present more important than the past, so we use… RoPE, which doesn’t have that kind of bias where you weigh the present more than the past.”

The purpose of the first release is to see how developers will perceive and use this model.

“We’re going to engage and work with the community to see what interesting directions they come up with and explore the generative developer space,” Cooper said.