Connect with us


IBM Offers AI to Translate COBOL Code to Java

IBM sees a future in broader AI tools for code generation as well, intending to compete with applications such as GitHub Copilot and Amazon CodeWhisperer.

COBOL, or Common Business Oriented Language, is one of the oldest programming languages, having appeared around 1959. According to a 2022 study, more than 800 billion lines of COBOL code are used in production systems, compared to about 220 billion in 2017.

However, COBOL has a reputation for being difficult to learn and inefficient. Why not upgrade to a more modern one? For large organizations, this is usually a complex and expensive undertaking, given the small number of COBOL specialists in the world. In 2012, Commonwealth Bank of Australia began replacing its core COBOL platform, a process that took five years and cost over $700 million.

In an effort to offer a new solution to the challenge of modernizing COBOL applications, IBM today introduced Code Assistant for IBM Z, which uses an artificial intelligence model to translate COBOL code into Java. Code Assistant will go public in Q4 2023 and will be previewed at the IBM TechXchange in Las Vegas in early September of this year.

Code Assistant is designed to help enterprises refactor mainframe applications, ideally while maintaining performance and security, according to IBM Research Principal Scientist Ruchira Puri. Code Assistant runs both on-premises and in the cloud as a managed service. It is based on the CodeNet code generation model , which is able to understand not only COBOL and Java, but also about 80 different programming languages.

“IBM has created a new, state-of-the-art generative AI model for code that enables legacy COBOL programs to be converted to enterprise Java with a highly natural generated code,” Puri told TechCrunch. “In addition to code conversion, Code Assistant supports the full application modernization lifecycle and helps developers understand, refactor, transform, and validate translated code in a modern architecture.”

According to Puri, CodeNet, trained on 1.5 trillion. tokens and with 20 billion parameters, was designed with a large context window of 32,000 tokens to “capture a wider context” for “a more efficient transformation of COBOL to Java”. Parameters are parts of the model learned from historical training data that essentially define the skill of the model in solving a problem, such as generating text, and “tokens” are raw text (for example, “fan”, “tas” and ” tic” for the word “fantastic”). As for the context window, it refers to the text that the model considers before generating a new one.

Today, there are many tools, applications, and services for converting COBOL applications to Java syntax, some of which are fully automated. Puri acknowledges this, but argues that Code Assistant does not sacrifice COBOL capabilities while providing cost savings and code that is easy to maintain—unlike some competing offerings on the market.

“IBM created Code Assistant to be able to mix and match COBOL and Java services,” Puri said. “If the “understanding” and “refactoring” capabilities of the system recommend that one or another application subservice should remain in COBOL, it will be left as such, and other subservices will be converted to Java.”

This does not mean that Code Assistant is flawless. A recent Stanford study found that software engineers who use artificial intelligence-like systems to generate code are the most likely to introduce vulnerabilities in the applications they develop. Moreover, Puri cautions against deploying code created by Code Assistant before it has been reviewed by human experts.

“Like any other AI system, enterprise COBOL applications can have unique usage patterns that Code Assistant may not have mastered yet,” says Puri. “In order to keep code secure, it needs to be scanned with modern vulnerability scanners.”

Risks aside, IBM certainly sees tools like Code Assistant as important to its future development. Today, about 84% of IBM mainframe customers use COBOL, mostly in the financial and government sectors. While IBM’s mainframe business is still a significant part of its overall business, the company sees mainframes as a bridge to the vast and profitable hybrid computing environments it also provides and supports.

IBM sees a future in broader AI tools for code generation as well, intending to compete with applications such as GitHub Copilot and Amazon CodeWhisperer. In May, IBM launched fm.model.code as part of its Watsonx AI service, which powers the Watson Code Assistant, which allows developers to generate code using simple English prompts in various programs.