21 06

IBM’s AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks.

While there’s a scant chance that machine learning models built on the CodeNet dataset will make human programmers redundant, there’s reason to be hopeful that they will make developers more productive.

With Project CodeNet, the researchers at IBM have tried to create a multi-purpose dataset that can be used to train machine learning models for various tasks.

The researchers at IBM have also gone through great effort to make sure the dataset is balanced along different dimensions, including programming language, acceptance, and error types.

CodeNet is not the only dataset to train machine learning models for programming tasks.

There are several ways CodeNet can be used to develop machine learning models for programming tasks.

Since each coding challenge in the dataset contains submissions of various programming languages, data scientists can use it to create machine learning models that translate code from one language to another.

Since CodeNet has a wealth of metadata about memory and execution-time metrics, data scientists can also use it to develop code optimization systems.

Or they can use the error-type metadata to train machine learning systems that flag potential flaws in source code.

CodeNet is a rich library of textual descriptions of problems and their corresponding source code.

Add your comment