IBM’s AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks.
While there’s a scant chance that machine learning models built on the CodeNet dataset will make human programmers redundant, there’s reason to be hopeful that they will make developers more productive.
With Project CodeNet, the researchers at IBM have tried to create a multi-purpose dataset that can be used to train machine learning models for various tasks.
The researchers at IBM have also gone through great effort to make sure the dataset is balanced along different dimensions, including programming language, acceptance, and error types.
CodeNet is not the only dataset to train machine learning models for programming tasks.
There are several ways CodeNet can be used to develop machine learning models for programming tasks.
Since each coding challenge in the dataset contains submissions of various programming languages, data scientists can use it to create machine learning models that translate code from one language to another.
Since CodeNet has a wealth of metadata about memory and execution-time metrics, data scientists can also use it to develop code optimization systems.
Or they can use the error-type metadata to train machine learning systems that flag potential flaws in source code.
CodeNet is a rich library of textual descriptions of problems and their corresponding source code.
- The Automation-Human Balance Takes Shape in Security
- 3 Tactics to Accelerate a Digital Transformation
- Putting Production on Repeat with Machine Tool Automation
- AI in manufacturing: Optimizing costs and enabling the workforce
- RPA: Why you need to care about this totally unsexy technology
- Buildings IOT Implements Smart Building Management System for Thor Equities’ 800 Fulton Market Development in Chicago
- Artificial Intelligence (AI) in Energy