Traditional storage and database systems are designed to work for a wide range of applications because of how long it can take to build them.
As a first step toward this vision, Kraska and colleagues developed Tsunami and Bao.
Tsunami uses machine learning to automatically re-organize a dataset’s storage layout based on the types of queries that its users make.
Bao produces query plans that run up to 50 percent faster than those created by the PostgreSQL optimizer, meaning that it could help to significantly reduce the cost of cloud services.
By fusing the two systems together, Kraska hopes to build the first instance-optimized database system that can provide the best possible performance for each individual application without any manual tuning.
Traditionally, the systems we use to store data are limited to only a few storage options and, because of it, they cannot provide the best possible performance for a given application.
What Tsunami can do is dynamically change the structure of the data storage based on the kinds of queries that it receives and create new ways to store data, which are not feasible with more traditional approaches.
Kraska says that in contrast to other learning-based approaches to query optimization, Bao learns much faster and can outperform open-source and commercial optimizers with as little as one hour of training time.