EEUM researcher presents solution to optimise data analytics operations

The multiple situations in which the transactional system does not ensure isolation to databases, thus compromising their efficiency, were the starting point for the research dedicated to the design and implementation of isolation in the query engine, preparing it for hybrid workloads.

Databases that support structures like banking or commerce deal with high transactional loads daily, and the disruptions caused by update hotspots are likely to occur — situations that lead to conflicts, waiting times, and degradation of their performance. In this sense, a transactional system is responsible for ensuring isolation, i.e., preventing two or more simultaneous transactions from originating said conflicts. Faced with said situation, the isolation acts to cancel one of the transactions, in order to ensure the system’s consistency.

However, this transactional isolation is not always efficient, flexible and simple when it comes to analytical databases that provide exceptional analytical performance, but have limited transactional guarantees — they only support transactions that modify a data line or a lower level of isolation — or low performance.

In order to address this problem, Nuno Faria, a researcher at HasLab/INESC TEC, developed the TiQuE solution, which aims to design and implement transactional isolation at the query engine level, the database component responsible for optimising and executing the requests made to said databases, preparing it for hybrid workloads — which combine a transactional and analytical component. As for the first, it refers to a workload that is often composed of many short-duration read-write transactions. For instance, a purchase on an online store or a login made by one of the customers. The second encompasses reading questions that process a large volume of data, e.g., a weekly report.

The solution proposed in the paper TiQuE: Improving the Transactional Performance of Analytical Systems for True Hybrid Workloads, was part of the 49th International Conference on Very Large Data Bases (VLDB), an A*ranking conference – considered one of the world’s leading events in the field of Computer Sciences. This year’s edition took place in Vancouver, Canada. The development process was supported by MonetDB Solutions, whose database was used as a proof of concept.

One of the problems that the solution addresses is the execution of hybrid workloads with low overhead (in computational terms, the combination of the time and resources required to perform a task) based on a single database. An alternative would be to create multiple databases for multiple workloads. However, and according to the researcher, this would represent an increased effort due to the need to replicate the same databases — which would result in delayed analytical data considering the reference — and the increasing complexity and management costs.

The TiQuE solution allows the implementation of transactional isolation without requiring “changes to the database engine code, so one can turn it on or off in specific tables, supporting various types of isolations. At the same time, it also enables efficient transactional isolation for analytical workloads, like data-cleaning.

The design of transactional isolation through the query engine is, according to the researcher at INESC TEC High-Assurance Software Laboratory (HASLab), the great innovative aspect of this solution. “It is important not only because it facilitates its implementation, but also because it allows transactional isolation to be optimised by the workload-based query engine”.

Through the implementation of the TiQuE solution, namely in applications of banking institutions — which execute hybrid workloads, i.e., dealing with significant quantities of customer orders and low latency needs —, processes are simplified to perform complex and time-consuming analytical operations on data, with an impact on future decision making.

The researcher mentioned in this news piece is associated with INESC TEC and UMinho.