Uber Introduces AresDB: GPU-Powered, Open-Source, Real-Time Analytics Engine Like Print Bookmarks
Uber recently introduced AresDB, an open-source real-time analytics engine leveraging an unconventional power source – graphics processing units (GPUs) – for meeting the growing demands of analysis at scale and at the same time unifying, simplifying and improving Uber's existing solutions.
AresDB is written in C++ and Golang and was released in November 2018. It is an addition to Uber's repertoire of open-source contributions.
The realm of real-time analysis has many existing technologies, some of which – Apache Pinot, Elasticsearch – have been used by Uber, but as company engineers stated in their "Introducing AresDB" post, no single solution simultaneously addressed all of Uber's functional, scalability, performance, cost, and operational requirements.
To tackle this problem, Uber focussed on using GPUs since the typical real-time analytical queries at Uber – used for functions such as powering dashboards to monitor business metrics and making automated decisions (like trip pricing and fraud detection) based on the metrics collected – involve filtering and aggregating millions and billions of records. The fast parallel-processing model of general-purpose GPUs is tailor-made to handle these kinds of computation tasks that can be parallelized.
AresDB only uses GPUs at the time of query processing. It handles data ingestion using CPUs (data is stored in host memory) and handles recovery via disks. At query time, it transfers data from host memory to GPU memory for parallel processing as evident from the following high-level overview diagram of AresDB's architecture:
Image source: https://eng.uber.com/aresdb/
AresDB's design has the following features: column-based storage, real-time upsert, and GPU-powered query processing
Column-based storage has been implemented to enable compression for storage and query efficiency. There are two categories of stores – a Live store for recently ingested data stored in an uncompressed, unsorted format and an Archive store for mature, sorted and compressed data.
Real-time upsert with primary key deduplication has been implemented to increase data accuracy and provide "near real-time data freshness" within seconds. As part of real-time ingestion, AresDB classifies records as "late" or not. Records considered as "late" are put into the archive store whereas fresh records go into the live store. A scheduled archiving process also periodically takes records from the live store (after they can be considered to be mature) and merges them into the archived store.
GPU-powered query processing uses highly parallelized data processing by GPUs to provide low query latencies. To run queries against AresDB, users need to use the Ares Query Language (AQL) in which the queries are specified using JSON, YAML and Golang objects. According to the introduction post, a benefit of not using SQL like languages is that "In JSON-format, AQL provides better programmatic query experience than SQL for dashboard and decision system developers, because it allows them to easily compose and manipulate queries using code without worrying about issues like SQL injection."
However, as stated in the announcement post, supporting SQL for querying is one of the future steps which the Uber engineering team plans to take to improve the user experience.
AresDB is open-sourced under the Apache 2.0 license and is being used at Uber to extract business insights in real-time enabling data-driven decision making to improve user experience on the Uber platform. The introduction post also states that in future, Uber intends to make the following improvements to the project:
- Distributed design to improve scalability and reduce operational costs.
- Developer support including more intuitive tooling and enriching documentation for improved onboarding experience.
- Expanding feature set to include functionality such as window functions and nested loop joins.
- Query engine optimization using techniques like Low Level Virtual Machine (LLVM) and GPU memory caching.
More information can be found at the AresDB project GitHub wiki.
This content is in the AI, ML & Data Engineering topic
Related Vendor Content
The InfoQ Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
Close Close Close OKOriginal Article