* An open source big data warehouse system: Apache Tajo


Tajo is an advanced open source data warehouse system on Hadoop. Tajo has rapidly evolved over couple of years. In this talk, I’ll give an overview of Tajo, and I will present how Tajo has been improved for years. In particular, this talk will introduce new features of the recent major release Tajo 0.10: Hbase storage support, thin JDBC driver, direct JSON support, and Amazon EMR support. Then, I will present the upcoming features that currently Tajo community is doing: multi-tenant scheduler take 1, allowing multiple users to submit multiple queries into one cluster; nested schema support, allowing users to directly handle complex data types without flattening; more advanced SQL features like window frame, and subqueries.

Takeaway for audience:

Tajo provides ANSI SQL and scalable and low-latency SQL processing on various data sources like HDFS, S3, Openstack Swift, RDBMS, and HBase. Users can learn that Tajo is the best system for an unified SQL-on-Hadoop system on batch as well as low-latency workloads. Also,  they will learn that Tajo can be a nice solution to users who already use RDBMSs and also want to introduce Hadoop-based data warehouse system or want to migrate existing RDBMs into Hadoop.


Hyunsik Choi, VP of Apache Tajo