Tajo is an advanced open source data warehouse system on Hadoop. Tajo has rapidly evolved over couple of years. In this talk, I’ll give an overview of Tajo, and I will present how Tajo has been improved for years. In particular, this talk will introduce new features of the recent major release Tajo 0.10: Hbase storage support, thin JDBC driver, direct JSON support, and Amazon EMR support. Then, I will present the upcoming features that currently Tajo community is doing: multi-tenant scheduler take 1, allowing multiple users to submit multiple queries into one cluster; nested schema support, allowing users to directly handle complex data types without flattening; more advanced SQL features like window frame, and subqueries.
Takeaway for audience:
Tajo provides ANSI SQL and scalable and low-latency SQL processing on various data sources like HDFS, S3, Openstack Swift, RDBMS, and HBase. Users can learn that Tajo is the best system for an unified SQL-on-Hadoop system on batch as well as low-latency workloads. Also, they will learn that Tajo can be a nice solution to users who already use RDBMSs and also want to introduce Hadoop-based data warehouse system or want to migrate existing RDBMs into Hadoop.
Hyunsik Choi, VP of Apache Tajo