Seminars > Berkeley Data Analytics Stack: Introducing Tachyon

Abstract

The Berkeley Data Analytics Stack (BDAS) is an open source software stack that integrates software components being built by the AMPLab to make sense of Big Data. Many systems in the stack provide orders of magnitudes better performance over other big data analytics tools, such as Hadoop. Today, BDAS' components are being used by numerous companies and institutions. In this talk, I will present an overview of the BDAS with a focus on Tachyon, a distributed filesystem that provides reliable file sharing at memory-speed across cluster frameworks.

Bio

Haoyuan Li is a Computer Science PhD student in the AMP Lab at UC Berkeley, working with Prof. Scott Shenker and Prof. Ion Stoica focusing on computer systems and big data research. During his Ph.D. study, he works on various components of BDAS. In particular, he leads the Tachyon project and co-created DStream (Spark Streaming). He is also a founding committer of Apache Spark. Before Berkeley, he worked at Conviva and Google on big data processing. Previous work PFP algorithm has been adopted by Apache Mahout. Haoyuan holds a M.S. from Cornell University and a B.S. from Peking University.