(1)    How Big Data changes Statistical Machine Learning.
Dr. Léon Bottou, Facebook AI Research, New York
This presentation illustrates how big data forces change on algorithmic techniques and the goals of machine learning, bringing along challenges and opportunities.
1.     The theoretical foundations of statistical machine learning traditionally assume that training data is scarce. If one assumes instead that data is abundant and that the bottleneck is the computation time, stochastic algorithms with poor optimization performance become very attractive learning algorithms. These algorithms quickly became the backbone of large-scale machine learning and are the object of very active research.
2.     Increasing the training set size cannot improve average errors indefinitely. However this diminishing returns problem vanishes if we measure instead the diversity of conditions in which the trained system performs well. In other words, big data is not an opportunity to increase the average accuracy, but an opportunity to increase coverage. Machine learning research must broaden its statistical framework in order to embrace all the (changing) aspects of real big data problems. Transfer learning, causal inference, and deep learning are successful steps in this direction.
Léon Bottou received the Diplôme d’Ingénieur de l’École Polytechnique (X84) in 1987, the Magistère de Mathématiques Fondamentales et Appliquées et d’Informatique from École Normale Supérieure in 1988, and a doctorat from Université de Paris-Sud in 1991. His research career took him to AT&T Bell Laboratories, AT&T Labs Research, NEC Labs America and Microsoft. He joined Facebook AI Research in 2015.
The long term goal of Léon’s research is to understand how to build human-level intelligence. Although reaching this goal requires conceptual advances that cannot be anticipated at this point, it certainly entails clarifying how to learn and how to reason. Leon Bottou best known contributions are his work on neural networks in the 90s, his work on large scale learning in the 00’s, and possibly his more recent work on causal inference in learning systems. Léon is also known for the DjVu document compression technology.
(2)    Moving Past the "Wild West" Era for Big Data
H. V. Jagadish, Bernard A Galler Collegiate Professor of Electrical Engineering and Computer Science, University of Michigan
The potential of Big Data is widely recognized and many are seeking fortunes with Big Data today, just as they once sought fortunes by heading West in America.
While success was initially limited only by creativity and passion, over time we need civilization, with all its accompanying benefits and constraints.
As the field of Big Data matures, it is approaching the end of the "Wild West" era. In this talk, I will suggest what the "civilized" era may look like.
Hosagrahar Visvesvaraya Jagadish (Jag) is a computer scientist in the field of database systems research. He is the Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan at Ann Arbor and a Senior Scientific Director of the National Center for Integrative Biomedical Informatics established by the National Institutes of Health. Prior to joining the Michigan faculty, he spent over a decade at AT&T Bell Laboratories as a research scientist where he would eventually become head of the Database division.
Jagadish earned his bachelor's degree from the Indian Institute of Technology, Delhi and a doctorate in Electrical Engineering from Stanford University in 1985.
He was elected fellow of the Association for Computing Machinery in 2003 and trustee of the VLDB Endowment in 2004. He was the founding editor of the Proceedings of the VLDB Endowment (PVLDB) in 2008.
(3)    Conquering Big Data with Spark
Prof. Ion Stocia, UC Berkeley, USA
Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. To address this challenge, we have developed Berkeley Data Analytics Stack (BDAS), an open source data analytics stack for big data processing.
In this talk I'll focus on the execution engine in BDAS: Apache Spark. Apache Spark is a cluster computing engine that is optimized for in-memory processing, and unifies support for a variety of workloads, including batch, streaming, and iterative computations. Spark is now the most active big data project in the open source community, and is already being used by over one thousand organizations.
Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. He received his PhD from Carnegie Mellon University, and his B.S. from Polytechnic Institute Bucharest. He does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. His current research focuses on resource management and scheduling for data centers, cluster computing frameworks, and network architectures. He is an ACM Fellow and has received numerous awards, including the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2006, he co-founded Conviva, a startup to commercialize technologies for large scale video distribution, and in 2013, he co-founded Databricks a startup to commercialize technologies for Big Data processing.