(1) Never-Ending Language Learning
Tom Mitchell - E. Fredkin University Professor,
Machine Learning Department, Carnegie Mellon University
We will never really understand learning until we can build
machines that learn many different things, over years, and become better
learners over time. We describe our research to build a Never-Ending Language
Learner (NELL) that runs 24 hours per day, forever, learning to read the web.
Each day NELL extracts (reads) more facts from the web, into its growing
knowledge base of beliefs. Each day NELL also learns to read better than the
day before. NELL has been running 24 hours/day for over four years now. The
result so far is a collection of 70 million interconnected beliefs (e.g.,
servedWtih(coffee, applePie)), NELL is considering at different levels of
confidence, along with millions of learned phrasings, morphological features,
and web page structures that NELL uses to extract beliefs from the web. NELL
is also learning to reason over its extracted knowledge, and to automatically
extend its ontology. Track NELL's progress at http://rtw.ml.cmu.edu,
or follow it on Twitter at @CMUNELL.
Tom M. Mitchell founded and chairs the Machine Learning
Department at Carnegie Mellon University, where he is the E. Fredkin
University Professor. His research uses machine learning to develop computers
that are learning to read the web, and uses brain imaging to study how the
human brain understands what it reads. Mitchell is a member of the U.S.
National Academy of Engineering, a Fellow of the American Association for the
Advancement of Science (AAAS), and a Fellow and Past President of the
Association for the Advancement of Artificial Intelligence (AAAI). He
believes the field of machine learning will be the fastest growing branch of
computer science during the 21st century.
This keynote speech slide can be downloaded at here.
(2) Smart Data - How you and I
will exploit Big Data for personalized digital health and many other
Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis - Wright State University
Big Data has captured a lot of interest in industry, with the
emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity,
and Veracity, and their applications to drive value for businesses.
Recently, there is rapid growth in situations where a big data challenge
relates to making individually relevant decisions. A key example is
personalized digital health that related to taking better decisions about our
health, fitness, and well-being. Consider for instance, understanding
the reasons for and avoiding an asthma attack based on Big Data in the form
of personal health signals (e.g., physiological data measured by devices/sensors
or Internet of Things around humans, on the humans, and inside/within the
humans), public health signals (e.g., information coming from the healthcare
system such as hospital admissions), and population health signals (such as
Tweets by people related to asthma occurrences and allergens, Web services
providing pollen and smog information). However, no individual has the
ability to process all these data without the help of appropriate technology,
and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by
extracting value from Big Data, to benefit not just large companies but each
individual. If my child is an asthma patient, for all the data relevant to my
child with the four V-challenges, what I care about is simply, “How is her
current health, and what are the risk of having an asthma attack in her
current situation (now and today), especially if that risk has changed?” As I
will show, Smart Data that gives such personalized and actionable information
will need to utilize metadata, use domain specific knowledge, employ
semantics and intelligent processing, and go beyond traditional reliance on
ML and NLP. I will motivate the need for a synergistic combination of
techniques similar to the close interworking of the top brain and the bottom
brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception,
that is, how to convert massive amounts of data into information, meaning,
and insight useful for human decision-making. For dealing with Variety, I
will discuss experience in using agreement represented in the form of
ontologies, domain models, or vocabularies, to support semantic interoperability
and integration. For Velocity, I will discuss somewhat more recent work
Semantics, which seeks to use dynamically created models of new objects,
concepts, and relationships, using them to better understand new cues in the
data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from
the domains of personalized health, energy, disaster response, and smart
city. I will present examples from a couple of these.
Amit P. Sheth (http://knoesis.org/amit) is an educator,
researcher, and entrepreneur. He is the LexisNexis Eminent Scholar and
founder/executive director of the Ohio Center of Excellence in
Knowledge-enabled Computing (Kno.e.sis) at
Wright State University. Kno.e.sis conducts research in
social/sensor/semantic data and Web 3.0 with real-world applications and
multidisciplinary solutions for translational research, healthcare and life
sciences, cognitive science, material sciences, and others. Kno.e.sis'
activities have resulted in Wright State University being recognized as
a top organization in the world on World
Wide Web in research impact. Prof. Sheth is one of top authors in
Computer Science, World Wide Web, and databases (cf: Microsoft Academic Search;
Google H-index). His research has led to several commercial products, many
real-world applications, and two earlier companies with two more in early
stages of development. One of these was Taalee/Voquette/Semagix, which was
likely the first company (founded in 1999) that developed Semantic Web
enabled search and analysis, and semantic application development platforms.
This keynote speech slide
can be downloaded at here.
(3) Addressing Human Bottlenecks in Big Data
Joseph M. Hellerstein, Chancellor's
Professor of Computer Science, University of California, Berkeley and Trifacta
We live in an era when compute is cheap, data is plentiful, and
system software is being given away for free.
Today, the critical bottlenecks in data-driven organizations are human
bottlenecks, measured in the costs of software developers, IT professionals,
and data analysts. How can computer
science remain relevant in this context?
The Big Data ecosystem presents two archetypal settings for answering
this question: NoSQL distributed databases, and
analytics on Hadoop.
In the case of NoSQL, developers are
being asked to build parallel programs for global-scale systems that cannot
even guarantee the consistency of a single register of memory. How can this possibly be made to work? I’ll talk about what we have seen in the
wild in user deployments, and what we’ve learned from developers and their
design patterns. Then I’ll present
theoretical results—the CALM Theorem—that shed light on what’s possible here,
and what requires more expensive tools for coordination on top of the typical
Finally, I will highlight some new approaches to writing and testing
software—exemplified by the Bloom language—that can help developers of
distributed software avoid expensive coordination when possible, and have the
coordination logic synthesized for them automatically when necessary.
In the Hadoop context, the key
bottlenecks lie with data analysts and data engineers, who are routinely
asked to work with data that cannot possibly be loaded into tools for
statistical analytics or visualization.
Instead, they have to engage in time-consuming data “wrangling”—to try
and figure out what’s in their data, whip it into a rectangular shape for
analysis, and figure out how to clean and integrate it for use. I’ll discuss what we heard talking with
data analysts in both academic interviews and commercial engagements. Then I’ll talk about how techniques from
human-computer interaction, machine learning, and database systems can be
brought together to address this human bottleneck, as exemplified by our work
on various systems including the Data Wrangler project and Trifacta's platform for data transformation.
Joseph M. Hellerstein is a
Chancellor's Professor of Computer Science at the University of California,
Berkeley, whose research focuses on data-centric systems and the way they
drive computing. A Fellow of the ACM, his work has been recognized via awards
including an Alfred P. Sloan Research Fellowship, MIT Technology Review's
TR10 and TR100 lists, Fortune Magazine's "Smartest in Tech" list,
and three ACM-SIGMOD "Test of Time" awards. In 2012, Joe co-founded Trifacta,
where he currently serves as Chief Strategy Officer.
This keynote speech slide can be downloaded at here.