Description: Description: Description: Description: Description: Description: Description: D:\Academic\TsG\Conferences\conference pre works\bigdata2014\BigData 2014 map network drive\whitehouse.png


      What's New

      Important Dates

      Online Submission


      Special Session



      Program Committee

      Program Schedule

      Keynote Speeches

      Panel with Program Directors  


      Doctoral Symposium


      Accepted Papers


      Student Travel Award

      Visa to USA

      Travel Information

      About Washington DC


Description: Description: Description: Description: Description: Description: Description: Description: D:\Academic\TsG\Conferences\conference pre works\bigdata2014\BigData 2014 map network drive\ieee_mb_blue.jpg          

Description: Description: Description: Description: Description: Description: Description: Description: D:\Academic\TsG\Conferences\conference pre works\bigdata2014\BigData 2014 map network drive\image_gallery.gif


























Keynote Speeches                                                                                    


 (1) Never-Ending Language Learning

Tom Mitchell - E. Fredkin University Professor, Machine Learning Department, Carnegie Mellon University


We will never really understand learning until we can build machines that learn many different things, over years, and become better learners over time. We describe our research to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web. Each day NELL extracts (reads) more facts from the web, into its growing knowledge base of beliefs. Each day NELL also learns to read better than the day before. NELL has been running 24 hours/day for over four years now. The result so far is a collection of 70 million interconnected beliefs (e.g., servedWtih(coffee, applePie)), NELL is considering at different levels of confidence, along with millions of learned phrasings, morphological features, and web page structures that NELL uses to extract beliefs from the web. NELL is also learning to reason over its extracted knowledge, and to automatically extend its ontology. Track NELL's progress at, or follow it on Twitter at @CMUNELL.


Tom M. Mitchell founded and chairs the Machine Learning Department at Carnegie Mellon University, where he is the E. Fredkin University Professor. His research uses machine learning to develop computers that are learning to read the web, and uses brain imaging to study how the human brain understands what it reads. Mitchell is a member of the U.S. National Academy of Engineering, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI). He believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.

This keynote speech slide can be downloaded at here.




(2) Smart Data - How you and I will exploit Big Data for personalized digital health and many other activities

Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis - Wright State University


Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses.   Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions.  A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being.  Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information).  However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!

In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.  I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.

For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration.  For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.  

Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.


Amit P. Sheth ( is an educator, researcher, and entrepreneur. He is the LexisNexis Eminent Scholar and founder/executive director of the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University. Kno.e.sis conducts research in social/sensor/semantic data and Web 3.0 with real-world applications and multidisciplinary solutions for translational research, healthcare and life sciences, cognitive science, material sciences, and others. Kno.e.sis' activities have resulted in Wright State University being recognized as a top organization in the world on World Wide Web in research impact. Prof. Sheth is one of top authors in Computer Science, World Wide Web, and databases (cf: Microsoft Academic Search; Google H-index). His research has led to several commercial products, many real-world applications, and two earlier companies with two more in early stages of development. One of these was Taalee/Voquette/Semagix, which was likely the first company (founded in 1999) that developed Semantic Web enabled search and analysis, and semantic application development platforms.

This keynote speech slide can be downloaded at here.


(3) Addressing Human Bottlenecks in Big Data

Joseph M. Hellerstein, Chancellor's Professor of Computer Science, University of California, Berkeley and Trifacta


We live in an era when compute is cheap, data is plentiful, and system software is being given away for free.  Today, the critical bottlenecks in data-driven organizations are human bottlenecks, measured in the costs of software developers, IT professionals, and data analysts.  How can computer science remain relevant in this context?  The Big Data ecosystem presents two archetypal settings for answering this question: NoSQL distributed databases, and analytics on Hadoop.

In the case of NoSQL, developers are being asked to build parallel programs for global-scale systems that cannot even guarantee the consistency of a single register of memory.  How can this possibly be made to work?  I’ll talk about what we have seen in the wild in user deployments, and what we’ve learned from developers and their design patterns.  Then I’ll present theoretical results—the CALM Theorem—that shed light on what’s possible here, and what requires more expensive tools for coordination on top of the typical NoSQL offerings.  Finally, I will highlight some new approaches to writing and testing software—exemplified by the Bloom language—that can help developers of distributed software avoid expensive coordination when possible, and have the coordination logic synthesized for them automatically when necessary.

In the Hadoop context, the key bottlenecks lie with data analysts and data engineers, who are routinely asked to work with data that cannot possibly be loaded into tools for statistical analytics or visualization.  Instead, they have to engage in time-consuming data “wrangling”—to try and figure out what’s in their data, whip it into a rectangular shape for analysis, and figure out how to clean and integrate it for use.  I’ll discuss what we heard talking with data analysts in both academic interviews and commercial engagements.  Then I’ll talk about how techniques from human-computer interaction, machine learning, and database systems can be brought together to address this human bottleneck, as exemplified by our work on various systems including the Data Wrangler project and Trifacta's platform for data transformation.


Joseph M. Hellerstein is a Chancellor's Professor of Computer Science at the University of California, Berkeley, whose research focuses on data-centric systems and the way they drive computing. A Fellow of the ACM, his work has been recognized via awards including an Alfred P. Sloan Research Fellowship, MIT Technology Review's TR10 and TR100 lists, Fortune Magazine's "Smartest in Tech" list, and three ACM-SIGMOD "Test of Time" awards.  In 2012, Joe co-founded Trifacta, Inc (, where he currently serves as Chief Strategy Officer.

This keynote speech slide can be downloaded at here.




Last update: 13 Nov. 2014