2016 IEEE International Confernece on Big Data

Keynote Speakers

Prof.Elisa Bertino, Purdue Univeristy, USA	Prof. Jiawei Han Abel Bliss Professor University of Illinois at Urbana-Champaign, USA	Dr. Mark Johnson Advanced Manufacturing Office U.S. Department of Energy USA
Dr. Michael Stonebraker Paradigm4/MIT, USA	Dr. Chaitanya Baru NSF, USA	Dr. Guruduth Banavar IBM, USA

Big Data Security and Privacy

Keynote PPT

Prof. Elisa Bertino, Purdue University, USA

Abstract:Technological advances and novel applications, such as sensors, cyber-physical systems, smart mobile devices, cloud systems, data analytics, and social networks, are making possible to capture, and to quickly process and analyze huge amounts of data from which to extract information critical for security-related tasks. In the area of cyber security, such tasks include user authentication, access control, anomaly detection, user monitoring, and protection from insider threat. By analyzing and integrating data collected on the Internet and Web one can identify connections and relationships among individuals that may in turn help with homeland protection. By collecting and mining data concerning user travels and disease outbreaks one can predict disease spreading across geographical areas. And those are just a few examples; there are certainly many other domains where data technologies can play a major role in enhancing security. The use of data for security tasks is however raising major privacy concerns. Collected data, even if anonymized by removing identifiers such as names or social security numbers, when linked with other data may lead to re-identify the individuals to which specific data items are related to. Also, as organizations, such as governmental agencies, often need to collaborate on security tasks, data sets are exchanged across different organizations, resulting in these data sets being available to many different parties. Apart from the use of data for analytics, security tasks such as authentication and access control may require detailed information about users. An example is multi-factor authentication that may require, in addition to a password or a certificate, user biometrics. Recently proposed continuous authentication techniques extend access control system. This information if misused or stolen can lead to privacy breaches.It would then seem that in order to achieve security we must give up privacy. However this may not be necessarily the case. Recent advances in cryptography are making possible to work on encrypted data – for example for performing analytics on encrypted data. However much more needs to be done as the specific data privacy techniques to use heavily depend on the specific use of data and the security tasks at hand. Also current techniques are not still able to meet the efficiency requirement for use with big data sets. In this talk we will discuss methods and techniques to make this reconciliation possible and identify research directions.

Elisa Bertino is professor of computer science at Purdue University, and serves as Research Director of the Center for Information and Research in Information Assurance and Security (CERIAS). She is also an adjunct professor of Computer Science & Info tech at RMIT. Prior to joining Purdue in 2004, she was a professor and department head at the Department of Computer Science and Communication of the University of Milan. She has been a visiting researcher at the IBM Research Laboratory (now Almaden) in San Jose, at the Microelectronics and Computer Technology Corporation, at Rutgers University, at Telcordia Technologies. Her recent research focuses on data security and privacy, digital identity management, policy systems, and security for the Internet-of-Things. She is a Fellow of ACM and of IEEE. She received the IEEE Computer Society 2002 Technical Achievement Award, the IEEE Computer Society 2005 Kanai Award, and the ACM SIGSAC 2014 Outstanding Contributions Award. She i s currently serving as EiC of IEEE Transactions on Dependable and Secure Computing.

On the Power of Big Data: Mining Structures from Massive, Unstructured Text Data

Keynote PPT

Prof. Jiawei Han, Abel Bliss Professor, University of Illinois at Urbana-Champaign, USA

Abstract:The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to turn such massive unstructured data into structured ones, and then to structured networks and actionable knowledge. We propose a data-intensive text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and relationships among entities can be discovered by meta-path guided network embedding. Finally, we propose a D2N2K (i.e., data-to-network-to-knowledge) paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowlege.

Jiawei Han is Abel Bliss Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He has been researching into data mining, information network analysis, database systems, and data warehousing, with over 700 journal and conference publications. He has chaired or served on many program committees of international conferences, including PC co-chair for KDD, SDM, and ICDM conferences, and Americas Coordinator for VLDB conferences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data and the Director of Information Network Academic Research Center supported by U.S. Army Research Lab, and is the co-Director of KnowEnG, an NIH funded Center of Excellence in Big Data Computing. He is a Fellow of ACM and Fellow of IEEE, and received 2004 ACM SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, 2009 M. Wallace McDowell Award from IEEE Computer Society. His co-authored book "Data M ining: Concepts and Techniques" has been adopted as a textbook popularly worldwide.

Leveraging High Performance Computing to Drive Advanced Manufacturing R&D at the US Department of Energy

Dr. Mark Johnson, Director, Advanced Manufacturing Office, U.S. Department of Energy

Abstract: Manufacturing is a critical component of the U.S. economy, responsible for 12.5% of GDP, direct employment for over 12 million people, and close to 75% of U.S. exports of goods. The U.S. manufacturing sector, while it produces 17% of the world's manufacturing output, also represents a quarter of the country's energy consumption. On the R&D side, it is responsible for 70% of all private sector R&D performed (in 2010 and 2011) and nearly 60% of patent applications. A number of emerging technologies are driving shifts in traditional manufacturing, in particular the convergence of information and communication technology with the materials and process technologies of manufacturing. Particularly for energy intensive and energy-dependent industries, harnessing IT to reduce energy usage while simultaneously making companies more competitive is essential to the future of U.S. manufacturing, competitiveness, and productivity.
This talk will review the Advanced Manufacturing Offices work to leverage high performance computing, smart manufacturing approaches for the U.S. clean energy manufacturing sector—through targeted R&D in modeling and simulation and partnerships with industry, academia, technology incubators and other stakeholders.

Mark Johnson, Ph.D. serves as the Director of the Advanced Manufacturing Office (AMO) in the Office of Energy Efficiency and Renewable Energy (EERE). AMO is focused on creating a fertile innovation environment for advanced manufacturing, enabling vigorous domestic development of new energy-efficient manufacturing processes and materials technologies to reduce the energy intensity and life-cycle energy consumption of manufactured products.
Previously, Mark served as a Program Director in the Advanced Research Projects Agency–Energy (ARPA-E) where he had the longest tenure in that post—from ARPA-E's formation in 2010 to mid-2013. At ARPA-E, Mark led initiatives to advance energy storage and critical materials, as well as projects in small business, advanced semiconductor, novel wind architectures, superconductors and electric machines
He also served as the Industry and Innovation Program Director for the Future Renewable Electric Energy Delivery and Management (FREEDM) Systems Center. This is a National Science Foundation Gen-111 Engineering Research Center targeting the convergence of power electronics, energy storage, renewable resource integration and information technology for electric power systems.
Mark joins EERE on assignment from North Carolina State University, where he is an Associate Professor of Materials Science and Engineering. His research has focused on crystal growth and device fabrication of compound semiconductor materials with electronic and photonic applications. Mark also taught in the Technology, Entrepreneurship and Commercialization program jointly between the NC State Colleges of Management and Engineering. In addition to his academic career, Mark is an entrepreneur and early stage leader in Quantum Epitaxial Designs (now International Quantum Epitaxy), EPI Systems (now Veeco) and Nitronex (now GaAs Labs).
Mark has a bachelor's degree from MIT and a Ph.D., from NC State, both in Materials Science and Engineering.

Database Decay and How to Avoid It

Keynote PPT

Dr. Michael Stonebraker, Paradigm4/MIT, USA

Abstract:The traditional wisdom for designing database schemas is to use a design tool (typically based on a UML or ER model) to construct an initial data model for one's data. When one is satisfied with the result, the tool will automatically construct a collection of 3rd normal form relations for the model. Then applications are coded against this relational schema. When business circumstances change (as they do frequently) one should run the tool again to produce a new data model and a new resulting collection of tables. The new schema is populated from the old schema, and the applications are altered to work on the new schema, using relational views whenever possible to ease the migration. In this way, the database remains in 3rd normal form, which represents a "good" schema, as defined by DBMS researchers. "In the wild", schemas often change once a quarter or more often, and the traditional wisdom is to repeat the above exercise for each alteration. In this paper we report that the traditional wisdom appears to be rarely-to-never followed "in the wild" for large, multi-department applications. Instead DBAs appear to attempt to minimize application maintenance (and hence schema changes) instead of maximizing schema quality. This leads to schemas which quickly diverge from ER or UML models and actual database semantics tend to drift farther and farther from 3rd normal form.We term this divergence of reality from 3rd normal form principles database decay. Obviously, this is a very undesirable state of affairs. In this paper we explore the reasons for database decay and tactics to avoid it. These include defensive schemas, defensive application programs and a different model for interacting with a database

Dr. Stonebraker has been a pioneer of data base research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as Chief Technology Officer of Paradigm4 and Tamr, Inc.Professor Stonebraker was awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994, and was elected to the National Academy of Engineering in 1997. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award, and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the Intel Science and Technology Center focused on big data.

Harnessing the Data Revolution: A Perspective from the National Science Foundation

Dr. Chaitanya Baru, NSF, USA

Abstract:This talk will introduce NSF's vision for moving beyond initial, isolated approaches for data science research, services, and infrastructure, towards a cohesive, federated, national-scale approach to harness the data revolution and transform US science, engineering, and education over the next decade and beyond.

Chaitan Baru is Senior Advisor for Data Science in the Computer and Information Science and Engineering (CISE) Directorate at the National Science Foundation. He is there on assignment from the San Diego Supercomputer, UC San Diego, where he is Associate Director for Data Initiatives. At NSF, he coordinates the cross-Foundation BIGDATA research program, advises the NSF Big Data Hubs and Spokes program, assists in strategic planning, and participates in interdisciplinary and inter-agency Data Science-related activities. He co-chairs the Big Data Inter-agency Working Group, and is co-author of the US Federal Big Data R&D Strategic Plan (https://www.whitehouse.gov/sites/default/files/microsites/ostp/NSTC/bigdatardstrategicplan-nitrd_final-051916.pdf), released in May 2016 under the auspices of the Networking and Information Technology R&D (NITRD) group of the National Coordination Office, White House Office of Science and Technology Policy.

Cognitive Computing: From breakthroughs in the lab to applications on the field

Dr. Guruduth S. Banavar, Vice President and Chief Science Officer, Cognitive Computing, IBM

Abstract:In the last decade, the availability of massive amounts of new data, the development of new machine learning technologies, and the availability of scalable computing infrastructure, have given rise to a new class of computing systems. These "Cognitive Systems" learn from data, reason from models, and interact naturally with us, to perform complex tasks better than either humans or machines can do by themselves. These tasks range from answering questions conversationally to extracting knowledge for discovering insights to evaluating options for difficult decisions. These cognitive systems are designed to create new partnerships between people and machines to augment and scale human expertise in every industry, from healthcare to financial services to education. This talk will provide an overview of cognitive computing, the technology breakthroughs that are enabling this trend, and the practical applications of this technology that are transforming every industry.

Dr. Guru Banavar is vice president and chief science officer for cognitive computing at IBM. He is responsible for advancing the next generation of cognitive technologies and solutions with IBM's global scientific ecosystem, including academia, government agencies and other partners. Most recently, he led the team responsible for creating new AI technologies and systems in the family of IBM Watson, designed to augment human expertise in all industries. Previously, as chief technology officer for IBM's Smarter Cities initiative, Banavar designed and implemented big data and analytics systems to help make cities, such as Rio de Janeiro and New York, more livable and sustainable. Prior to that, he was director of IBM Research in India, where he and his team received a presidential award for improving technology access with the Spoken Web project. Across his career, Banavar and his team have delivered a range of products and solutions for IBM and its clients. He has also served on task forces such as NY Governor Cuomo's commission to improve resilience to natural disasters. He holds more than 25 patents and has published extensively, with his work featured in media outlets around the world.