Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: D:\Academic\TsG\Conferences\conference pre works\bigdata2014\BigData 2014 map network drive\whitehouse.png

 

      What's New

      Important Dates

      Online Submission

      Workshops

      Special Session

      Highlights  

      Organization

      Program Committee

      Program Schedule

      Keynote Speeches

      Panel with Program Directors  

      Tutorial

      Doctoral Symposium

      Sponsors

      Accepted Papers

      Registration

      Student Travel Award

      Visa to USA

      Travel Information

      About Washington DC

      Hotel

Description: Description: Description: Description: Description: Description: Description: Description: D:\Academic\TsG\Conferences\conference pre works\bigdata2014\BigData 2014 map network drive\ieee_mb_blue.jpg          

Description: Description: Description: Description: Description: Description: Description: Description: D:\Academic\TsG\Conferences\conference pre works\bigdata2014\BigData 2014 map network drive\image_gallery.gif

          

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

IEEE BigData 2014 Program Schedule                                                                                    

Washington DC
USA
Oct 27-30, 2014

Program

 

 October 27, 2014
 October 28, 2014
 October 29, 2014
 October 30, 2014

 


 

Keynote Lecture
Main conference regular paper: 25 minutes (about 20 minutes for talk and 5 minutes for Q and A)
Main conference short paper: 15 minutes (about 11 minutes for talk and 4 minutes for Q and A)

 


 

 

26-Oct

15:30-20:00

Registration

Venue:

Ballroom Foyer and Ballroom Coatroom

 

 

27-Oct

7:30-18:00

Venue:

Registration

Ballroom Foyer and Ballroom Coatroom

10:00-10:20
and
15:20-15:40


Coffee Break at Meeting Room Foyer


08:00-18:30

 


Sessions


Session Chair


Venue

Special session I

From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems

Sudarsan Rachuri

Ronay AK


JUDICIARY SUITE

Full  day event

Doctoral Consortium

Jingrui He

Patuxent

Full-day workshop

 

 

#2  

 

 

The 2nd Workshop on Scalable Machine Learning: Theory and Applications

 

Zenglin Xu

 

Waterford

#3  

1st International Workshop on High Performance Big Graph Data Management, Analysis, and Mining

Fengguang Song

Lalique

#8   

 

The 2nd International Workshop of BigData in Bioinformatics and Healthcare Informatics

Jun Huan

Haverford

#13  

First Hands-On Workshop on Leveraging High Performance Computing Resources for Managing Large Datasets

Ritu Arora

Baccarat

#18   

 

Large Scale Data Analytics in Transportation and Railway Infrastructure

Nii Attoh-Okine

CARTIER/TIFFANY

 

#20  

Big Humanities Data

Mark Hedges

CABINET SUITE

#22  

IEEE NIST Big Data PWG Workshop on Big Data: Challenges, Practices and Technologies

Nancy Grady

Embassy


08:00-12:00

 


Sessions


Session Chair


Venue

Morning  workshop

 

 

#1 

 

Scholarly Big Data: Challenges & Issues

 

Ingemar J. Cox

 

Diplomat

#15  

Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH)

Weijia Xu

Ambassador

#17  

Big Data in Computational Epidemiology

Jiangzhuo Chen

Severn

#19  

2nd Workshop on Scalable Cloud Data Management

Felix Gessert

Susquehanna

 

#21 

Complexity for Big Data

Guozhu Dong

Potomac


13:30-18:30

 


Sessions


Session Chair


Venue


Special session II


Big Data Representation and Processing in Data Science


T.Y. Lin



Susquehanna

Tutorial

Big Data Stream Mining

Alfred Bifet

Severn

Afternoon workshop

 

 

#11

 

CASK-14 :  1st International Workshop on Collaborative methodologies to Accelerate Scientific Knowledge discovery in big data

 

Chen Jin

 

Diplomat

#16 

IEEE Big Data Workshop on Semantics for Big Data on the Internet of Things (SemBIoT 2014)

Kemafor Ogan

Ambassador

 

 

 

 

 

 

28-Oct

07:30-18:00

Registration

Venue:

Ballroom Foyer and Ballroom Coatroom

08:30-08:45

Opening and Welcoming Speech

Conference Co-Chairs:

Charu Aggarwal, Nick Cercone, Vasant Honavar

Program Co-Chairs:

Jimmy Lin, Jian Pei

Industry Program co-Chairs:

Wo Chang, Raghunath Nambiar

BigData Steering Committee Chair:

Xiaohua Tony Hu (Drexel University)

Venue:

CRYSTAL BALLROOM

08:45-09:45

Session Chair:  Jian Pei

Keynote Speech 1:  Never-Ending Language Learning

Tom Mitchell - E. Fredkin University Professor, Machine Learning Department, Carnegie Mellon University

Venue:

CRYSTAL BALLROOM


09:45-10:00


Coffee Break
at Meeting Room Foyer

Poster session setup and display: Meeting Room Foyer

10:00-12:30

S 1

Visual analytics, time, and space

S 2

Cloud computing and systems (1)

S 3

Graphs and networks

Tutorial

Big ML Software for Modern ML Algorithms

Session Chair

Arash Jalal Zadeh Fard

Amy Apon

Luke Huan

Qirong Ho, Eric Xing

Venue

CABINET SUITE

DIPLOMAT/AMBASSADOR

JUDICIARY SUITE

EMBASSY/PATUXENT


12:30-14:00


Lunch
provided by the conference at BALLROOM FOYER (Seating inside the Crystal Ballroom)

Poster session setup and display: Meeting Room Foyer

14:00-16:05

L 1

Graphs and networks (1)

L 2

Scalable systems

L 3

Storage

I&G 1

Industry & Government

Session Chair

Conrad S. Tucker

Weijia Xu

Steven Y. Ko

Wo Chang

Venue:

CABINET SUITE

DIPLOMAT/AMBASSADOR

JUDICIARY SUITE

EMBASSY/PATUXENT


16:05-16:20


Coffee Break at Meeting Room Foyer


16:20-18:00

L 4

Image processing

L 5

Data streams and time series

L 6

Regression and machine learning

I&G 2

Industry & Government

Session Chair

Lin-Ching Chang

Bo Luo

Jiang Zheng

Raghunath Nambiar

Venue:

CABINET SUITE

DIPLOMAT/AMBASSADOR

JUDICIARY SUITE

EMBASSY/PATUXENT


1
9:00-20:30

Venue:


Banquet
:

CRYSTAL BALLROOM

 

 

29-Oct

07:30-18:00

Venue:

Registration

Ballroom Foyer, Ballroom Coatroom

08:30-09:30

Session Chair:  Vasant Honavar

Keynote Speech 2: Smart Data - How you and I will exploit Big Data for personalized digital health and many other activities

Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis - Wright State University

Venue:

CRYSTAL BALLROOM


09:30-10:00


Coffee Break at Meeting Room Foyer

Poster session setup and display: Meeting Room Foyer

10:00-12:30

Panel with Program Directors: Dr. Chaitanya Baru (NSF),  Dr. Yuan Liu (NIH), Dr. David Kuehn (DoT), Dr.Tsengdar Lee (NASA), Dr. Sudarsan  Rachuri   (NIST),   Mr. Matti Vakkuri (DIGILE):

Big Data Challenges and Opportunities

Tutorial

Large-scale Heterogeneous Learning in Big Data Analytics

Session Chair

 

Xiaohua Tony Hu

Jun Huan

Venue:

CRYSTAL BALLROOM

OLD GEORGETOWN


12:30-14:00


Lunch provided by conference at BALLROOM FOYER (Seating inside the Crystal Ballroom)

 

Poster session setup and display: Meeting Room Foyer

14:00-16:05

L 7

Distributed systems

L 8

Visualization/bioinformatics

L 9

Cloud computing

I&G 3

Industry & Government

Session Chair

Yicheng Tu

Saumyadipta Pyne

Ada Fu

Raghunath Nambiar

Venue:

OLD GEORGETOWN

CABINET SUITE

JUDICIARY SUITE

DIPLOMAT/AMBASSADOR


16:05-16:20


Coffee Break at Meeting Room Foyer

16:20-18:00

L 10

Privacy and security

L 11

Graphs and networks (2)

I&G 4

Industry & Government

Tutorial

Big Data Benchmarking

Session Chair

Christoph Schommer

Hao Howie Huang

Wo Chang

Chaitan Baru, Tilmann Rabl

Venue:

OLD GEORGETOWN

CABINET SUITE

DIPLOMAT/AMBASSADOR

JUDICIARY SUITE

 

 

 

 

 

30-Oct

07:30-18:00

Registration

Venue:

Ballroom Foyer, Ballroom Coatroom

08:30-09:30

Session Chair:  Jimmy Lin

Keynote Speech 3:  Addressing Human Bottlenecks in Big Data

 

Joseph M. Hellerstein, Chancellor's Professor of Computer Science, University of California, Berkeley and Trifacta

Venue:

CRYSTAL BALLROOM

 

09:30-10:00

 

Coffee Break at Meeting Room Foyer

Poster session setup and display: Meeting Room Foyer

10:00-12:30

S 4

Cloud computing and systems (2)

 

S 5

Applications

S 6

Data mining and learning

Session Chair

Feng Luo

Mathias Johanson

Xiaohua Tony Hu

Venue:

JUDICIARY SUITE

OLD GEORGETOWN

CABINET SUITE


08:00-12:00

 


Sessions


Session Chair


Venue

Morning  workshop

 

 

#6 

 

The Second Workshop on Distributed Storage Systems and Coding for Big Data

 

 

Bing Zhu

 

Diplomat

#7  

First IEEE International Workshop on Big Data Security and Privacy (BDSP 2014)

 

Tyrone W A Grandison

Ambassador


13:30-18:30

 


Sessions


Session Chair


Venue

Afternoon workshop

 

 

#9

 

Solar Astronomy Big Data (SABiD) – 1st Workshop on Management, Search and Mining of Massive Repositories of Solar Astronomy Data

 

 

  Rafal Angryk

 

Diplomat

 

 

 

 

 

 

 

Keynote Speeches: 3

 

Keynote 1:


Title:  
Never-Ending Language Learning


Speaker
:

Tom Mitchell - E. Fredkin University Professor, Machine Learning Department, Carnegie Mellon University


Abstract:

We will never really understand learning until we can build machines that learn many different things, over years, and become better learners over time. We describe our research to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web. Each day NELL extracts (reads) more facts from the web, into its growing knowledge base of beliefs. Each day NELL also learns to read better than the day before. NELL has been running 24 hours/day for over four years now. The result so far is a collection of 70 million interconnected beliefs (e.g., servedWtih(coffee, applePie)), NELL is considering at different levels of confidence, along with millions of learned phrasings, morphological features, and web page structures that NELL uses to extract beliefs from the web. NELL is also learning to reason over its extracted knowledge, and to automatically extend its ontology. Track NELL's progress at http://rtw.ml.cmu.edu, or follow it on Twitter at @CMUNELL.

 

Short Bio:

Tom M. Mitchell founded and chairs the Machine Learning Department at Carnegie Mellon University, where he is the E. Fredkin University Professor. His research uses machine learning to develop computers that are learning to read the web, and uses brain imaging to study how the human brain understands what it reads. Mitchell is a member of the U.S. National Academy of Engineering, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI). He believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.

 

 

 

 

Keynote 2:


Title: 
Smart Data - How you and I will exploit Big Data for personalized digital health and many other activities


Speaker
:

Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis - Wright State University


Abstract:

Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses.   Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions.  A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being.  Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information).  However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!

 

In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.  I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.

 

For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration.  For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations. 

 

Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.


Short Bio:

Amit P. Sheth (http://knoesis.org/amit) is an educator, researcher, and entrepreneur. He is the LexisNexis Eminent Scholar and founder/executive director of the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University. Kno.e.sis conducts research in social/sensor/semantic data and Web 3.0 with real-world applications and multidisciplinary solutions for translational research, healthcare and life sciences, cognitive science, material sciences, and others. Kno.e.sis' activities have resulted in Wright State University being recognized as a top organization in the world on World Wide Web in research impact. Prof. Sheth is one of top authors in Computer Science, World Wide Web, and databases (cf: Microsoft Academic Search; Google H-index). His research has led to several commercial products, many real-world applications, and two earlier companies with two more in early stages of development. One of these was Taalee/Voquette/Semagix, which was likely the first company (founded in 1999) that developed Semantic Web enabled search and analysis, and semantic application development platforms.

 

 

 

 

Keynote 3:


Title
Addressing Human Bottlenecks in Big Data


Speaker
:

Joseph M. Hellerstein, Chancellor's Professor of Computer Science, University of California, Berkeley and Trifacta 


Abstract:

We live in an era when compute is cheap, data is plentiful, and system software is being given away for free.  Today, the critical bottlenecks in data-driven organizations are human bottlenecks, measured in the costs of software developers, IT professionals, and data analysts.  How can computer science remain relevant in this context?  The Big Data ecosystem presents two archetypal settings for answering this question: NoSQL distributed databases, and analytics on Hadoop.

 

In the case of NoSQL, developers are being asked to build parallel programs for global-scale systems that cannot even guarantee the consistency of a single register of memory.  How can this possibly be made to work?  I’ll talk about what we have seen in the wild in user deployments, and what we’ve learned from developers and their design patterns.  Then I’ll present theoretical results—the CALM Theorem—that shed light on what’s possible here, and what requires more expensive tools for coordination on top of the typical NoSQL offerings.  Finally, I will highlight some new approaches to writing and testing software—exemplified by the Bloom language—that can help developers of distributed software avoid expensive coordination when possible, and have the coordination logic synthesized for them automatically when necessary.

 

In the Hadoop context, the key bottlenecks lie with data analysts and data engineers, who are routinely asked to work with data that cannot possibly be loaded into tools for statistical analytics or visualization.  Instead, they have to engage in time-consuming data “wrangling”—to try and figure out what’s in their data, whip it into a rectangular shape for analysis, and figure out how to clean and integrate it for use.  I’ll discuss what we heard talking with data analysts in both academic interviews and commercial engagements.  Then I’ll talk about how techniques from human-computer interaction, machine learning, and database systems can be brought together to address this human bottleneck, as exemplified by our work on various systems including the Data Wrangler project and Trifacta's platform for data transformation.


Short Bio:

oseph M. Hellerstein is a Chancellor's Professor of Computer Science at the University of California, Berkeley, whose research focuses on data-centric systems and the way they drive computing. A Fellow of the ACM, his work has been recognized via awards including an Alfred P. Sloan Research Fellowship, MIT Technology Review's TR10 and TR100 lists, Fortune Magazine's "Smartest in Tech" list, and three ACM-SIGMOD "Test of Time" awards.  In 2012, Joe co-founded Trifacta, Inc (http://www.trifacta.com/), where he currently serves as Chief Strategy Officer.

 

 

 

 

 

Conference Paper Presentations

 

L1: Graphs and networks (1)

Regular

BigD210 "4S: Learning to Estimate Pairwise Distances in Large Graphs"
Maria Christoforaki and Torsten Suel

Regular

BigD304 "Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization"
Ryan Compton, David Jurgens, and David Allen

Regular

BigD357 "GRAPHiQL: A Graph Intuitive Query Language for Relational Databases"
Alekh Jindal and Samuel Madden

Regular

BigD395 "PULP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks"
George Slota, Siva Rajamanickam, and Kamesh Madduri

Regular

BigD436 "Synergistic Partitioning in Multiple Large Scale Social Networks"
Songchang Jin, Jiawei Zhang, Philip S. Yu, Shuqiang Yang, and Aiping Li

 

L 2: Scalable systems

Regular

BigD216 "FusionFS: Toward Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems"
Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Rob Ross, and Ioan Raicu

Regular

BigD253 "Sparse computation for large-scale data mining"
Dorit S. Hochbaum and Philipp Baumann

Regular

BigD306 "BASIC: an Alternative to BASE for Large-Scale Data Management System"
Lengdong Wu, Li-Yan Yuan, and Jia-Huai You

Regular

BigD336 "Facilitating Twitter Data Analytics: Platform, Language, and Functionality"
Ke Tao, Claudia Hauff, Geert-Jan Houben, Fabian Abel, and Guido Wachsmuth

Regular

BigD444 "Large-scale Distributed Sorting for GPU-based Heterogeneous Supercomputers"
Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, and Satoshi Matsuoka

 

L 3: Storage 

Regular

BigD271 "BurstMem: A High-Performance Burst Buffer System for Scientific Applications"
Teng Wang, Sarp Oral, Yandong Wang, Brad Settlemyer, Scott Atchley, and Weikuan Yu

Regular

BigD313 "Meeting Predictable Buffer Limits in the Parallel Execution of Event Processing Operators"
Ruben Mayer, Boris Koldehofe, and Kurt Rothermel

Regular

BigD398” Effective Caching Techniques for Accelerating Pattern Matching Queries Arash Fard, Satya Manda, Lakshmish Ramaswamy, and John Miller

Regular

BigD407 "Provenance-Based Object Storage Prediction Scheme for Scientific Big Data Applications"
Dong Dai, Yong Chen, Dries Kimpe, and Rob Ross

Regular

BigD215 "Virtual Chunks: On Supporting Random Accesses to Scientific Data in Compressible Storage Systems"
Dongfang Zhao, Jian Yin, Kan Qiao, and Ioan Raicu

 

L 4: Image processing

Regular

BigD316 " Metadata Extraction and Correction for Large-Scale Traffic Surveillance Videos "
Xiaomeng Zhao, Huadong Ma, Haitao Zhang, Yi Tang, and Guangping Fu

Regular

BigD360 " Structure Recognition from High Resolution Images of Ceramic Composites "
Daniela Ushizima, Talita Perciano, Harinarayan Krishnan, Burlen Loring, Hrishikesh Bale, Dilworth Parkinson, and James Sethian

Regular

BigD379 " Evaluating Density-based Motion for Big Data Visual Analytics "
Ronak Etemadpour, Paul Murray, and Angus Forbes

Regular

BigD421 " Locating Visual Storm Signatures from Satellite Images "
Yu Zhang, Stephen Wistar, Jose A. Piedra-Fernández, Jia Li, Michael Steinberg, and James Z. Wang

 

L 5: Data streams and time series

Regular

BigD234 "Distributed Adaptive Model Rules for Mining Big Data Streams"
Anh Thu Vu, Gianmarco De Francisci Morales, Joao Gama, and Albert Bifet

Regular

BigD382 "Interpretable Streaming Regression Models with Local Performance Guarantees"
Ulf Johansson, Cecilia Sönströd, and Henrik Linusson

Regular

BigD451 "Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing"
Hao Li, Di Yu, Anand Kumar, and Yicheng Tu

Regular

BigD445 "TRISTAN: Real-Time Analytics on Massive Time Series Using Sparse Dictionary Compression"
Alice Marascu, Pascal Pompey, Eric Bouillet, Michael Wurst, Olivier Verscheure, Martin Grund, and Philippe Cudre-Mauroux

 

L 6: Regression and machine learning 

Regular

BigD402 "Predicting Glaucoma Progression using Multi-task Learning with Heterogeneous Features"
Shigeru Maya, Kai Morino, and Kenji Yamanishi

Regular

BigD283 "Examination of Data, Rule Generation and Detection of Phishing URLs using Online Logistic Regression"
Mohammed Nazim Feroz and Susan Mengel

Regular

BigD454 "Large-scale Logistic Regression and Linear Support Vector Machines Using Spark"
Chieh-Yen Lin, Cheng-Hao Tsai, Ching-Pei Lee, and Chih-Jen Lin

Regular

BigD465 "BayesWipe: A Multimodal System for Data Cleaning and Consistent Query Answering on Structured Data"
Sushovan De, Yuheng Hu, Yi Chen, and Subbarao Kambhampati

 

L 7: Distributed systems 

Regular

BigD318 "Partial Rollback-based Scheduling on In-memory Transactional Data Grids"
Junwhan Kim

Regular

BigD337 "Main Memory Evaluation of Recursive Queries on Multicore Machines"
Mohan Yang and Carlo Zaniolo

Regular

BigD391 "Distributed Algorithms for k-truss Decomposition"
Ming-Syan Chen, Pei-Ling Chen, and Chung-Kuang Chou

Regular

BigD434 "Parallel Breadth First Search on GPU Clusters"
Zhisong Fu, Harish Dasari, Martin Berzins, and Bryan Thompson

Regular

BigD471 "Optimizing Load Balancing and Data-Locality with Data-aware Scheduling"
Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, and Ioan Raicu

 

L 8: Visualization/bioinformatics

Regular

BigD258 " Topic Similarity Networks: Visual Analytics for Large Document Sets "
Arun Maiya

Regular

BigD303 " Web-based Visual Analytics for Extreme Scale Climate Science "
Chad Steed, Katherine Evans, John Harney, Brian Jewell, Galen Shipman, Brian Smith, Peter Thornton, and Dean Williams

Regular

BigD338 " Visual Fusion of Mega-City Big Data: An Application to Traffic and Tweets Data Analysis of Metro Passengers "
Masahiko Itoh, Daisaku Yokoyama, Masashi Toyoda, Yoshimitsu Tomita, Satoshi Kawamura, and Masaru Kitsuregawa

Regular

BigD277 " Random Projection Based Clustering for Population Genomics "
Sotiris Tasoulis, Lu Cheng, Niko Välimäki, Nicholas Croucher, Simon Harris, William Hanage, Teemu Roos, and Jukka Corander

Regular

BigD460 " Identification of SNP Interactions Using Data-Parallel Primitives on GPUs "
Can Altinigneli, Bettina Konte, Dan Rujescu, Christian Boehm, and Claudia Plant

 

L 9: Cloud computing 

Regular

BigD380 "Combining Hadoop and GPU to Preprocess Large Affymetrix Microarray Data"
sufeng Niu, guangyu yang, nilim sarma, Melissa Smith, Pradip Srimani, and Feng Luo

Regular

BigD423 "Detecting and Identifying System Changes in the Cloud via Discovery by Example"
Hao Chen, Sastry Duri, Vasanth Bala, Nilton Bila, Canturk Isci, and Ayse Coskun

Regular

BigD426 "PigOut: Making Multiple Hadoop Clusters to Work Together"
Kyungho Jeon, Sharath Chandrashekhara, Feng Shen, Shikhar Mehra, Oliver Kennedy, and Steven Ko

Regular

BigD432 "Accurate and Efficient Selection of the Best Consumption Prediction Method in Smart Grids"
Marc Frincu, Charalampos Chelmis, Muhammad Noor, and Viktor Prasanna

Regular

BigD244 "E-Sketch: Gathering Large-scale Energy Consumption Data Based on Consumption Patterns"
Zhichuan Huang, Hongyao Luo, David Skoda, Ting Zhu, and Yu Gu

 

L 10: Privacy and security 

Regular

BigD260 "Hierarchical Management of Large-Scale Malware Data"
Lee Kellogg, Brian Ruttenberg, Alison O'Connor, Michael Howard, and Avi Pfeffer

Regular

BigD294 "MR-TRIAGE: Scalable Multi-Criteria Clustering for Big Data Security Intelligence Applications"
Yun Shen and Olivier Thonnard

Regular

BigD383 "Using Data Content to Assist Access Control for Large-Scale Content-Centric Databases"
Wenrong Zeng, Yuhao Yang, and Bo Luo

 

 

 

L 11: Graphs and networks (2)

Regular

BigD301 "Efficient Breadth-First Search on a Heterogeneous Processor"
Mayank Daga, Mark Nutter, and Mitesh Meswani

Regular

BigD419 "Clique Guided Community Detection"
Diana Palsetia, Mostofa Patwary, William Hendrix, Ankit Agrawal, and Alok Choudhary

Regular

BigD441 "Increasing the Veracity of Event Detection on Social Media Networks Through User Trust Modeling"
Todd Bodnar, Conrad Tucker, Kenneth Hopkinson, and Sven Bilén

Regular

BigD455 "NVM-based Hybrid BFS with Memory Efficient Data Structure"
Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, and Satoshi Matsuoka

 

 

 

 

I&G: Industry & Government (1)

Regular

N211

Spatial Computations over Terabyte-Sized Images on Hadoop Platforms

Peter Bajcsy, Phuong Nguyen, Antoine Vandecreme, and Mary Brady

Regular

N223

Astro: A Predictive Model for Anomaly Detection and Feedback-based Scheduling on Hadoop

Chaitali Gupta, Mayank Bansal, Tzu-Cheng Chuang, Ranjan Sinha, and Sami Ben-romdhane

Regular

N222

ALOJA: a Systematic Study of Hadoop Deployment Variables to Enable Automated Characterization of Cost-Effectiveness

Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Nikola Vujic, Daron Green, José Blakeley, Sergio Mendoza, Yolanda Becerra, Jordi Torres, Eduard Ayguadé, and Jesús Labarta

Regular

N217

Lightweight Approximate Top-k for Distributed Settings

Vinay Deolalikar and Kave Eshghi

Regular

N230

Recommending Similar Items in Large-scale Online Marketplaces

Jayasimha Reddy Katukuri, Tolga Konik, Rajyashree Mukherjee, and Santanu Kolay

 

I&G: Industry & Government (2)

Regular

N216

Crowdsourced Query Augmentation through Semantic Discovery of Domain-specific Jargon

Khalifeh Aljadda, Mohammed Korayem, Trey Grainger, and Chris Russell

Regular

N224

Heterogeneous Stream Processing for Disaster Detection and Alarming

Francois Schnitzler, Thomas Liebig, Shie Mannor, Gustavo Souto, Sebastian Bothe, and Hendrik Stange

Regular

N201

Recall Estimation for Rare Topic Retrieval from Large Corpuses

Praveen Bommannavar, Alek Kolcz, and Anand Rajaraman

Regular

N236

Identifying top Chinese network buzzwords from social media big data set based on time-distribution features

Yongli Tang, Tingting He, Bo Li, and Xiaohua Hu

Regular

N218

Query Revision During Cluster Based Search on Large Unstructured Corpora

Vinay Deolalikar

 

I&G: Industry & Government (3)

Regular

N213

A Scalable and Efficient Community Detection Algorithm

Dhaval C. Lunagariya, Somayajulu D.V.L.N., and Radha Krishna P.

Regular

N202

Future Directions of Humans in Big Data Research

Celeste Lyn Paul, Chris Argenta, William Elm, and Alex Endert

Regular

N228

An Initial Study of Predictive Machine Learning Analytics on Large Volumes of Historical Data for Power System Applications

Jiang Zheng and Aldo Dagnino

Regular

N207

In Unity There is Strength: Showcasing a Unified Big Data Platform with MapReduce Over both Object and File Storage

Renu Tewari, Dean Hildebrand, and Rui Zhang

Regular

N203

Bridging High Velocity and High Volume Industrial Big Data Through Distributed In-Memory Storage & Analytics

Jenny Weisenberg Williams, Kareem Aggour, John Interrante, Justin McHugh, and Eric Pool

 

I&G: Industry & Government (4)

Regular

N232

Big Data Predictive Analytics for Proactive Semiconductor Equipment Maintenance

Sathyan Munirathinam

Regular

N219

Automating Data Integration with HiperFuse

Eric Huang, Andres Quiroz, and Luca Ceriani

Regular

N215

Explore Efficient Data Organization for Large Scale Graph Analytics and Storage

Yinglong Xia, Ilie Tanasa, Lifeng Nai, Wei Tan, Yanbin Liu, Jason Crawford, and Ching-Yung Lin

Regular

N209

Increasing the Accessibility to Big Data Systems via a Common Services API

Rohan Malcolm, Cherrelle Morrison, Tyrone Grandison, Sean Thorpe, Kimron Christie, Akim Wallace, Damian Green, Julian Jarrett, and Arnett Campbell

 

 

S 1: Visual analytics, time, and space

Short

BigD204 "The Role of Visual Analysis in the Regulation of Electronic Order Book Markets"
Mark Paddrik, Richard Haynes, Andrew Todd, William Scherer, and Peter Beling

Short

BigD217 "Preferences over Time"
noriaki kawamae

Short

BigD227 "Online Temporal-Spatial Analysis for Detection of Critical Events in Cyber-Physical Systems"
Magnus Almgren, Olaf Landsiedel, Marina Papatriantafilou, and Zhang Fu

Short

BigD252 "In-Situ Visualization and Computational Steering for Large-Scale Simulation of Turbulent Flows in Complex Geometries"
Hong Yi, Michel Rasquin, Jun Fang, and Igor Bolotnov

Short

BigD288 "Large-Scale Network Traffic Monitoring with DBStream, a System for Rolling Big Data Analysis"
Arian Bär, Alessandro Finamore, Pedro Casas, Lukasz Golab, and Marco Mellia

Short

BigD387 "Immerive and collaborative data visualization using virtual reality platforms"
Ciro Donalek, S.G. Djorgovski, Scott Davidoff, Alex Cioc, Anwell Wang, Giuseppe Longo, Jeffrey S. Norris, Jerry Zhang, Elizabeth Lawler, and Stacy Yeh

Short

BigD411 "On Scaling Time Dependent Shortest Path Computations for Dynamic Traffic Assignment"
Amit Gupta, Weijia Xu, Kenneth Perrine, Dennis Bell, and Natalia Ruiz-Juri

Short

BigD413 "High Volume Geospatial Mapping for Internet-of-Vehicle Solutions with In-Memory Map-Reduce Processing"
Tao Zhong, Kshitij Doshi, Gang Deng, Xiaoming Yang, and Hegao Zhang

Short

BigD431 "The Adaptive Projection Forest: Using Adjustable Exclusion and Parallelism in Metric Space Indexes"
Lee Thompson, Weijia Xu, and Daniel Miranker

Short

BigD440 "Low Complexity Sensing for Big Spatio-Temporal Data"
Dongeun Lee and Jaesik Choi

 

S 2: Cloud computing and systems (1)

Short

BigD242 "Scheduling MapReduce Tasks on Virtual MapReduce Clusters from a Tenant’s Perspective"
Jia-Chun Lin, Ming-Chang Lee, and Ramin Yahyapour

Short

BigD311 "Minimizing Data Movement through Query Transformation"
Patrick Leyshock, David Maier, and Kristin Tufte

Short

BigD364 "Automated Workload-aware Elasticity of NoSQL Clusters in the Cloud"
Evie Kassela, Christina Boumpouka, Ioannis Konstantinou, and Nectarios Koziris

Short

BigD384 "Multilevel Partitioning of Large Unstructured Grids"
Oyindamola Akande and Philip Rhodes

Short

BigD392 "On the Performance of MapReduce: A Stochastic Approach"
Sarker Ahmed and Dmitri Loguinov

Short

BigD428 "VENU: Orchestrating SSDs in Hadoop Storage"
Krish K.R., M. Safdar Iqbal, and Ali Butt

Short

BigD438 "In-Memory I/O and Replication for HDFS with Memcached: Early Experiences"
Nusrat Islam, Xiaoyi Lu, Md. Rahman, Raghunath Rajachandrasekar, and Dhabaleswar Panda

Short

BigD448 "Scaling Up Prioritized Grammar Enumeration for Scientific Discovery in the Cloud"
Tony Worm and Kenneth Chiu

Short

BigD469 "In-advance Data Analytics for Reducing Time to Discovery"
Jialin Liu, Yin Lu, and Yong Chen

Short

BigD475 "Enabling Composite Applications through an Asynchronous Shared Memory Interface"
Douglas Otstott, Noah Evans, Latchesar Ionkov, Ming Zhao, and Michael Lang

 

S 3: Graphs and networks

Short

BigD225 "Random Walks on Adjacency Graphs for Mining Lexical Relations from Big Text Data"
Shan Jiang and Chengxiang Zhai

Short

BigD284 "MMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping"
Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng Chau, Ho Lee, and U Kang

Short

BigD287 "Building k-nn graphs from large text data"
Thibault Debatty, Pietro Michiardi, Olivier Thonnard, and Wim Mees

Short

BigD331 "Empowering users of social networks to assess their privacy risks;
Vladimir Estivill-Castro, Md Zahidul Islam, and Peter Hough

Short

BigD333 "Matching Approximate Patterns in Richly-Attributed Graphs"
Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau

Short

BigD346 "A Unified Approach to Network Anomaly Detection"
Tara Babaie, Sanjay Chawla, and Sebastien Ardon

Short

BigD365 "Big Data: Myths, Misconceptions and Opportunities"
Mark Lycett and Asmat Monaghan

 

S 4: Cloud computing and systems (2)

Short

BigD230 "A Cross-job Framework for MapReduce Scheduling"
Xuejie Xiao, Jian Tang, Zhenhua Chen, Jielong Xu, and Chonggang Wang

Short

BigD247 "Rainbow: A Distributed and Hierarchical RDF Triple Store with Dynamic Scalability"
Rong Gu, Yihua Huang, and Wei Hu

Short

BigD259 "MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its Application to SNOMED CT"
Guo-Qiang Zhang, Wei Zhu, Mengmeng Sun, Shiqiang Tao, Olivier Bodenreider, and Licong Cui

Short

BigD264 "FlexDAS: A Flexible Direct Attached Storage for I/O Intensive Applications"
Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, and Jun Tanaka

Short

BigD296 "Perldoop: Efficient Execution of Perl Scripts on Hadoop Clusters"
Jose M. Abuin, Juan C. Pichel, Tomas F. Pena, Pablo Gamallo, and Marcos Garcia

Short

BigD315 "Evaluating the Performance and Scalability of the Ceph Distributed Storage System"
Diana Gudu, Marcus Hardt, and Achim Streit

Short

BigD347 "Incremental Window Aggregates over Array Database"
Jiang Li, Hideyuki Kawashima, and Osamu Tatebe

Short

BigD350 "Analyzing the Language of Food on Social Media"
Daniel Fried, Mihai Surdeanu, Stephen Kobourov, Melanie Hingle, and Dane Bell

Short

BigD362 "BigCache for Big-data Systems"
Michel Roger, Yiqi Xu, and Ming Zhao

Short

BigD476 "k-balanced sorting and skew join in MPI and MapReduce"
Silu Huang and Ada Fu

 

S 5: Applications

Short

BigD232 "Big Automotive Data - Leveraging large volumes of data for knowledge-driven product development"
Mathias Johanson, Stanislav Belenki, Jonas Jalminger, Magnus Fant, and Mats Gjertz

Short

BigD233 "On the Impact of Socio-economic Factors on Power Load Forecasting"
Yufei Han, Xiaolan Sha, Etta Grover-Silva, and Pietro Michiardi

Short

BigD239 "Toward Personalized and Scalable Voice-Enabled Services Powered by Big Data"
JONG HOON AHNN

Short

BigD270 "A Two-Sided Market Mechanism for Trading Big Data Computing Commodities"
Lena Mashayekhy, Mahyar Movahed Nejad, and Daniel Grosu

Short

BigD310 "Department of Energy Strategic Roadmap for Earth System Science Data Integration"
Dean Williams, Giri Palanisamy, Galen Shipman, Thomas Boden, and Jimmy Voyles

Short

BigD312 "Synthetic Data Generation for the Internet of Things"
Jason Anderson, Ken Kennedy, Linh Ngo, Andre Luckow, and Amy Apon

Short

BigD324 "Learning to Predict Subject-Line Opens for Large-Scale Email Marketing"
Raju Balakrishnan and Rajesh Parech

Short

BigD366 "Using Geometric Structures to Improve the Error Correction Algorithm of High-Throughput Sequencing Data on MapReduce Framework"
Wei-Chun Chung, Yu-Jung Chang, D. T. Lee, and Jan-Ming Ho

Short

BigD376 "Knowledge-based Clustering of Ship Trajectories Using Density-based Approach"
Bo Liu, Erico N.de Souza, Stan Matwin, and Marcin Sydow

Short

BigD409 "Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, Pitfalls, and Use Cases"
Maryam Panahiazar, Vahid Taslimi, Ashutosh Jadhav, Amit Sheth, and Jyotishman Pathak

 

S 6: Data mining and learning

Short

BigD238 "Entity Resolution Using Inferred Relationships and Behavior"
Jonathan Mugan, Ranga Chari, Laura Hitt, Eric McDermid, Marsha Sowell, Yuan Qu, and Thayne Coffman

Short

BigD291 "Dynamic Pre-training of Deep Recurrent Neural Networks for Predicting Environmental Monitoring Data"
Bun Theang Ong, Komei Sugiura, and Koji Zettsu

Short

BigD293 "Scaling up M-estimation via sampling designs: the Horvitz-Thompson stochastic gradient descent"
Stéphan Clémençon, Bertail Patrice, and Emilie Chautru

Short

BigD327 "Metadata Capital: Simulating the Predictive Value of Self-Generated Heatlh Information (SGHI)"
Jane Greenberg, Adrian Ogletree, Angela Murillo, Thomas Caruso, and Herbie Huang

Short

BigD339 "Bootstrapping K-means for Big data analysis"
Jungkyu Han and Min Luo

Short

BigD343 "Representative Subsets For Big Data Learning using k-NN graphs"
Raghvendra Mall, Vilen Jumutc, Rocco Langone, and Johan Suykens

Short

BigD356 "Towards Building and Evaluating a Personalized Location-Based Recommender System"
Rubing Duan

Short

BigD361 "Distributed Adaptive Importance Sampling on Graphical Models using MapReduce"
Ahsanul Haque, Swarup Chandra, Latifur Khan, and Charu Aggarwal

Short

BigD401 "PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems"
Khalifeh Aljadda, Mohammed Korayem, Camilo Ortiz, Trey Grainger, John Miller, and William York

Short

BigD410 "Distributed Class Dependent Feature Analysis - A Big Data Approach"
Khoa Luu, Chenchen Zhu, and Marios Savvides

 

 

 

 

 

 

 

 

 

 

 

Workshops:

 

Workshop details could be found at here: Workshops

 

 

Workshop 1:  Scholarly Big Data: Challenges & Issues

 

07:30-08:00

Coffee and cake

08:00-08:10

Opening

08:10-08:50

Guest talk from Microsoft Academic Search

08:50-09:15

Why Name Ambiguity Resolution Matters for Scholarly Big Data Research by Jinseok Kim, Jana Diesner, Amirhossein Aleyasen (University of Illinois at Urbana-Champaign), Heejun Kim (University of North Carolina at Chapel Hill) and Hwan-Min Kim (Korea Institute of Science and Technology Information).

09:15-09:40

The OceanLink Project by Tom Narock (Marymount University), Adila Krisnadhi, Pascal Hitzler, Michelle Cheatham (Wright State University), Robert Arko, Suzanne Carbotte (Columbia University) and Timothy Finin (University of Maryland, Baltimore County).

09:40-10:05

Evolution of Scientific Collaboration Networks by Gaurav Madaan (Thapar University) and Shivakumar Jolad (IIT Gandhinagar).

10:05-10:25

Coffee and cake

10:25-10:50

Managing the Academic Data Lifecycle:

10:50-11:30

Guest talk from UK CORE project

11:30-12:10

Guest talk from Semantic Scholar from the AllenAI Labs

12:10-12:50

Guest talk from Dewey Murdick from IARPA

12:50-13:00

Closing

13:00-14:00

Lunch and networking

 

 

 

 

Workshop 2:  The 2nd Workshop on Scalable Machine Learning: Theories and Applications

 

27-Oct-14

Venue: Waterford

 

08:30-08:35

Opening Remarks

08:35-09:50

Session I

08:35-09:20

Eric Xing, Carnegie Mellon University

09:20-09:35

S2207: Towards Scalable Graph Computation on Mobile Devices

Yiqi Chen, Zhiyuan Lin, Robert Pienta, Minsuk Kahng, and Duen Horng Chau

09:35-09:50

S2210: An Improved Memory Management Scheme for Large Scale Graph Computing Engine GraphChi

Yifang Jiang, Kai Chen, Yi Zhou, Diao Zhang, Qu Zhou, and Jianhua He

09:50-10:00

BigD470: FS^3: A sampling based method for top-k frequent subgraph mining

Tanay Kumar Saha and Mohammad Hasan

BigD446: Fast Algorithm for Computing Weighted Projection Quantiles and Data Depth for High-Dimensional Large Data

Ujjal Mukherjee and Ansu Chatterjee

10:00-10:30

Break and Poster Session

10:30-12:00

Session II

10:30-11:15

Mikhail Bilenko, Microsoft Research

11:15-11:30

BigD278: Boosting Stochastic Newton Descent for Bigdata Mining and Classification

Roberto D'Ambrosio, Wafa Belhajali, and Michel Barlaud

11:30-11:45

BigD309: Parameterized Multilayer Perceptron for Fast Learning in Big Data

Chandra B and Rajesh Kumar sharma

11:45-12:00

S2213: A Multi-View Two-level Classification Method for Generalized Multi-instance Problems

Xiaoguang Wang, Xuan Liu, Stan Matwin, Nathalie Nathalie Japkowicz, and hongyu guo

S2215: Applying Instance-weighted Support Vector Machines to Class Imbalanced Datasets

Xiaoguang Wang, Xuan Liu, and Stan Matwin

S2205: Computing Fuzzy Rough Approximations in Large Scale Information Systems

Hasan Asfoor, Rajagopalan Srinivasan, Gayathri Vasudevan, Nele Verbiest, Chris Cornelis, Matthew Tolentino, Ankur Teredesai, and Martine De Cock

12:00-14:00

Lunch at your own and Poster Session (13:30-14:00)

14:00-15:15

Session III

14:00-14:45

Duen Horng(Polo) Chau, Georgia Tech

14:45-15:00

BigD208: Calculating Feature Importance in Data Streams with Concept Drift using Online Random Forest

Andrew Cassidy and Frank Deviney

15:00-15:15

S2206: Feature Selection for Text Clustering in Limited Memory Using Monte Carlo Wrapper

Vinay Deolalikar

S2208: WS^2F: A Weakly Supervised Framework for Data Stream Filtering Cailing Dong and Arvind Agarwal

S2212: A Clustering Based Scalable Hybrid Approach for Web Page Recommendation

Mohammad Sharif and Vijay Raghavan

BigD317: Multiresolution analysis of incomplete rankings with applications to prediction

Eric Sibony, Stéphan Clémençon, and Jérémie Jakubowicz

15:15-15:50

Break and Poster Session

15:50-17:20

Session IV

15:50-16:35

Ping Li, Rutgers University

16:35-16:50

BigD467: Pairwise Topic Model via Relation Extraction Xiaoli Song, Yue Shang, Yuan Ling, Mengwen Liu, and Xiaohua Hu

16:50-17:00

S2205: Computing Fuzzy Rough Approximations in Large Scale Information Systems

Hasan Asfoor, Rajagopalan Srinivasan, Gayathri Vasudevan, Nele Verbiest, Chris Cornelis, Matthew Tolentino, Ankur Teredesai, and Martine De Cock

S2201: Scalable Big Data Computing for the Personalization of Machine Learned Models and its Application to Automatic Speech Recognition Service

JONG HOON AHNN

17:00-18:00

Free Discussion and Poster Session

 

 

 

 

Workshop 3:  1st International Workshop on High Performance Big Graph Data Management, Analysis, and Mining  

 

08:40 - 08:55

Opening remarks

09:00 - 10:00

Keynote talk

10:00 - 10:20

Coffee break

10:20 - 12:00

Session I

 

10:20 - 10:45 Christian L. Staudt, Henning Meyerhenke, and Yassine Marrakchi
S3201: Detecting Communities Around Seed Nodes in Complex Networks

10:45 - 11:10 William Eberle and Lawrence Holder
S3205: A Partitioning Approach to Scaling Anomaly Detection in Graph Streams

11:10 - 11:35 Ghizlane Echbarthi and Hamamache Kheddochi
S3204: Fractional Greedy and Partial Restreaming Partitioning : New Methods For Massive Graph Partitioning

11:35 - 12:00 Angen Zheng, Alexandros Labrinidis, and Panos Chrysanthis
S3207: Architecture-Aware Graph Repartitioning for Data-Intensive Scientific Computing

12:00 - 13:30

Lunch (lunch on your own, please put up your poster)

13:30 - 15:10

Session II

 

13:30 - 13:55 Ichitaro Yamazaki, Theo Mary, Jakub Kurzak, Stanimire Tomov, and Jack Dongarra
S3202: Access-averse Framework for Computing Low-rank Matrix Approximations

13:55 - 14:10 S M Faisal, Srinivasan Parthasarathy, and P Sadayappan
S3214: Global Graphs: A Middleware for Large Graph Processing

14:10 - 14:35 David Mizell, Kristyn Maschhoff, and Steve Reinhardt
S3208: Extending SPARQL with graph functions

14:35 - 15:10 Josephine Namayanja and Vandana Janeja
BigD280: Change Detection in Temporally Evolving Computer Networks: A Big Data Framework

15:10 - 15:45

Coffee break and poster session

15:45 - 17:00

Session III

 

15:45 - 16:10 Naga Shailaja Dasari, Ranjan Desh, and Zubair M Park
S3212: An Efficient Algorithm of k-core Decomposition on Multicore Processors

16:10 - 16:35 Amlan Chatterjee, Sridhar Radhakrishnan, and Chandra N. Sekharan
S3209: Connecting the dots: Triangle completion and related problems on large data sets using GPUs

16:35 - 17:00 Ronald Hagan, Charles Phillips, Kai Wang, Gary Rogers, and Michael Langston
S3213: Toward an Efficient, Highly Scalable Maximum Clique Solver for Massive Graphs

17:00 - 18:00

Free discussion and poster session

 

 

 

 

Workshop 6:  The Second Workshop on Distributed Storage Systems and Coding for Big Data

Date: 30th, Oct. 2014

Venue: Diplomat, Hyatt Regency Bethesda, Washington DC, USA

 

Time

Workshop Schedule

09:00-09:10

Plenary

 

09:10-09:40

Keynote: A New Zigzag MDS Code with Optimal Encoding and Efficient Decoding

Prof. Hui Li (Peking University, China)

 

09:40-10:00

Parity Declustering for Fault-Tolerant Storage Systems via t-designs

Son Hoang Dau (Singapore University of Technology and Design, Singapore),

Yan Jia, Chao Jin, Weiya Xi, and Kheong Sann Chan (Data Storage Institute, Singapore)

10:00-10:20

Coffee Break

 

10:20-10:40

A C Library of Repair-Efficient Erasure Codes for Distributed Data Storage Systems

Chao Tian (University of Tennessee at Knoxville, United States)

 

 

10:40-11:00

STORE: Data Recovery with Approximate Minimum Network Bandwidth and Disk I/O in Distributed Storage Systems

Tai Zhou, Hui Li, Bing Zhu, Yumeng Zhang, Hanxu Hou, and Jun Chen (Peking University Shenzhen Graduate School, China)

 

11:00-11:20

ReCT: Improving MapReduce Performance under Failures with Resilient Checkpointing Tactics

Hao Wang, Haopeng Chen, and Fei Hu (Shanghai Jiao Tong University, China)

 

11:20-11:40

An Efficient Scheme to Ensure Data Availability for a Cloud Service Provider

Seungmin Kang (National University of Singapore, Singapore), Bharadwaj Veeravalli, Khin Mi Mi Aung, and Chao Jin (Data Storage Institute, Singapore)

 

 

 

 

Workshop 7:  First IEEE International Workshop on Big Data Security and Privacy (BDSP 2014)

 

8:00 - 8:05

Opening Remarks and Keynote Introduction

8:05 - 9:05

Keynote: "Privacy in Big Data: Thinking Outside the Anonymity/Confidentiality Box"

                - Chris Clifton.

9:05 - 9:30

"Location Prediction Attacks Using Tensor Factorization and Optimal Defenses"

    - Takao Murakami and Hajime Watanabe.

9:30 - 10:00

Coffee Break

10:00 - 10:25

"Secure Data Storage in Distributed Cloud Environments".

    - Renata Jordão, Valério Martins, Fábio Buiati, Rafael Timóteo de Sousa Júnior, and Flávio Elias de Deus.

10:25 - 10:50

"A Practical Security Framework for Cloud Storage and Computation".

    - Kavya Premkumar, Aditya Suresh Kumar, and Saswati Mukherjee.

10:50 - 11:15

"Privacy-aware Filter-based Feature Selection".

    - Yasser Jafer, Stan Matwin, and Marina Sokolova.

11:15 - 11:40

"A PLOTAM Model for Analyzing Potential Relationships in Social Media and the Web in Big Data Perspective".

    - Huaping Zhang and Yanping Zhao.

11:40 - 11:50

Closing Remarks

 

 

 

 

Workshop 8:  The 2nd International Workshop of BigData in Bioinformatics and Healthcare Informatics

 

Time

Description

08:30 am

Workshop Registration and Workshop Opening

09:00 am

Keynote Speech (George Strawn)

10:00 am

Poster Session in Conjunction with Coffee Break

10:20 am

Invited Talk (Subha Madhavan): “Data Science Platforms are Integral to Help Drive Molecularly Targeted Therapy Development and Personalized Medicine Research”

11:00 am

s8204: “Predicting a Biological Response of Molecules from Their Chemical Properties Using Diverse and Optimized Ensembles of Stochastic Gradient Boosting Machine”

11:15 am

s8214: “Towards Integrating the Detection of Genetic Variants into an In-Memory Database”

11:30 am

s8205: “Redundancy Feature Selection with Grouped Variables and Its Application to Healthcare Data”

11:45 am

s8213: “Understanding the Effects of Concussion using Big Data”

12:00 pm

Lunch

01:30 pm

Invited Talk (Keith Crandall): “Translating Genomics to Personalized Medicine through Computation”

02:00 pm

s8210: “Big Data in Genomics: An Overview”

02:15 pm

s8212: “TIDE: Inter-chromosomal Translocation and Insertion Detection using Embeddings”

02:30 pm

s8202: “Pharmacological Class Data Representation in the Web Ontology Language (OWL)”

02:45 pm

Invited Talk (Marc Overhage): “‘Big’ Methods for Big Data: Lessons Learned from the Observational Medical Outcomes Partnership“

03:20 pm

Poster Session in Conjunction with Coffee Break

03:50 pm

s8208: “A General Supervised Approach to Segmentation of Clinical Texts”

04:05 pm

s8206: “Protective Effects of Rheumatoid Arthritis in Septic ICU Patients”

04:20 pm

Big Data Panel with
Panelist: Vinay Pai, Roger Mark, Marilyn Matz, and Thomas Klein. Moderated by: Mengling Feng and Shipeng Yu

05:10 pm

Workshop Closing

 

 

 

 

 

Workshop 9:  Solar Astronomy Big Data (SABiD) – 1st Workshop on Management, Search and Mining of Massive Repositories of Solar Astronomy Data

All accepted SABID papers will be presented as Regular Papers, that is 25 minutes per paper (about 20 minutes for talk and 5 minutes for Q and A)

13:30-13:55

Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition
Michael Schuh & Rafal Angryk

13:55-14:20

A computer vision approach to mining big solar data
Simon Felix  & André Csillaghy

14:20-14:45

Scalable Solar-Image Retrieval with Lucene
Juan Banda & Rafal Angryk

14:45-15:10

Stream Processing for Solar Physics: Applications and Implications for Big Solar Data
Karl Battams

15:10-15:40

Break

15:40-16:05

Iterative Refinement of Multiple Targets Tracking of Solar Events
Dustin Kempton, Karthik Ganesan Pallai & Rafal Angryk

16:05-16:30

Spatiotemporal Indexing Techniques for Efficiently Mining Spatiotemporal Co-occurrence Patterns
Berkay Aydin, Dustin Kempton, Vijay Akkineni, Shaktidhar Gopavaram, Karthik Ganesan Pillai & Rafal Angryk

16:30-16:55

Improved data exploitation for DKIST and high-resolution solar observations
Kevin Reardon & Steve Berukoff

16:55-17:20

Closing Discussion

 

 

Workshop 13:  First Hands-On Workshop on Leveraging High Performance Computing Resources for Managing Large Datasets

 

8:30 AM - 12:10 PM

Morning Session

8:30 AM - 8:45 AM

Opening Remarks

8:45 AM - 9:15 AM

Computing and Data Management at the Joint Genome Institute, by Kirsten Fagnan (NERSC)

9:15 AM - 9:45 AM

Unlocking the Power of High Performance Computing for Big Humanities Data Curation, by Jessica Trelogan (Institute of Classical Archaeology, UT Austin)

9:45 AM - 10:15 AM

IDC Update on How Big Data Is Redefining High Performance Computing, by Earl Joseph (IDC)

10:15 AM - 10:30 AM

Break

10:30 AM - 11 AM

Mitigating Big Data management Challenges Using HPC, by Ritu Arora (TACC)

11 AM - 12:10 PM

Introduction to Linux & TACC resources, hands-on session, by Ritu Arora (TACC)

12:10 PM - 1:30 PM

Lunch

Workshop participants who are not supported by the NSF travel award are on their own for lunch.

Networking and mentoring activities for students sponsored by the NSF travel grant, coordinated by Elizabeth Bautista (NERSC) and Valerie Shilling (TACC)

1:30 PM - 6:00 PM

Afternoon Session

1:30 PM - 1:40 PM

Introduction to the test-case to be used for further exercises, by Ritu Arora (TACC):

1:40 PM - 2:50 PM

Hands-on exercises on data transfer, calculating checksum, and metadata extraction by Ritu Arora (TACC)

2:50 PM - 3:20 PM

Design and development of data management workflows on HPC resources, Ritu Arora (TACC)

3:20 PM - 3:35 PM

Break

3:35 PM - 4:00 PM

Improv Session, by Raquell Holmes (Improvscience)

4:00 PM - 6:00 PM

Hackathon Sessions, coordinated by TACC and NERSC staff

 

 

 

 

 

Workshop 15:  Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH)

 

8:00-8:45

opening/ Invited talk from Texas Advanced Computing Center

8:45 -10:00  

session 1, Methods and Designs for Big Data System

 

Guangchen Ruan, Hui Zhang, and Beth Plale, Parallel and Quantitative Sequential PatternMining for Large-scale Interval-based Temporal Data

Yongen Yu, Hongbo Zou, Wei Tang, and Liwei Liu, A CCG Virtual System for Big Data Application Communication Costs Analysis

Wuheng Luo, Bo Liu, and Allie Watfa, An Open Schema for XML Data in Hive

Greg Sand, Leonidas Tsitouras, George Dimitrakopoulos, and Vassillis Chatzigiannakis, A Big Data Aggregation, Analysis and Exploitation Integrated Platform for Increasing Social Management Intelligence

Julian Krumeich, Dirk Werth, Jens Schimmelpfennig, Sven Jacobi, and Peter Loos, Advanced Planning and Control of Manufacturing Processes in Steel Industry through Big Data Analytics: Case Study and Architecture Proposal

10:00-10:20

coffee break

10:20 -11: 00

Invited talk from Oracle Research Team 

11:00-12:00  

Session 2, Practical Big Data Use Cases

 

Jian Zou and Hui Zhang, High-Frequency Financial Statistics with Parallel R and Intel Xeon Phi Coprocessor

Sabrina Azzi, Cindy Dallaire, Abdenour Bouzouane, Bruno Bouchard, and Sylvain Giroux, Human activity recognition in big data smart home context

Jennifer Shin, Investigating the Accuracy of the openFDA API using the FDA Adverse Event Reporting System (FAERS)

HILARY CHENG, Yi Chuan Lu, and Chih-Cheng Hsu, A Visualized Data Analysis for Bogus Business Entities Detection

 

 

 

 

Workshop 16:  IEEE Big Data Workshop on Semantics for Big Data on the Internet of Things (SemBIoT 2014)

 

1:30 - 1:35

Opening Remarks and Keynote Introduction

1.35 - 2:30

Keynote Talk: "Big Data and Semantic Web Meet Applied Ontology 

-       Ram Sriram    - (NIST)

2:30 – 3:00

Handling smart environment devices, data and services at the semantic level with the FI-WARE core platform"   (Regular paper)   

 -       Fano Ramparany, Fermin Galan Marquez, Javier Soriano, and Tarek Elsaleh.

3:00 – 3.25

Situation Aware Computing for Big Data  (Short paper)

-       Eric Chan, Dieter Gawlick, Adel Ghoneimy, and Zhen Liu

3.25 – 3.45

Coffee Break

3.45 – 4.40

Invited Talk:  Transforming Outcomes in Vertical Industries in the IoE era: From Seeing to Foreseeing

-       Rajesh Vargheese – (CISCO)

4.40 – 5.10

Topic-Specific Post Identification in Microblog Streams  (Regular Paper)

-       Shanika Karunasekera, Aaron Harwood, Sameendra Samarawickrama, Kotagiri Ramamohanarao, and Garry Robins

5.10 – 5.40

An IoT/IoE Enabled Architecture Framework for Precision On Shelf Availability: Enhancing Proactive Shopper Experience  (Regular Paper)

-       Rajesh Vargheese and Hazim Dahir

 

 

 

 

Workshop 17:  The First IEEE Workshop on Big Data in Computational Epidemiology

 

Date: Oct. 27th, 2014

Time: 08:00 - 12:00

Venue: Severn

 

8:00-10:00

Session I: Invited Talks (Session Chair: Sandeep Gupta)

8:10-8:45

Computational Biosecurity via Modeling of Immunophenotypic Heterogeneity

Saumyadipta Pyne (University of Hyderabad Campus, India)

8:45-9:20

Time-Critical Ebola Modeling and Response: A Big-Data Informatics Approach

Keith Bisset (Virginia Tech, USA)

9:20-9:55

Data Driven Methods for Disease Forecasting

Prithwish Chakraborty (Virginia Tech, USA)

10:00-10:20

Coffee Break

10:20-11:50

Session II: Paper Presentations (Session Chair: Jiangzhuo Chen)

 

Learning Machines for Computational Epidemiology

Magnus Boman and Daniel Gillblad

 

Big data problems on discovering and analyzing causal relationships in epidemiological data

Yiheng Liang and Armin Mikler

 

Epidemiological Modeling of Bovine Brucellosis in India

Gloria Kang, L Gunaseelan, and Kaja Abbas

 

Spatial Big Data Analytics of Influenza Epidemic in Vellore, India

Daphne Lopez, M. Gunasekaran, B. Senthil Murugan, Harpreet Kaur, and Kaja Abbas

11:50-12:00

Closing Remarks and Discussions

 

 

 

 

Workshop 18:  Large Scale Data Analytics in Transportation and Railway Infrastructure

 

TIME

TOPIC/PAPER

SPEAKERS

8:00- 8:10

INTRODUCTION

8:10-8:40

Keynote Address

Nii Attoh-Okine

8:40- 9:00

Predicting flight arrival times with a multistage model

Gabor Takacs

9:00- 9:20

Efficient Traffic Speed Forecasting Based on Massive Heterogeneous Historical Data

Xing-Yu Chen, Hsing-Kuo Pao, and Yuh-Jye Lee

9:20- 9:40

Multi-Objective Optimization for Resilient Airline Networks Using Socioeconomic-Environmental Data

Hidefumi Sawai and Aki-Hiro Sato

9:40- 10:00

A Dynamic Programming Approach for 4D Flight Route Optimization”

Christian Kiss-Tóth and Gabor Takacs

10:00-10:20

COFFEE BREAK

10:20-10:40 

Evaluating Structural Engineering Finite Element Analysis Data Using Multiway Analysis

 

Matija Radovic and Jennifer McConnell

10:40-1:00

Multiway Analysis of Bridge Structural Types in the National Bridge Inventory (NBI)

Offei Adarkwa, Thomas Schumacher, and Nii Attoh-Okine

11:00-11:20

Topological Models of Document-Query Sets in Retrieval for Enterprise Information Management

Vinay Deolalikar

11:20-11:40

Metaheuristics in Big Data: An Approach to Railway Engineering

Silvia Galvan Nunez and Nii Attoh-Okine

11:40-12:00

Facilitating Maintenance Decisions on the Dutch Railways Using Big Data: The ABA Case Study

Alfredo Núñez, Jurjen Hendriks, Zili Li, Bart De Schutter, and Rolf Dollevoet

12:00-12:40 

LUNCH

12:40-13:00

Applications of Linked Data in the Rail Domain

Christopher Morris, John Easton, and Clive Roberts

13:00 – 13:20

Ontology-driven Data Integration for Railway Asset Monitoring Applications

Jonathan Tutcher

13:20 – 13:40

Some Examples of Big Data in Railroad Engineering

Allan Zarembski

13:40 – 14:00

Spatial Data Analysis of Complex Urban System

Farideddin Peiravian, Amirhassan Kermanshah, Sybil Derrible, and Clio Andris

14:00 – 14:20

Big Data Challenges in Railway Engineering

Nii Attoh-Okine

14:20 – 14:40

CLOSING REMARKS

 

 

 

 

Workshop 19:  2nd Workshop on Scalable Cloud Data Management

 

8:00-8:50

Invited Talk by Scharam Dustdar

8:50-10:05

Session I

 

The Best of Two Worlds: Integrating IBM InfoSphere Streams with Apache YARN by Zubair Nabi, Rohit Wagle, and Eric Bouillet

Taking an Electronic Ticketing System to the Cloud: Design and Discussion by Filipe Araujo, Marilia Curado, Pedro Furtado, and Raul Barbosa A Relational Database Schema on the Transactional Key-Value Store Scalaris by Nico Kruber, Florian Schintke, and Michael Berlin

10:05-10:20

Coffee Break

10:20-12:00

Session II

 

A Contention Aware Hybrid Evaluator for Schedulers of Big Data Applications in Computer Clusters by Shouvik Bardhan and Daniel Menasce,

RuleMR: Classification Rule Discovery with MapReduce by Vasilis Kolias, Constantinos Kolias, Ioannis Anagnostopoulos, and Eleftherios Kayafas

Temporal Bipartite Projection and Link Prediction for Online Social Networks by Tsunghan Wu, Sheau-Harn Yu, Wanjiun Liao, and Cheng-Shang Chang

Community Structure Analysis in Big Climate Data by Michael McGuire and Nam Nguyen

 

 

 

 

Workshop 20:  Big Humanities Data

 

Chairs:  Mark Hedges, Tobias Blanke, Richard Marciano

  • Brett Bobley: NEH, Director of the Office of Digital Humanities, USA
  • Bob Horton: IMLS, Associate Deputy Director for Library Services, Discretionary Programs, USA
  • Crystal Sissons: SSHRC, Senior Program Officer at Research Grants and Partnerships Division, CANADA
  • Christie Walker: AHRC, Strategy & Development Manager, UK

 

 

9:00-9:15

Welcome

9:15-10:00

Keynote (30 min + 15 min discussion)

Opportunities from Big Humanities Data for Holocaust Research and Education

Michael LEVY  Director of Digital Collections

Michael HALEY GOLDMAN  Director of Global Classroom and Evaluation

United States Holocaust Memorial Museum

Overview: The United States Holocaust Memorial Museum continues to collect and digitize the material evidence of the best documented crime of the 20th century – building the Collection of Record for the Holocaust. But as more and more of this collection becomes digital – and as we create greater quantities of historical data about this history – we have the opportunity to go beyond traditional approaches to scholarship and education. How can the new techniques and tools being developed for big data change how we explore what happened during the Holocaust? And how does digital history offer new contexts for learning about the Holocaust? In this brief presentation, we will describe the current state of Museum collections, project how this collection will grow in the next decade, and raise questions about what the future of Holocaust research and education can be.

10:00-10:20

Coffee Break

10:20-12:10

Morning Session (1 hour 50 min)

 

THEMES: complexity / scale / historical analysis

  1. Scaling Historical Text Re-use (Marco BÜCHLER, Emily Franzini, Greta Franzini, and Maria Moritz)
      • LONG PAPER — SLIDES
  2. The Infra-City: The Exceptional and the Everyday in Social Media (Lev Manovich, Alise TIFENTALE, Mehrdad Yazdani, and Jay Chow)
      • LONG PAPER — SLIDES
  3. Revolutionary Entities: Turning Data into Knowledge to Drive Personalized Exploration of The Irish Rising of 1916 (Owen CONLAN, Alexander O’Connor, Órla Ní Loinsigh, Gary Munnelly, Séamus Lawless, and Rachel Murphy)
      • LONG PAPER — SLIDES

 

11:05-11:15   Break

 

THEMES: news / film

  1. On the Coverage of Science in the Media: A Big Data Study on the Impact of the Fukushima Disaster (Thomas LANSDALL-WELFARE, Saatviga Sudhahar, Guiseppe Veltri, and Nello Cristianini)
    • LONG PAPER — SLIDES
  2. The DEEP FILM Access Project: Ontology and Metadata Design for Digital Film Production Assets (Sarah ATKINSON, Roger EVANS, and Jos Lehmann)
    • SHORT PAPER — SLIDES

THEMES: frameworks / infrastructure

  1. Probabilistic Estimates of Attribute Statistics and Match Likelihood for People Entity Resolution (Xin WANG, Ang Sun, Hakan Kardes, Siddharth Agrawal, Lin Chen, and Andrew Borthwick)
    • LONG PAPER — SLIDES
  2. BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences (Muhammed Asif Saleem, Blesson VARGHESE, and Adam Barker)
    • LONG PAPER — SLIDES

12:10-1:10

Lunch (not provided)

1:10-3:15

Afternoon Session

 

THEMES: geospatial / mobile

  1. Dealing with Heterogeneous Big Data When Geoparsing Historical Corpora (C.J. Rupp, Paul RAYSON, Ian Gregory, Andrew Hardie, Amelia Joulain, and Daniel Hartmann)
    • SHORT PAPER — SLIDES
  2. Mining Mobile Youth Cultures (Tobias BLANKE, Giles Greenway, Jennifer Pybus, and Mark Cote)
    • SHORT PAPER — SLIDES

Projects from ‘Digging into Data’ program (7 total):

  1. Mining Microdata: Economic Opportunity and Spatial Mobility in Britain and the United States, 1850-1881 (Peter Baskerville, Lisa Dillon, Kris Inwood, Evan ROBERTS, Steven Ruggles, and John Robert Warren)
    • LONG PAPER — SLIDES
  2. Understanding the Role of Medical Experts during a Public Health Crisis: Digital Tools and Library Resources for Research on the 1918 Spanish Influenza (E. Thomas EWING, Samah Gad, Naren Ramakrishnan, and Jeffrey S. Reznick)
    • LONG PAPER — SLIDES

 

2:00-2:15    Break

 

  1. Scaled Entity Search: A Method for Media Historiography and Response to Critiques of Big Humanities Data Research (Eric HOYT, Kit Hughes, Derek Long, Kevin Ponto, and Anthony Tran)
    • LONG PAPER — SLIDES
  2. A Computational Pipeline for Crowdsourced Transcriptions of Ancient Greek Papyrus Fragments (Alex Williams, John Wallin, Haoyu Yu, Marco Perale, Hyrum Carroll, Anne-Francoise Lamblin, Lucy Fortson, Dirk Obbink, Chris Lintott, and James BRUSUELAS)
    • LONG PAPER — SLIDES
  3. Scientific Findings as Big Data for Research Synthesis: The metaBUS Project (Frank BOSCO, Krista Uggerslev, and Piers Steel)
    • SHORT PAPER — SLIDES
  4. Metadata Infrastructure for the Analysis of Parliamentary Proceedings (Richard GARTNER)
    • SHORT PAPER — SLIDES
  5. Integrating Data Mining and Data Management Technologies for Scholarly Inquiry (Ray Larson, Paul Watry, Richard MARCIANO, John Harrison, Chien-Yi Hou, Luis Aguilar, Shreyas, and Jerome Fuselier)
    • SHORT PAPER — SLIDES

3:15-3:45

Coffee Break

3:45-4:45

Funders Panel and discussion on the future of big data in the humanities

 

 

 

 

Workshop 21:  Complexity for Big Data

 

Erin-Elizabeth Durham, Andrew Rosen, and Robert Harrison, "A Model Architecture for Big Data applications using Relational Databases"

  

Ruoqian Liu, Ankit Agrawal, Wei-Keng Liao, and Alok Choudhary, "Search Space Preprocessing in Solving Complex Optimization Problems"

    

Walid Shalaby, Wlodek Zadrozny, and Sean Gallagher, "Knowledge Based Dimensionality Reduction for Technical Text Mining"

    

Xiaoguang Wang, Xuan Liu, and Stan Matwin, "A Distributed Instance-weighted SVM Algorithm on Large-scale Imbalanced Datasets"

 

Coffee Break

Philippe Calvez and Eddie Soulier, "Sustainable Assemblage for Energy (SAE) inside Intelligent Urban Areas How massive heterogeneous data could help to reduce energy footprints and promote sustainable practices and an ecological transition"

   

Xuan Liu, Xiaoguang Wang, Bo Liu, and Stan Matwin, "Vessel Route Anomaly Detection with Hadoop MapReduce"

 

Jiazhen Nian, Shan Jiang, and Yan Zhang, "HBGSim: A Structural Similarity Measurement over Heterogeneous Big Graph"

 

Eric L. Goodman, Edward Jimenez, Cliff Joslyn, David Haglin, Sinan al-Saffar, and Dirk Grunwald, "Optimizing Graph Queries with Graph Joins and Sprinkle SPARQL"

 

 

 

Posters

 

 Poster ID

Poster Title

P201

Biva Shrestha, Ranjeet Devarakonda, and Giriprakash Palanisamy,An open source framework to add spatial extent and geospatial visibility to Big Data

 

P202

Tugdual Sarazin, Mustapha Lebbah, and Hanane Azzag, Biclustering using Spark-MapReduce

 

P204

Joshua Westgard, The Bot Will Serve You Now: Automating Access to Archival Materials

 

P205

Jose Teixeira, Developing a Cloud Computing Platform for Big Data: The OpenStack Nova case

 

P206

Bing Zhu, Hui Li, and Kenneth Shum, Repair Efficient Storage Codes via Combinatorial Configurations

 

P209

Nick Manfredi, Darakhshan J. Mir, Shannon Lu, and Dominick Sanchez, Differentially Private Models of Tollgate Usage in Metropolitan Areas: The Milan Tollgate Data Set

 

P210

Ranjeet Devarakonda, Biva Shrestha, and Giriprakash Palanisamy,OME: A tool for generating and managing metadata to handle BigData

 

P212

Syunya Okuno, Hiroki Asai, and Hayato Yamana, A Challenge of Authorship Identification for Ten-thousand-scale Microblog Users

 

P214

Niall Gaffney, Christopher Jordan, Tommy Minyard, and Dan Stanzione, Building Wrangler: A Transformational Data Intensive Resource for the Open Science Community

 

P215

Akira Kinoshita, Atsuhiro Takasu, and Jun Adachi, Real-Time Traffic Incident Detection Using Probe-Car Data on the Tokyo Metropolitan Expressway

 

P216

Fan Jiang, Michael Shoffner, Claris Castillo, and Charles Schmitt,Enabling Genomic Analysis on Federated Clouds

 

P217

Anirudh Kadadi, Rajeev Agrawal, Christopher Nyamful, and Rahman Atiq, Challenges of Data Integration and Interoperability in Big Data

 

P218

Robert Warren and Bo Liu, Language, Cultural Influences and Intelligence in Historical Gazetteers of the Great War

 

P219

Vandana P. Janeja, Ali Azari, Josephine Namayanja, and Brian Heilig, B-dIDS: Mining Anomalies in a Big-distributed Intrusion Detection System

 

P220

Zubair Shah and Abdun Mahmood, A Summarization Paradigm for Big Data

 

P221

Jin Soung Yoo and Douglas Boulware, Incremental and Parallel Spatial Association Mining

 

P222

Julien Amelot, Peter Bajcsy, Anne Plant, and Mary Brady, Machine Learning and Interactive Visualization applied to TB-sized Images of Stem Cells

 

P223

Gaël Chareyron, Jerome Da-Rugna, and Thomas Raimbault, Big Data: a new challenge for tourism

 

P224

Thomas Raimbault, Gaël Chareyron, and Corinne Krzyzanowski-Guillot, Cognitive Map of Tourist Behavior based on Tripadvisor

 

P225

Thomas Hassan, Rafael Peixoto, Christophe Cruz, Aurélie Bertaux, and Nuno Silva, Semantic HMC for Big Data Analysis

 

P226

IOANNIS MYTILINIS, Ioannis Giannakopoulos, IOANNIS KONSTANTINOU, KATERINA DOKA, and NECTARIOS KOZIRIS,MoDisSENSE: A Distributed Platform for Social Networking Services over Mobile Devices

 

P227

Madian Khabsa, Pucktada Treeratpituk, and C Lee Giles, Large Scale Author Name Disambiguation in Digital Libraries

 

P229

Roberto Espinosa, Larisa Garriga, Jose Jacobo Zubcoff, and Jose-Norberto Mazon, Linked Open Data Mining for Democratization of Big Data

 

P230

Saima Aman, Charalampos Chelmis, and Viktor Prasanna,Addressing Data Veracity in Big Data Applications

 

P232

Ashiq Imran, Rajeev Agrawal, Jessie Walker, and Anthony Gomes, A Layer Based Architecture for Provenance in Big Data

 

P235

Erin-Elizabeth Durham, Chinua Umoja, J.T. Torrance, Andrew Rosen, and Robert Harrison, A Novel Approach to Determine Docking Locations Using Fuzzy Logic and Shape Determination

 

P236

Haozhen Zhao, Sharding for Literature Search via Cutting Citation Graphs

 

P237

Andy Doyle, Graham Katz, Kristen Summers, Chris Ackermann, Ilya Zavorin, Zunsik Lim, Sathappan Muthiah, Liang Zhao, Chang-Tien Lu, Patrick Butler, Rupinder Paul Khandpur, Youssef Fayed, and Naren Ramakrishnan, The EMBERS Architecture for Streaming Predictive Analytics

 

P239

Ioannis Giannakopoulos, CELAR: Automated Application Elasticity Platform

 

 

 

 

 

 

Specials Session I: “From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems”

 

u  Session Agenda (Tentative)


Session Chair: Sudarsan Rachuri and Ronay AK

Venue: JUDICIARY SUITE

Time: 8:30-17:00

 

 

 

Time

Title

Opening and Welcoming Speech

08:30 – 09:00

(30 min)

Dr. Sudarsan Rachuri, NIST

Invited Speaker

09:00-9:40

Dr. Athulan Vijayaraghavan, CTO, System Insights

Title: The Internet of Manufacturing Things

09:40-10:00

Q&A

 

10:00-10:20

Coffee Break

Keynote Lectures

10:20 – 10:40

(20 min)

Dr. Ashit Talukder, NIST

Title: Data–Driven Smart Manufacturing: Challenges and Solutions 

10:40 – 11:00

(20 min)

Matteo Bellucci, GE Global Research

Title: Brilliant Factory at GE

11:00 – 11:20

(20 min)

Juergen Heit, Robert Bosch North America

Panel

11: 20–12:30

(70 min)

Panel Title: Current Issues, Challenges and Opportunities in Deploying Big Data Analytics for Smart Manufacturing Systems.

Panelists:

Dr. Ashit Talukder, NIST

Prof. Kincho H. Law, Stanford University

Prof. Sankaran Mahadevan, Vanderbilt University

Dr. Athulan Vijayaraghavan, CTO, System Insights

Matteo Bellucci, GE Global Research

Juergen Heit, Robert Bosch North America

 

12:30 – 14:00

Lunch

Paper Presentations

14:00-14:20

Paper #1 “CloudMan: A Platform for Portable Cloud Manufacturing Services” by Soheil Qanbari, Samira Mahdi Zadeh, Soroush Vedaie, and Schahram Dustdar, TUW, Austria.

14:20 – 14:40

Paper #2 “An Intelligent Machine Monitoring System Using Gaussian Process Regression for Energy Prediction” by Raunak Bhinge, Jinkyoo Park, Nishant Biswas, Moneer Helu, David Dornfeld, Kincho Law, and Sudarsan Rachuri, Berkeley, USA

14:40 – 15:00

Paper #3 “Building a Rigorous Foundation for Performance Assurance Assessment Techniques for ‘Smart’ Manufacturing Systems” by Utpal Roy, Yunpeng Li, and Bicheng Zhu, Syracuse University, USA

15:00 – 15:20

Paper #4 “Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing” by David Lechevalier, Anantha Narayanan, and Sudarsan Rachuri, NIST, USA

 

15:20 – 15:40

Coffee Break

Paper Presentation

15:40-16:00

Paper #5 “Uncertainty Quantification in Performance Evaluation of Manufacturing Processes” by Saideep Nannapaneni and Sankaran Mahadevan, Vanderbilt University, USA

16:00 – 16:20

Paper #6 “A System Architecture for Manufacturing Process Analysis based on Big Data and Process Mining Techniques” by Hanna Yang, Minseok Song, and Seongjoo Kim, UNIST, South Korea

16:20 – 16:40

Paper #7 “Toward Smart Manufacturing: Monitoring, Analysis, Planning and Execution using Decision Guidance Analytics” by Alexander Brodsky, Mohan Krishnamoorthy, Daniel Menasce, Guodong Shao, and Sudarsan Rachuri, George Mason University, USA

Poster Session

16:40 – 17:00

Smart Manufacturing Systems Design And Analysis (SMSDA) − Big Data Analytics In Manufacturing” by Ronay AK, Sudarsan Rachuri and Seung-Jun Shin, NIST, USA

 

 

 

 

 

Specials Session II on Big Data Representation and Processing in Data Science

 

u  Session Agenda (Tentative)

 

Session Chair: T.Y. Lin,

Venue: Susquehanna

Time: 13:30-18:30

 

 

Session Opening

- Featured Talk #1


Stochastic Finite Automata for the Translation of DNA to Protein

Speaker: Tsau-Young Lin and Asmi H. Shah

 

 

BDRP206

Researching Persons & Organizations AWAKE: From Text to an Entity-Centric Knowledge Base

Elizabeth Boschee, Marjorie Freedman, Saurabh Khanwalker, Anoop Kumar, Amit Srivastava, and Ralph Weischedel

 

BDRP210

Integrating Existing Large Scale Medical Laboratory Data Into the Semantic Web Framework

Newres Al Haider, Samina Abidi, William van Woensel, and Syed SR Abidi

 


- Featured Talk #2


Data mining and sharing tool for high content screening large scale biological image data

Speaker: Asmi H. Shah

 

 

BDRP213

Path Knowledge Discovery: Association Mining Based on Multi-category Lexicons

Chen Liu, Wesley W. Chu, Fred Sabb, D. Stott Parker, and Joseph Korpela

 

BDRP208

Extracting Discriminative Features in Multivariate Data from Heterogeneous Sensors

Om Patri, Abhishek Sharma, Haifeng Chen, Guofei Jiang, Anand Panangadan, and Viktor Prasanna

 

BDRP204

Statistical Technique for Online Anomaly Detection Using Spark Over Heterogeneous Data from Multi-source VMware Performance Data

Mohiuddin Solaimani, Mohammed Iftekhar, Latifur Khan, and Bhavani Thuraisingham

 

BDRP203

A Building Performance Evaluation & Visualization System

Georgios Stavropoulos, Stelios Krinidis, Dimosthenis Ioannidis, Konstantinos Moustakas, and Dimitris Tzovaras

 


Discussions for future Planes


Adjourn & Refreshments

 

 

 

 

 

 

Doctoral Consortium

 

u  Session Agenda (Tentative)

 

Session Chair: Jingrui He (Arizona State University)

Venue: Patuxent

Time: 8:25-15:00

 

 

8:25 - 8:30

Opening remarks

8:30 - 9:00

Presentation

by Lena Mashayekhy (Wayne State University)

9:00 - 9:15

Comments

from mentor Steven Y. Ko (the University at Buffalo, the State University of New York)

9:15 - 9:45

Presentation

by Jialin Liu (Texas Tech University)

9:45 - 10:00

Comments

from mentor Sastry S Duri (IBM Research)

10:00 - 10:30

Coffee break

10:30 - 11:00

Presentation

by Debdipto Misra (George Mason University)

11:00 - 11:15

Comments

from mentor Duen Horng (Polo) Chau (Georgia Tech)

11:15 - 11:45

Presentation

by Hiba Baround (University of Oklahoma)

11:45 - 12:00

Comments

from Chih-Jen Lin (National Taiwan University)

12:00 - 13:00

Lunch on your own

13:00 - 14:00

Individual discussions

between students and mentors

14:00 - 15:00

Open discussion (attended by all)

15:00

Adjourn

 

 

 

 

 

Tutorials :4

 

 

TUTORIAL 1: Big Data Stream Mining

Presenters: Gianmarco De Francisci Morales, Joao Gama, Albert Bifet, andWei Fan

 

Summary:

The challenge of deriving insights from big data has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. This tutorial is a gentle introduction to mining big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part discusses data stream mining on distributed engines such as Storm, S4, and Samza.

 

 

Content:

Fundamentals and Stream Mining Algorithms

     Stream mining setting

     Concept drift

     Classification and Regression

     Clustering

     Frequent Pattern mining

Distributed Big Data Stream Mining

     Distributed Stream Processing Engines

     Classification

     Regression

 

 

Short Bio.  

 

Gianmarco De Francisci Morales 's Profile

Gianmarco De Francisci Morales is a Research Scientist at Yahoo Labs Barcelona. He received his Ph.D. in Computer Science and Engineering from the IMT Institute for Advanced Studies of Lucca in 2012. His research focuses on large scale data mining and big data, with a particular emphasis on web mining and Data Intensive Scalable Computing systems. He is an active member of the open source community of the Apache Software Foundation working on the Hadoop ecosystem, and a committer for the Apache Pig project. He is the co-leader of the SAMOA project, an open-source platform for mining big data streams.        

 

Joao Gama's Profile

Joao Gama is a Researcher at LIAAD, University of Porto, working at the Machine Learning group. His main research interest is in Learning from Data Streams. He published more than 80 articles. He served as Co-chair of ECML 2005, DS09, ADMA09 and a series ofWorkshops on KDDS and Knowledge Discovery from Sensor Data with ACM SIGKDD. He is serving as Co-Chair of next ECM-PKDD 2015. He is author of a recent book on Knowledge Discovery from Data Streams.           

 

Albert Bifet's Profile

Albert Bifet is a Research Scientist at Huawei. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams.          

 

Wei Fan's Profile   

Wei Fan is the associate director of Huawei Noah’s Ark Lab. He received his PhD in Computer Science from Columbia University in 2001. His main research interests and experiences are in various areas of data mining and database systems, such as, stream computing, high performance computing, extremely skewed distribution, cost-sensitive learning, risk analysis, ensemble methods, easy-touse nonparametric methods, graph mining, predictive feature discovery, feature selection, sample selection bias, transfer learning, time series analysis, bioinformatics, social network analysis, novel applications and commercial data mining systems. His co-authored paper received ICDM’2006 Best Application Paper Award, he led the team that used his Random Decision Tree method to win 2008 ICDM Data Mining Cup Championship. He received 2010 IBM Outstanding Technical Achievement Award for his contribution to IBM Infosphere Streams. He is the associate editor of ACM Transaction on Knowledge Discovery and Data Mining (TKDD). Since he joined Huawei in August 2012, he has led his colleagues to develop Huawei StreamSMART – a streaming platform for online and real-time processing, query and mining of very fast streaming data. In addition, he also led his colleagues to develop a real-time processing and analysis platform of Mobile Broad Band (MBB) data.

 

 

 

 

 

TUTORIAL 2: Big ML Software for Modern ML Algorithms

Presenters:  Qirong Ho and Eric P. Xing

 

Summary:

Many Big Data practitioners are familiar with classical Machine Learning techniques such as Naive Bayes, Decision Trees, K­means, PCA, and Collaborative Filtering (to name but a few), and their implementations on popular Big Data systems such as Hadoop. Going beyond these classic techniques, a new generation of ML algorithms ­ for example, topic models, nonparametric Bayesian models, deep neural networks, and sparse regression ­ has been gaining popularity in both academia and industry, because they improve performance on existing tasks like recommendation and prediction, or even enable completely new ones such as topical visualization and image object detection. Initially, these algorithms were the exclusive privilege of large companies with the engineering resources to build their own cluster implementations from scratch. Today however, new open­source software platforms, such as GraphLab, Petuum and Spark, have democratized some or all of these advanced algorithms, putting them within reach of individual researchers and data analysts that do not mind getting their hands a little dirty. In this tutorial, you will learn about these emerging ML algorithms, the software platforms that can run them today, the ML­centric theory, principles and design of an ideal parallel ML system and how today’s platforms fit that idea, and the open research opportunities that have sprouted in this space between advanced ML and distributed systems.

 

 

Content:

Advanced, emerging ML algorithms:

     e.g. Deep Neural Networks, topic models, sparse regression

Open source software platforms that can run some or all of these algorithms at scale:

     e.g. GraphLab, Petuum and Spark

Principles, design and theory of an algorithmic and systems interface to BigML

     Pros and cons of each platform: when should you favor one over the other

Research opportunities in the space between advanced ML and distributed systems

 

 

 

 

 

 

 

TUTORIAL 3: Large-scale Heterogeneous Learning in Big Data Analytics

Presenters: Jun Huan

 

Summary:

Heterogeneous learning deals with data from complex real-world applications such as social networks, biological networks, internet of things among others. The heterogeneity could be found in multi-task learning, multi-view learning, multi-label learning, and multi-instance learning.  In this talk we will present our and other groups’ recent progresses for designing and implementing large-scale heterogeneous learning algorithms include multi-task learning, multi-view learning, transfer learning algorithms. The applications of these work in social network analysis and bioinformatics will be discussed as well..

 

Content:

We cover the recent progresses on the following aspects:

Multi-task learning (MTL) aims to train multiple related learning tasks together to reduce generalization error. MTL has been widely utilized in many application domains include bioinformatics, social network analysis, image processing among others.

Multi-view learning (MVL) aims to identify a model where data are collected from different sources (a.k.a. views). There is an intense discussion on how and to what extend multi-view may help.

Multi-label learning (MLL) aims to build classifier that assign multi-labels to an instance. It has wide applications in image annotation, recommender systems, and etc.

 

We cover the theoretic foundation of MTL/MVL/MLL learning algorithms using penalized maximum likelihood estimation, Bayesian MTL, and Gaussian process. We also cover the related algorithms such as MTL with known task relationship, multi-task & multi-view learning, learning with structured input and output. We also want to discuss a very important but less investigated area of scaling those learning algorithms to large-scale data. We plan to cover a few platforms that are suitable to support large-scale heterogeneous learning. Applications of heterogeneous learning in Bioinformatics, Health care informatics, Drug Discovery, Social network analysis will be reviewed.

 

 

Short Bio.  

 

Dr. Jun (Luke) Huan's Profile

Dr. Jun (Luke) Huan is a Professor in the Department of Electrical Engineering and Computer Science at the University of Kansas. He directs the Bioinformatics and Computational Life Sciences Laboratory at KU Information and Telecommunication Technology Center (ITTC) and the Cheminformatics core at KU Specialized Chemistry Center, funded by NIH. He holds courtesy appointments at the KU Bioinformatics Center, the KU Bioengineering Program, an adjunct professorship from the Department of Internal Medicine in the KU Medical School, and a visiting professorship from GlaxoSmithKline plc.. Dr. Huan received his Ph.D. in Computer Science from the University of North Carolina.

 

Dr. Huan works on data science, machine learning, data mining, big data, and interdisciplinary topics including bioinformatics. He has published more than 80 peer-reviewed papers in leading conferences and journals and has graduated more than ten graduate students including six PhDs. Dr. Huan serves the editorial board of several international journals including the Springer Journal of Big Data, Elsevier Journal of Big Data Research, and the International Journal of Data Mining and Bioinformatics. He regularly serves the program committee of top-tier international conferences on machine learning, data mining, big data, and bioinformatics.

 

Dr. Huan's research is recognized internationally. He was a recipient of the prestigious National Science Foundation Faculty Early Career Development Award in 2009. His group won the Best Student Paper Award at the IEEE International Conference on Data Mining in 2011 and the Best Paper Award (runner-up) at the ACM International Conference on Information and Knowledge Management in 2009. His work appeared at mass media including Science Daily, R&D magazine, and EurekAlert (sponsored by AAAS). Dr. Huan's research was supported by NSF, NIH, DoD, and the University of Kansas. 
 

 

 

 

 

TUTORIAL 4: Big Data Benchmarking

Presenters:  Chaitan Baru and Tilmann Rabl

 

Summary:

This tutorial will introduce the audience to the broad set of issues involved in defining big data benchmarks, for creating auditable industry-standard benchmarks that consider performance as well as price/performance. Big data benchmarks must capture the essential characteristics of big data applications and systems, including heterogeneous data, e.g. structured, semi- structured, unstructured, graphs, and streams; large-scale and evolving system configurations; varying system loads; processing pipelines that progressively transform data; workloads that include queries as well as data mining and machine learning operations and algorithms. Different benchmarking approaches will be introduced, from micro-benchmarks to application- level benchmarking.

 

Since May 2012, five workshops have been held on Big Data Benchmarking including participation from industry and academia. One of the outcomes of these meetings has been the creation of industry’s first big data benchmark, viz., TPCx-HS, the Transaction Processing Performance Council’s benchmark for Hadoop Systems. During these workshops, a number of other proposals have been put forward for more comprehensive big data benchmarking. The tutorial will present and discuss salient points and essential features of such benchmarks that have been identified in these meetings, by experts in big data as well as benchmarking. Two key approaches are now  being pursued—one, called  BigBench, is based  on extending the TPC- Decision Support (TPC-DS) benchmark with big data applications  characteristics.  The  other called Deep Analytics Pipeline, is based on modeling processing that is routinely encountered in real-life big data applications. Both will be discussed.

 

We  conclude  with  a  discussion  of  a  number  of  future  directions  for  big  data benchmarking

.

 

Content:

Introduction

     Introduction to benchmarking: What are TPC and SPEC; what is each organization’s role and approach to benchmarking.

     Characteristics of good industry standard benchmarks: Why has TPC-C lasted so long? Brief overview of TPCx-HS.

     Overview of big data benchmarking approaches: From micro-benchmarks to application-level pipelines.

     Applications scenarios and use cases: Big data scenarios and use cases that help define the application-level benchmark.

BigBench: In-depth discussion of an example, proposed big data benchmark

     Data generation: Synthetic data generation for big data.

     The Benchmarking process: Steps involved in setting up, executing, and verifying end-to-end benchmarks.

     Benchmark metrics: Existing metrics for industry standards, and appropriate metrics for big data benchmarks.

     Discussion of performance results on a small 6-node cluster at Intel and a large, 540-node cluster at Pivotal.

     Possible extensions to BigBench

Benchmarking challenges and future directions

     Modeling system failures in the benchmark; extrapolating from one scale factor to the next; benchmarking for new application scenarios, e.g. the Internet of Things.

Q&A.

 

 

Short Bio.  

 

Dr. Chaitan Baru and Dr. Tilmann Rabl 's Profile

The tutorial presenters are Dr. Chaitan Baru from the San Diego Supercomputer Center, UC San Diego, and Dr. Tilmann Rabl, from the Middleware Systems Research Group, University of Toronto.

 

Dr. Baru and Dr. Rabl have been collaborating since 2012 on the topic of big data benchmarking. They were both instrumental in starting the Workshops on Big Data Benchmarking series, and serve on the Steering Committee for these workshops. Five workshop have been held so far in May 2012 (San Jose), December 2012 (Pune, India), July 2013 (Xi’an, China), October 2013 (San Jose), and August 2014 (Potsdam, Germany). They are co-editors of the Springer Verlag Lecture Notes in Computer Science series on Specifying Big Data Benchmarks. They have also co-authored three papers:

Big  Data  Benchmarking  and  the  BigData  Top100  List,  C.  Baru,  M.  Bhandarkar,  R.  Nambiar, M.  Poess,  T.  Rabl,  Big  Data  Journal,  Vol.1,  No.1,  Mary  Ann  Liebert  Inc.  Publishers, http://online.liebertpub.com/toc/big/1/1.

Setting the Direction for Big Data Benchmark Standards, C. Baru, M. Bhandarkar, R. Nambiar, M.       Poess, T. Rabl, TPC Technical Conference, VLDB 2012, Aug 27-30, Istanbul, Turkey. http://link.springer.com/chapter/10.1007/978-3-642-36727-4_14.

Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data, Baru, Bhandarkar, Curino, Danisch, Frank, Gowda, Jacobsen, Jie, Kumar, Nambiar, Poess, Raab, Rabl, Ravi, Sachs, Sen, Yi, Youn, Proceedings of the TPC Technical Conference, VLDB 2014, September, Hangzhou, China.

 

Furthermore, Dr. Rabl is the Chair of the recently formed SPEC Research Group on Big Data Benchmarking, and Dr. Baru is co-Chair. Thus, even though the tutorial instructors are from different institutions, they have worked together closely for several years, and have a continuing working relationship.

 

 

 

 

 

Panel with Program Directors: Big Data Challenges and Opportunities

 

Panelists:

1)      Dr. Chaitanya Baru (NSF)

2)      Dr. Yuan Liu (NIH)

3)      Dr. David Kuehn (DoT)

4)      Dr.Tsengdar Lee (NASA)

5)      Dr. Sudarsan  Rachuri   (NIST)

6)      Mr. Matti Vakkuri (DIGILE)

 

 

Bios of Panelists

Dr. Chaitanya Baru (NSF)

Chaitan Baru currently serves as Senior Advisor for Data Science in the CISE Directorate at the National Science Foundation. He is on assignment from the San Diego Supercomputer Center, UC San Diego, where he is Distinguished Scientist and Associate Director of Data Initiatives. He has served in leadership positions in a number of national-scale cyberinfrastructure R&D initiatives across a wide range of science and engineering disciplines including, earth science, ecology, earthquake engineering, and biomedical informatics. In 2012, he initiated an industry-academia effort to define big data benchmarks via the Workshops for Big Data Benchmarking (WBDB). This has resulted in the recent formation of the SPEC Research Group on Big Data Benchmarking, which he co-chairs. He is co-editor of the Lecture Notes in Computer Science series entitled, Specifying Big Data Benchmarks, published by Springer Verlag. He co-chairs the National Institute for Standards and Technology’s (NIST) Public Working Group on Big Data. He is a member of the teaching faculty for the Masters in Advanced Studies program in Data Science and Engineering (MAS-DSE) in the Computer Science Department at UC San Diego.

Baru has co-edited the book, Geoinformatics: Cyberinfrastructure for the Solid Earth Sciences with Prof. Randy Keller, University of Oklahoma, published by Cambridge University Press (ISBN: 9780521897150).

Baru has a B.Tech (Electronics Engineering) from IIT Madras and an M.E. and Ph.D. (Electrical Engineering) from the University of Florida.

 

Dr. Yuan Liu (NIH)

Dr. Yuan Liu is the Chief of the Office of International Activities, and the Director of Computational Neuroscience and Neuroinformatics Program at the National Institute of Neurological Disorders and Stroke (NINDS), National Institutes of Health (NIH).

Dr. Liu leads the NINDS’ international activities, which focus on fostering international and global health research, training and collaborations. She also oversees the computational neuroscience and neuroinformatics program, which promotes collaborations between experimental, computational and informatics neuroscientists to advance the understanding of nervous system structure and function, and the mechanisms underlying nervous system disorders.

In addition, Dr. Liu has been serving as the NINDS representative on over 30 international, interagency and trans-NIH committees and working groups that develop international and computational biology and bioinformatics related programs, including many inter-agency initiatives (e.g., CRCNS, IMAG, and BigData), trans-NIH programs (e.g., Roadmap, BISTI, and BD2K), and Blueprint for Neuroscience activities (e.g., NIF, NITRC and Human Connectome Project). For her achievement and contribution, she received several NIH Director’s Awards and NIH Blueprint Neuroscience Research Directors Awards.

Dr. Liu received her bachelors and masters degrees in neurophysiology from Peking University in P.R. China, and her Ph.D. in neuroscience, under the mentorship of Prof. John G. Nicholls, from the Biozentrum, Universität Basel in Switzerland. Her research career was focused on the area of neurophysiology at single channel, synaptic and systems levels. Between 1999 and 2004, she managed the research portfolio centered on channels, synapses and circuit grants at NINDS. Prior to joining the NINDS, Dr. Liu was Program Director for Basic Neuroscience Research at

 

Dr. David Kuehn (DoT)

David Kuehn is the Program Manager for the Federal Highway Administration (FHWA) Exploratory Advanced Research Program. The Program Manager serves as the senior advisor to agency leadership on the communication and coordination of exploratory advanced research activities and fosters partnerships with other Federal agencies, national scientific societies and organizations, and the academic community in support of the Program.  The program focuses on longer term and higher risk research with the potential for transformational improvements to the transportation system.  David entered federal service as a Presidential Management Fellow.  Before working at the federal level, David worked in local government and as a consultant in southern California.  He holds a Masters of Public Administration from the University of Southern California and a B.A from the University of California, Irvine and is a member of the American Institute of Certified Planners (AICP).

 

Dr.Tsengdar Lee (NASA)

Dr. Tsengdar Lee manages the High-End Computing Program from NASA Headquarters. He is responsible for maintaining the high-end computing capability to support the agency's aeronautics research, human exploration, scientific discovery, and space operations missions. Lee is also the manager of the NASA Weather Data Analysis Program, focusing on the transition of research results into the operational forecast centers and the acceleration of operational use of research data. Two major activities include the multi-agency Joint Center for Satellite Data Assimilation and the Short-term Prediction Research and Transition Center.

In 2011, Lee served as Acting Chief Technology Officer (CTO) for Information Technology (IT) in the NASA Office of the Chief Information Officer. In this capacity, Lee funded agency-wide IT research and advanced prototyping and created NASA's IT Labs. He also chaired the CTO-IT Council.Lee joined NASA in 2001 as the High-End Computing Program Manager for the Earth Science Enterprise. He was responsible for the Earth science computational modeling needs, primarily focusing on weather and climate modeling. Between 2002 and 2006, Lee also managed the Earth Science Global Modeling Program. He funded research efforts to study the global climate change, weather forecasting, and hurricane prediction problems.Prior to 2001, Lee held positions as Senior Technical Advisor with Northrop Grumman Information Technology and Senior Staff Engineer with Litton PRC. He worked on the Advanced Weather Information Processing System (AWIPS) project for the National Weather Service. He was responsible for the rapid development, integration, and commercialization of the AWIPS client-server system. Lee also was a principal engineer on the effort to develop the AWIPS network monitoring and control system.

He was a Research Scientist and worked on the dispersion problem of bio-chemical agents during his short tenure with the Science Applications International Corporation between 1994 and 1996. Lee received two graduate degrees from Colorado State University, a PhD in Atmospheric Science in 1992 and an MS in Civil Engineering in 1988. Trained as a short-term weather modeler, his work focused on the integration of weather and ancillary geographical information data into weather models to produce reliable forecasts. His research pioneered the modeling of land surface hydrology’s impact on weather forecasting.

 

Dr. Sudarsan  Rachuri   (NIST)

Dr. Sudarsan Rachuri is the Program Manager for Smart Manufacturing Design and Analysis program at NIST. Prior to joining NIST, he was a research professor at George Washington University. His primary research objectives are to develop and transfer knowledge to industry about information models for sustainable and smart manufacturing, green products, big data analytics for manufacturing, system level analysis, and knowledge representation.  Specific focus is on identifying integration and technology issues that promote industry acceptance of information models, and standards, that will enable designers to develop products that are sustainable and manufactured using smart technologies in a distributed and collaborative environment. Dr. Rachuri's primary areas of interest are smart and sustainable manufacturing, scientific computing, CAD/CAM/CAE, design for Sustainability, data analytics, object-oriented modeling, and ontology.

Dr. Rachuri is an ASME Fellow, having been elected in 2012 for his significant contributions in the areas of information and semantic modeling of product life cycle management, and the application of measurement science for sustainable manufacturing.

 

Mr. Matti Vakkuri (DIGILE)

Mr. Matti Vakkuri, Program Director, Big Data, Tieto & Focus Area Director, DIGILE’s Data-to-Intelligence Programme

Mr. Matti Vakkuri graduated from Finnish Military Academy in 1993. He has  20 years of experience from areas of management, leadership, business development, security, quality, human resource management, project management, program management, offering development , crisis management and consulting in both governmental and private sector.

In his current position in Tieto his tasks are to enable the power of Big Data and its enormous impact on the customers’ businesses, develop and ramp-up the offering, sales and delivery capabilities, build competences in Big Data, Hadoop and data sciences, assure cross-organizational collaboration and network, evaluate partners, suppliers and competitors in Big Data market. Tasks include advocating and lobbying Big Data’s possibilities internally and externally in business operations, research and product development.  Since April 2013, in addition to his job in Tieto he has held a part-time occupation of Focus Area Director (the Head of the Program) for Digile’s Data to Intelligence research program. The program is focused on Big Data, data reserves and user-centric service development. The aim of the program is together with companies and research institutions to develop intelligent tools and methods for managing, refining and utilizing diverse data. The results of the program enable innovative business models and services. One of the program’s targets is to develop methods for Big Data analytics that handle complexity through fusion of heterogeneous data sources, and use adaptivity, context-sensitivity, scalability, and user relevance as the main methodological objectives.

From January, 2014 he has been a full member of Finland’s ministry of Transportation and communications Big Data working group which has built and written Finland’s national Big Data strategy draft in June, 2014.

His motto is "Management by leadership".

 

 

 

 

 

 

 

 

Last update: 22 Oct. 2014