What's New

      Call for Paper  

      Workshops

      Online Submission

      Highlights

      Important Dates

      Topics

      Organization

      Program Committee

      Program Schedule

      Keynote Speeches

      Accepted Papers

      Sponsors

      Registration

      Student Travel Award

      Visa to USA

      Hotel

      About San Francisco 

          

          

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

IEEE BigData 2013 Program Schedule

Hyatt Regency Santa Clara
CA, USA
Oct 6-9, 2013

Program

 

 October 5, 2013
 October 6, 2013
 October 7, 2013

 October 8, 2013
 October 9, 2013

 


 

Keynote Lecture: 60 minutes((about 45 minutes for talk and 15 minutes for Q and A)
Main conference regular paper: 25 minutes (about 20 minutes for talk and 5 minutes for Q and A)
Main conference short paper: 20 minutes (about 16 minutes for talk and 4 minutes for Q and A)

 


 

5-Oct

 

17:00-20:00

Registration: Ballroom E Foyer

 

 

6-Oct

 

07:30-18:00

Registration: Hotel Lobby West

Venue:

Ballroom AB (Ba-AB), Ballroom C (Ba-C), Ballroom D (Ba-D), Ballroom E (Ba-E),

Ballroom F (Ba-F), Ballroom G (Ba-G), Ballroom (H) (Ba-H)

08:30-12:10

Workshop

5

 

The First Workshop on Big Data Visualization

Workshop

6

Big Data and Science: Infrastructure and Service

Workshop

7

 

Scalable Machine Learning: Theory and Application

  Workshop

8

 

Big Data in Bioinformatics and Health Informatics

Workshop

12

 

Knowledge management and Big Data Analytics

Workshop

9

 

Scholarly Big Data: Challenges & issues

Tutorial 1

 

 

Online Learning for Big Data Analytics  (8-10am)

 

 

Tutorial 2

 

 

Large-Scale Click-stream and transaction log mining in practice

(10:20-12:20am)

Session Chairs

Kwan-Liu Ma

Shane Canon

Haiqin Yang

  Juan Huan et al.

Qing Liu

  Ingemar J. Cox

 

 

Venue:

Ba-AB

 

Ba-C

Ba-D

 

Ba-E

 

Ba-F

Ba-G

Ba-H

Ba-H

 

Coffee break:     10:00-10:20   Foyer

12:10-13:30

Lunch at your own

13:30-18:00

Workshop

5

 

The First Workshop on Big Data Visualization

Workshop

6

 

Big Data and Science: Infrastructure and Services

Workshop

7

 

Scalable Machine Learning: Theory and Applications

Workshop

8

 

Big Data in Bioinformatics and Health Informatics

Workshop

12

 

Knowledge management and Big Data Analytics

Workshop

9

 

Scholarly Big Data: Challenges & issues

Workshop 10

 

Scalable Cloud Data Management

 

   

 

 

 

 

 

 

 

 

 

Session Chairs

Kwan-Liu Ma

Shane
Canon

Haiqin
Yang

Juan Huan
et al.

Qing Liu

Ingemar J. Cox

Norbert Ritter

        

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

 

Ba-F

Ba-G

Ba-H

 

 

Coffee break:     15:40-16:00   Foyer

 

7-Oct

 

07:30-18:00

 

Registration: Hotel Lobby West

Venue:

Ballroom AB (Ba-AB), Ballroom C (Ba-C), Ballroom D (Ba-D), Ballroom E (Ba-E), Ballroom F (Ba-F)

08:10-08:25

Opening and Welcoming Speech

Conference Co-Chairs:

T.Y. Lin, Vijay Raghavanm, Benjamin Wah

Program Co-Chairs:

Ricardo Baeza-Yates, Geoffrey Fox, Cyrus Shahabi, Matthew Smith, Qiang Yang

Industry Program co-Chairs:

Rayid Ghani, Wei Han, Ronny Lempel, Raghunath Nambiar

BigData Steering Committee Chair:  

Xiaohua Tony Hu (Drexel University)

Venue:

Ba-AB

08:25-09:25

Session Chair:  Geoffrey Fox

Keynote Lecture 1:  The Berkeley Data Analytics Stack: Present and Future

 

Prof. Mike Franklin, AMP Lab, UC Berkeley, USA

Venue:

Ba-AB

09:25-09:45

Coffee Break : Foyer

Poster session setup: Ballroom  Foyer

09:45-12:00

Session AB1

 

Algorithms and Systems for Big Data Search

 

Session C1

 

Cloud/Grid/Stream Computing for Big Data

Session D1

 

Complex Big Data Applications

Workshop 1

 

Distributed Storage Systems and Coding for Bigdata

Workshop 3

 

Workshop on Big Data and Society

Session Chair

Umit Catalyurek

Natasha Balac

Qunzhi Zhou

Hui Li et al .

Yike Guo et al.

 

Venue

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

12:00-13:20

Lunch at your own

 

 Poster session setup: Ballroom  Foyer

13:20-15:20

Session AB2

 

Algorithms and Systems for Big Data Search

 

Session C2

 

High Performance/ Parallel Computing Platforms for Big Data

 

Session D2

 

Complex Big Data Applications

Workshop 1

 

Distributed Storage Systems and Coding for Bigdata

Workshop 3

 

Workshop on Big Data and Society

Session Chair

Michael Goodrich

Eugen Feller

Saumyadipta Pyne

Hui Li et al .

Yike Guo et al.

 

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

15:20-15:40

Coffee

 

 

 

 

15:40-17:40

Session AB3

 

Big Data Search Architectures, Scalability and Effciency

 

Session C3

 

High Performance/ Parallel Computing Platforms for Big Data

Session D3

 

Complex Big Data Applications

Workshop 1

 

Distributed Storage Systems and Coding for Bigdata

Workshop 3

 

Workshop on Big Data and Society

Session Chair

Peter Sanders

Toshimori

En-hui Yang

Hui Li et al .

Yike Guo et al.

 

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

18:30-20:30

Banquet:

Santa Clara Ballroom

 

 

 

 

 

 

 

 

8-Oct

 

08:00-18:00

Registration: Hotel Lobby West

Venue:

Ballroom AB (Ba-AB), Ballroom C (Ba-C), Ballroom D (Ba-D), Ballroom E (Ba-E), Ballroom F (Ba-F)

8:30-9:30

Session Chair:   Cyrus Shahabi

Keynote Lecture 2:  Using Crowdsourcing for Data Analytics

Prof. Hector Garcia-Molina, Stanford University, USA

Venue:

Ba-AB

9:30-9:50

Coffee Break: Ballroom Foyer

9:50-12:00

Session AB4

 

Large-scale Recommendation Systems and Social Media Systems

 

Session C4

 

Energy-efficient Computing for Big Data

Session D4

 

Data Preservation, Information Integration and Heterogeneous and Mult-structured Data Integration

 

Workshop 2

 

Big Data and the Humanities

Workshop 4

 

BPOE 2013

Session Chair

Noriaki Kawamae

Leonardo Bautista

Yong Chen

Mark Hedges et al.

Jianfeng Zhan et al.

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

12:00-13:20

Lunch provided by conference: TERRA COURTYARD

13:20-15:20

Session AB5

 

Link and Graph Mining

Session C5

 

New Computational Models for Big Data

Session D5

 

Spatiotemporal and Stream Data Management, Scientific Data Management

 

Workshop 2

 

Big Data and the Humanities

Workshop 4

 

BPOE 2013

Session Chair:

Qi Liao

Shestakov Denis

Frank Dehne

Mark Hedges et al.

Jianfeng Zhan et al.

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

15:20-15:40

Coffee Break: Ballroom Foyer

15:40-17:40

Session AB6

 

Link and Graph Mining, Mobility and Big Data

 

Session C6

 

Novel Theoretical Models for Big Data

Session D6

 

Scientific Data Management

Workshop 2

 

Big Data and the Humanities

Workshop 4

 

BPOE 2013

Session Chair

Abhirup Chakraborty

Weijia Xu

Andreas Rauber

Mark Hedges et al.

Jianfeng Zhan et al.

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

 

9-Oct

 

08:00-15:00

Registration:

Venue:

Ballroom AB (Ba-AB), Ballroom C (Ba-C), Ballroom D (Ba-D), Ballroom E (Ba-E), Ballroom F (Ba-F)

08:30-09:30

Session Chair: T.Y Lin

Keynote Lecture 3:  Security – A Big Question for Big Data

 

Prof. Roger Schell, University of Southern California, USA

 

Venue:

Ba-AB

09:30-09:50

Coffee Break: : Ballroom Foyer

9:50-12:00

Key Issues in Big Data Research Panel

Session E1

 

Industry and Government Program

Session D7

 

Database Management Challenges: Architecture, Storage, User Interfaces

 

Workshop 11

 

Big Data and Smarter Cities

Session E2

 

Industry and Government Program

Session Chair

T.Y.Lin

Avigdor Gal

Mihajlo Grbovic

Sambit Sahu

Nikos Papailiou

Venue:

Ba-AB

Ba-C

Ba-D

Ba-E

Ba-F

12:00-13:25

Lunch provided by conference: TERRA COURTYARD

13:30-14:30                                                                                                                

Session Chair:     Raghunath Nambiar                                                                                                               

Keynote Lecture 4:  Key Usage Patterns for Apache Hadoop in the Enterprise

                               Dr. Amr Awadallah, CTO, Cloudera, USA

 

Venue:                   Ba-AB                                                           

 

 

 

 

 

Session Chair

Venue:

Workshop 11

 

Big Data and Smarter Cities

 

 

Sambit Sahu

Ba-C

Session AB7

 

Privacy Preserving Big Data Collection/Analytics, Threat Detection using Big Data Analytics

 

Simon Chan

Ba-F

14:30-14:50 

Coffee Break:

Ballroom Foyer

 

 

 

14:40-16:50

 

 

 

 

 

Session Chair  

Big Data Funding Program Panel: Challenging and Opportunities

 

 

 

Vijay Raghavan

 

 

 Workshop 11

 

Big Data and Smarter Cities

 


Sambit Sahu

 

 

Venue:

Ba-AB

Ba-C

 

 

 

 

 

 

 

 

 

I Keynote Lectures: 4


Keynote 1:

 

Title:  The Berkeley Data Analytics Stack: Present and Future

 

Speaker:

Prof. Mike Franklin, UC Berkeley, USA

 

Abstract:

The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications requires a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical and intellectual barriers that had arisen during decades of evolutionary development. The vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (such as machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and en masse, as with crowdsourced human computation). To pursue this goal, we assembled a research team with diverse interests across computer science, forged relationships with domain experts on campus and elsewhere, and obtained the support of leading industry partners and major government sponsors. The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the nearly three years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS shows up prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support deep machine learning and to make people a full-fledged resource for making sense of Big Data.

In this talk, I'll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on the key components listed above and will address our current efforts on machine learning scalability and ease of use, and hybrid human/computer processing. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.

 

Short Bio:

Michael Franklin is the Thomas M. Siebel Professor of Computer Science at UC Berkeley, where he also serves as Director of the Algorithms, Machines and People Lab (AMPLab). The Berkeley AMPLab is a collaboration of over 60 researchers supported by Founding Sponsors Amazon Web Services, Google, and SAP, along with 17 other leading companies, the Darpa XData program, and an NSF Expeditions in Computing award. The latter was announced as part of the Obama Administration's Big Data research initiative in 2012. His research interests include large-scale data management and analytics, data integration, and hybrid human/computer data processing systems. He was founder and CTO of Truviso, a real-time data analytics company acquired by Cisco Systems in 2012. He is an ACM Fellow and two-time winner of the ACM SIGMOD Test of Time Award (2013 and 2004). He also recently received the Best Paper awards at ICDE 2013 and NSDI 2012, a "Best of VLDB 2012" selection, Best Demo awards at SIGMOD 2012 and VLDB 2011 and the Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is a committee member on the U.S. National Academy of Sciences study on Analysis of Massive Data and a Transportation Research Board committee on long-term data stewardship. Prof. Franklin received his Ph.D. in Computer Science from the University of Wisconsin-Madison in 1993.

 

Keynote 2:

Title: Using Crowdsourcing for Data Analytics

Speaker:

Prof. Hector Garcia-Molina, Stanford University, USA

 

Abstract:

It may sound contradictory to use humans to analyze big data, since humans cannot process huge amounts of data, may be error prone and are relatively slow. However, humans can do certain tasks much better than machines, e.g., tasks that involve image analysis or natural language.

In this talk I will discuss how humans can be judiciously used to improve data analytics by cleansing, clustering and filtering critical data. I will also briefly describe ongoing work at our Stanford InfoLab in this area

.

Short Bio:

Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford University, Stanford, California. He was the chairman of the Computer Science Department from January 2001 to December 2004. From 1997 to 2001 he was a member the President's Information Technology Advisory Committee (PITAC). From 1979 to 1991 he was on the faculty of the Computer Science Department at Princeton University, Princeton, New Jersey. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. He holds an honorary PhD from ETH Zurich (2007). Garcia-Molina is a Fellow of the Association for Computing Machinery and of the American Academy of Arts and Sciences; is a member of the National Academy of Engineering; received the 1999 ACM SIGMOD Innovations Award; is a Venture Advisor for Onset Ventures, and is a member of the Board of Directors of Oracle.

 

 

Keynote 3:


Title
: Security – A Big Question for Big Data

 

Speaker:

Prof. Roger Schell, University of Southern California, USA

 

Abstract:

Big data implies performing computation and database operations for massive amounts of data, remotely from the data owner’s enterprise. Since a key value proposition of big data is access to data from multiple and diverse domains, security and privacy will play a very important role in big data research and technology. The limitations of standard IT security practices are well-known, making the ability of attackers to use software subversion to insert malicious software into applications and operating systems a serious and growing threat whose adverse impact is intensified by big data. So, a big question is what security and privacy technology is adequate for controlled assured sharing for efficient direct access to big data. Making effective use of big data requires access from any domain to data in that domain, or any other domain it is authorized to access. Several decades of trusted systems developments have produced a rich set of proven concepts for verifiable protection to substantially cope with determined adversaries, but this technology has largely been marginalized as “overkill” and vendors do not widely offer it. This talk will discuss pivotal choices for big data to leverage this mature security and privacy technology, while identifying remaining research challenges.


Short Bio:

Dr. Roger R. Schell recently joined USC/ISI supporting their Masters of Cyber Security degree program. He is internationally recognized for originating several key modern security design and evaluation techniques, and he holds patents in cryptography, authentication and trusted workstation. For more than decade he has been co-founder and President of Aesec Corporation, a start-up company providing verifiably secure platforms. Previously Dr. Schell was co-founder and vice president for Gemini Computers, Inc., where he directed development of their highly secure (what NSA called “Class A1”) commercial product, the Gemini Multiprocessing Secure Operating System (GEMSOS). He was also the founding Deputy Director of NSA’s National Computer Security Center. He has been referred to as the "father" of the Trusted Computer System Evaluation Criteria (the "Orange Book"). Dr. Schell is a retired USAF Colonel. He received a Ph.D. in Computer Science from the MIT, an M.S.E.E. from Washington State, and a B.S.E.E. from Montana State. The NIST and NSA have recognized Dr. Schell with the National Computer System Security Award. In 2012 he was inducted into the inaugural class of the National Cyber Security Hall of Fame.

 

Keynote 4:

Title: Key Usage Patterns for Apache Hadoop in the Enterprise

Speaker:

Dr. Amr Awadallah, CTO, Cloudera, USA

 

Abstract:

Advances in computing capabilities are palpably evident throughout many industries manifest by unprecedented, large-scale data integration and inferencing. Branded as “big-data” in many cases, the question of whether such techniques can leverage advances in biomedicine and clinical practice are obvious. High-throughput clinical analytics, synthesizing genomic and clinical attributes of a particular patient, portends predictive models that can directly influence clinical care decisions. However, to make this widely shared vision practical and scalable, barriers attributable to data heterogeneity dominate. Methods and strategies to increase the comparability and consistency of healthcare related data will be discussed.


Short Bio:

Before co-founding Cloudera in 2008, Amr (@awadallah) was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after they acquired his first startup, VivaSmart, in July of 2000. Amr holds a Bachelor’s and Master’s degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University.

 

 

I Conference Paper Presentations

 

Session AB1: Algorithms and Systems for Big Data Search

Regular

BigD220 "4S: Scalable Subspace Search Scheme"
Hoang Vu Nguyen, Emmanuel Müller, and KlemensBöhm

Regular

BigD254 "Computing Betweenness Centrality in External Memory"
Lars Arge, Michael Goodrich, and Freek van Walderveen

Regular

BigD282 "NUMA-optimized Parallel Breadth-first Search on Multicore Single-node System"
YuichiroYasui, Katsuki Fujisawa, and Kazushige Goto

Short

BigD248 "A Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures"
Frank Dehne, Quan Kong, Andrew Rau-Chaplin, HamidrezaZaboli, and Rebecca Zhou

Short

BigD323 "Group-Scheme: A Universal SIMD-based Compression Scheme"
Xudong Zhang, Xin Zhao, Dongdong Shan, and Hongfei Yan

 

Session C1: Cloud/Grid/Stream Computing for Big Data

Short

BigD287 "On the Performance and Energy Efficiency of Hadoop Deployment Models"
Eugen Feller, LavanyaRamakrishnan, and Christine Morin

Short

BigD355 "Scalable and Robust Key Group Size Estimation For Reducer Load Balancing in MapReduce"
Wei Yan, Yuan Xue, and Bradley Malin

Short

BigD359 "Robot: An Efficient Model For Big Data Storage Systems Based On Erasure Coding"
Chao Yin, Jianzong Wang, ChangshengXie, Jiguang Wan, and Changlin Long

Short

BigD450 "Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing"
Qunzhi Zhou, YogeshSimmhan, and prasanna Viktor

Short

BigD453 "DDSN: Duplicate Detection to Reduce Both Storage and Bandwidth Consumption"
Jiaran Zhang, Xiaohui Yu, and Liwei Lin

Short

BigD414 "An Infrastructure for Automating Large-scale Performance Studies and Data Processing"
Deepal Jayasinghe, Josh Kimball, Tao Zhu, Siddharth Choudhary, and CaltonPu

 

Session D1: Complex Big Data Applications 

Regular

BigD288 "The BTWorld Use Case for Big Data Analytics: Description, MapReduce Logical Workflow, and Empirical Evaluation"
Tim Hegeman, BogdanGhit, MihaiCapotă, Jan Hidders, Dick Epema, and AlexandruIosup,

Regular

BigD311 "Modeling Heterogeneous Time Series Dynamics to Profile Big Sensor Data in Complex Physical Systems"
Bin Liu

Regular

BigD332 "Efficiently Extracting Frequent Subgraphs using MapReduce"
Wei Lu, Gang Chen, Anthony Tung, and Feng Zhao

Regular

BigD342 ” Opinion mining with word order''

Noriaki Kawamae

Short

BigD252 "HIG – An In-memory Database Platform Enabling Real-time Analyses of Genome Data"
Matthieu-P.Schapranow and HassoPlattner

 

Session E1: Industry and Government Program

Regular

N203 " Terabyte-sized Image Computations on Hadoop Cluster Platforms "
Peter Bajcsy, Antoine Vandecreme, Julien Amelot, Phuong Nguyen, Joe Chalfoun, and Mary Brady

Regular

N206 " A Fast and Scalable Method for Threat Detection in Large-scale DNS Logs "
Ron Begleiter, Yuval Elovici, Yona Hollander, Ori Mendelson, Lior Rokach, and Roi Saltzman

Regular

N207 " Hourglass: a Library for Incremental Processing on Hadoop "
Matthew Hayes and Sam Shah

Regular

N209 " Correlation-based Performance Analysis for Full-System MapReduce Optimization "
Qi Guo, Yan Li, Tao Liu, Kun Wang, Guancheng Chen, Xiaoming Bao, and Wentao Tang

Regular

N217 " Large Scale Ad Latency Analysis "
Mihajlo Grbovic, Jon Malkin, and Hirakendu Das

 

Session AB2: Algorithms and Systems for Big Data Search

Regular

BigD315 "A Distributed Vertex-Centric Approach for Pattern Matching in Massive Graphs"
ArashFard, M. UsmanNisar, LakshmishRamaswamy, John A. Miller, and Matthew Saltz

Regular

BigD318 "Fast Scalable Selection Algorithms for Large Scale Data"
Lee Thompson, WeijiaXu, and Daniel Miranker

Regular

BigD410 "Distributed Confidence-Weighted Classification on MapReduce"
NemanjaDjuric, MihajloGrbovic, and Slobodan Vucetic

Short

BigD350 "A Streaming Partitioning Approach to Processing Large Scale Distributed Graph Datasets"
Rui Wang and Kenneth Chiu

Short

BigD402 "Scaling Concurrency of Personalized Semantic Search over Large RDF Data"
HAIZHOU FU, Hyeongsik Kim, and KemaforAnyanwu

 

Session C2: High Performance/Parallel Computing  Platforms for Big Data

Regular

BigD279 "HFSP: Size-based Scheduling for Hadoop"
Mario Pastorelli, Antonio Barbuzzi, DamianoCarra, MatteoDell'Amico, and Pietro Michiardi

Regular

BigD314 "An Evaluation Study of BigData Frameworks for Graph Processing"
BenediktElser and Alberto Montresor

Short

BigD225 "Hardware acceleration of HadoopMapReduce"
ToshimoriHonjo and Kazuki Oikawa

Short

BigD339 "Algebraic Dataflows for Big Data Analysis"
Jonas Dias, Eduardo Ogasawara, Daniel de Oliveira, Fabio Porto, Patrick Valduriez, and Marta Mattoso

Short

BigD360 "Multilevel Active Storage for Big Data Applications in High Performance Computing"
Chao Chen and Yong Chen

 

Session D2: Complex Big Data Applications 

Regular

BigD341 "Explaining the Product Range Effect in Purchase Data"
Diego Pennacchioli, Michele Coscia, Salvatore Rinzivillo, Dino Pedreschi, and Fosca Giannotti

Regular

BigD372 "Parallel Deterministic Annealing Clustering and its Application to LC-MS Data Analysis"
Geoffrey Fox, D. R. Mani, and Saumyadipta Pyne

Regular

BigD378 "Terabyte-scale image similarity search: experience and best practice"
Diana Moise, Denis Shestakov, Gylfi Gudmundsson, and Laurent Amsaleg

Short

BigD266 "Real-time streaming mobility analytics"
Andras Garzo, Csaba Sidlo, Daniel Tahara, Erik Wyatt, and Andras Bencur

Short

BigD320 "QuPARA: Query-Driven Large-Scale Portfolio Aggregate Risk Analysis on MapReduce"
Andrew Rau-Chaplin, Blesson Varghese, Duane Wilson, Zhimin Yao, and Norbert Zeh

 

Session E2: Industry and Government Program

Regular

N218 " Accelerating semantic graph databases on commodity clusters "
Alessandro Morari, Vito Giovanni Castellana, Oreste Villa, David Haglin, John Feo, Jesse Weaver, and Antonino Tumeo

Regular

N219 " Practical Distributed Classification using the Alternating Direction Method of Multipliers Algorithm "
Peter Lubell-Doughtie and Jon Sondag

Regular

N225 " Scaling Deep Social Feeds at Pinterest "
Varun Sharma

Regular

N226 " Big Data Analytics on High Velocity Streams: A Case Study "
Thibaud Chardonnens, Philippe Cudre-Mauroux, Martin Grund, and Benoit Perroud

 

Session AB3: Big Data Search  Architectures, Scalability and Efficiency

Regular

BigD260 "A Parallel Computing Platform for Training Large Scale Neural Networks"
RongGu, FuraoShen, and Yihua Huang

Regular

BigD330 "An NML-based Model Selection Criterion for General Relational Data Modeling"
Yoshiki Sakai and Kenji Yamanishi

Regular

BigD411 "Scalable Context-Aware Role Mining with MapReduce"
Zhiwei Yu, Raymond Wong, and Chi-Hung Chi

Short

BigD297 "Sparse Poisson Coding for High Dimensional Document Clustering"
Chenxia Wu, Haiqin Yang, Jianke Zhu, Jiemi Zhang, Irwin King, and Michael R. Lyu

Short

BigD465 "Parallel Subgroup Discovery on Computing Clusters -- First Results"
Daniel Trabold and HenrikGrosskreutz

 

Session C3: High Performance/Parallel Computing  Platforms for Big Data

Regular

BigD331 "Storing and manipulating environmental big data with JASMIN"
Bryan Lawrence, Victoria Bennett, Jonathan Churchill, Martin Juckes, Philip Kershaw, Stephen Pascoe, Sam Pepler, Matt Pritchard, and Ag Stephens

Regular

BigD455 "Locality-driven High-level I/O Aggregation for Processing Scientific Datasets"
Jialin Liu, BradlyCrysler, and Yong Chen

Short

BigD363 "GPU Accelerated Item-Based Collaborative Filtering for Big-Data Applications"
Chandima Hewa Nadungodage, Yuni Xia, John Lee, Myungcheol Lee, and Choon Seo Park

Short

BigD427 "Kylin: An Efficient and Scalable Graph Data Processing System"
Li-Yung Ho, Tsung-Han Li, Jan-Jan Wu, and Pangfeng Liu

Short

BigD454 "A Reconfigurable Computing Architecture for Semantic Information Filtering"
Aalap Tripathy, Ka Chon Ieong, Atish Patra, and Rabi Mahapatra

 

Session D3: Complex Big Data Applications

Regular

BigD437 "Demand Response Targeting Using Big Data Analytics"
Jungsuk Kwac and Ram Rajagopal

Regular

BigD353 "Large-scale Predictive Analytics for Real Time Energy Management"
Natasha Balac, Tamara Sipes, Nicole Wolter, Kenneth Nunes, Robert Sinkovits, and Homa Karimabadi

Short

BigD405 "Constructing User Profiles from Social Media Data"
Mauricio Hernandez, Kirsten Hildrum, Prateek Jain, Chitra Venkatramani, RohitWagle, BogdanAlexe, and Ioana Roxana Stanoi

Short

BigD431 "CloudRS: An Error Correction Algorithm of High-Throughput Sequencing Data based on Scalable Framework"
Chien-Chih Chen, Yu-Jung Chang, Wei-Chun Chung, Der-Tsai Lee, and Jan-Ming Ho

Short

BigD444 "Building dynamic thermal profiles of energy consumption for individuals and neighborhoods"
Adrian Albert and Ram Rajagopal

 

Session AB4: Large-scale Recommendation Systems and Social Media Systems

Regular

BigD211 "Continuous Hyperparameter Optimization for Large-scale Recommender Systems"
Simon Chan, Philip Treleaven, and Licia Capra

Regular

BigD334 "Parallel Matrix Factorization for Binary Response"
Rajiv Khanna, Liang Zhang, Deepak Agarwal, and Bee-Chung Chen

Regular

BigD400 "CallCab: A Unified Recommendation System for Carpooling and Regular Taxicab Services"
Desheng Zhang and Tian He

Short

BigD361 "Scalable Distributed Event Detection for Twitter"
Richard McCreadie, Craig Macdonald, IadhOunis, Miles Osborne, and SasaPetrovic

Short

BigD233 "Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH"
Paul Bogen, Christopher Symons, Amber McKenzie, Robert Patton, and Rob Gillen

 

Session C4: Energy-efficient Computing for Big Data

Regular

BigD345 "Efficient Gear-shifting for a Power-proportional Distributed Data-placement Method"
HieuHanh Le, Satoshi Hikida, and Haruo Yokota

Regular

BigD413 "Building a Generic Platform for Big Sensor Data Application"
Chun-Hsiang Lee, David Birch, Chao Wu, Dilshan Silva, OrestisTsinalis, Yang Li, Shulin Yan, MoustafaGhanem, and YikeGuo

Regular

BigD354 "Agrios: A Hybrid Approach to Big Array Analytics"
Patrick Leyshock, David Maier, and Kristin Tufte

Short

BigD215 "clusiVAT: A Mixed Visual/Numerical Clustering Algorithm for Big Data"
Dheeraj Kumar, James Bezdek, SutharshanRajasegarar, MarimuthuPalaniswami, Christopher Leckie, and Timothy Havens

Short

BigD298 "Feliss: Flexible distributed computing framework with light-weight checkpointing"
Takuya Araki, Kazuyo Narita, and Hiroshi Tamano

 

Session D4; Data Preservation, Information Integration and Heterogeneous and Multi-structured Data Integration

Regular

BigD253 "CORE: Cross-Object Redundancy for Efficient Data Repair in Storage Systems"
Kyumars Sheykh Esmaili, Lluis Pamies Juarez, and Anwitaman Datta

Regular

BigD217 "Iteration Aware Prefetching For Unstructured Grids"
Oyindamola Akande and Philip Rhodes

Short

BigD278 "Scalable Data Citation in Dynamic, Large Databases: Model and Reference Implementation"
Stefan Pröll and Andreas Rauber

Short

BigD344 "Self-Adaptive Event Recognition for Intelligent Transport Management"
Alexander Artikis, Matthias Weidlich, Avigdor Gal, VanaKalogeraki, and DimitriosGunopoulos

Short

BigD375 "Robust Crowdsourced Learning"
Zhiquan Liu, Luo Luo, and Wu-Jun Li

 

Session AB5: Link and Graph Mining

Regular

BigD267 "Self-Tuned Kernel Spectral Clustering for Large Scale Networks"
Raghvendra Mall, Rocco Langone, and Johan Suykens

Regular

BigD403 "Top-K aggregation over a Large Graph Using Shared-Nothing Systems"
AbhirupChakraborty

Short

BigD241 "Incremental Algorithms for Network Management and Analysis based on Closeness Centrality"
AhmetErdemSariyuce, Kamer Kaya, Erik Saule, and Umit V. Catalyurek

Short

BigD247 "Classification of Big Velocity Data via Cross-Domain Canonical Correlation Analysis"
Bo Zhang and Zhongzhi Shi

Short

BigD212 "Elver: Recommending Facebook Pages in Cold Start Situation Without Content Features"
YushengXie, AlokChoudhary, Zhengzhang Chen, and AnkitAgrawal

 

Session C5: New Computational Models for Big Data

Regular

BigD399 "Map-Based Graph Analysis on MapReduce"
Upa Gupta and Leonidas Fegaras,

Regular

BigD216 "P-DOT: A Model of Computation for Big Data"
Tao Luo, Yin Liao, Yunquan Zhang, and Guoliang Chen

Short

BigD285 "Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor"
Mian Lu, Lei Zhang, Huynh Phung Huynh, ZhongliangOng, Yun Liang, Bingsheng He, Rick SiowMongGoh, and Richard Huynh

Short

BigD289 "Optimizing Throughput on Guaranteed-Bandwidth WAN Networks for the Large Synoptic Survey Telescope (LSST)"
Mike Freemon

Short

BigD390 "GPU-Accelerated Adaptive Compression Framework for Genomics Data"
GuixinGuo, Shuang Qiu, Mian Lu, BingQiang Wang, Lin Fang, and Simon See

 

Session D5: Spatiotemporal and Stream Data Management, Scientific Data Management

Regular

BigD423 "Spatio-temporal Indexing in Non-relational Distributed Databases"
Anthony Fox, Chris Eichelberger, James Hughes, and Skylar Lyon

Regular

BigD245 "Measuring Inter-Site Engagement"
Elad Yom-Tov, MouniaLalmas, Ricardo Baeza-Yates, Georges Dupret, Janette Lehmann, and Pinar Donmez

Regular

BigD312 "Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures"
Austin Benson, David Gleich, and James Demmel

Short

BigD243 "Scientific Discovery through Weighted Sampling"
Lefteris Sidirourgos, Martin Kersten, and Peter Boncz

Short

BigD294 "On the Use of Shared Storage in Shared-Nothing Environments"
Krishnaraj Ravindranathan, Aleksander Khasymski, Guanying Wang, Ali Butt, and Gaurav Makkar

 

Session AB6: Link and Graph Mining, Mobility and Big Data

Short

BigD335 "Efficient Large Graph Pattern Mining for Big Data in the Cloud"
Chun-Chieh Chen, Kuan-Wei Lee, Chih-Chieh Chang, De-Nian Yang, and Ming-Syan Chen

Short

BigD417 "A Hypergraph-Partitioned Vertex Programming Approach for Large-scale Consensus Optimization"
Hui Miao, Xiangyang Liu, Bert Huang, and LiseGetoor

Short

BigD366 "Analysis of GSM calls data for understanding user mobility behavior"
Chiara Renso, Barbara Furletti, Lorenzo Gabrielli, and Salvatore Rinzivillo

Short

BigD448 "A Higher-Order Data Flow Model for Heterogeneous Big Data"
Simon Price and Peter Flach

Short

BigD284 "DL-MPI: Enabling Data Locality Computation for MPI-based Data-Intensive Applications"
Jiangling Yin, Andrew Foran, and Jun Wang

Short

BigD308 "Fast OLAP Query Execution in Main Memory on Large Data in a Cluster"
Martin Weidner, Jonathan Dees, and Peter Sanders

 

Session C6: Novel Theoretical Models for Big Data

Regular

BigD358 "Communication Efficient Algorithms for Fundamental Big Data Problems”
Peter Sanders, Ingo Müller, and Sebastian Schlag

Regular

BigD244 "On-Line Learning Gossip Algorithm in Multi-Agent Systems with Local Decision Rules"
Stephan Clemencon, Pascal Bianchi, Gemma Morral, and JeremieJakubowicz

Short

BigD229 "Transparent Composite Model For Large Scale Image/Video Processing"
Enhui Yang and Xiang Yu

Short

BigD319 "Elastic Algorithms for Guaranteeing Quality Monotonicity in Big Data Mining"
Rui Han, Lei Nie, Moustafa M. Ghanem, and Yike Guo

 

Session D6: Scientific Data Management

Regular

BigD338 "Adaptive File Management for Scientific Workflows on the Azure Cloud"
Radu Tudoran, Alexandru Costan, Ramin Rad Rezai, Goetz Brasche, and Gabriel Antoniu

Regular

BigD407 "Model-View Sensor Data Management in the Cloud"
TianGuo, Thanasis G. Papaioannou, and Karl Aberer

Short

BigD373 "Using Pattern-Models to Guide SSD Deployment for Big Data in HPC systems"
Junjie Chen, Yong Chen, and Philip C. Roth

Short

BigD365 "Improving Floating Point Compression through Binary Masks"
Leonardo Bautista Gomez and Franck Cappello

Short

BigD445 "Segmented Analysis for Reducing Data Movement"
Jialin Liu, SurendraByna, and Yong Chen

 

Session AB7: Privacy Preserving Big Data Collection/Analytics, Threat Detection using Big Data Analytics

Regular

BigD269 "DP-WHERE: Differentially Private Modeling of Human Mobility"
Darakhshan Mir, SibrenIsaacman, Ramón Cáceres, Margaret Martonosi, and Rebecca Wright

Regular

BigD305 "Malicious URLs Filtering - A Big Data Application"
Min-Sheng Lin, Chien-Yi Chiu, Yuh-Jye Lee, and Hsing-KuoPao

Regular

BigD328 "Zero-Knowledge Private Graph Summarization"
Maryam Shoaran, Alex Thomo, and Jens Weber

Short

BigD230 "Scalable Network Traffic Visualization Using Compressed Graphs"
Lei Shi, Qi Liao, and Xiaohua Sun

Short

BigD391 "Breaking the Arc: RIsk Control for Big Data"
Duncan Hodges and Sadie Creese

 

Session D7: Database Management Challenges: Architecture, Storage, User Interfaces

Regular

BigD249 "A Selective Checkpointing Mechanism for Query Plans in a Parallel Database System"
Ting Chen and Kenjiro Taura

Regular

BigD270 "H2RDF+: High-performance Distributed Joins over Large-scale RDF Graphs"
Nikolaos Papailiou, Ioannis Konstantinou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris

Short

BigD447 "Knowledge Cubes - A Proposal for Scalable and Semantically-Guided Management of Big Data"
Amgad Madkour, Walid Aref, and Saleh Basalamah

 

 

 

Workshops

 

Workshop 1: Distributed Storage Systems and Coding for Big Data

Paper List

1

S1209 "The Code Rebalancing Problem for a Storage-Flexible Data Center Network "
Iryna Andriyanova, Alan Jule and Emina Soljanin

2

S1211 "suvfs: A virtual file system in userspace that supports large files"
Wasim Ahmad Bhat and S.M.K. Quadri

3

S1213 "Reliability of Erasure Coded Storage Systems: A Geometric Approach"
Antonio Campello and Vinay Vaishampayan

4

S1210 "Distributed Storage Evaluation on a Three-Wide Inter-Data Center Deployment"
Yih-Farn Chen, Scott Daniels, Marios Hadjieleftheriou, Pingkai Liu, Chao Tian and Vinay Vaishampayan

5

S1201 "Paired-Replicas with Constant Repair Time: Loss Functions and Memorylessness"
Vinay Deolalikar

6

S1202 "Efficient Updates in Cross-Object Erasure-Coded Storage Systems"
Kyumars Sheykh Esmaili, Aatish Chiniah and Anwitaman Datta

7

S1208 "Construction of Exact-BASIC Codes for Distributed Storage Systems at the MSR Point"
Hanxu Hou, Kenneth W. Shum and Hui Li

8

S1205 "Minimum Storage BASIC Codes: A System Perspective Xianxia Huang"
Hui Li, Tai Zhou, Yumeng Zhang, Han Guo, Hanxu Hou, Huayu Zhang, Kai Pan and Kai Lei

9

S1207 "Layout-Aware I/O Scheduling for Terabits Data Movement"
Youngjae Kim, Scott Atchley, Geoffroy R. Vallee and Galen M. Shipman

Schedule

Date

7th, October,2013

Location

Ballroom E

Time

Schedule

9:00-9:15

Plenary

9:15-10:00

Invited Talk

10:00-11:00

S1208 "Construction of Exact-BASIC Codes for Distributed Storage Systems at the MSR Point"
S1213 "Reliability of Erasure Coded Storage Systems: A Geometric Approach"
S1209 "The Code Rebalancing Problem for a Storage-Flexible Data Center Network "

11:00-11:20

Coffee Time

11:20-12:00

S1202 "Efficient Updates in Cross-Object Erasure-Coded Storage Systems"
S1211 "suvfs: A virtual file system in userspace that supports large files"

12:00-14:00

Lunch at your own

14:00-14:30

Invited Talk: S1205 "Minimum Storage BASIC Codes: A System Perspective Xianxia Huang"

14:30-15:30

S1210 "Distributed Storage Evaluation on a Three-Wide Inter-Data Center Deployment"
S1201 "Paired-Replicas with Constant Repair Time: Loss Functions and Memorylessness"
S1207 "Layout-Aware I/O Scheduling for Terabits Data Movement"

 

Workshop 2: Big Data and the Humanities

Paper List

1

S2203 "Robustness of emotion extraction from 20th century English books Alberto Acerbi"
Vasileios Lampos and Alexander Bentley

2

S2228 "VisualPage: Towards Large Scale Analysis of Nineteenth-Century Print Culture"
Neal Audenaert and Natalie Houston

3

S2210 "Back to our Data – Experiments with NoSQL Technologies in the Humanities"
Tobias Blanke, Michael Bryant and Mark Hedges

4

S2234 "The Human Face of Crowdsourcing: A Citizen-led Crowdsourcing Case Study"
Sheryl Grant, Kristan Shawgo, Richard Marciano, Jeff Heard and Priscilla Ndiaye

5

S2219 "Visualization and Rhetoric: Key Concerns for Utilizing Big Data in Humanities"
Kathleen Kerr, Bernice Hausman, Samah Gad and Waqas Javed

6

S2224 "Humanities 'Big Data': Myths, challenges, and lessons"
Amalia S. Levi

7

S2229 "Digging into Human Rights Violations: Data Modeling Collective Memory"
Ben Miller, Ayush Shrestha, Jason Derby, Jennifer Olive, Fuxin Li, Yanjun Zhao and Karthikeyan Umapathy

8

S2231 "The Royal Birth of 2013: Analysing and Visualising Public Sentiment in the UK Using Twitter"
Vu Dung Nguyen, Blesson Varghese and Adam Barker

9

S2221 "Bibliographic Records as Humanities Big Data"
Andrew Prescott

10

S2209 "Customising Geoparsing and Georeferencing for Historical Texts"
C.J. Rupp, Paul Rayson, Alistair Baron, Christopher Donaldson, Ian Gregory, Andrew Hardie and Patricia Murrieta-Flores

11

S2208 "A Concept of Generic Workspace for Big Data Processing in Humanities"
Jedrzej Rybicki, Benedikt von St. Vieth and Daniel Mallmann

12

S2220 "From Assets to Stories via the Google Cultural Institute Platform"
William Seales, Steve Crossan, Mark Yoshitake and Sertan Girgin

13

S2223 "The Curious Identity of Michael Field and its Implications for Humanities Research with the Semantic Web"
Susan Brown and John Simpson

14

S2222 "Infectious Texts: Modeling Text Reuse in Nineteenth-Century Newspapers"
David Smith, Ryan Cordell and Elizabeth Maddock Dillon

15

S2204 "Mapping Mutable Genres in Structurally Complex Volumes"
Ted Underwood, Michael Black, Loretta Auvil and Boris Capitanu

16

S2214 "CKM: A Shared Visual Analytical Tool for Large-Scale Analysis of Audio-Video Interviews"
Lu Xiao, Yan Luo and Steven High

17

S2218 "A Case Study on Entity Resolution for Distant Processing of Big Humanities Data"
Weijia Xu, Maria Esteva, Jessica Trlogan and Todd Swinson

Schedule

Date

8th, October,2013

Location

Ballroom E

Time

Schedule

9:30-9:50

Coffee Time

9:50-12:00

S2222 "Infectious Texts: Modeling Text Reuse in Nineteenth-Century Newspapers"
S2204 "Mapping Mutable Genres in Structurally Complex Volumes"
S2231 "The Royal Birth of 2013: Analysing and Visualising Public Sentiment in the UK Using Twitter"
S2228 "VisualPage: Towards Large Scale Analysis of Nineteenth-Century Print Culture"
S2218 "A Case Study on Entity Resolution for Distant Processing of Big Humanities Data"
S2208 "A Concept of Generic Workspace for Big Data Processing in Humanities"

12:00-13:20

Lunch (not provided)

13:20-15:20

S2229 "Digging into Human Rights Violations: Data Modeling Collective Memory"
S2214 "CKM: A Shared Visual Analytical Tool for Large-Scale Analysis of Audio-Video Interviews"
S2219 "Visualization and Rhetoric: Key Concerns for Utilizing Big Data in Humanities"
S2221 "Bibliographic Records as Humanities Big Data"
S2210 "Back to our Data – Experiments with NoSQL Technologies in the Humanities"
S2209 "Customising Geoparsing and Georeferencing for Historical Texts"
S2234 "The Human Face of Crowdsourcing: A Citizen-led Crowdsourcing Case Study"
S2224 "Humanities 'Big Data': Myths, challenges, and lessons"
S2203 "Robustness of emotion extraction from 20th century English books Alberto Acerbi"

15:20-15:40

Coffee Time

15:40-17:40

S2223 "The Curious Identity of Michael Field and its Implications for Humanities Research with the Semantic Web"
S2220 "From Assets to Stories via the Google Cultural Institute Platform"

 

Workshop 3: Workshop on Big Data and Society
       -- Data Economy, Real-Time Mining and Analytics, Mining Techniques for Online and Customer Service in Big data Era

Paper List

1

S6207 "Enterprise Pre-Sales Forums: A Preliminary Study of Metadata and Content"
Vinay Deolalikar

2

S6212 "Advancing value creation and value capture in data-intensive contexts"
Roman Ferrando-Llopis, David Lopez-Berzosa and Catherine Mulligan

3

S6203 "A Cloud Service for the Evaluation of Company's Financial Health Using XBRL-based Financial Statements"
Wen-Chiao Hsu, Jyun-Yao Huang, Chi-Hao Chen, Chien-Yu Su, Hsiao-Chen Shih, Tzu-Ya Liao and I-En Liao

4

S6209 "Real-Time Data Analysis in ClowdFlows"
Janez Kranjc, Vid Podpečan and Nada Lavrač

5

S6210 "ma3tch - privacy and knowledge - dynamic networked collective intelligence"
Udo Kroon

6

S6202 "Business Model Canvas Perspective on Big Data Applications"
Fatma Canan Pembe Muhtaroglu, Seniz Demir, Murat Obali and Canan Girgin

7

S6215 "Understanding the value of (Big) data"
Koutroumpis Pantelis and Leiponen Aija

8

S6214 "OpenFridge: A Platform for Data Economy for Energy Efficiency Data"
Slobodanka Dana Kathrin Tomic and Anna Fensel

9

S6201 "A Study of Innovation Network Database Construction by Using Big Data and An Enterprise Strategy Model"
Zhou Wen, Ye Shu Tao and Lu Xiao Long

10

S6213 "Enhanced User Data Privacy with Pay-by-Data Model"
Chao Wu and Yike Guo

11

S6206 "Query Optimization over a Heterogeneously Distributed Scientific Database"
Helen Xiang

12

S6204 "Enterprise Data Economy: A Hadoop-Driven Model and Strategy"
Wuheng Luo

Schedule

Date

7th, October,2013

Location

Ballroom F

Time

Schedule

8:00-8:40

Registration (Hotel Lobby West)

8:40-9:35

Invited talk: What's around the corner in social commerce?(Jaiddep Srivastava)

9:35-10:00

S6212 "Advancing value creation and value capture in data-intensive contexts"

10:00-10:25

Coffee Time

10:25-12:30

S6202 "Business Model Canvas Perspective on Big Data Applications"
S6203 "A Cloud Service for the Evaluation of Company's Financial Health Using XBRL-based Financial Statements"
S6207 "Enterprise Pre-Sales Forums: A Preliminary Study of Metadata and Content"
S6204 "Enterprise Data Economy: A Hadoop-Driven Model and Strategy"
S6213 "Enhanced User Data Privacy with Pay-by-Data Model"

12:30-13:30

Lunch at your own

13:30-14:25

Invited talk: Large Scale Mining and Modeling of Telecommunication Carrier's Big Data (Wei Fan)

14:25-16:05

S6206 "Query Optimization over a Heterogeneously Distributed Scientific Database"
S6210 "ma3tch - privacy and knowledge - dynamic networked collective intelligence"
S6201 "A Study of Innovation Network Database Construction by Using Big Data and An Enterprise Strategy Model"
S6209 "Real-Time Data Analysis in ClowdFlows"

16:05-16:30

Coffee Time

16:30-17:25

S6215 "Understanding the value of (Big) data"
S6214 "OpenFridge: A Platform for Data Economy for Energy Efficiency Data"

 

Workshop 4: The First Workshop on Benchmarks, Performance Optimization, and Emerging hardware of Big Data Systems and Applications(BPOE 2013)

Paper List

1

S7210 "Optimizing a MapReduce Module of Preprocessing High-Throughput DNA Sequencing Data"
Wei-Chun Chung, Yu-Jung Chang, Chien-Chih Chen, Der-Tsai Lee and Jan-Ming Ho

2

BigD370 "Hash in a Flash: Hash Tables for Flash Devices"
Tyler Clemons, S M Faisal, Shirish Tatikonda, Charu Aggarwal and Srinivasan Parthasarathy

3

S7202 "Memory system characterization of Big Data workloads"
Martin Dimitrov, Karthik Kumar, Patrick Lu and Vish Viswanathan

4

S7211 "Performance Evaluation of R with Intel Xeon Phi Coprocessor"
Yaakoub El-Khamra, Niall Gaffney, David Walling, Eric Wernert, Weijia Xu and Hui Zhang

5

S7216 "The Implications from Benchmarking Three Big Data Systems"
Quan Jing, Shi Yingjie, Zhao Ming and Wei Yang

6

S7205 "A Performance Evaluation of Hive for Scientific Data Management"
Taoying Liu, Jing Liu, Hong Liu and Wei Li

7

S7214 "Evaluating Task Scheduling in Hadoop-based Cloud Systems"
Shengyuan Liu, Jungang Xu, Zongzhen Liu and Xu Liu

8

BigD397 "Efficient Near-Duplicate Document Detection using FPGAs"
Xi Luo, Walid Najjar and Vagelis Hristidis

9

BigD389 "Workload-Aware Aggregate Maintenance in Columnar In-Memory Databases"
Stephan Müller, Lars Butzmann, Stefan Klauck and Hasso Plattner

10

S7206 "Virtualization I/O Optimization Based on Shared Memory"
Fengfeng Ning, Chuliang Weng and Yuan Luo

11

S7209 "An Ensemble MIC-based Approach for Performance Diagnosis in Big Data Platform"
Chen Pengfei, Qi Yong, Li Xinyi and Li Su

12

S7207 "A Reconfigurable Stream Compression Hardware based on Static Symbol-Lookup Table"
Shinichi Yamagiwa and Hiroshi Sakamoto

13

S7201 "NativeTask: A Hadoop Compatible Framework for High Performance"
Dong Yang, Xiang Zhong, Dong Yan, Fangqin Dai, Xusen Yin, Cheng Lian, Zhongliang Zhu, Weihua Jiang and Gansha Wu

14

S7212 "On Mixing High-Speed Updates and In-Memory Queries: A Big-Data Architecture for Real-time Analytics"
Tao Zhong, Kshitij Doshi, Xi Tang, Ting Lou, Zhongyan Lu and Hong Li

15

S7215 "AxPUE: Application Level Metrics for Power Usage Effectiveness in Data Centers"
Runlin Zhou, Yingjie Shi and Chunge Zhu

16

S7217 "A Characterization of Big Data Benchmarks"
Wen Xiong and Zhibin Yu

Schedule

Date

8th, October,2013

Location

Ballroom F

Time

Schedule

9:00-12:00

Opening remarks: Jianfeng Zhan and Weijia Xu

Session one: Performance optimization of big data systems (Session Chair: Xiaoyi Lu, OSU)
BigD389 "Workload-Aware Aggregate Maintenance in Columnar In-Memory Databases"
S7201 "NativeTask: A Hadoop Compatible Framework for High Performance"
S7210 "Optimizing a MapReduce Module of Preprocessing High-Throughput DNA Sequencing Data"
S7212 "On Mixing High-Speed Updates and In-Memory Queries: A Big-Data Architecture for Real-time Analytics"
S7206 "Virtualization I/O Optimization Based on Shared Memory"
S7205 "A Performance Evaluation of Hive for Scientific Data Management"

12:00-13:20

Lunch

13:20-15:20

Session two: Big Data Benchmarks and Workload characterization (Session Chair: Jianfeng Zhan, ICT, CAS)
Invited Talk: TBD
S7202 "Memory system characterization of Big Data workloads"
S7211 "Performance Evaluation of R with Intel Xeon Phi Coprocessor"
S7214 "Evaluating Task Scheduling in Hadoop-based Cloud Systems"
S7217 "A Characterization of Big Data Benchmarks"
S7209 "An Ensemble MIC-based Approach for Performance Diagnosis in Big Data Platform"

15:20-15:40

Break

15:40-17:40

Session two: Big Data Benchmarks and Workload characterization (Session Chair: Jianfeng Zhan, ICT, CAS)
Invited Talk: TBD
S7207 "A Reconfigurable Stream Compression Hardware based on Static Symbol-Lookup Table"
BigD397 "Efficient Near-Duplicate Document Detection using FPGAs"
BigD370 "Hash in a Flash: Hash Tables for Flash Devices"
S7216 "The Implications from Benchmarking Three Big Data Systems"
S7215 "AxPUE: Application Level Metrics for Power Usage Effectiveness in Data Centers"

Closing remark (Weijia Xu and Jianfeng Zhan)

 

Workshop 5: The First Workshop on Big Data Visualization

Paper List

1

S9209 "Dynamic Reduction of Query Result Sets for Interactive Visualization"
Leilani Battle, Michael Stonebraker and Remco Chang

2

S9211 "Overplotting: Unified solutions under Abstract Rendering Missing"
Joseph Cottam, Andrew Lumsdaine and Peter Wang

3

S9205 "Typograph: Multiscale Spatial Exploration of Text Documents"
Alexander Endert, Russ Burtner, Nick Cramer, Ralph Perko, Shawn Hampton and Kristin Cook

4

S9204 "VisReduce: Fast and responsive incremental information visualization of large datasets"
Jean-Francois Im, Felix Giguere Villegas and Michael J. McGuffin

5

S9208 "A System for Large-Scale Visualization of Streaming Doppler Data"
Peter Kristof, Bedrich Benes, Carol X. Song and Lan Zhao

6

S9210 "Visualization of Streaming Data: Observing Change and Context in Information Visualization Techniques"
Milos Krstajic and Daniel A. Keim

7

S9202 "CompactMap: A Mental Map Preserving Visual Interface for Streaming Text Data"
Xiaotong Liu, Yifan Hu, Stephen North and Han-Wei Shen

8

S9207 "Egocentric Storylines for Visual Analysis of Large Dynamic Graphs"
Chris W. Muelder, Tarik Crnovrsanin, Arnaud Sallaberry and Kwan-Liu Ma

9

S9206 "GPU-Accelerated Incremental Correlation Clustering of Large Data in the Cloud with Visual Feedback"
Eric Papenhausen, Bing Wang, Sungsoo Ha, Alla Zelenyuk, Dan Imre and Klaus Mueller

10

S9201 "Visualization of Big SPH Simulations via Compressed Octree Grids"
Florian Reichl, Marc Treib and Rüdiger Westermann

11

S9203 "A Novel Visual Analysis Approach for Clustering Large-Scale Social Data"
Zhangye Wang, Juanxia Zhou, Wei Chen, Chang Chen, Jiyuan Liao and Ross Maciejewski

12

S9212 "DriveSense: Contextual Handling of Large-scale Route Map Data for the Automobile"
Frederik Wiehr, Vidya Setlur and Alark Joshi

Schedule

Date

6th, October,2013

Location

Ballroom AB

Time

Schedule

8:00-8:40

Opening

8:40-9:30

Keynote: "Big Picture" Mixed-Initiative Visual Analytics of Big Data, Michelle Zhou, IBM Research

9:30-10:00

Invited talk: Data Intensive Visualization and Analysis of Numerically Intensive Applications Chris Mitchell, Los Alamos National Laboratory

10:00-10:30

Coffee Time

10:30-12:00

Text Data
SS9210 "Visualization of Streaming Data: Observing Change and Context in Information Visualization Techniques"
S9202 "CompactMap: A Mental Map Preserving Visual Interface for Streaming Text Data"
S9205 "Typograph: Multiscale Spatial Exploration of Text Documents"

12:00-13:30

Lunch

13:30-14:30

Rendering
S9211"Overplotting: Unified solutions under Abstract Rendering Missing"
S9212 "DriveSense: Contextual Handling of Large-scale Route Map Data for the Automobile"

14:30-15:30

Visual Analysis
S9203 "A Novel Visual Analysis Approach for Clustering Large-Scale Social Data"
S9207 "Egocentric Storylines for Visual Analysis of Large Dynamic Graphs"

15:30-16:00

Coffee Time

16:00-17:30

Scientific Data
S9201 "Visualization of Big SPH Simulations via Compressed Octree Grids"
S9208 "A System for Large-Scale Visualization of Streaming Doppler Data"
S9209 "Dynamic Reduction of Query Result Sets for Interactive Visualization"

17:30-18:30

Fast, Incremental Visualization
S9206 "GPU-Accelerated Incremental Correlation Clustering of Large Data in the Cloud with Visual Feedback"
S9204 "VisReduce: Fast and responsive incremental information visualization of large datasets"

 

Workshop 6: Big Data and Science: Infrastructure and Services

Paper List

1

SC210 "A big data analytics framework for scientific data management"
Sandro Fiore, Cosimo Palazzo, Alessandro D'Anca, Ian Foster, Dean Williams and Giovanni Aloisio

2

BigD337 "Searching Inter-disciplinary Scientific Big Data based on Latent Correlation Analysis"
Eloy Gonzales, Bun Theang Ong and Koji Zettsu

3

SC209 "Complete Storm Identification Algorithms from Big Raw Rainfall Data Using MapReduce Framework"
Kulsawasd Jitkajornwanich, Upa Gupta, Sakthi Kumaran Shanmuganathan, Ramez Elmasri, Leonidas Fegaras and John McEnery

4

BigD409 "A Scalable Data Analysis Platform for Metagenomics"
Wei Tang, Jared Wilkening, Narayan Desai, Wolfgang Gerlach, Andreas Wilke and Folker Meyer

5

BigD426 "Rethinking Data Management for Big Data Scientific Workflows"
Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani and Ewa Deelman

6

BigD377 "SciFlow: A Dataflow-Driven Model Architecture for Scientific Computing using Hadoop"
Pengfei Xuan, Yueli Zheng, Sapna Sarupria and Amy Apon

7

SC204 "perSONAR: On-board Diagnostics for Big Data"
Jason Zurawski, Sowmya Balasubramanian, Aaron Brown, Ezra Kissel, Andrew Lake, Martin Swany, Brian Tierney and Matt Zekauskas

Schedule

Date

6th, October,2013

Location

Ballroom C

Time

Schedule

9:00

Introduction

9:10-10:00

Keynote: Noel Gorelick (Google) - Google Earth Engine

10:00-10:20

Break

10:20-12:00

SC204 "perSONAR: On-board Diagnostics for Big Data"
BigD426 "Rethinking Data Management for Big Data Scientific Workflows"
BigD377 "SciFlow: A Dataflow-Driven Model Architecture for Scientific Computing using Hadoop"
SC210 "A big data analytics framework for scientific data management"

12:00-13:30

Lunch

13:30-14:15

SC209 "Complete Storm Identification Algorithms from Big Raw Rainfall Data Using MapReduce Framework"
BigD337 "Searching Inter-disciplinary Scientific Big Data based on Latent Correlation Analysis"
BigD409 "A Scalable Data Analysis Platform for Metagenomics"

14:15-14:45

Lightning Talks(1)
Tom Plunket - Analyzing Cancer-Genome Relationships
Eugen Feller - TBD

14:45-15:00

Break

15:00-15:45

Invited Speaker:
Dula Parkinson (Lawrence Berkeley National Laboratory)
"Web interfaces and High-Performance Computing: Solutions to Data Management, Processing, and Analysis Challenges at the Advanced Light Source X-ray Facility"

15:45-16:00

Break

16:00-17:00

Lightning Talks(2)
Yong Chen - Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets
Chaitanya Baru - Lessons Learned from Gordon
Discussion
Close Out

 

Workshop 7: Scalable Machine Learning: Theory and Applications

Paper List

1

BigD351 "Assessment of Dimensionality Reduction Based on Communication Channel Model; Application to Immersive Information Visualization"
Mohammadreza Babaee, Mihai Datcu and Gerhard Rigoll

2

BigD227 "Hierarchical Feature Learning from Sensorial Data by Spherical Clustering"
Bonny Banerjee and Jayanta Dutta

3

BigD436 "Efficient Learning from Explanation of Prediction Errors in Streaming Data"
Bonny Banerjee and Jayanta Dutta

4

SD211 "Distributed Pivot Clustering with Arbitrary Distance Functions"
L. Karl Branting

5

SD202 "Nearest Neighbor Classification Using Bottom-k Sketches"
Søren Dahlgaard and Christian Igel

6

SD206 "Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets"
Ciro Donalek, Arun Kumar, Ashish Mahabal, S. George Djorgovski, Andrew Drake, Matthew, Graham, Sajeet Philip, Thomas Fuchs and Michael

7

SD217 "How Data Partitioning Strategies and Subset Size Influence the Performance of an Ensemble?"
Majed Farrash and Wenjia Wang

8

SD207 "Fast Change Point Detection for Electricity Market Analysis"
William Gu, Jaesik Choi, Ming Gu, Horst Simon and Kesheng Wu

9

SD209 "A Novel Integrated Method for Human Multiplex Protein Subcellular Localization Prediction"
Hong Gu and Junzhe Cao

10

paper title
author list

10

BigD435 "Learning from Multiple Data Sets with Different Missing Attributes and Privacy Policies: Parallel Distributed Fuzzy Genetics-Based Machine Learning Approach"
Hisao Ishibuchi, Masakazu Yamane and Yusuke Nojima

11

SD214 "Data Chaos: An Entropy based MapReduce Framework for Scalable Learning"
Jiaoyan Chen, Huajun Chen, Xi Chen, Guozhou Zheng and Zhaohui Wu

12

SD218 "Exploring Sketches for Probability Estimation with Sublinear Memory"
Anthony Kleerekoper, Mikel Lujan and Gavin Brown

13

SD216 "Agglomerative Co-Clustering for Synonymous Phrases Based on Common Effects and Influences"
Koji Kumanami, Kazuhiro Seki and Kuniaki Uehara

14

SD201 "Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC"
Zhiyuan Lin, Duen Horng Chau and U Kang

15

SD220 "Scalable Sentiment Classification for Big Data Analysis Using Na¨ıve Bayes Classifier"
Bingwei Liu, Erik Blasch, Yu Chen, Dan Shen and Genshe Chen

16

SD221 "Meta-learning for Large Scale Machine Learning with MapReduce"
Xuan Liu, Xiaoguang Wang, Stan Matwin and Nathalie Japkowicz

17

SD223 "Frequent Itemset Mining for Big Data"
Sandy Moens, Emin Aksehirli and Bart Goethals

18

BigD317 "Evaluating Parallel Logistic Regression Models"
Haoruo Peng, Ding Liang and Cyrus Choi

19

SD203 "Approximate triangle counting algorithms on Multi-cores"
Mahmudur Rahman and Mohammad Al Hasan

20

SD212 "Tree Labeled LDA: A Hierarchical Model for Web Summaries"
Anton Slutsky, Xiaohua Hu and Yuan An

21

SD205 "Nearest Neighbour Regression Outperforms Model-based Prediction of Specific Star Formation Rate"
Kristoffer Stensbo-Smidt, Christian Igel, Andrew Zirm and Kim Steenstrup Pedersen

22

BigD304 "MapReduce Implementation of Variational Bayesian Probabilistic Matrix Factorization Algorithm"
Naveen Tewari, Hari Koduvely, Sarbendu Guha, Arun Yadav and Gladbin David

23

BigD394 "A Unified Framework for Predicting Attributes and Links in Social Networks"
Xusen Yin, Bin Wu and Xiuqin Lin

24

SD219 "Scalable Approximation of Kernel Fuzzy c-Means"
Zijian Zhang and Timothy Havens

25

SD204 "Large-scale Restricted Boltzmann Machines on Single GPU"
Yun Zhu, Yanqing Zhang and Yi Pan

Schedule

Date

6th, October,2013

Location

Ballroom D

Time

Schedule

8:30-9:20

Opening Remarks: Prof. Irwin King

9:20-9:35

Invited talk: Alex Smola, Carnegie Mellon University

9:20-9:50

SD204 "Large-scale Restricted Boltzmann Machines on Single GPU"
BigD304 "MapReduce Implementation of Variational Bayesian Probabilistic Matrix Factorization Algorithm"

9:50-10:30

Break and Poster Session

10:30-11:15

Invited talk: Joseph Gonzalez, University of California, Berkeley

11:15-12:00

SD217 "How Data Partitioning Strategies and Subset Size Influence the Performance of an Ensemble?"
SD201 "Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC" SD203 "Approximate triangle counting algorithms on Multi-cores"

12:00-13:30

Lunch

13:30-14:00

Poster Session

14:00-14:45

Invited talk: Mikhail Bilenko, Microsoft Research

14:45-15:30

SD221 "Meta-learning for Large Scale Machine Learning with MapReduce"
BigD436 "Efficient Learning from Explanation of Prediction Errors in Streaming Data"
SD202 "Nearest Neighbor Classification Using Bottom-k Sketches"

15:30-16:00

Break and Poster Session

16:00-16:45

Invited talk: Alek Kolcz, Twitter Inc.

16:45-17:00

ISD212: Tree Labeled LDA: A Hierarchical Model for Web Summaries

17:00-18:00

Panel Discussion

 

Workshop 8: Big Data in Bioinformatics and Health Informatics

Paper List

1

SE207 "Lung Transplant Outcome Prediction using UNOS Data"
Ankit Agrawal, Reda Al-Bahrani, Mark Russo, Jaishankar Raman and Alok Choudhary

2

SE206 "Colon cancer survival prediction using ensemble data mining on SEER data"
Reda Al-Bahrani, Ankit Agrawal and Alok Choudhary

3

SE204 "A Look at Challenges and Opportunities of Big Data Analytics in Healthcare"
Ruchie Bhardwaj, Adhiraaj Sethi, Rajesh Vargheese and Raghunath Nambiar

4

SE201 "Multidimensional Analysis of Fetal Growth Curves"
Mario Bochicchio, Lucia Vaira, Antonella Longo, Antonio Malvasi and Andrea Tinelli

5

BigD296 "OWL Reasoning over Big Biomedical Data"
Xi Chen, Huajun Chen, Ningyu Zhang, Jiaoyan Chen and Zhaohui Wu

6

SE205 "KUChemBio: A database of computational chemical biology data sets hosted at the University of Kansas"
Aaron Smalter Hall and Jun Huan

7

SE202 "Parallel and Memory-efficient Burrows-Wheeler Transform"
Shinya Hayashi and Kenjiro Taura

8

SE209 "Content-based Assessment of the Credibility of Online Healthcare Information"
Meeyoung Park, Hariprasad Sampathkumar, Bo Luo and Xue-wen Chen

9

BigD382 "BIG DATA Infrastructures for Pharmaceutical Research"
Christian Seebode, Matthias Ort, Martin Peuker and Christian Regenbrecht

10

SE208 "Big Data Solutions for Predicting Risk-of-Readmission for Congestive Heart Failure Patients"
Kiyana Zolfaghar, Naren Meadem, Ankur Teredesai, Senjuti Basu Roy, Si-Chi Chin and Brian Muckian

Schedule

Date

6th, October,2013

Location

Ballroom E

Time

Schedule

8:30-8:40

Introduction

8:40-9:40

Keynote talk: Dr. Belinda Seto, Deputy Director, NIBIB, NIH

9:40-10:00

SE201 "Multidimensional Analysis of Fetal Growth Curves"

10:00-10:20

Coffee Time

10:20-11:05

Invited talk: Dr. Ida Sim, UC San Franscisco

11:05-11:25

SE208 "Big Data Solutions for Predicting Risk-of-Readmission for Congestive Heart Failure Patients"

11:25-12:10

Health Informatics Panel

12:10-13:30

Lunch Break

13:30-14:15

Invited talk: Dr. Mark Musen, Stanford University
"Big Data for Big Data: Metadata to Manage Access and Analysis of Large Biomedical Datasets"

14:15-15:40

SE206 "Colon cancer survival prediction using ensemble data mining on SEER
BigD296 "OWL Reasoning over Big Biomedical Data"
SE207 "Lung Transplant Outcome Prediction using UNOS Data"
SE209 "Content-based Assessment of the Credibility of Online Healthcare Information"

15:40-16:00

Coffee Time

16:00-17:10

SE202 "Parallel and Memory-efficient Burrows-Wheeler Transform"
SE205 "KUChemBio: A database of computational chemical biology data sets hosted at the University of Kansas"
SE204 "A Look at Challenges and Opportunities of Big Data Analytics in Healthcare"
BigD382 "BIG DATA Infrastructures for Pharmaceutical Research"
An Industrial presentation from Dr.Shipeng Yu, Siemens

17:10-17:55

Bioinformatics Panel

 

Workshop 9: Scholarly Big Data: Challenges & issues

Paper List

1

PID2931527 "The Microsoft Academic Search Challenges at KDD Cup 2013"
Martine De Cock, Senjuti Basu Roy, Swapna Savvana, Vani Mandava, Brian Dalessandro, Claudia Perlich, William Cukierski and Ben Hamner

2

sbd "Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems"
Philipp Mayr and Peter Mutschke

3

PID2929325 "Academic Publishing as a Social Media Paradigm"
Michael E. Payne, Linh B. Ngo and Amy W. Apon

4

SF201_9005 "Big Spatial Data Mining"
Wang Shuliang, Ding Gangyi and Zhong Ming

Schedule

Date

6th, October,2013

Location

Ballroom G

Time

Schedule

8:00-8:30

Registration

8:30-8:45

Welcome

8:45-10:00

PID2931527 "The Microsoft Academic Search Challenges at KDD Cup 2013"

10:45-11:10

sbd "Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems"

11:10-11:35

SF201_9005 "Big Spatial Data Mining"

11:35-12:00

PID2929325 "Academic Publishing as a Social Media Paradigm"

12:00-13:30

Lunch

13:30-15:40

Discussion and breakout sessions

15:40-16:00

Coffee Time

16:00-16:20

Closing

 

Workshop 10: Scalable Cloud Data Management

Paper List

1

SG206 "Modeling and Querying Data in NoSQL Databases"
Karamjit Kaur and Rincle Rani

2

SG201 "Elastic Data Partitioning for Cloud-based SQL Processing Systems"
Lipyeow Lim

3

SG203 "Parallel SECONDO: Practical and Efficient Mobility Data Processing in the Cloud"
Jiamin Lu and Ralf Hartmut Gutting

4

SG204 "Index-Based Join Operations in Hive"
Mahsa Mofidpoor, Nematollaah Shiri and T. Radhakrishnan

5

SG205 "SLA data management criteria"
Katerina Stamou, Verena Kantere and Jean-Henry Morin

Schedule

Date

6th, October,2013

Location

Ballroom H

Time

Schedule

13:30-14:30

Invited talk: Keynote by Peter Bailis

14:30-14:40

Break

14:40-15:40

Session I
SG206 "Modeling and Querying Data in NoSQL Databases"
SG201 "Elastic Data Partitioning for Cloud-based SQL Processing Systems"

15:40-16:00

Break

16:00-18:00

Session II
SG203 "Parallel SECONDO: Practical and Efficient Mobility Data Processing in the Cloud"
SG204 "Index-Based Join Operations in Hive"
SG205 "SLA data management criteria"

 

Workshop 11:  Big Data and Smarter Cities

Paper List

1

SI205 "Fast Solution of Load Shedding Problems via a Sequence of Linear Programs"
Harish S. Bhat, Garnet J. Vaz, Juan C. Meza

2

SI202 "Alarm Prediction in Large-Scale Sensor Networks - A Case Study in Railroad"
Hongfei Li, Buyue Qian, Dhaivat Parikh, Arun Hampapur

3

BigD419 "MiSTRAL: An Architecture for Low-Latency Analytics on Massive Time Series"
Alice Marascu, Pascal Pompey, Eric Bouillet, Olivier Verscheure, Michael Wurst, Martin Grund, Philippe Cudre-Mauroux

4

SI204 "Yellow Cabs as Red Corpuscles"
Tim Savage, Huy Vo

5

BigD440 "Scalable Prediction of Energy Consumption using Incremental Time series Clustering"
Yogesh Simmhan, Muhammad Usman Noor

6

SI207 "A Big Data Driven Model for Taxi Drivers' Airport Pick-up Decisions in New York City"
Anil Yazici, Camille Kamga, Abhishek Singhal

Schedule

Date

9th, October,2013

Location

Ballroom E

Time

Schedule

 

Workshop 12: Knowledge management and Big Data Analytics

Paper List

1

SJ213 "Managing Massive Graphs in Relational DBMS"
Ruiwen Chen

2

SJ211 "A Distributed Approach for Graph-Oriented Multidimensional Analysis Benoît Denis"
Amine Ghrab, Sabri Skhiri

3

SJ206 "Constructing E-Tourism Platform Based on Service Value Broker: A Knowledge Management"
Yucong Duan, Jinpeng Wei, Ajay Kattepur, Wencai Du

4

SJ202 "ADraw: A novel social network visualization tool with attribute-based layout and coloring"
Zhenwen Wang, Weidong Xiao, Bin Ge, Hao Xu

5

SJ207 "IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications"
Yongzhi Wang, Jinpeng Wei, Mudhakar Srivatsa, Yucong Duan, Wencai Du

6

SJ212 "Local Join Optimization over a Heterogeneously Distributed Scientific Database"
Helen Xiang

7

SJ205 "Core-based Community Evolution in Mobile Social Networks"
Hao Xu, Weidong Xiao, Daquan Tang, Jiuyang Tang

8

BigD333 "Super-sequence Frequent Pattern Mining on Sequential Dataset"
Xinran Yu, Turgay Korkmaz

9

SJ203 "Exploring Big Data in Small Forms: A Multi-layered Knowledge Extraction of Social Networks"
Yun Wei Zhao, Willem-Jan van den Heuvel, Xiaojun Ye

10

SJ204 "Provenance Comparison for Large-Scale Knowledge Discovery"
Xiang Zhao, Bin Ge, Jiuyang Tang, Weidong Xiao, Haichuan Shang

Schedule

Date

6th, October,2013

Location

Ballroom F

Time

Schedule

8:30-8:50

Introduction (Chi-Hung Chi)

8:50-9:50

SJ203 "Exploring Big Data in Small Forms: A Multi-layered Knowledge Extraction of Social Networks"
SJ205 "Core-based Community Evolution in Mobile Social Networks"

9:50-10:20

Coffee Time

10:20-11:50

SJ211 "A Distributed Approach for Graph-Oriented Multidimensional Analysis Benoît Denis"
SJ204 "Provenance Comparison for Large-Scale Knowledge Discovery"
SJ213 "Managing Massive Graphs in Relational DBMS"

11:50-13:30

Lunch

13:30-15:30

SJ202 "ADraw: A novel social network visualization tool with attribute-based layout and coloring"
SJ206 "Constructing E-Tourism Platform Based on Service Value Broker: A Knowledge Management"
SJ207 "IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications"
SJ212 "Local Join Optimization over a Heterogeneously Distributed Scientific Database"

15:30-16:00

Coffee Time

16:00-16:30

BigD333 "Super-sequence Frequent Pattern Mining on Sequential Dataset"

 

Posters

 

Posters

1

P201 "Re-projection of Terabyte-Sized Images"
Peter Bajcsy, Antoine Vandecreme, Mary Brady

2

P207 "Tile Based Visual Analytics for Twitter Big Data Exploratory Analysis"
Daniel Cheng, Peter Schretlen, Nathan Kronenfeld, Neil Bozowsky, William Wright

3

P216 "Optimizing Queries over Semantically Integrated Datasets on MapReduce Platforms"
HyeongSik Kim, Kemafor Anyanwu

4

P211 "Secure Decoupled Linkage (SDLink) System for Building a Social Genome"
Hye-Chung Kum, Ashok Krishnamurthy, Darshana Pathak, Michael Reiter, Stanley Ahalt

5

P206 "Risk Adjustment of Patient Expenditures: A Big Data Analytics Approach"
Lin Li, Saeed Bagheri, Helena Goote, Asif Hasan, Gregg Hazard

6

P214 "Parallel Auto-encoder for Efficient Outlier Detection"
Yunlong Ma, Peng Zhang, Yanan Cao, Li Guo

7

mohPID29228727 "New Factors for Identifying Influential Bloggers"
Teng-Sheng Moh, SivaNaga Prasad Shola

8

P205 "A Scalable Infrastructure of Interactive Evolutionary Computation to Evolve Services Online with Data"
Masaharu Munetomo, Shitaro Bando

9

P215 "Big Data for Business Managers - Bridging the gap between Potential and Value"
Anmol Rajpurohit

10

tsumoto_PID2930649 "Granularity-based Temporal Data Mining in Hospital Information System"
Shusaku Tsumoto, Shoji Hirano, Haruko Iwata

11

P212 "Observation of Matthew Effects in Sina Weibo Microblogger"
Mengmeng Yang, Yi Zhou, Qu Zhou, Kai Chen, Jianhua He, Xiaokang Yang

12

P213 "A framework of Spatial Co-location Mining on MapReduce"
Jin Soung Yoo, Douglas Boulware

13

P210 "Access Control for Big Data using Data Content"
Wenrong Zeng, Yuhao Yang, Bo Luo

 

Tutorials

 

TUTORIAL 1: Online Learning for Big Data Analytics

Presenters: Irwin King, Michael R. Lyu and Haiqin Yang,

Department of Computer Science & Engineering

The Chinese University of Hong Kong

 

Summary:

Nowadays, Big data becomes a new era as science, engineering and tech- nology are producing increasingly large data streams daily making them in petabyte and exabyte scales. Moreover, massive data embedding human activity are online and available to analyze and build business models for providing personalized services in commerce. Learning from big data is a novel topic to expand the area of machine learning. Many new learning techniques need to be developed to increase the e_ectiveness and e_ciency of learning the data. Among them, online learning is one of the promising techniques, which we have deeply investigated for several years, for learning big data.

The tutorial will investigate several important components of online learning techniques for big data. First, a brief introduction of the basic con- cept of big data and big data analytics will be given. The basic concept of di_erent learning paradigms and online learning will be provided to give a whole map of the techniques developed in this area. Second, the connection of online learning techniques and big data will be addressed. After that, some motivating examples will be presented to illustrate the promising of online learning techniques. Fourth, we will present di_erent online learning techniques for non-sparse learning models, sparse learning models, unsu- pervised learning models, etc. Some hand-on demos may be given in the tutorial.

The tutorial will conclude by summarizing and reecting back on the trends of online learning techniques for big data which may lead to the change of the whole area of exciting and dynamic research that is worthy of more detailed investigation for many years to come.

 

Content:

1. Introduction
   1.1 Basic concept of big data and big data analytics
   1.2 Basic concept of online learning and its applications
2. Online Learning Algorithms
   2.1 Perceptron
   2.2 Online non-sparse learning
   2.3 Online sparse learning
   2.4 Online unsupervised learning
3. Discussion and Q & A

Short Bio.  

 

Prof. King's Profile

Prof. King's research interests include machine learning, social computing, web intelligence, data mining, and multimedia information processing. In these research areas, he has over 210 technical publications in journals and conferences. In addition, he has contributed over 20 book chapters and edited volumes. Moreover, Prof. King has over 30 research and applied grants. One notable patented system he has developed is the VeriGuide System, previously known as the CUPIDE (Chinese University Plagiarism IDentification Engine) system, which detects similar sentences and performs readability analysis of text-based documents in both English and in Chinese to promote academic integrity and honesty.

Prof. King is the Book Series Editor for Social Media and Social Computing" with Taylor and Francis (CRC Press). He is also an Associate Editor of the ACM Transactions on Knowledge Discovery from Data (ACM TKDD) and a former Associate Editor of the IEEE Transactions on Neural Networks (TNN) and IEEE Computational Intelligence Magazine (CIM). He is a member of the Editorial Board of the Open Information Systems Journal, Journal of Nonlinear Analysis and Applied Mathematics, and Neural Information Processing Letters and Reviews Journal (NIP-LR). He has also served as Special Issue Guest Editor for Neurocomputing, International Journal of Intelligent Computing and Cybernetics (IJICC), Journal of Intelligent Information Systems (JIIS), and International Journal of Computational Intelligent Research (IJCIR). He is a senior member of IEEE and a member of ACM, International Neural Network Society (INNS), and Asian Pacific Neural Network Assembly (APNNA). Currently, he is serving the Neural Network Technical Committee (NNTC) and the Data Mining Technical Committee under the IEEE Computational Intelligence Society (formerly the IEEE Neural Network Society). He is also a member of the Board of Governors of INNS and a Vice-President and Governing Board Member of APNNA. He also serves INNS as the Vice-President for Membership in the Board of Governors.

Prof. King is an associate dean of engineering faculty and a professor at the Department of Computer Science and Engineering, The Chinese University of Hong Kong. He received his B.Sc. degree in Engineering and Applied Science from California Institute of Technology, Pasadena and his M.Sc. and Ph.D. degree in Computer Science from the University of Southern California, Los Angeles.

 

Prof. Lyu's Profile

Prof. Lyu's research interests include software reliability engineering, distributed systems, fault-tolerant computing, web technologies, mobile networks, digital video library, multimedia processing, and video searching and delivery. He has participated in more than 30 industrial projects in these areas, and helped to develop many commercial systems and software tools. He has been frequently invited as a keynote or tutorial speaker to conferences and workshops in U.S., Europe, and Asia.

Prof. Lyu has published over 400 refereed journal and conference papers in his research areas. He initiated the first International Symposium on Software Reliability Engineering (ISSRE) in 1990. He was the Program Chair for ISSRE'96, Program co-Chair for WWW10, General Chair for ISSRE'2001, General co-Chair for PRDC'2005, and has served in program committees for many conferences. He is the editor for two book volumes: Software Fault Tolerance, published by Wiley in 1995 and the Handbook of Software Reliability Engineering, published by IEEE and McGraw-Hill in 1996. These books have received an overwhelming response from both the academia and the industry. He was an Associate Editor of IEEE Transactions on Reliability, IEEE Transactions on Knowledge and Data Engineering, and Journal of Information Science and Engineering. He is currently on the editorial board of Wiley Software Testing, Verification and Reliability Journal. He was elected to IEEE Fellow (2004) and AAAS Fellow (2007) for his contributions to software reliability engineering and software fault tolerance. He was also named Croucher Senior Research Fellow in 2008 and IEEE Reliability Society Engineer of the Year in 2010.

Prof. Lyu is currently a Professor in the Computer Science and Engineering department of the Chinese University of Hong Kong. He received his B.S. in Electrical Engineering from National Taiwan University, his M.S. in Computer Science from University of California, Santa Barbara, and his Ph.D. in Computer Science from University of California, Los Angeles.

Dr. Yang's Profile

Dr. Haiqin Yang's research interests include machine learning, data mining, and financial engineering. In these areas, he has over 30 technical publications in journals (JMLR, IEEE TNN, Neurocomputing, IEEE BME, IEEE SMC) and conferences (ICML, CIKM, IJCNN, ICONIP, etc.). In addition, he has written two books, four book chapters, and granted seven patents. He has served as a reviewer for many journals and in program committees for many conferences, e.g., CIKM, ACML, and IEEE BigData 2013, IEEE BDSE 2013. He also received many awards, including the ``First Prize" postgraduate paper award in the IEEE Hong Kong Section 2010, PCCW Foundation Scholarship, and The Global Scholarship Programme for Research Excellence. Dr.~Yang is currently a Postdoctoral Fellow in The Chinese University of Hong Kong. He received his B.S. degree in the Computer Science and Technology in Nanjing University, his M.Phil. and Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong.



TUTORIA 2: Large-Scale Click-stream and transaction log mining in practice

Presenters:Uwe Mayer, Nish Parikh, Gyanit Singh

Ebay

This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources. The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world's largest e-commerce platforms).  The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations). Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems. Some basic background in algorithms, statistics would be beneficial.

 

Content:

We will present the tutorial through real applications built at eBay. We will present three case studies.

·  Shipping Recommendation System

·  Mining large-scale temporal dynamics with Hadoop

·  Query Suggestions at scale with Hadoop

 

Short Bio.  

Uwe Mayer (http://labs.ebay.com/people/uwe-mayer/)

Prior to joining eBay, Uwe Mayer was a senior research scientist at Yahoo, and was a director of Analytic Science at FICO. He has been a professor of mathematics at universities in both the U.S. and in Germany.

Uwe received his MA and PhD in mathematics from the University of Utah where he was a Fulbright scholar, with an extended research stay at the Institute for Advanced Studies at Princeton. He carried out his undergraduate studies with a double major in Mathematics and Computer Sciences in Germany. Bringing his academic career full circle from computer sciences to mathematics back to computers, Uwe also has co-advised a PhD student in data mining at the University of California, San Diego, and has published in several data mining/machine learning conferences including KDD.

 

Nish Parikh (http://labs.ebay.com/people/nish-parikh/)

Nish Parikh joined eBay Research Labs in February 2008 and currently is the Head of Data Sciences Research. At eBay Research Labs, he leads efforts on query analysis, recommender systems and large-scale data processing from a data science perspective. Prior to joining eBay Research Labs he was part of the team that launched eBay's Next Generation Search Engine Voyager which supported near real-time indexing of products and served billions of search queries every week. Prior to joining eBay, Nish received an M.S. in Computer Science from University of Southern California and a B.S. in Electrical Engineering from Gujarat University where he was awarded a gold medal for academic excellence. Nish has published in premier conferences such as SIGIR, KDD, WWW, CIKM and WSDM. In addition to the research community engagement, Nish is a frequent speaker in industry and big data forums such as the Hadoop Summit and XLDB.

 

Gyanit Singh (http://labs.ebay.com/people/gyanit-singh/)

Gyanit Singh is a Research Scientist at eBay Research Labs. His research interests are in large scale data mining, query log mining and large scale data platforms. At eBay he has worked on problems like query suggestion and recovery from null search. He has also worked on in house Map-Reduce data platform called Mobius. Prior to joining eBay, Gyanit completed his masters in Computer Science from university of Washington, Seattle. Before that he was at Indian institute of Technology, Delhi pursuing his bachelors in Computer Science. Gyanit has published in premier conferences such as SIGIR, WWW, APPROX-RANDOM and WSDM. In addition to the research community engagement, Gyanit is a frequent speaker in industry and big data forums such as the Hadoop Summit and Hadoop World, ACM Data Mining Camp, Bay Area Search Forum.

 

 

Panel: "Key Issues in Big Data"

Panelists:
(1)Dr. Roger R. Schell, USC
(2)Dr. Amr Awadallah, Cloudera, Inc.
(3)Dr. Peter G. Neumann, RSl
(4)Dr.Tomoyuki Higuchi
(5)Dr. Sylvia Osborn, University of Western Ontario
(6)Dr. Justin Zhan, A&T State University
(7)Dr. T. Y. Lin, San Jose State University (Chair)

Bios of Panelists

Dr.Toger R. Schell

Dr. Roger R. Schell recently joined USC/ISI supporting their Masters of Cyber Security degree program. He is internationally recognized for originating several key modern security design and evaluation techniques, and he holds patents in cryptography, authentication and trusted workstation. For more than decade he has been co-founder and President of Aesec Corporation, a start-up company providing verifiably secure platforms. Previously Dr. Schell was co-founder and vice president for Gemini Computers, Inc., where he directed development of their highly secure (what NSA called "Class A1") commercial product, the Gemini Multiprocessing Secure Operating System (GEMSOS). He was also the founding Deputy Director of NSA's National Computer Security Center. He has been referred to as the "father" of the Trusted Computer System Evaluation Criteria (the "Orange Book"). Dr. Schell is a retired USAF Colonel. He received a Ph.D. in Computer Science from the MIT, an M.S.E.E. from Washington State, and a B.S.E.E. from Montana State. The NIST and NSA have recognized Dr. Schell with the National Computer System Security Award. In 2012 he was inducted into the inaugural class of the National Cyber Security Hall of Fame

Dr.Amr Awadallah

Dr. Amr Awadallah is the CTO/Cofounder, Cloudera, Inc. Before co-founding Cloudera in 2008, Amr (@awadallah) was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after they acquired his first startup, VivaSmart, in July of 2000. Amr holds a Bachelor's and Master's degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University

3)	Dr. Peter G. Neumann

Peter G. Neumann (Neumann@CSL.sri.com) has doctorates from Harvard and Darmstadt. After 10 years at Bell Labs in Murray Hill, New Jersey, in the 1960s, during which he was heavily involved in the Multics development jointly with MIT and Honeywell, he has been in SRI's Computer Science Lab since September 1971 -- where he is a Senior Principal Scientist. He is concerned with computer systems and networks, trustworthiness/dependability, high assurance, security, reliability, survivability, safety, and many risks-related issues such as election-system integrity, crypto applications and policies, health care, social implications, and human needs -- especially those including privacy. He is currently PI on two DARPA projects: clean-slate trustworthy hosts for the CRASH program with new hardware and new software, and clean-slate networking for the Mission- oriented Resilient Clouds program. He moderates the ACM Risks Forum (http://www.risks.org), has been reponsible for CACM's Inside Risks columns monthly from 1990 to 2007, tri-annually since then, chairs the ACM Committee on Computers and Public Policy, and has chaired the National Committee for Voting Integrity (http://www.votingintegrity.org) -- which is about to be disbanded in lieu of many other efforts. He created ACM SIGSOFT's Software Engineering Notes in 1976, was its editor for 19 years, and still contributes the RISKS section. He is on the editorial board of IEEE Security and Privacy. He has participated in four studies for the National Academies of Science: Multilevel Data Management Security (1982), Computers at Risk (1991), Cryptography's Role in Securing the Information Society (1996), and Improving Cybersecurity for the 21st Century: Rationalizing the Agenda (2007). His 1995 book, Computer-Related Risks, is still timely. He is a Fellow of the ACM, IEEE, and AAAS, and is also an SRI Fellow. He received the National Computer System Security Award in 2002, the ACM SIGSAC Outstanding Contributions Award in 2005, and the Computing Research Association Distinguished Service Award in 2013. In 2012, he was elected to the newly created National Cybersecurity Hall of Fame as one of the first set of inductees. He is a member of the U.S. Government Accountability Office Executive Council on Information Management and Technology, and vestigially the California Office of Privacy Protection advisory council (although that group has been dormant due to the CA budget crunch). He co-founded People For Internet Responsibility (PFIR, http://www.PFIR.org). He has taught courses at Darmstadt, Stanford, U.C. Berkeley, and the University of Maryland. See his website (http://www.csl.sri.com/neumann) for testimonies for the U.S. Senate and House and California state Senate and Legislature, papers, bibliography, further background, etc. See also the Illustrative Risks annotated index of earlier risks incidents, which is more or less up-to-date regarding items relating to election integrity.

4)	Dr.Tomoyuki Higuchi

Tomoyuki Higuchi is Director-General of The Institute of Statistical Mathematics (ISM) and an Executive director of the Research Organization of Information and Systems (ROIS) from April 2011. He completed his Ph.D. in Geophysics, Faculty of Science at University of Tokyo in 1989. Since joining at ISM in 1989,he has taken the part to development of the statistical modeling study consistently based on the actual problem, and is making an outstanding achievement  in the applied research of the Bayesian modeling, in particular, sequential data assimilation. He is a member of the International Statistical Institute (ISI) and the American Geophysical Union (AGU).

5)	Dr. Sylvia Osborn

Sylvia Osborn received her PhD in Computer Science from the University of Waterloo. Since 1977, she has been a faculty member in the Computer Science Department at the University of Western Ontario in London, Ontario, Canada. She is the author of numerous research papers, starting in the database field in dependency theory, and object-oriented databases.  More recently she has been active in research into role-based access control including comparison of access control models, administration of access control, delegation.  Recently, she has been focusing on the integration of privacy issues with access control, and how the consideration of  privacy of individuals' data does or does not differ from access control.

6)	Dr. Justin Zhan

Dr. Justin Zhan is the director of ILAB, which is an Interdisciplinary Research Institute at North Carolina A&T State University. He is a faculty member at Department of Computer Science, College of Engineering, North Carolina A&T State University. He has previously been a faculty member at Carnegie Mellon University and National Center for the Protection of Financial Infrastructure in South Dakota State. His research interests include Big Data, Information Assurance, Social Computing, and Health Science. He is a steering chair of IEEE International Conference on Social Computing (SocialCom) and IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT). He is currently an editor-in-chief of International Journal of Privacy, Security and Integrity, International Journal of Social Computing and Cyber-Physical Systems, and managing editor of SCIENCE journal. He has served as a conference general chair, a program chair, a publicity chair, a workshop chair, or a program committee member for 200 international conferences and an editor-in-chief, an editor, an associate editor, a guest editor, an editorial advisory board member, or an editorial board member for 30 journals. In recent years, he has published extensively in peer-reviewed journals and conferences.  His research has been funded by NSF, DoD, NIH, NSA, etc.

Dr. T. Y. Lin

Tsau Young (T.Y.) Lin received his PhD in Mathematics from Yale University. He is a Professor of Computer Science at San Jose State University and a fellow in Berkeley Initiative in Soft Computing, University of California. He is the President of International Granular Computing Society and the Founding President of International Rough Set Society. He is one of  E-i-C of International Journal of Granular Computing, Rough Sets and Intelligent Systems. He has served on various roles in reputable international journals and conferences. His interests include data/text/web mining, data security and granular/rough/soft computing. He received the best contribution awards from ICDM01 and International Rough Set Society (2005), best service award from IEEE/WIC/ACM WI-IAT2007and a pioneer award from GrC 2008.

Panel:"Big Data Projects Funding: Challenges and Opportunities"

Panelists:
(1)Wo Chang , Digital Data Advisor, NIST Information Technology Laboratory
(2)Vasanth Honavar, Frymoyer Chair Professor of IST, Penn State Univ.
(3)David Kuehn, Program Manager, Federal Highway Administration
(4)Piyush Mehrotra , Division Chief, NASA Advanced Supercomputing (NAS) Division

Moderator:
"Vijay Raghavan, UL Lafayette" (raghavan@louisiana.edu)

Panel Statement:
Big data and data analytics are one of the hottest IT themes in both academics and industry worldwide. At the same time, many governmental agencies are in the midst of great financial uncertainties and significant budget cuts. In this panel, the panelists will present their point of view on funding/federal initiatives for research on Big Data. The discussion will leverage a diverse set of experiences and viewpoints, since the panel consists not only of current and former PDs from several agencies, but also other experts knowledgeable of various alliances that are being forged in order to face big data challenges and to creatively fund such projects. Panelists may share their controversial points of view and provocative positions on issues, including above but not limited to, during panel presentation and discussion. The following structure will be followed:
(1)Welcome, Panel mechanics for discussion and Q & A, Introduction of panel members
(2)Presentations from Panelists (10- 12 min. each, including any quick questions/ comments)
(3)Moderator-directed Panel Q & A
(4)Questions from the Audience and open discussion


Bios of Panelists

Wo Chang

Mr. Wo Chang, Digital Data Advisor for the NIST Information Technology Laboratory (ITL), Chair of ISO/IEC JTC/1 SC29 WG11 (MPEG) Multimedia Preservation AHG

Mr. Wo Chang is Digital Data Advisor for the NIST Information Technology Laboratory (ITL). His responsibilities include, but are not limited to, promoting a vital and growing Big Data community at NIST with external stakeholders in commercial, academic, and government sectors. Mr. Chang currently chairs the ISO/IEC JTC/1 SC29 WG11 (MPEG) Multimedia Preservation AHG. Prior to joining ITL Office, Mr. Chang was manager of the Digital Media Group in ITL and his duties included overseeing several key projects including digital data, long-term preservation and management of EHRs, motion image quality, and multimedia standards. In the past, Mr. Chang was the Deputy Chair for the US National Body for MPEG (INCITS L3.1) and chaired several other key projects for MPEG, including MPQF, MAF, MPEG-7 Profiles and Levels, and co-chaired the JPEG Search project. Mr. Chang was one of the original members of the W3C's SMIL WG and developed one of the SMIL reference software. Furthermore, Mr. Chang also participated in the HL7 and ISO/IEC TC215 for health informatics and IETF for the protocols development of SIP, RTP/RTPC, RTSP, and RSVP. Mr. Chang's research interests include digital data preservation, cloud computing, big data analytics, content metadata description, digital file formats, multimedia synchronization, and Internet protocols.

Vasanth Honavar

Dr.  Vasant Honavar received his Ph.D. in Computer Science and Cognitive Science in 1990 from the University of Wisconsin Madison, specializing in Artificial Intelligence. From 1990 to 2013, he served on the faculty of Computer Science and of Bioinformatics and Computational Biology at Iowa State University (ISU). At ISU, he directed the Artificial Intelligence Research Laboratory (which he founded in 1990) and the Center for Computational Intelligence, Learning & Discovery (which he founded in 2005) and served as the associate chair (2001-2003) and chair (2003-2005) of the ISU Bioinformatics and Computational Biology Graduate Program, which he helped establish in 1999 with support from an Integrative Graduate Education and Research Training (IGERT) award.

Honavar served as a program director in the Information and Intelligent Systems Division of the Computer and Information Sciences and Engineering directorate of the National Science Foundation (NSF) during 2010-2013 while maintaining his research program at ISU. He led the Big Data Science and Engineering Program, established the NSF-OFR collaboration in Computational and Information Processing Approaches to and Infrastructure in support of, Financial Research and Analysis and Management, contributed to Smart and Connected Health, Information Integration and Informatics, Expeditions in Computing, Science of Learning Centers, Integrative Graduate Education and Research Training, Computing Research Infrastructure Programs. In September 2013, Honavar joined the faculty of Penn State University where he will serve as the Frymoyer Chair Professor of Information Science and Technology and lead new research and educational initiatives in Data Sciences and contribute to initiatives in Life Sciences.

Honavar's current research and teaching interests include Artificial Intelligence, Machine Learning, Bioinformatics, Big Data Analytics, Computational Molecular Biology, Data Mining, Discovery Informatics Information Integration, Knowledge Representation and Inference, Semantic Technologies, Social Informatics, Security Informatics, and Health Informatics. Honavar has served on, or currently serves on the editorial boards of several journals and program committees of several major research conferences in these areas.

Honavar has led research projects funded by NSF, NIH, and USDA that have resulted in foundational research contributions (documented in over 250 peer-reviewed publications) in algorithms for constructing predictive models from sequence, image, text, multi-relational, graph-structured data; Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data); Knowledge base and service federation; Representing and reasoning about preferences; and applications in bioinformatics, computational biology, immunoinformatics, energy informatics, health informatics and social informatics.

Honavar received the Iowa Board of Regents Award for Faculty Excellence in 2007, the Iowa State University College of Liberal Arts and Sciences Award for Research Excellence in 2008, and the Iowa State University Margaret Ellen White Graduate Faculty Award in 2011. Honavar received the NSF Director’s Award for Superior Accomplishment in 2013 for his leadership of the NSF Big Data Program. However, he considers the 28 Ph.D. students that he has mentored and trained during his academic career his proudest accomplishments.

David Kuehn

David Kuehn is the Program Manager for the Federal Highway Administration (FHWA) Exploratory Advanced Research Program. The Program Manager serves as the senior advisor to agency leadership on the communication and coordination of exploratory advanced research activities and fosters partnerships with other Federal agencies, national scientific societies and organizations, and the academic community in support of the Program. The program focuses on longer term and higher risk research with the potential for transformational improvements to the transportation system. David entered federal service as a Presidential Management Fellow. Before working at the federal level, David worked in local government and as a consultant in southern California. He holds a Masters of Public Administration from the University of Southern California and a B.A from the University of California, Irvine and is a member of the American Institute of Certified Planners (AICP).

Piyush Mehrotra

As Division Chief of the NASA Advanced Supercomputing (NAS) Division, Dr. Piyush Mehrotra oversees the full range of high-performance computing services for NASA's premier Supercomputing center. The Division focuses on the advanced computing needs of the NASA scientists and engineers, including in the areas of Accelerator technologies, Collaborative Computing, Cloud Computing, Data Analytics and Quantum Computing. Dr. Mehrotra has over 30 years of R&D experience in parallel programming languages, including compilers and runtime systems for shared- and distributed-memory systems, and middleware infrastructure for grid environments. Recently his research focus has been on performance characterization, benchmarking and effective utilization of parallel systems including HPC clouds. He has published over 100 articles in journals and conferences, edited two books, and served as editor for several issues of international computer science journals.