IEEE
BigData 2014 Program Schedule
Washington DC
USA
Oct 27-30, 2014
|
Program
|
|
|
• October 27,
2014
• October
28, 2014
• October
29, 2014
• October
30, 2014
|
|
|
|
|
|
Keynote Lecture
Main conference regular paper: 25 minutes (about 20 minutes for talk and
5 minutes for Q and A)
Main conference short paper: 15 minutes (about 11 minutes for talk and 4
minutes for Q and A)
|
|
|
|
|
|
|
26-Oct
|
|
15:30-20:00
|
Registration
|
|
Venue:
|
Ballroom Foyer and Ballroom Coatroom
|
|
27-Oct
|
|
7:30-18:00
Venue:
|
Registration
Ballroom Foyer and Ballroom Coatroom
|
|
10:00-10:20
and
15:20-15:40
|
Coffee Break
at Meeting Room Foyer
|
|
08:00-18:30
|
|
Sessions
|
Session Chair
|
Venue
|
|
Special session I
|
From
Data to Insight: Big Data and Analytics for Smart Manufacturing Systems
|
Sudarsan
Rachuri
Ronay
AK
|
JUDICIARY SUITE
|
|
Full day event
|
Doctoral Consortium
|
Jingrui He
|
Patuxent
|
|
Full-day workshop
|
#2
|
The
2nd Workshop on Scalable Machine Learning: Theory and Applications
|
Zenglin Xu
|
Waterford
|
|
#3
|
1st International Workshop on High
Performance Big Graph Data Management, Analysis, and Mining
|
Fengguang
Song
|
Lalique
|
|
#8
|
The
2nd International Workshop of BigData in Bioinformatics and Healthcare
Informatics
|
Jun Huan
|
Haverford
|
|
#13
|
First
Hands-On Workshop on Leveraging High Performance Computing Resources
for Managing Large Datasets
|
Ritu Arora
|
Baccarat
|
|
#18
|
Large
Scale Data Analytics in Transportation and Railway Infrastructure
|
Nii Attoh-Okine
|
CARTIER/TIFFANY
|
|
#20
|
Big
Humanities Data
|
Mark Hedges
|
CABINET
SUITE
|
|
#22
|
IEEE
NIST Big Data PWG Workshop on Big Data: Challenges, Practices and
Technologies
|
Nancy Grady
|
Embassy
|
|
08:00-12:00
|
|
Sessions
|
Session Chair
|
Venue
|
|
Morning workshop
|
#1
|
Scholarly Big Data: Challenges &
Issues
|
Ingemar
J. Cox
|
Diplomat
|
|
#15
|
Workshop on Advances in Software and
Hardware for Big Data to Knowledge Discovery (ASH)
|
Weijia
Xu
|
Ambassador
|
|
#17
|
Big Data in Computational Epidemiology
|
Jiangzhuo
Chen
|
Severn
|
|
#19
|
2nd Workshop on Scalable Cloud Data
Management
|
Felix
Gessert
|
Susquehanna
|
|
|
#21
|
Complexity for Big Data
|
Guozhu
Dong
|
Potomac
|
|
13:30-18:30
|
|
Sessions
|
Session Chair
|
Venue
|
|
Special session II
|
Big Data Representation and Processing in Data Science
|
T.Y. Lin
|
Susquehanna
|
|
Tutorial
|
Big Data Stream Mining
|
Alfred
Bifet
|
Severn
|
|
Afternoon
workshop
|
#11
|
CASK-14 : 1st International Workshop on
Collaborative methodologies to Accelerate Scientific Knowledge
discovery in big data
|
Chen
Jin
|
Diplomat
|
|
#16
|
IEEE Big Data Workshop on Semantics
for Big Data on the Internet of Things (SemBIoT 2014)
|
Kemafor
Ogan
|
Ambassador
|
|
|
|
|
|
|
28-Oct
|
|
|
07:30-18:00
|
Registration
|
|
Venue:
|
Ballroom
Foyer and Ballroom Coatroom
|
|
08:30-08:45
|
Opening and Welcoming Speech
Conference Co-Chairs:
Charu Aggarwal, Nick Cercone, Vasant
Honavar
Program Co-Chairs:
Jimmy Lin, Jian Pei
Industry Program co-Chairs:
Wo Chang, Raghunath Nambiar
BigData Steering Committee Chair:
Xiaohua Tony
Hu (Drexel University)
|
|
Venue:
|
CRYSTAL BALLROOM
|
|
08:45-09:45
|
Session Chair: Jian Pei
Keynote Speech 1: Never-Ending
Language Learning
Tom Mitchell - E. Fredkin University
Professor, Machine Learning Department, Carnegie Mellon University
|
|
Venue:
|
CRYSTAL BALLROOM
|
|
09:45-10:00
|
Coffee Break at Meeting Room Foyer
Poster session
setup and display: Meeting Room Foyer
|
|
10:00-12:30
|
S 1
Visual analytics, time, and space
|
S 2
Cloud computing and systems (1)
|
S 3
Graphs and networks
|
Tutorial
Big ML Software for Modern ML
Algorithms
|
|
Session Chair
|
Arash Jalal Zadeh Fard
|
Amy Apon
|
Luke Huan
|
Qirong Ho, Eric Xing
|
|
Venue
|
CABINET SUITE
|
DIPLOMAT/AMBASSADOR
|
JUDICIARY SUITE
|
EMBASSY/PATUXENT
|
|
12:30-14:00
|
Lunch provided by the conference at BALLROOM FOYER (Seating inside the Crystal Ballroom)
Poster session setup and display: Meeting Room Foyer
|
|
|
|
14:00-16:05
|
L 1
Graphs and networks (1)
|
L 2
Scalable systems
|
L 3
Storage
|
I&G 1
Industry & Government
|
|
Session
Chair
|
Conrad
S. Tucker
|
Weijia
Xu
|
Steven
Y. Ko
|
Wo
Chang
|
|
Venue:
|
CABINET SUITE
|
DIPLOMAT/AMBASSADOR
|
JUDICIARY SUITE
|
EMBASSY/PATUXENT
|
|
16:05-16:20
|
Coffee Break at Meeting
Room Foyer
|
|
16:20-18:00
|
L 4
Image processing
|
L 5
Data streams and time series
|
L 6
Regression and machine learning
|
I&G 2
Industry & Government
|
|
Session
Chair
|
Lin-Ching
Chang
|
Bo
Luo
|
Jiang
Zheng
|
Raghunath
Nambiar
|
|
Venue:
|
CABINET
SUITE
|
DIPLOMAT/AMBASSADOR
|
JUDICIARY
SUITE
|
EMBASSY/PATUXENT
|
|
19:00-20:30
Venue:
|
Banquet:
CRYSTAL BALLROOM
|
|
|
|
|
|
|
|
|
|
|
|
|
29-Oct
|
|
|
07:30-18:00
Venue:
|
Registration
Ballroom Foyer,
Ballroom Coatroom
|
|
08:30-09:30
|
Session Chair: Vasant Honavar
Keynote
Speech 2: Smart Data - How you
and I will exploit Big Data for personalized digital health and many
other activities
Amit
Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis - Wright State
University
|
|
Venue:
|
CRYSTAL BALLROOM
|
|
09:30-10:00
|
Coffee Break at Meeting
Room Foyer
Poster session setup and display: Meeting Room Foyer
|
|
10:00-12:30
|
Panel with Program Directors: Dr. Chaitanya Baru
(NSF), Dr. Yuan Liu (NIH), Dr.
David Kuehn (DoT), Dr.Tsengdar Lee (NASA), Dr. Sudarsan Rachuri (NIST), Mr. Matti Vakkuri (DIGILE):
Big Data
Challenges and Opportunities
|
Tutorial
Large-scale
Heterogeneous Learning in Big Data Analytics
|
|
Session Chair
|
Xiaohua Tony Hu
|
Jun Huan
|
|
Venue:
|
CRYSTAL
BALLROOM
|
OLD
GEORGETOWN
|
|
12:30-14:00
|
Lunch provided by conference at BALLROOM FOYER (Seating inside the Crystal Ballroom)
|
|
|
Poster session setup and display: Meeting Room Foyer
|
|
14:00-16:05
|
L 7
Distributed
systems
|
L 8
Visualization/bioinformatics
|
L 9
Cloud
computing
|
I&G 3
Industry
& Government
|
|
Session Chair
|
Yicheng Tu
|
Saumyadipta Pyne
|
Ada Fu
|
Raghunath Nambiar
|
|
Venue:
|
OLD
GEORGETOWN
|
CABINET
SUITE
|
JUDICIARY
SUITE
|
DIPLOMAT/AMBASSADOR
|
|
16:05-16:20
|
Coffee Break at Meeting
Room Foyer
|
|
16:20-18:00
|
L 10
Privacy and
security
|
L 11
Graphs and
networks (2)
|
I&G 4
Industry
& Government
|
Tutorial
Big Data
Benchmarking
|
|
Session
Chair
|
Christoph
Schommer
|
Hao Howie
Huang
|
Wo Chang
|
Chaitan
Baru, Tilmann Rabl
|
|
Venue:
|
OLD
GEORGETOWN
|
CABINET
SUITE
|
DIPLOMAT/AMBASSADOR
|
JUDICIARY
SUITE
|
|
|
|
|
|
|
|
|
|
|
|
|
30-Oct
|
|
|
07:30-18:00
|
Registration
|
|
Venue:
|
Ballroom
Foyer, Ballroom Coatroom
|
|
08:30-09:30
|
Session Chair: Jimmy Lin
Keynote Speech 3: Addressing Human Bottlenecks in Big
Data
Joseph
M.
Hellerstein, Chancellor's Professor of Computer Science, University of
California, Berkeley and Trifacta
|
|
Venue:
|
CRYSTAL
BALLROOM
|
|
09:30-10:00
|
Coffee Break
at Meeting Room Foyer
Poster session setup and display: Meeting Room Foyer
|
|
10:00-12:30
|
S 4
Cloud
computing and systems (2)
|
S 5
Applications
|
S 6
Data mining
and learning
|
|
Session
Chair
|
Feng Luo
|
Mathias
Johanson
|
Xiaohua Tony
Hu
|
|
Venue:
|
JUDICIARY
SUITE
|
OLD
GEORGETOWN
|
CABINET
SUITE
|
|
08:00-12:00
|
|
Sessions
|
Session Chair
|
Venue
|
|
Morning workshop
|
#6
|
The Second Workshop on Distributed
Storage Systems and Coding for Big Data
|
Bing
Zhu
|
Diplomat
|
|
#7
|
First IEEE International Workshop on
Big Data Security and Privacy (BDSP 2014)
|
Tyrone
W A Grandison
|
Ambassador
|
|
13:30-18:30
|
|
Sessions
|
Session Chair
|
Venue
|
|
Afternoon
workshop
|
#9
|
Solar Astronomy Big Data (SABiD) – 1st
Workshop on Management, Search and Mining of Massive Repositories of
Solar Astronomy Data
|
Rafal Angryk
|
Diplomat
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Keynote
1:
Title: Never-Ending Language Learning
Speaker:
Tom
Mitchell - E. Fredkin University Professor, Machine Learning
Department, Carnegie Mellon University
Abstract:
We
will never really understand learning until we can build machines that
learn many different things, over years, and become better learners over
time. We describe our research to build a Never-Ending Language Learner
(NELL) that runs 24 hours per day, forever, learning to read the web.
Each day NELL extracts (reads) more facts from the web, into its
growing knowledge base of beliefs. Each day NELL also learns to read
better than the day before. NELL has been running 24 hours/day for over
four years now. The result so far is a collection of 70 million
interconnected beliefs (e.g., servedWtih(coffee, applePie)), NELL is
considering at different levels of confidence, along with millions of
learned phrasings, morphological features, and web page structures that
NELL uses to extract beliefs from the web. NELL is also learning to
reason over its extracted knowledge, and to automatically extend its
ontology. Track NELL's progress at http://rtw.ml.cmu.edu, or follow it
on Twitter at @CMUNELL.
Short
Bio:
Tom
M. Mitchell founded and chairs the Machine Learning Department at
Carnegie Mellon University, where he is the E. Fredkin University
Professor. His research uses machine learning to develop computers that
are learning to read the web, and uses brain imaging to study how the
human brain understands what it reads. Mitchell is a member of the U.S.
National Academy of Engineering, a Fellow of the American Association
for the Advancement of Science (AAAS), and a Fellow and Past President
of the Association for the Advancement of Artificial Intelligence
(AAAI). He believes the field of machine learning will be the fastest
growing branch of computer science during the 21st century.
Keynote
2:
Title: Smart Data - How you and I will exploit Big Data for
personalized digital health and many other activities
Speaker:
Amit
Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis - Wright State
University
Abstract:
Big
Data has captured a lot of interest in industry, with the emphasis on
the challenges of the four Vs of Big Data: Volume, Variety, Velocity,
and Veracity, and their applications to drive value for
businesses. Recently, there is
rapid growth in situations where a big data challenge relates to making
individually relevant decisions.
A key example is personalized digital health that related to
taking better decisions about our health, fitness, and well-being. Consider for instance, understanding
the reasons for and avoiding an asthma attack based on Big Data in the
form of personal health signals (e.g., physiological data measured by
devices/sensors or Internet of Things around humans, on the humans, and
inside/within the humans), public health signals (e.g., information
coming from the healthcare system such as hospital admissions), and
population health signals (such as Tweets by people related to asthma
occurrences and allergens, Web services providing pollen and smog
information). However, no
individual has the ability to process all these data without the help
of appropriate technology, and each human has different set of relevant
data!
In
this talk, I will describe Smart Data that is realized by extracting
value from Big Data, to benefit not just large companies but each
individual. If my child is an asthma patient, for all the data relevant
to my child with the four V-challenges, what I care about is simply,
“How is her current health, and what are the risk of having an asthma
attack in her current situation (now and today), especially if that
risk has changed?” As I will show, Smart Data that gives such
personalized and actionable information will need to utilize metadata,
use domain specific knowledge, employ semantics and intelligent
processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a
synergistic combination of techniques similar to the close interworking
of the top brain and the bottom brain in the cognitive models.
For
harnessing volume, I will discuss the concept of Semantic Perception,
that is, how to convert massive amounts of data into information,
meaning, and insight useful for human decision-making. For dealing with
Variety, I will discuss experience in using agreement represented in
the form of ontologies, domain models, or vocabularies, to support
semantic interoperability and integration. For Velocity, I will discuss somewhat
more recent work on Continuous Semantics, which seeks to use
dynamically created models of new objects, concepts, and relationships,
using them to better understand new cues in the data that capture
rapidly evolving events and situations.
Smart
Data applications in development at Kno.e.sis come from the domains of
personalized health, energy, disaster response, and smart city. I will
present examples from a couple of these.
Short Bio:
Amit
P. Sheth (http://knoesis.org/amit) is an educator, researcher, and
entrepreneur. He is the LexisNexis Eminent Scholar and
founder/executive director of the Ohio Center of Excellence in Knowledge-enabled
Computing (Kno.e.sis) at Wright State University. Kno.e.sis conducts
research in social/sensor/semantic data and Web 3.0 with real-world
applications and multidisciplinary solutions for translational
research, healthcare and life sciences, cognitive science, material
sciences, and others. Kno.e.sis' activities have resulted in Wright
State University being recognized as a top organization in the world on
World Wide Web in research impact. Prof. Sheth is one of top authors in
Computer Science, World Wide Web, and databases (cf: Microsoft Academic
Search; Google H-index). His research has led to several commercial
products, many real-world applications, and two earlier companies with
two more in early stages of development. One of these was Taalee/Voquette/Semagix,
which was likely the first company (founded in 1999) that developed
Semantic Web enabled search and analysis, and semantic application
development platforms.
Keynote
3:
Title: Addressing Human Bottlenecks in Big Data
Speaker:
Joseph
M. Hellerstein, Chancellor's Professor of Computer Science, University
of California, Berkeley and Trifacta
Abstract:
We
live in an era when compute is cheap, data is plentiful, and system
software is being given away for free.
Today, the critical bottlenecks in data-driven organizations are
human bottlenecks, measured in the costs of software developers, IT
professionals, and data analysts.
How can computer science remain relevant in this context? The Big Data ecosystem presents two
archetypal settings for answering this question: NoSQL distributed
databases, and analytics on Hadoop.
In
the case of NoSQL, developers are being asked to build parallel
programs for global-scale systems that cannot even guarantee the
consistency of a single register of memory. How can this possibly be made to
work? I’ll talk about what we
have seen in the wild in user deployments, and what we’ve learned from
developers and their design patterns.
Then I’ll present theoretical results—the CALM Theorem—that shed
light on what’s possible here, and what requires more expensive tools
for coordination on top of the typical NoSQL offerings. Finally, I will highlight some new
approaches to writing and testing software—exemplified by the Bloom
language—that can help developers of distributed software avoid
expensive coordination when possible, and have the coordination logic
synthesized for them automatically when necessary.
In
the Hadoop context, the key bottlenecks lie with data analysts and data
engineers, who are routinely asked to work with data that cannot
possibly be loaded into tools for statistical analytics or
visualization. Instead, they
have to engage in time-consuming data “wrangling”—to try and figure out
what’s in their data, whip it into a rectangular shape for analysis,
and figure out how to clean and integrate it for use. I’ll discuss what we heard talking
with data analysts in both academic interviews and commercial
engagements. Then I’ll talk
about how techniques from human-computer interaction, machine learning,
and database systems can be brought together to address this human
bottleneck, as exemplified by our work on various systems including the
Data Wrangler project and Trifacta's platform for data transformation.
Short Bio:
oseph M. Hellerstein is a Chancellor's Professor of
Computer Science at the University of California, Berkeley, whose
research focuses on data-centric systems and the way they drive
computing. A Fellow of the ACM, his work has been recognized via awards
including an Alfred P. Sloan Research Fellowship, MIT Technology
Review's TR10 and TR100 lists, Fortune Magazine's "Smartest in
Tech" list, and three ACM-SIGMOD "Test of Time"
awards. In 2012, Joe co-founded
Trifacta, Inc (http://www.trifacta.com/),
where he currently serves as Chief Strategy Officer.
|
|
Conference Paper Presentations
|
|
L1:
Graphs and networks (1)
|
|
Regular
|
BigD210 "4S:
Learning to Estimate Pairwise Distances in Large Graphs"
Maria Christoforaki and Torsten Suel
|
|
Regular
|
BigD304 "Geotagging
One Hundred Million Twitter Accounts with Total Variation
Minimization"
Ryan Compton, David Jurgens, and David Allen
|
|
Regular
|
BigD357
"GRAPHiQL: A Graph Intuitive Query Language for Relational
Databases"
Alekh Jindal and Samuel Madden
|
|
Regular
|
BigD395
"PULP: Scalable Multi-Objective Multi-Constraint Partitioning for
Small-World Networks"
George Slota, Siva Rajamanickam, and Kamesh Madduri
|
|
Regular
|
BigD436
"Synergistic Partitioning in Multiple Large Scale Social
Networks"
Songchang Jin, Jiawei Zhang, Philip S. Yu, Shuqiang Yang, and Aiping Li
|
|
L 2:
Scalable systems
|
|
Regular
|
BigD216
"FusionFS: Toward Supporting Data-Intensive Scientific
Applications on Extreme-Scale High-Performance Computing Systems"
Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries
Kimpe, Philip Carns, Rob Ross, and Ioan Raicu
|
|
Regular
|
BigD253
"Sparse computation for large-scale data mining"
Dorit S. Hochbaum and Philipp Baumann
|
|
Regular
|
BigD306 "BASIC:
an Alternative to BASE for Large-Scale Data Management System"
Lengdong Wu, Li-Yan Yuan, and Jia-Huai You
|
|
Regular
|
BigD336
"Facilitating Twitter Data Analytics: Platform, Language, and
Functionality"
Ke Tao, Claudia Hauff, Geert-Jan Houben, Fabian Abel, and Guido
Wachsmuth
|
|
Regular
|
BigD444
"Large-scale Distributed Sorting for GPU-based Heterogeneous
Supercomputers"
Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, and
Satoshi Matsuoka
|
|
L 3:
Storage
|
|
Regular
|
BigD271
"BurstMem: A High-Performance Burst Buffer System for Scientific
Applications"
Teng Wang, Sarp Oral, Yandong Wang, Brad Settlemyer, Scott Atchley, and
Weikuan Yu
|
|
Regular
|
BigD313
"Meeting Predictable Buffer Limits in the Parallel Execution of
Event Processing Operators"
Ruben Mayer, Boris Koldehofe, and Kurt Rothermel
|
|
Regular
|
BigD398” Effective Caching Techniques
for Accelerating Pattern Matching Queries Arash
Fard, Satya Manda, Lakshmish Ramaswamy, and John Miller
|
|
Regular
|
BigD407
"Provenance-Based Object Storage Prediction Scheme for Scientific
Big Data Applications"
Dong Dai, Yong Chen, Dries Kimpe, and Rob Ross
|
|
Regular
|
BigD215
"Virtual Chunks: On Supporting Random Accesses to Scientific Data in
Compressible Storage Systems"
Dongfang Zhao, Jian Yin, Kan Qiao, and Ioan Raicu
|
|
L 4: Image
processing
|
|
Regular
|
BigD316 "
Metadata Extraction and Correction for Large-Scale Traffic Surveillance
Videos "
Xiaomeng Zhao, Huadong Ma, Haitao Zhang, Yi Tang, and Guangping Fu
|
|
Regular
|
BigD360 "
Structure Recognition from High Resolution Images of Ceramic Composites
"
Daniela Ushizima, Talita Perciano, Harinarayan Krishnan, Burlen Loring,
Hrishikesh Bale, Dilworth Parkinson, and James Sethian
|
|
Regular
|
BigD379 "
Evaluating Density-based Motion for Big Data Visual Analytics
"
Ronak Etemadpour, Paul Murray, and Angus Forbes
|
|
Regular
|
BigD421 "
Locating Visual Storm Signatures from Satellite Images "
Yu Zhang, Stephen Wistar, Jose A. Piedra-Fernández, Jia Li, Michael
Steinberg, and James Z. Wang
|
|
L 5:
Data streams and time series
|
|
Regular
|
BigD234
"Distributed Adaptive Model Rules for Mining Big Data
Streams"
Anh Thu Vu, Gianmarco De Francisci Morales, Joao Gama, and Albert Bifet
|
|
Regular
|
BigD382
"Interpretable Streaming Regression Models with Local Performance
Guarantees"
Ulf Johansson, Cecilia Sönströd, and Henrik Linusson
|
|
Regular
|
BigD451
"Performance Modeling in CUDA Streams - A Means for
High-Throughput Data Processing"
Hao Li, Di Yu, Anand Kumar, and Yicheng Tu
|
|
Regular
|
BigD445
"TRISTAN: Real-Time Analytics on Massive Time Series Using Sparse
Dictionary Compression"
Alice Marascu, Pascal Pompey, Eric Bouillet, Michael Wurst, Olivier
Verscheure, Martin Grund, and Philippe Cudre-Mauroux
|
|
L 6:
Regression and machine learning
|
|
Regular
|
BigD402
"Predicting Glaucoma Progression using Multi-task Learning with Heterogeneous
Features"
Shigeru Maya, Kai Morino, and Kenji Yamanishi
|
|
Regular
|
BigD283
"Examination of Data, Rule Generation and Detection of Phishing
URLs using Online Logistic Regression"
Mohammed Nazim Feroz and Susan Mengel
|
|
Regular
|
BigD454
"Large-scale Logistic Regression and Linear Support Vector
Machines Using Spark"
Chieh-Yen Lin, Cheng-Hao Tsai, Ching-Pei Lee, and Chih-Jen Lin
|
|
Regular
|
BigD465
"BayesWipe: A Multimodal System for Data Cleaning and Consistent Query
Answering on Structured Data"
Sushovan De, Yuheng Hu, Yi Chen, and Subbarao Kambhampati
|
|
L 7:
Distributed systems
|
|
Regular
|
BigD318
"Partial Rollback-based Scheduling on In-memory Transactional Data
Grids"
Junwhan Kim
|
|
Regular
|
BigD337 "Main
Memory Evaluation of Recursive Queries on Multicore Machines"
Mohan Yang and Carlo Zaniolo
|
|
Regular
|
BigD391
"Distributed Algorithms for k-truss Decomposition"
Ming-Syan Chen, Pei-Ling Chen, and Chung-Kuang Chou
|
|
Regular
|
BigD434 "Parallel
Breadth First Search on GPU Clusters"
Zhisong Fu, Harish Dasari, Martin Berzins, and Bryan Thompson
|
|
Regular
|
BigD471
"Optimizing Load Balancing and Data-Locality with Data-aware
Scheduling"
Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, and
Ioan Raicu
|
|
L 8: Visualization/bioinformatics
|
|
Regular
|
BigD258 "
Topic Similarity Networks: Visual Analytics for Large Document Sets
"
Arun Maiya
|
|
Regular
|
BigD303 " Web-based
Visual Analytics for Extreme Scale Climate Science "
Chad Steed, Katherine Evans, John Harney, Brian Jewell, Galen Shipman,
Brian Smith, Peter Thornton, and Dean Williams
|
|
Regular
|
BigD338 "
Visual Fusion of Mega-City Big Data:
An Application to Traffic and Tweets Data Analysis of Metro Passengers "
Masahiko Itoh, Daisaku Yokoyama, Masashi Toyoda, Yoshimitsu Tomita,
Satoshi Kawamura, and Masaru Kitsuregawa
|
|
Regular
|
BigD277 "
Random Projection Based Clustering for Population Genomics "
Sotiris Tasoulis, Lu Cheng, Niko Välimäki, Nicholas Croucher, Simon
Harris, William Hanage, Teemu Roos, and Jukka Corander
|
|
Regular
|
BigD460 "
Identification of SNP Interactions Using Data-Parallel Primitives on
GPUs "
Can Altinigneli, Bettina Konte, Dan Rujescu, Christian Boehm, and
Claudia Plant
|
|
L 9:
Cloud computing
|
|
Regular
|
BigD380
"Combining Hadoop and GPU to Preprocess Large Affymetrix
Microarray Data"
sufeng Niu, guangyu yang, nilim sarma, Melissa Smith, Pradip Srimani,
and Feng Luo
|
|
Regular
|
BigD423
"Detecting and Identifying System Changes in the Cloud via
Discovery by Example"
Hao Chen, Sastry Duri, Vasanth Bala, Nilton Bila, Canturk Isci, and Ayse
Coskun
|
|
Regular
|
BigD426
"PigOut: Making Multiple Hadoop Clusters to Work Together"
Kyungho Jeon, Sharath Chandrashekhara, Feng Shen, Shikhar Mehra, Oliver
Kennedy, and Steven Ko
|
|
Regular
|
BigD432
"Accurate and Efficient Selection of the Best Consumption
Prediction Method in Smart Grids"
Marc Frincu, Charalampos Chelmis, Muhammad Noor, and Viktor Prasanna
|
|
Regular
|
BigD244
"E-Sketch: Gathering Large-scale Energy Consumption Data Based on
Consumption Patterns"
Zhichuan Huang, Hongyao Luo, David Skoda, Ting Zhu, and Yu Gu
|
|
L
10: Privacy and security
|
|
Regular
|
BigD260
"Hierarchical Management of Large-Scale Malware Data"
Lee Kellogg, Brian Ruttenberg, Alison O'Connor, Michael Howard, and Avi
Pfeffer
|
|
Regular
|
BigD294
"MR-TRIAGE: Scalable Multi-Criteria Clustering for Big Data
Security Intelligence Applications"
Yun Shen and Olivier Thonnard
|
|
Regular
|
BigD383 "Using Data Content to
Assist Access Control for Large-Scale Content-Centric Databases"
Wenrong Zeng, Yuhao Yang, and Bo Luo
|
|
|
|
|
L
11: Graphs and networks (2)
|
|
Regular
|
BigD301
"Efficient Breadth-First Search on a Heterogeneous Processor"
Mayank Daga, Mark Nutter, and Mitesh Meswani
|
|
Regular
|
BigD419 "Clique
Guided Community Detection"
Diana Palsetia, Mostofa Patwary, William Hendrix, Ankit Agrawal, and
Alok Choudhary
|
|
Regular
|
BigD441 "Increasing the Veracity
of Event Detection on Social Media Networks Through User Trust
Modeling"
Todd Bodnar, Conrad Tucker, Kenneth Hopkinson, and Sven Bilén
|
|
Regular
|
BigD455
"NVM-based Hybrid BFS with Memory Efficient Data Structure"
Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, and
Satoshi Matsuoka
|
|
I&G: Industry & Government
(1)
|
|
Regular
|
N211
Spatial Computations over Terabyte-Sized Images on
Hadoop Platforms
Peter Bajcsy, Phuong Nguyen, Antoine Vandecreme, and
Mary Brady
|
|
Regular
|
N223
Astro: A Predictive Model for Anomaly Detection and
Feedback-based Scheduling on Hadoop
Chaitali Gupta, Mayank Bansal, Tzu-Cheng Chuang,
Ranjan Sinha, and Sami Ben-romdhane
|
|
Regular
|
N222
ALOJA: a Systematic Study of Hadoop Deployment
Variables to Enable Automated Characterization of Cost-Effectiveness
Nicolas Poggi, David Carrera, Aaron Call, Rob
Reinauer, Nikola Vujic, Daron Green, José Blakeley, Sergio Mendoza,
Yolanda Becerra, Jordi Torres, Eduard Ayguadé, and Jesús Labarta
|
|
Regular
|
N217
Lightweight Approximate Top-k for Distributed
Settings
Vinay Deolalikar and Kave Eshghi
|
|
Regular
|
N230
Recommending Similar Items in Large-scale Online
Marketplaces
Jayasimha Reddy Katukuri, Tolga Konik, Rajyashree
Mukherjee, and Santanu Kolay
|
|
I&G: Industry & Government
(2)
|
|
Regular
|
N216
Crowdsourced Query Augmentation through Semantic
Discovery of Domain-specific Jargon
Khalifeh Aljadda, Mohammed Korayem, Trey Grainger,
and Chris Russell
|
|
Regular
|
N224
Heterogeneous Stream Processing for Disaster
Detection and Alarming
Francois Schnitzler, Thomas Liebig, Shie Mannor,
Gustavo Souto, Sebastian Bothe, and Hendrik Stange
|
|
Regular
|
N201
Recall Estimation for Rare Topic Retrieval from
Large Corpuses
Praveen Bommannavar, Alek Kolcz, and Anand Rajaraman
|
|
Regular
|
N236
Identifying top Chinese network buzzwords from
social media big data set based on time-distribution features
Yongli Tang, Tingting He, Bo Li, and Xiaohua Hu
|
|
Regular
|
N218
Query Revision During Cluster Based Search on Large
Unstructured Corpora
Vinay Deolalikar
|
|
I&G: Industry & Government
(3)
|
|
Regular
|
N213
A Scalable and Efficient Community Detection
Algorithm
Dhaval C. Lunagariya, Somayajulu D.V.L.N., and Radha
Krishna P.
|
|
Regular
|
N202
Future Directions of Humans in Big Data Research
Celeste Lyn Paul, Chris Argenta, William Elm, and
Alex Endert
|
|
Regular
|
N228
An Initial Study of Predictive Machine Learning
Analytics on Large Volumes of Historical Data for Power System
Applications
Jiang Zheng and Aldo Dagnino
|
|
Regular
|
N207
In Unity There is Strength: Showcasing a Unified Big
Data Platform with MapReduce Over both Object and File Storage
Renu Tewari, Dean Hildebrand, and Rui Zhang
|
|
Regular
|
N203
Bridging High Velocity and High Volume Industrial
Big Data Through Distributed In-Memory Storage & Analytics
Jenny Weisenberg Williams, Kareem Aggour, John
Interrante, Justin McHugh, and Eric Pool
|
|
I&G: Industry & Government
(4)
|
|
Regular
|
N232
Big Data Predictive Analytics for Proactive
Semiconductor Equipment Maintenance
Sathyan Munirathinam
|
|
Regular
|
N219
Automating Data Integration with HiperFuse
Eric Huang, Andres Quiroz, and Luca Ceriani
|
|
Regular
|
N215
Explore Efficient Data Organization for Large Scale
Graph Analytics and Storage
Yinglong Xia, Ilie Tanasa, Lifeng Nai, Wei Tan,
Yanbin Liu, Jason Crawford, and Ching-Yung Lin
|
|
Regular
|
N209
Increasing the Accessibility to Big Data Systems via
a Common Services API
Rohan Malcolm, Cherrelle Morrison, Tyrone Grandison,
Sean Thorpe, Kimron Christie, Akim Wallace, Damian Green, Julian
Jarrett, and Arnett Campbell
|
|
S 1:
Visual analytics, time, and space
|
|
Short
|
BigD204 "The
Role of Visual Analysis in the Regulation of Electronic Order Book
Markets"
Mark Paddrik, Richard Haynes, Andrew Todd, William Scherer, and Peter
Beling
|
|
Short
|
BigD217
"Preferences over Time"
noriaki kawamae
|
|
Short
|
BigD227
"Online Temporal-Spatial Analysis for Detection of Critical Events
in Cyber-Physical Systems"
Magnus Almgren, Olaf Landsiedel, Marina Papatriantafilou, and Zhang Fu
|
|
Short
|
BigD252
"In-Situ Visualization and Computational Steering for Large-Scale
Simulation of Turbulent Flows in Complex Geometries"
Hong Yi, Michel Rasquin, Jun Fang, and Igor Bolotnov
|
|
Short
|
BigD288
"Large-Scale Network Traffic Monitoring with DBStream, a System
for Rolling Big Data Analysis"
Arian Bär, Alessandro Finamore, Pedro Casas, Lukasz Golab, and Marco
Mellia
|
|
Short
|
BigD387 "Immerive
and collaborative data visualization using virtual reality
platforms"
Ciro Donalek, S.G. Djorgovski, Scott Davidoff, Alex Cioc, Anwell Wang,
Giuseppe Longo, Jeffrey S. Norris, Jerry Zhang, Elizabeth Lawler, and
Stacy Yeh
|
|
Short
|
BigD411 "On Scaling
Time Dependent Shortest Path Computations for Dynamic Traffic
Assignment"
Amit Gupta, Weijia Xu, Kenneth Perrine, Dennis Bell, and Natalia
Ruiz-Juri
|
|
Short
|
BigD413 "High
Volume Geospatial Mapping for Internet-of-Vehicle Solutions with
In-Memory Map-Reduce Processing"
Tao Zhong, Kshitij Doshi, Gang Deng, Xiaoming Yang, and Hegao Zhang
|
|
Short
|
BigD431 "The
Adaptive Projection Forest: Using Adjustable Exclusion and Parallelism
in Metric Space Indexes"
Lee Thompson, Weijia Xu, and Daniel Miranker
|
|
Short
|
BigD440 "Low
Complexity Sensing for Big Spatio-Temporal Data"
Dongeun Lee and Jaesik Choi
|
|
S 2:
Cloud computing and systems (1)
|
|
Short
|
BigD242 "Scheduling
MapReduce Tasks on Virtual MapReduce Clusters from a Tenant’s
Perspective"
Jia-Chun Lin, Ming-Chang Lee, and Ramin Yahyapour
|
|
Short
|
BigD311
"Minimizing Data Movement through Query Transformation"
Patrick Leyshock, David Maier, and Kristin Tufte
|
|
Short
|
BigD364
"Automated Workload-aware Elasticity of NoSQL Clusters in the
Cloud"
Evie Kassela, Christina Boumpouka, Ioannis Konstantinou, and Nectarios
Koziris
|
|
Short
|
BigD384
"Multilevel Partitioning of Large Unstructured Grids"
Oyindamola Akande and Philip Rhodes
|
|
Short
|
BigD392 "On
the Performance of MapReduce: A Stochastic Approach"
Sarker Ahmed and Dmitri Loguinov
|
|
Short
|
BigD428
"VENU: Orchestrating SSDs in Hadoop Storage"
Krish K.R., M. Safdar Iqbal, and Ali Butt
|
|
Short
|
BigD438
"In-Memory I/O and Replication for HDFS with Memcached: Early
Experiences"
Nusrat Islam, Xiaoyi Lu, Md. Rahman, Raghunath Rajachandrasekar, and
Dhabaleswar Panda
|
|
Short
|
BigD448
"Scaling Up Prioritized Grammar Enumeration for Scientific
Discovery in the Cloud"
Tony Worm and Kenneth Chiu
|
|
Short
|
BigD469
"In-advance Data Analytics for Reducing Time to Discovery"
Jialin Liu, Yin Lu, and Yong Chen
|
|
Short
|
BigD475 "Enabling
Composite Applications through an Asynchronous Shared Memory
Interface"
Douglas Otstott, Noah Evans, Latchesar Ionkov, Ming Zhao, and Michael
Lang
|
|
S 3:
Graphs and networks
|
|
Short
|
BigD225 "Random
Walks on Adjacency Graphs for Mining Lexical Relations from Big Text
Data"
Shan Jiang and Chengxiang Zhai
|
|
Short
|
BigD284
"MMap: Fast Billion-Scale Graph Computation on a PC via Memory
Mapping"
Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng Chau, Ho Lee,
and U Kang
|
|
Short
|
BigD287
"Building k-nn graphs from large text data"
Thibault Debatty, Pietro Michiardi, Olivier Thonnard, and Wim Mees
|
|
Short
|
BigD331
"Empowering users of social networks to assess their privacy
risks;
Vladimir Estivill-Castro, Md Zahidul Islam, and Peter Hough
|
|
Short
|
BigD333
"Matching Approximate Patterns in Richly-Attributed Graphs"
Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau
|
|
Short
|
BigD346 "A
Unified Approach to Network Anomaly Detection"
Tara Babaie, Sanjay Chawla, and Sebastien Ardon
|
|
Short
|
BigD365 "Big
Data: Myths, Misconceptions and Opportunities"
Mark Lycett and Asmat Monaghan
|
|
S 4:
Cloud computing and systems (2)
|
|
Short
|
BigD230 "A Cross-job
Framework for MapReduce Scheduling"
Xuejie Xiao, Jian Tang, Zhenhua Chen, Jielong Xu, and Chonggang Wang
|
|
Short
|
BigD247
"Rainbow: A Distributed and Hierarchical RDF Triple Store with
Dynamic Scalability"
Rong Gu, Yihua Huang, and Wei Hu
|
|
Short
|
BigD259
"MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its
Application to SNOMED CT"
Guo-Qiang Zhang, Wei Zhu, Mengmeng Sun, Shiqiang Tao, Olivier
Bodenreider, and Licong Cui
|
|
Short
|
BigD264 "FlexDAS:
A Flexible Direct Attached Storage for I/O Intensive Applications"
Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi
Miyoshi, and Jun Tanaka
|
|
Short
|
BigD296
"Perldoop: Efficient Execution of Perl Scripts on Hadoop Clusters"
Jose M. Abuin, Juan C. Pichel, Tomas F. Pena, Pablo Gamallo, and Marcos
Garcia
|
|
Short
|
BigD315
"Evaluating the Performance and Scalability of the Ceph
Distributed Storage System"
Diana Gudu, Marcus Hardt, and Achim Streit
|
|
Short
|
BigD347 "Incremental
Window Aggregates over Array Database"
Jiang Li, Hideyuki Kawashima, and Osamu Tatebe
|
|
Short
|
BigD350
"Analyzing the Language of Food on Social Media"
Daniel Fried, Mihai Surdeanu, Stephen Kobourov, Melanie Hingle, and
Dane Bell
|
|
Short
|
BigD362 "BigCache for Big-data Systems"
Michel Roger, Yiqi Xu, and Ming Zhao
|
|
Short
|
BigD476
"k-balanced sorting and skew join in MPI and MapReduce"
Silu Huang and Ada Fu
|
|
S 5:
Applications
|
|
Short
|
BigD232 "Big Automotive
Data - Leveraging large volumes of data for knowledge-driven product
development"
Mathias Johanson, Stanislav Belenki, Jonas Jalminger, Magnus Fant, and
Mats Gjertz
|
|
Short
|
BigD233 "On
the Impact of Socio-economic Factors on Power Load Forecasting"
Yufei Han, Xiaolan Sha, Etta Grover-Silva, and Pietro Michiardi
|
|
Short
|
BigD239
"Toward Personalized and Scalable Voice-Enabled Services Powered
by Big Data"
JONG HOON AHNN
|
|
Short
|
BigD270 "A Two-Sided
Market Mechanism for Trading Big Data Computing Commodities"
Lena Mashayekhy, Mahyar Movahed Nejad, and Daniel Grosu
|
|
Short
|
BigD310
"Department of Energy Strategic Roadmap for Earth System Science
Data Integration"
Dean Williams, Giri Palanisamy, Galen Shipman, Thomas Boden, and Jimmy
Voyles
|
|
Short
|
BigD312
"Synthetic Data Generation for the Internet of Things"
Jason Anderson, Ken Kennedy, Linh Ngo, Andre Luckow, and Amy Apon
|
|
Short
|
BigD324 "Learning
to Predict Subject-Line Opens for Large-Scale Email Marketing"
Raju Balakrishnan and Rajesh Parech
|
|
Short
|
BigD366
"Using Geometric Structures to Improve the Error Correction
Algorithm of High-Throughput Sequencing Data on MapReduce Framework"
Wei-Chun Chung, Yu-Jung Chang, D. T. Lee, and Jan-Ming Ho
|
|
Short
|
BigD376
"Knowledge-based Clustering of Ship Trajectories Using
Density-based Approach"
Bo Liu, Erico N.de Souza, Stan Matwin, and Marcin Sydow
|
|
Short
|
BigD409
"Empowering Personalized Medicine with Big Data and Semantic Web
Technology: Promises, Challenges, Pitfalls, and Use Cases"
Maryam Panahiazar, Vahid Taslimi, Ashutosh Jadhav, Amit Sheth, and
Jyotishman Pathak
|
|
S 6:
Data mining and learning
|
|
Short
|
BigD238 "Entity
Resolution Using Inferred Relationships and Behavior"
Jonathan Mugan, Ranga Chari, Laura Hitt, Eric McDermid, Marsha Sowell,
Yuan Qu, and Thayne Coffman
|
|
Short
|
BigD291
"Dynamic Pre-training of Deep Recurrent Neural Networks for
Predicting Environmental Monitoring Data"
Bun Theang Ong, Komei Sugiura, and Koji Zettsu
|
|
Short
|
BigD293
"Scaling up M-estimation via sampling designs: the
Horvitz-Thompson stochastic gradient descent"
Stéphan Clémençon, Bertail Patrice, and Emilie Chautru
|
|
Short
|
BigD327
"Metadata Capital: Simulating the Predictive Value of
Self-Generated Heatlh Information (SGHI)"
Jane Greenberg, Adrian Ogletree, Angela Murillo, Thomas Caruso, and
Herbie Huang
|
|
Short
|
BigD339 "Bootstrapping K-means
for Big data analysis"
Jungkyu Han and Min Luo
|
|
Short
|
BigD343 "Representative Subsets
For Big Data Learning using k-NN graphs"
Raghvendra Mall, Vilen Jumutc, Rocco Langone, and Johan Suykens
|
|
Short
|
BigD356 "Towards Building and Evaluating
a Personalized Location-Based Recommender System"
Rubing Duan
|
|
Short
|
BigD361 "Distributed Adaptive
Importance Sampling on Graphical Models using MapReduce"
Ahsanul Haque, Swarup Chandra, Latifur Khan, and Charu Aggarwal
|
|
Short
|
BigD401 "PGMHD: A Scalable
Probabilistic Graphical Model for Massive Hierarchical Data
Problems"
Khalifeh Aljadda, Mohammed Korayem, Camilo Ortiz, Trey Grainger, John
Miller, and William York
|
|
Short
|
BigD410 "Distributed Class
Dependent Feature Analysis - A Big Data Approach"
Khoa Luu, Chenchen Zhu, and Marios Savvides
|
|
|
|
Workshop details could be found at here: Workshops
Workshop 1:
Scholarly Big Data: Challenges & Issues
|
07:30-08:00
|
Coffee
and cake
|
|
08:00-08:10
|
Opening
|
|
08:10-08:50
|
Guest talk from Microsoft Academic Search
|
|
08:50-09:15
|
Why Name Ambiguity Resolution Matters for Scholarly Big
Data Research by Jinseok Kim, Jana Diesner, Amirhossein Aleyasen (University
of Illinois at Urbana-Champaign), Heejun Kim (University of North
Carolina at Chapel Hill) and Hwan-Min Kim (Korea Institute of Science
and Technology Information).
|
|
09:15-09:40
|
The OceanLink Project by Tom Narock (Marymount
University), Adila Krisnadhi, Pascal Hitzler, Michelle Cheatham (Wright
State University), Robert Arko, Suzanne Carbotte (Columbia University)
and Timothy Finin (University of Maryland, Baltimore County).
|
|
09:40-10:05
|
Evolution of Scientific Collaboration Networks by Gaurav
Madaan (Thapar University) and Shivakumar Jolad (IIT Gandhinagar).
|
|
10:05-10:25
|
Coffee
and cake
|
|
10:25-10:50
|
Managing the Academic Data Lifecycle:
|
|
10:50-11:30
|
Guest talk from UK CORE project
|
|
11:30-12:10
|
Guest talk from Semantic Scholar from the AllenAI Labs
|
|
12:10-12:50
|
Guest talk from Dewey Murdick from IARPA
|
|
12:50-13:00
|
Closing
|
|
13:00-14:00
|
Lunch and
networking
|
Workshop 2: The 2nd Workshop on Scalable Machine
Learning: Theories and Applications
27-Oct-14
Venue: Waterford
|
08:30-08:35
|
Opening Remarks
|
|
08:35-09:50
|
Session I
|
|
08:35-09:20
|
Eric
Xing, Carnegie Mellon University
|
|
09:20-09:35
|
S2207: Towards Scalable Graph
Computation on Mobile Devices
Yiqi
Chen, Zhiyuan Lin, Robert Pienta, Minsuk Kahng, and Duen Horng Chau
|
|
09:35-09:50
|
S2210: An Improved Memory Management
Scheme for Large Scale Graph Computing Engine GraphChi
Yifang Jiang,
Kai Chen, Yi Zhou, Diao Zhang, Qu Zhou, and Jianhua He
|
|
09:50-10:00
|
BigD470: FS^3: A sampling based
method for top-k frequent subgraph mining
Tanay Kumar Saha and Mohammad Hasan
BigD446: Fast Algorithm for
Computing Weighted Projection Quantiles and Data Depth for
High-Dimensional Large Data
Ujjal
Mukherjee and Ansu Chatterjee
|
|
10:00-10:30
|
Break and Poster Session
|
|
10:30-12:00
|
Session II
|
|
10:30-11:15
|
Mikhail
Bilenko, Microsoft Research
|
|
11:15-11:30
|
BigD278: Boosting Stochastic Newton
Descent for Bigdata Mining and Classification
Roberto
D'Ambrosio, Wafa Belhajali, and Michel Barlaud
|
|
11:30-11:45
|
BigD309: Parameterized Multilayer
Perceptron for Fast Learning in Big Data
Chandra
B and Rajesh Kumar sharma
|
|
11:45-12:00
|
S2213: A Multi-View Two-level
Classification Method for Generalized Multi-instance Problems
Xiaoguang Wang, Xuan Liu, Stan
Matwin, Nathalie Nathalie Japkowicz, and hongyu guo
S2215: Applying Instance-weighted Support
Vector Machines to Class Imbalanced Datasets
Xiaoguang Wang, Xuan Liu, and Stan
Matwin
S2205: Computing Fuzzy Rough
Approximations in Large Scale Information Systems
Hasan Asfoor, Rajagopalan
Srinivasan, Gayathri Vasudevan, Nele Verbiest, Chris Cornelis, Matthew
Tolentino, Ankur Teredesai, and Martine De Cock
|
|
12:00-14:00
|
Lunch at your own and Poster Session
(13:30-14:00)
|
|
14:00-15:15
|
Session III
|
|
14:00-14:45
|
Duen
Horng(Polo) Chau, Georgia Tech
|
|
14:45-15:00
|
BigD208: Calculating Feature
Importance in Data Streams with Concept Drift using Online Random
Forest
Andrew
Cassidy and Frank Deviney
|
|
15:00-15:15
|
S2206: Feature Selection for Text
Clustering in Limited Memory Using Monte Carlo Wrapper
Vinay Deolalikar
S2208: WS^2F: A Weakly Supervised
Framework for Data Stream Filtering Cailing Dong and Arvind Agarwal
S2212: A Clustering Based Scalable
Hybrid Approach for Web Page Recommendation
Mohammad Sharif and Vijay Raghavan
BigD317: Multiresolution analysis of
incomplete rankings with applications to prediction
Eric
Sibony, Stéphan Clémençon, and Jérémie Jakubowicz
|
|
15:15-15:50
|
Break and Poster Session
|
|
15:50-17:20
|
Session IV
|
|
15:50-16:35
|
Ping
Li, Rutgers University
|
|
16:35-16:50
|
BigD467: Pairwise Topic Model via
Relation Extraction Xiaoli Song, Yue Shang, Yuan Ling, Mengwen Liu, and
Xiaohua Hu
|
|
16:50-17:00
|
S2205: Computing Fuzzy Rough
Approximations in Large Scale Information Systems
Hasan Asfoor, Rajagopalan Srinivasan,
Gayathri Vasudevan, Nele Verbiest, Chris Cornelis, Matthew Tolentino,
Ankur Teredesai, and Martine De Cock
S2201: Scalable Big Data Computing
for the Personalization of Machine Learned Models and its Application
to Automatic Speech Recognition Service
JONG
HOON AHNN
|
|
17:00-18:00
|
Free Discussion and Poster Session
|
Workshop 3: 1st International Workshop on High
Performance Big Graph Data Management, Analysis, and Mining
|
08:40 - 08:55
|
Opening remarks
|
|
09:00 - 10:00
|
Keynote talk
|
|
10:00 - 10:20
|
Coffee break
|
|
10:20 - 12:00
|
Session I
|
|
|
10:20
- 10:45 Christian L. Staudt, Henning Meyerhenke, and Yassine Marrakchi
S3201: Detecting Communities Around Seed Nodes in Complex Networks
10:45
- 11:10 William Eberle and Lawrence Holder
S3205: A Partitioning Approach to Scaling Anomaly Detection in Graph
Streams
11:10
- 11:35 Ghizlane Echbarthi and Hamamache Kheddochi
S3204: Fractional Greedy and Partial Restreaming Partitioning : New
Methods For Massive Graph Partitioning
11:35
- 12:00 Angen Zheng, Alexandros Labrinidis, and Panos Chrysanthis
S3207: Architecture-Aware Graph Repartitioning for Data-Intensive
Scientific Computing
|
|
12:00 - 13:30
|
Lunch (lunch on your own, please put up your poster)
|
|
13:30 - 15:10
|
Session II
|
|
|
13:30
- 13:55 Ichitaro Yamazaki, Theo Mary, Jakub Kurzak, Stanimire Tomov,
and Jack Dongarra
S3202: Access-averse Framework for Computing Low-rank Matrix
Approximations
13:55
- 14:10 S M Faisal, Srinivasan Parthasarathy, and P Sadayappan
S3214: Global Graphs: A Middleware for Large Graph Processing
14:10
- 14:35 David Mizell, Kristyn Maschhoff, and Steve Reinhardt
S3208: Extending SPARQL with graph functions
14:35
- 15:10 Josephine Namayanja and Vandana Janeja
BigD280: Change Detection in Temporally Evolving Computer Networks: A
Big Data Framework
|
|
15:10 - 15:45
|
Coffee break and
poster session
|
|
15:45 - 17:00
|
Session III
|
|
|
15:45
- 16:10 Naga Shailaja Dasari, Ranjan Desh, and Zubair M Park
S3212: An Efficient Algorithm of k-core Decomposition on Multicore
Processors
16:10
- 16:35 Amlan Chatterjee, Sridhar Radhakrishnan, and Chandra N.
Sekharan
S3209: Connecting the dots: Triangle completion and related problems on
large data sets using GPUs
16:35
- 17:00 Ronald Hagan, Charles Phillips, Kai Wang, Gary Rogers, and
Michael Langston
S3213: Toward an Efficient, Highly Scalable Maximum Clique Solver for
Massive Graphs
|
|
17:00 - 18:00
|
Free discussion and
poster session
|
Workshop 6: The Second Workshop on Distributed
Storage Systems and Coding for Big Data
Date: 30th, Oct.
2014
Venue: Diplomat, Hyatt
Regency Bethesda, Washington DC, USA
|
Time
|
Workshop Schedule
|
|
09:00-09:10
|
Plenary
|
|
09:10-09:40
|
Keynote:
A New Zigzag MDS Code with Optimal Encoding and Efficient Decoding
Prof. Hui Li
(Peking University, China)
|
|
09:40-10:00
|
Parity
Declustering for Fault-Tolerant Storage Systems via t-designs
Son Hoang Dau (Singapore University of
Technology and Design, Singapore),
Yan Jia, Chao Jin, Weiya Xi, and Kheong Sann
Chan (Data Storage Institute, Singapore)
|
|
10:00-10:20
|
Coffee
Break
|
|
10:20-10:40
|
A C
Library of Repair-Efficient Erasure Codes for Distributed Data Storage
Systems
Chao Tian (University of Tennessee at Knoxville,
United States)
|
|
10:40-11:00
|
STORE:
Data Recovery with Approximate Minimum Network Bandwidth and Disk I/O
in Distributed Storage Systems
Tai Zhou, Hui Li, Bing Zhu, Yumeng Zhang, Hanxu
Hou, and Jun Chen (Peking University Shenzhen Graduate School, China)
|
|
11:00-11:20
|
ReCT:
Improving MapReduce Performance under Failures with Resilient
Checkpointing Tactics
Hao Wang, Haopeng Chen, and Fei Hu (Shanghai
Jiao Tong University, China)
|
|
11:20-11:40
|
An
Efficient Scheme to Ensure Data Availability for a Cloud Service
Provider
Seungmin Kang (National University of Singapore,
Singapore), Bharadwaj Veeravalli, Khin Mi Mi Aung, and Chao Jin (Data
Storage Institute, Singapore)
|
Workshop 7: First IEEE International Workshop on
Big Data Security and Privacy (BDSP 2014)
|
8:00 - 8:05
|
Opening Remarks and Keynote
Introduction
|
|
8:05 - 9:05
|
Keynote: "Privacy in
Big Data: Thinking Outside the Anonymity/Confidentiality Box"
- Chris Clifton.
|
|
9:05 - 9:30
|
"Location Prediction Attacks
Using Tensor Factorization and Optimal Defenses"
- Takao Murakami
and Hajime Watanabe.
|
|
9:30 - 10:00
|
Coffee Break
|
|
10:00 - 10:25
|
"Secure Data Storage in
Distributed Cloud Environments".
- Renata Jordão,
Valério Martins, Fábio Buiati, Rafael Timóteo de Sousa Júnior, and
Flávio Elias de Deus.
|
|
10:25 - 10:50
|
"A Practical Security
Framework for Cloud Storage and Computation".
- Kavya Premkumar,
Aditya Suresh Kumar, and Saswati Mukherjee.
|
|
10:50 - 11:15
|
"Privacy-aware Filter-based
Feature Selection".
- Yasser Jafer,
Stan Matwin, and Marina Sokolova.
|
|
11:15 - 11:40
|
"A PLOTAM Model for Analyzing
Potential Relationships in Social Media and the Web in Big Data
Perspective".
- Huaping Zhang and
Yanping Zhao.
|
|
11:40 - 11:50
|
Closing Remarks
|
Workshop 8: The 2nd International Workshop of BigData
in Bioinformatics and Healthcare Informatics
Workshop 9: Solar Astronomy Big Data (SABiD) – 1st Workshop
on Management, Search and Mining of Massive Repositories of Solar
Astronomy Data
All
accepted SABID papers will be presented as Regular Papers, that is 25
minutes per paper (about 20 minutes for talk and 5 minutes for Q and
A)
|
13:30-13:55
|
Massive
Labeled Solar Image Data Benchmarks for Automated Feature Recognition
Michael Schuh & Rafal Angryk
|
|
13:55-14:20
|
A
computer vision approach to mining big solar data
Simon Felix & André Csillaghy
|
|
14:20-14:45
|
Scalable
Solar-Image Retrieval with Lucene
Juan Banda & Rafal Angryk
|
|
14:45-15:10
|
Stream
Processing for Solar Physics: Applications and Implications for Big
Solar Data
Karl Battams
|
|
15:10-15:40
|
Break
|
|
15:40-16:05
|
Iterative
Refinement of Multiple Targets Tracking of Solar Events
Dustin Kempton, Karthik Ganesan Pallai & Rafal Angryk
|
|
16:05-16:30
|
Spatiotemporal
Indexing Techniques for Efficiently Mining Spatiotemporal Co-occurrence
Patterns
Berkay Aydin, Dustin Kempton, Vijay Akkineni, Shaktidhar Gopavaram,
Karthik Ganesan Pillai & Rafal Angryk
|
|
16:30-16:55
|
Improved
data exploitation for DKIST and high-resolution solar observations
Kevin Reardon & Steve Berukoff
|
|
16:55-17:20
|
Closing
Discussion
|
Workshop 13: First Hands-On Workshop on Leveraging
High Performance Computing Resources for Managing Large Datasets
|
8:30 AM - 12:10 PM
|
Morning Session
|
|
8:30 AM - 8:45 AM
|
Opening Remarks
|
|
8:45 AM
- 9:15 AM
|
Computing and Data Management at the Joint Genome Institute, by
Kirsten Fagnan (NERSC)
|
|
9:15 AM
- 9:45 AM
|
Unlocking
the Power of High Performance Computing for Big Humanities Data Curation,
by Jessica Trelogan (Institute of Classical Archaeology, UT Austin)
|
|
9:45 AM
- 10:15 AM
|
IDC
Update on How Big Data Is Redefining High Performance Computing, by
Earl Joseph (IDC)
|
|
10:15 AM - 10:30 AM
|
Break
|
|
10:30 AM
- 11 AM
|
Mitigating
Big Data management Challenges Using HPC, by Ritu Arora (TACC)
|
|
11 AM -
12:10 PM
|
Introduction
to Linux & TACC resources, hands-on session, by Ritu Arora (TACC)
|
|
12:10 PM - 1:30 PM
|
Lunch
Workshop participants who are not supported by the
NSF travel award are on their own for lunch.
Networking and mentoring activities for students
sponsored by the NSF travel grant, coordinated by Elizabeth Bautista
(NERSC) and Valerie Shilling (TACC)
|
|
1:30 PM - 6:00 PM
|
Afternoon Session
|
|
1:30 PM - 1:40 PM
|
Introduction to the test-case to be used for
further exercises, by Ritu Arora (TACC):
|
|
1:40 PM - 2:50 PM
|
Hands-on exercises on data transfer, calculating
checksum, and metadata extraction by Ritu Arora (TACC)
|
|
2:50 PM - 3:20 PM
|
Design and development of data management
workflows on HPC resources, Ritu Arora (TACC)
|
|
3:20 PM -
3:35 PM
|
Break
|
|
3:35 PM - 4:00 PM
|
Improv Session, by Raquell Holmes
(Improvscience)
|
|
4:00 PM - 6:00 PM
|
Hackathon Sessions, coordinated by TACC and
NERSC staff
|
Workshop 15: Workshop on Advances in Software and
Hardware for Big Data to Knowledge Discovery (ASH)
|
8:00-8:45
|
opening/
Invited talk from Texas Advanced Computing Center
|
|
8:45
-10:00
|
session 1, Methods
and Designs for Big Data System
|
|
|
Guangchen Ruan, Hui Zhang, and Beth Plale, Parallel
and Quantitative Sequential PatternMining for Large-scale
Interval-based Temporal Data
Yongen Yu, Hongbo Zou, Wei Tang, and Liwei
Liu, A CCG Virtual System for Big Data Application
Communication Costs Analysis
Wuheng Luo, Bo Liu, and Allie Watfa, An
Open Schema for XML Data in Hive
Greg Sand, Leonidas Tsitouras, George
Dimitrakopoulos, and Vassillis Chatzigiannakis, A Big Data Aggregation,
Analysis and Exploitation Integrated Platform for Increasing Social
Management Intelligence
Julian Krumeich, Dirk Werth, Jens Schimmelpfennig,
Sven Jacobi, and Peter Loos, Advanced Planning and Control of
Manufacturing Processes in Steel Industry through Big Data Analytics:
Case Study and Architecture Proposal
|
|
10:00-10:20
|
coffee
break
|
|
10:20
-11: 00
|
Invited
talk from Oracle Research Team
|
|
11:00-12:00
|
Session 2, Practical Big Data Use Cases
|
|
|
Jian Zou and Hui Zhang, High-Frequency
Financial Statistics with Parallel R and Intel Xeon Phi Coprocessor
Sabrina Azzi, Cindy Dallaire, Abdenour Bouzouane,
Bruno Bouchard, and Sylvain Giroux, Human activity recognition
in big data smart home context
Jennifer Shin, Investigating the Accuracy
of the openFDA API using the FDA Adverse Event Reporting System (FAERS)
HILARY CHENG, Yi Chuan Lu, and Chih-Cheng
Hsu, A Visualized Data Analysis for Bogus Business Entities
Detection
|
Workshop 16:
IEEE Big Data Workshop on Semantics for Big Data on the Internet
of Things (SemBIoT 2014)
|
1:30 - 1:35
|
Opening Remarks and
Keynote Introduction
|
|
1.35 - 2:30
|
Keynote Talk: "Big Data and
Semantic Web Meet Applied Ontology "
-
Ram Sriram - (NIST)
|
|
2:30 – 3:00
|
Handling smart
environment devices, data and services at the semantic level with the
FI-WARE core platform" (Regular paper)
-
Fano Ramparany, Fermin Galan Marquez, Javier Soriano, and Tarek
Elsaleh.
|
|
3:00 – 3.25
|
Situation Aware Computing
for Big Data (Short paper)
-
Eric Chan, Dieter Gawlick, Adel Ghoneimy, and Zhen Liu
|
|
3.25 – 3.45
|
Coffee Break
|
|
3.45 – 4.40
|
Invited Talk: Transforming
Outcomes in Vertical Industries in the IoE era: From Seeing to
Foreseeing
-
Rajesh Vargheese – (CISCO)
|
|
4.40 – 5.10
|
Topic-Specific Post
Identification in Microblog Streams (Regular Paper)
-
Shanika Karunasekera, Aaron Harwood, Sameendra Samarawickrama, Kotagiri
Ramamohanarao, and Garry Robins
|
|
5.10 – 5.40
|
An IoT/IoE Enabled
Architecture Framework for Precision On Shelf Availability: Enhancing
Proactive Shopper Experience (Regular Paper)
-
Rajesh Vargheese and Hazim Dahir
|
Workshop 17: The First IEEE Workshop on Big Data in
Computational Epidemiology
Date:
Oct. 27th, 2014
Time:
08:00 - 12:00
Venue:
Severn
|
8:00-10:00
|
Session
I: Invited Talks (Session Chair: Sandeep Gupta)
|
|
8:10-8:45
|
Computational Biosecurity via Modeling of
Immunophenotypic Heterogeneity
Saumyadipta Pyne (University of Hyderabad Campus,
India)
|
|
8:45-9:20
|
Time-Critical Ebola Modeling and Response: A
Big-Data Informatics Approach
Keith Bisset (Virginia Tech, USA)
|
|
9:20-9:55
|
Data Driven Methods for Disease Forecasting
Prithwish Chakraborty (Virginia Tech, USA)
|
|
10:00-10:20
|
Coffee
Break
|
|
10:20-11:50
|
Session II: Paper
Presentations (Session Chair: Jiangzhuo Chen)
|
|
|
Learning Machines for Computational Epidemiology
Magnus Boman and Daniel Gillblad
Big data problems on discovering and analyzing causal
relationships in epidemiological data
Yiheng Liang and Armin Mikler
Epidemiological Modeling of Bovine Brucellosis in India
Gloria Kang, L Gunaseelan, and Kaja Abbas
Spatial Big Data Analytics of Influenza Epidemic in Vellore,
India
Daphne Lopez, M. Gunasekaran, B. Senthil Murugan, Harpreet Kaur,
and Kaja Abbas
|
|
11:50-12:00
|
Closing Remarks and
Discussions
|
Workshop 18: Large Scale Data Analytics in
Transportation and Railway Infrastructure
|
TIME
|
TOPIC/PAPER
|
SPEAKERS
|
|
8:00- 8:10
|
INTRODUCTION
|
|
8:10-8:40
|
Keynote Address
|
Nii Attoh-Okine
|
|
8:40-
9:00
|
Predicting flight arrival times with a
multistage model
|
Gabor Takacs
|
|
9:00-
9:20
|
Efficient Traffic Speed Forecasting Based on
Massive Heterogeneous Historical Data
|
Xing-Yu Chen, Hsing-Kuo Pao, and Yuh-Jye Lee
|
|
9:20-
9:40
|
Multi-Objective Optimization for Resilient
Airline Networks Using Socioeconomic-Environmental Data
|
Hidefumi Sawai and Aki-Hiro Sato
|
|
9:40-
10:00
|
A Dynamic Programming Approach for 4D Flight
Route Optimization”
|
Christian Kiss-Tóth and Gabor Takacs
|
|
10:00-10:20
|
COFFEE
BREAK
|
|
10:20-10:40
|
Evaluating Structural Engineering Finite Element
Analysis Data Using Multiway Analysis
|
Matija Radovic and Jennifer McConnell
|
|
10:40-1:00
|
Multiway Analysis of Bridge Structural Types in
the National Bridge Inventory (NBI)
|
Offei Adarkwa, Thomas Schumacher, and Nii
Attoh-Okine
|
|
11:00-11:20
|
Topological Models of Document-Query Sets in
Retrieval for Enterprise Information Management
|
Vinay Deolalikar
|
|
11:20-11:40
|
Metaheuristics in Big Data: An Approach to
Railway Engineering
|
Silvia Galvan Nunez and Nii Attoh-Okine
|
|
11:40-12:00
|
Facilitating Maintenance Decisions on the Dutch
Railways Using Big Data: The ABA Case Study
|
Alfredo Núñez, Jurjen Hendriks, Zili Li, Bart De
Schutter, and Rolf Dollevoet
|
|
12:00-12:40
|
LUNCH
|
|
12:40-13:00
|
Applications of Linked Data in the Rail Domain
|
Christopher Morris, John Easton, and Clive
Roberts
|
|
13:00 – 13:20
|
Ontology-driven Data Integration for Railway
Asset Monitoring Applications
|
Jonathan Tutcher
|
|
13:20 – 13:40
|
Some Examples of Big Data in Railroad
Engineering
|
Allan Zarembski
|
|
13:40 – 14:00
|
Spatial Data Analysis of Complex Urban System
|
Farideddin Peiravian, Amirhassan Kermanshah,
Sybil Derrible, and Clio Andris
|
|
14:00 – 14:20
|
Big Data Challenges in Railway Engineering
|
Nii Attoh-Okine
|
|
14:20 –
14:40
|
CLOSING
REMARKS
|
Workshop 19: 2nd Workshop on Scalable Cloud Data
Management
|
8:00-8:50
|
Invited
Talk by Scharam Dustdar
|
|
8:50-10:05
|
Session I
|
|
|
The Best of Two Worlds: Integrating IBM InfoSphere
Streams with Apache YARN by Zubair Nabi, Rohit Wagle, and Eric Bouillet
Taking an Electronic Ticketing System to the
Cloud: Design and Discussion by Filipe Araujo, Marilia Curado, Pedro
Furtado, and Raul Barbosa A Relational Database Schema on the
Transactional Key-Value Store Scalaris by Nico Kruber, Florian
Schintke, and Michael Berlin
|
|
10:05-10:20
|
Coffee
Break
|
|
10:20-12:00
|
Session
II
|
|
|
A Contention Aware Hybrid Evaluator for Schedulers
of Big Data Applications in Computer Clusters by Shouvik Bardhan and
Daniel Menasce,
RuleMR: Classification Rule Discovery with
MapReduce by Vasilis Kolias, Constantinos Kolias, Ioannis
Anagnostopoulos, and Eleftherios Kayafas
Temporal Bipartite Projection and Link Prediction
for Online Social Networks by Tsunghan Wu, Sheau-Harn Yu, Wanjiun Liao,
and Cheng-Shang Chang
Community Structure Analysis in Big Climate Data
by Michael McGuire and Nam Nguyen
|
Workshop 20: Big Humanities Data
Chairs:
Mark Hedges, Tobias Blanke, Richard Marciano
- Brett
Bobley: NEH, Director of the Office of Digital Humanities,
USA
- Bob
Horton: IMLS,
Associate Deputy Director for Library Services, Discretionary
Programs, USA
- Crystal
Sissons: SSHRC,
Senior Program Officer at Research Grants and Partnerships Division,
CANADA
- Christie
Walker: AHRC,
Strategy & Development Manager, UK
|
9:00-9:15
|
Welcome
|
|
9:15-10:00
|
Keynote
(30 min + 15 min discussion)
Opportunities
from Big Humanities Data for Holocaust Research and Education
Michael LEVY Director of Digital
Collections
Michael HALEY GOLDMAN Director of Global Classroom and Evaluation
United States Holocaust Memorial Museum
Overview: The United
States Holocaust Memorial Museum continues to collect and digitize the
material evidence of the best documented crime of the 20th century –
building the Collection of Record for the Holocaust. But as more and
more of this collection becomes digital – and as we create greater
quantities of historical data about this history – we have the
opportunity to go beyond traditional approaches to scholarship and
education. How can the new techniques and tools being developed for big
data change how we explore what happened during the Holocaust? And how
does digital history offer new contexts for learning about the
Holocaust? In this brief presentation, we will describe the current
state of Museum collections, project how this collection will grow in
the next decade, and raise questions about what the future of Holocaust
research and education can be.
|
|
10:00-10:20
|
Coffee Break
|
|
10:20-12:10
|
Morning Session (1 hour 50 min)
|
|
|
THEMES: complexity / scale /
historical analysis
- Scaling
Historical Text Re-use (Marco BÜCHLER, Emily
Franzini, Greta Franzini, and Maria Moritz)
- The Infra-City:
The Exceptional and the Everyday in Social Media (Lev
Manovich, Alise TIFENTALE, Mehrdad Yazdani, and Jay Chow)
- Revolutionary
Entities: Turning Data into Knowledge to Drive Personalized
Exploration of The Irish Rising of 1916 (Owen
CONLAN, Alexander O’Connor, Órla Ní Loinsigh, Gary Munnelly,
Séamus Lawless, and Rachel Murphy)
|
|
|
11:05-11:15 Break
|
|
|
THEMES:
news / film
- On the Coverage
of Science in the Media: A Big Data Study on the Impact of the
Fukushima Disaster (Thomas LANSDALL-WELFARE, Saatviga
Sudhahar, Guiseppe Veltri, and Nello Cristianini)
- The DEEP FILM
Access Project: Ontology and Metadata Design for Digital Film
Production Assets (Sarah ATKINSON, Roger EVANS, and
Jos Lehmann)
THEMES:
frameworks / infrastructure
- Probabilistic
Estimates of Attribute Statistics and Match Likelihood for People
Entity Resolution (Xin WANG, Ang Sun, Hakan Kardes,
Siddharth Agrawal, Lin Chen, and Andrew Borthwick)
- BigExcel: A
Web-Based Framework for Exploring Big Data in Social Sciences (Muhammed
Asif Saleem, Blesson VARGHESE, and Adam Barker)
|
|
12:10-1:10
|
Lunch (not provided)
|
|
1:10-3:15
|
Afternoon Session
|
|
|
THEMES:
geospatial / mobile
- Dealing with
Heterogeneous Big Data When Geoparsing Historical Corpora (C.J.
Rupp, Paul RAYSON, Ian Gregory, Andrew Hardie, Amelia
Joulain, and Daniel Hartmann)
- Mining Mobile
Youth Cultures (Tobias BLANKE, Giles Greenway,
Jennifer Pybus, and Mark Cote)
Projects from ‘Digging into Data’ program
(7 total):
- Mining
Microdata: Economic Opportunity and Spatial Mobility in Britain
and the United States, 1850-1881 (Peter Baskerville, Lisa Dillon, Kris
Inwood, Evan ROBERTS, Steven Ruggles, and John Robert
Warren)
- Understanding
the Role of Medical Experts during a Public Health Crisis: Digital
Tools and Library Resources for Research on the 1918 Spanish
Influenza
(E. Thomas EWING, Samah Gad, Naren Ramakrishnan, and
Jeffrey S. Reznick)
|
|
|
2:00-2:15 Break
|
|
|
- Scaled Entity
Search: A Method for Media Historiography and Response to
Critiques of Big Humanities Data Research (Eric
HOYT, Kit Hughes, Derek Long, Kevin Ponto, and Anthony Tran)
- A Computational
Pipeline for Crowdsourced Transcriptions of Ancient Greek Papyrus
Fragments (Alex
Williams, John Wallin, Haoyu Yu, Marco Perale, Hyrum Carroll,
Anne-Francoise Lamblin, Lucy Fortson, Dirk Obbink, Chris Lintott,
and James BRUSUELAS)
- Scientific
Findings as Big Data for Research Synthesis: The metaBUS Project (Frank
BOSCO, Krista Uggerslev, and Piers Steel)
- Metadata
Infrastructure for the Analysis of Parliamentary Proceedings (Richard
GARTNER)
- Integrating Data
Mining and Data Management Technologies for Scholarly Inquiry (Ray
Larson, Paul Watry, Richard MARCIANO, John Harrison,
Chien-Yi Hou, Luis Aguilar, Shreyas, and Jerome Fuselier)
|
|
3:15-3:45
|
Coffee Break
|
|
3:45-4:45
|
Funders Panel and discussion on the future of
big data in the humanities
|
Workshop 21: Complexity for Big Data
|
Erin-Elizabeth Durham, Andrew Rosen, and Robert
Harrison, "A Model Architecture for Big Data applications using
Relational Databases"
Ruoqian Liu, Ankit Agrawal, Wei-Keng Liao, and
Alok Choudhary, "Search Space Preprocessing in Solving Complex
Optimization Problems"
Walid Shalaby, Wlodek Zadrozny, and Sean
Gallagher, "Knowledge Based Dimensionality Reduction for Technical
Text Mining"
Xiaoguang Wang, Xuan Liu, and Stan Matwin, "A
Distributed Instance-weighted SVM Algorithm on Large-scale Imbalanced
Datasets"
|
|
Coffee Break
|
|
Philippe Calvez and Eddie Soulier,
"Sustainable Assemblage for Energy (SAE) inside Intelligent Urban
Areas How massive heterogeneous data could help to reduce energy
footprints and promote sustainable practices and an ecological
transition"
Xuan Liu, Xiaoguang Wang, Bo Liu, and Stan Matwin,
"Vessel Route Anomaly Detection with Hadoop MapReduce"
Jiazhen Nian, Shan Jiang, and Yan Zhang,
"HBGSim: A Structural Similarity Measurement over Heterogeneous
Big Graph"
Eric L. Goodman, Edward Jimenez, Cliff Joslyn,
David Haglin, Sinan al-Saffar, and Dirk Grunwald, "Optimizing
Graph Queries with Graph Joins and Sprinkle SPARQL"
|
|
Poster ID
|
Poster Title
|
|
|
|
P201
|
Biva Shrestha, Ranjeet Devarakonda, and Giriprakash
Palanisamy,An open source framework to add spatial extent and geospatial
visibility to Big Data
|
|
|
P202
|
Tugdual Sarazin, Mustapha Lebbah, and Hanane
Azzag, Biclustering using Spark-MapReduce
|
|
|
P204
|
Joshua Westgard, The Bot Will Serve You Now:
Automating Access to Archival Materials
|
|
|
P205
|
Jose Teixeira, Developing a Cloud Computing
Platform for Big Data: The OpenStack Nova case
|
|
|
P206
|
Bing Zhu, Hui Li, and Kenneth Shum, Repair
Efficient Storage Codes via Combinatorial Configurations
|
|
|
P209
|
Nick Manfredi, Darakhshan J. Mir, Shannon Lu, and
Dominick Sanchez, Differentially Private Models of Tollgate
Usage in Metropolitan Areas: The Milan Tollgate Data Set
|
|
|
P210
|
Ranjeet Devarakonda, Biva Shrestha, and Giriprakash
Palanisamy,OME: A tool for generating and managing metadata to
handle BigData
|
|
|
P212
|
Syunya Okuno, Hiroki Asai, and Hayato Yamana, A
Challenge of Authorship Identification for Ten-thousand-scale Microblog
Users
|
|
|
P214
|
Niall Gaffney, Christopher Jordan, Tommy Minyard, and
Dan Stanzione, Building Wrangler: A Transformational Data
Intensive Resource for the Open Science Community
|
|
|
P215
|
Akira Kinoshita, Atsuhiro Takasu, and Jun Adachi, Real-Time
Traffic Incident Detection Using Probe-Car Data on the Tokyo
Metropolitan Expressway
|
|
|
P216
|
Fan Jiang, Michael Shoffner, Claris Castillo, and
Charles Schmitt,Enabling Genomic Analysis on Federated Clouds
|
|
|
P217
|
Anirudh Kadadi, Rajeev Agrawal, Christopher Nyamful, and
Rahman Atiq, Challenges of Data Integration and
Interoperability in Big Data
|
|
|
P218
|
Robert Warren and Bo Liu, Language, Cultural
Influences and Intelligence in Historical Gazetteers of the Great War
|
|
|
P219
|
Vandana P. Janeja, Ali Azari, Josephine Namayanja, and
Brian Heilig, B-dIDS: Mining Anomalies in a Big-distributed
Intrusion Detection System
|
|
|
P220
|
Zubair Shah and Abdun Mahmood, A Summarization
Paradigm for Big Data
|
|
|
P221
|
Jin Soung Yoo and Douglas Boulware, Incremental
and Parallel Spatial Association Mining
|
|
|
P222
|
Julien Amelot, Peter Bajcsy, Anne Plant, and Mary
Brady, Machine Learning and Interactive Visualization applied
to TB-sized Images of Stem Cells
|
|
|
P223
|
Gaël Chareyron, Jerome Da-Rugna, and Thomas
Raimbault, Big Data: a new challenge for tourism
|
|
|
P224
|
Thomas Raimbault, Gaël Chareyron, and Corinne
Krzyzanowski-Guillot, Cognitive Map of Tourist Behavior based
on Tripadvisor
|
|
|
P225
|
Thomas Hassan, Rafael Peixoto, Christophe Cruz, Aurélie
Bertaux, and Nuno Silva, Semantic HMC for Big Data Analysis
|
|
|
P226
|
IOANNIS MYTILINIS, Ioannis Giannakopoulos, IOANNIS
KONSTANTINOU, KATERINA DOKA, and NECTARIOS KOZIRIS,MoDisSENSE: A
Distributed Platform for Social Networking Services over Mobile Devices
|
|
|
P227
|
Madian Khabsa, Pucktada Treeratpituk, and C Lee
Giles, Large Scale Author Name Disambiguation in Digital
Libraries
|
|
|
P229
|
Roberto Espinosa, Larisa Garriga, Jose Jacobo Zubcoff,
and Jose-Norberto Mazon, Linked Open Data Mining for
Democratization of Big Data
|
|
|
P230
|
Saima Aman, Charalampos Chelmis, and Viktor Prasanna,Addressing
Data Veracity in Big Data Applications
|
|
|
P232
|
Ashiq Imran, Rajeev Agrawal, Jessie Walker, and Anthony
Gomes, A Layer Based Architecture for Provenance in Big Data
|
|
|
P235
|
Erin-Elizabeth Durham, Chinua Umoja, J.T. Torrance,
Andrew Rosen, and Robert Harrison, A Novel Approach to Determine
Docking Locations Using Fuzzy Logic and Shape Determination
|
|
|
P236
|
Haozhen Zhao, Sharding for Literature Search via
Cutting Citation Graphs
|
|
|
P237
|
Andy Doyle, Graham Katz, Kristen Summers, Chris Ackermann,
Ilya Zavorin, Zunsik Lim, Sathappan Muthiah, Liang Zhao, Chang-Tien Lu,
Patrick Butler, Rupinder Paul Khandpur, Youssef Fayed, and Naren
Ramakrishnan, The EMBERS Architecture for Streaming Predictive
Analytics
|
|
|
P239
|
Ioannis Giannakopoulos, CELAR: Automated
Application Elasticity Platform
|
|
|
Specials
Session I: “From Data to
Insight: Big Data and Analytics for Smart Manufacturing Systems”
|
u Session Agenda (Tentative)
Session Chair: Sudarsan
Rachuri and Ronay AK
Venue: JUDICIARY SUITE
Time: 8:30-17:00
|
|
Time
|
Title
|
|
Opening and Welcoming Speech
|
08:30 – 09:00
(30 min)
|
Dr. Sudarsan Rachuri, NIST
|
|
Invited Speaker
|
09:00-9:40
|
Dr. Athulan Vijayaraghavan, CTO, System
Insights
Title: The Internet of Manufacturing Things
|
|
09:40-10:00
|
Q&A
|
|
|
10:00-10:20
|
Coffee Break
|
|
Keynote Lectures
|
10:20 – 10:40
(20 min)
|
Dr. Ashit Talukder, NIST
Title: Data–Driven Smart Manufacturing:
Challenges and Solutions
|
|
10:40
– 11:00
(20 min)
|
Matteo Bellucci, GE Global Research
Title: Brilliant Factory at GE
|
|
11:00 – 11:20
(20 min)
|
Juergen Heit, Robert Bosch North America
|
|
Panel
|
11:
20–12:30
(70 min)
|
Panel Title: Current Issues, Challenges and
Opportunities in Deploying Big Data Analytics for Smart Manufacturing
Systems.
Panelists:
Dr.
Ashit Talukder, NIST
Prof.
Kincho H. Law, Stanford University
Prof.
Sankaran Mahadevan, Vanderbilt University
Dr.
Athulan Vijayaraghavan, CTO, System Insights
Matteo
Bellucci, GE Global Research
Juergen Heit, Robert Bosch North America
|
|
|
12:30 – 14:00
|
Lunch
|
|
Paper Presentations
|
14:00-14:20
|
Paper #1 “CloudMan:
A Platform for Portable Cloud Manufacturing Services” by Soheil
Qanbari, Samira Mahdi Zadeh, Soroush Vedaie, and Schahram Dustdar, TUW,
Austria.
|
|
14:20 – 14:40
|
Paper #2 “An Intelligent Machine Monitoring System Using Gaussian Process
Regression for Energy Prediction” by Raunak Bhinge, Jinkyoo Park,
Nishant Biswas, Moneer Helu, David Dornfeld, Kincho Law, and Sudarsan
Rachuri, Berkeley, USA
|
|
14:40 – 15:00
|
Paper #3 “Building
a Rigorous Foundation for Performance Assurance Assessment Techniques
for ‘Smart’ Manufacturing Systems” by Utpal Roy, Yunpeng Li, and
Bicheng Zhu, Syracuse University, USA
|
|
15:00 – 15:20
|
Paper #4 “Towards a Domain-Specific
Framework for Predictive Analytics in Manufacturing” by David
Lechevalier, Anantha Narayanan, and Sudarsan Rachuri, NIST, USA
|
|
|
15:20 – 15:40
|
Coffee Break
|
|
Paper Presentation
|
15:40-16:00
|
Paper #5 “Uncertainty Quantification in Performance Evaluation of
Manufacturing Processes” by Saideep Nannapaneni and Sankaran
Mahadevan, Vanderbilt University, USA
|
|
16:00 – 16:20
|
Paper #6 “A System Architecture for Manufacturing
Process Analysis based on Big Data and Process Mining Techniques” by
Hanna Yang, Minseok Song, and Seongjoo Kim, UNIST, South Korea
|
|
16:20 – 16:40
|
Paper #7 “Toward Smart Manufacturing: Monitoring, Analysis, Planning and Execution
using Decision Guidance Analytics” by Alexander Brodsky, Mohan
Krishnamoorthy, Daniel Menasce, Guodong Shao, and Sudarsan Rachuri,
George Mason University, USA
|
|
Poster Session
|
16:40 – 17:00
|
“Smart
Manufacturing Systems Design And Analysis (SMSDA) − Big Data Analytics
In Manufacturing” by Ronay AK, Sudarsan Rachuri and Seung-Jun Shin,
NIST, USA
|
|
Specials Session II on Big Data Representation
and Processing in Data Science
|
u Session Agenda (Tentative)
Session Chair: T.Y. Lin,
Venue: Susquehanna
Time:
13:30-18:30
|
Session
Opening
-
Featured
Talk #1
|
Stochastic Finite
Automata for the Translation of DNA to Protein
Speaker: Tsau-Young
Lin and Asmi H. Shah
|
|
BDRP206
Researching Persons & Organizations AWAKE: From Text to
an Entity-Centric Knowledge Base
Elizabeth Boschee, Marjorie
Freedman, Saurabh Khanwalker, Anoop Kumar, Amit Srivastava, and Ralph
Weischedel
BDRP210
Integrating Existing Large Scale Medical Laboratory Data
Into the Semantic Web Framework
Newres Al Haider, Samina Abidi,
William van Woensel, and Syed SR Abidi
|
|
- Featured Talk #2
|
Data mining and sharing tool for high content screening large
scale biological image data
Speaker: Asmi
H. Shah
|
|
BDRP213
Path Knowledge Discovery: Association Mining Based on
Multi-category Lexicons
Chen Liu, Wesley W. Chu, Fred
Sabb, D. Stott Parker, and Joseph Korpela
BDRP208
Extracting Discriminative Features in Multivariate Data
from Heterogeneous Sensors
Om Patri, Abhishek Sharma,
Haifeng Chen, Guofei Jiang, Anand Panangadan, and Viktor Prasanna
BDRP204
Statistical Technique for Online Anomaly Detection Using
Spark Over Heterogeneous Data from Multi-source VMware Performance Data
Mohiuddin Solaimani, Mohammed
Iftekhar, Latifur Khan, and Bhavani Thuraisingham
BDRP203
A Building Performance Evaluation & Visualization
System
Georgios Stavropoulos, Stelios
Krinidis, Dimosthenis Ioannidis, Konstantinos Moustakas, and Dimitris
Tzovaras
|
|
Discussions for future
Planes
|
|
Adjourn &
Refreshments
|
u Session
Agenda (Tentative)
Session Chair: Jingrui He (Arizona State University)
Venue: Patuxent
Time: 8:25-15:00
|
8:25 - 8:30
|
Opening remarks
|
|
8:30 - 9:00
|
Presentation
by Lena
Mashayekhy (Wayne State University)
|
|
9:00 - 9:15
|
Comments
from mentor Steven
Y. Ko (the University at Buffalo, the State
University of New York)
|
|
9:15 - 9:45
|
Presentation
by Jialin Liu
(Texas Tech University)
|
|
9:45 - 10:00
|
Comments
from mentor Sastry
S Duri (IBM Research)
|
|
10:00 - 10:30
|
Coffee break
|
|
10:30 - 11:00
|
Presentation
by Debdipto Misra
(George Mason University)
|
|
11:00 - 11:15
|
Comments
from mentor Duen
Horng (Polo) Chau (Georgia Tech)
|
|
11:15 - 11:45
|
Presentation
by Hiba Baround
(University of Oklahoma)
|
|
11:45 - 12:00
|
Comments
from Chih-Jen Lin
(National Taiwan University)
|
|
12:00 - 13:00
|
Lunch on your own
|
|
13:00 - 14:00
|
Individual discussions
between students and
mentors
|
|
14:00 - 15:00
|
Open discussion (attended by all)
|
|
15:00
|
Adjourn
|
|
TUTORIAL 1: Big Data Stream
Mining
Presenters: Gianmarco
De Francisci Morales, Joao Gama, Albert Bifet, andWei Fan
Summary:
The
challenge of deriving insights from big data has been recognized as one
of the most exciting and key opportunities for both academia and
industry. Advanced analysis of big data streams is bound to become a
key area of data mining research as the number of applications
requiring such processing increases. Dealing with the evolution over
time of such data streams, i.e., with concepts that drift or change
completely, is one of the core issues in stream mining. This tutorial
is a gentle introduction to mining big data streams. The first part
introduces data stream learners for classification, regression,
clustering, and frequent pattern mining. The second part discusses data
stream mining on distributed engines such as Storm, S4, and Samza.
Content:
Fundamentals and Stream Mining
Algorithms
– Stream mining setting
– Concept drift
– Classification
and Regression
– Clustering
– Frequent Pattern mining
Distributed Big Data Stream Mining
– Distributed Stream Processing Engines
– Classification
– Regression
Short
Bio.
Gianmarco De Francisci Morales 's
Profile
Gianmarco
De Francisci Morales is a Research Scientist
at Yahoo Labs Barcelona. He received his Ph.D. in Computer Science and
Engineering from the IMT Institute for Advanced Studies of Lucca in
2012. His research focuses on large scale data mining and big data,
with a particular emphasis on web mining and Data Intensive Scalable
Computing systems. He is an active member of the open source community
of the Apache Software Foundation working on the Hadoop ecosystem, and
a committer for the Apache Pig project. He is the co-leader of the
SAMOA project, an open-source platform for mining big data
streams.
Joao
Gama's Profile
Joao
Gama is a Researcher at LIAAD, University of
Porto, working at the Machine Learning group. His main research
interest is in Learning from Data Streams. He published more than 80
articles. He served as Co-chair of ECML 2005, DS09, ADMA09 and a series
ofWorkshops on KDDS and Knowledge Discovery from Sensor Data with ACM
SIGKDD. He is serving as Co-Chair of next ECM-PKDD 2015. He is author
of a recent book on Knowledge Discovery from Data Streams.
Albert
Bifet's Profile
Albert
Bifet is a Research Scientist at Huawei. He is
the author of a book on Adaptive Stream Mining and Pattern Learning and
Mining from Evolving Data Streams. He is one of the leaders of MOA and
SAMOA software environments for implementing algorithms and running
experiments for online learning from evolving data streams.
Wei
Fan's Profile
Wei Fan is
the associate director of Huawei Noah’s Ark Lab. He received his PhD in
Computer Science from Columbia University in 2001. His main research
interests and experiences are in various areas of data mining and
database systems, such as, stream computing, high performance
computing, extremely skewed distribution, cost-sensitive learning, risk
analysis, ensemble methods, easy-touse nonparametric methods, graph
mining, predictive feature discovery, feature selection, sample
selection bias, transfer learning, time series analysis,
bioinformatics, social network analysis, novel applications and
commercial data mining systems. His co-authored paper received
ICDM’2006 Best Application Paper Award, he led
the team that used his Random Decision Tree method to win 2008 ICDM
Data Mining Cup Championship. He received 2010 IBM Outstanding
Technical Achievement Award for his contribution to IBM Infosphere
Streams. He is the associate editor of ACM Transaction on Knowledge
Discovery and Data Mining (TKDD). Since he joined Huawei in August
2012, he has led his colleagues to develop Huawei StreamSMART – a
streaming platform for online and real-time processing, query and
mining of very fast streaming data. In addition, he also led his
colleagues to develop a real-time processing and analysis platform of
Mobile Broad Band (MBB) data.
TUTORIAL 2: Big ML Software for Modern ML Algorithms
Presenters:
Qirong Ho and Eric P. Xing
Summary:
Many
Big Data practitioners are familiar with classical Machine Learning
techniques such as Naive Bayes, Decision Trees, Kmeans, PCA, and
Collaborative Filtering (to name but a few), and their implementations
on popular Big Data systems such as Hadoop. Going beyond these classic
techniques, a new generation of ML algorithms for example, topic
models, nonparametric Bayesian models, deep neural networks, and sparse
regression has been gaining popularity in both academia and industry,
because they improve performance on existing tasks like recommendation
and prediction, or even enable completely new ones such as topical
visualization and image object detection. Initially, these algorithms
were the exclusive privilege of large companies with the engineering
resources to build their own cluster implementations from scratch.
Today however, new opensource software platforms, such as GraphLab,
Petuum and Spark, have democratized some or all of these advanced
algorithms, putting them within reach of individual researchers and
data analysts that do not mind getting their hands a little dirty. In
this tutorial, you will learn about these emerging ML algorithms, the
software platforms that can run them today, the MLcentric theory,
principles and design of an ideal parallel ML system and how today’s
platforms fit that idea, and the open research opportunities that have
sprouted in this space between advanced ML and distributed systems.
Content:
Advanced, emerging ML algorithms:
– e.g. Deep Neural Networks,
topic models, sparse regression
Open source software platforms that can run
some or all of these algorithms at scale:
– e.g. GraphLab, Petuum and Spark
Principles, design and theory of an
algorithmic and systems interface to BigML
– Pros and cons of each platform: when should you
favor one over the other
Research opportunities in the space between
advanced ML and distributed systems
TUTORIAL 3: Large-scale Heterogeneous Learning in Big Data
Analytics
Presenters: Jun
Huan
Summary:
Heterogeneous
learning deals with data from complex real-world applications such as
social networks, biological networks, internet of things among others.
The heterogeneity could be found in multi-task learning, multi-view
learning, multi-label learning, and multi-instance learning. In this talk we will present our and
other groups’ recent progresses for designing and implementing
large-scale heterogeneous learning algorithms include multi-task
learning, multi-view learning, transfer
learning algorithms. The applications of these work in social network
analysis and bioinformatics will be discussed as well..
Content:
We cover the recent progresses on the
following aspects:
–Multi-task
learning (MTL) aims to train multiple related learning tasks together
to reduce generalization error. MTL has been widely utilized in many
application domains include bioinformatics, social network analysis,
image processing among others.
–Multi-view
learning (MVL) aims to identify a model where data are collected from
different sources (a.k.a. views). There is an
intense discussion on how and to what extend multi-view may help.
–Multi-label
learning (MLL) aims to build classifier that assign multi-labels to an
instance. It has wide applications in image annotation, recommender
systems, and etc.
We cover the theoretic foundation of
MTL/MVL/MLL learning algorithms using penalized maximum likelihood
estimation, Bayesian MTL, and Gaussian process. We also cover the
related algorithms such as MTL with known task relationship, multi-task
& multi-view learning, learning with structured input and output.
We also want to discuss a very important but less investigated area of
scaling those learning algorithms to large-scale data. We plan to cover
a few platforms that are suitable to support large-scale heterogeneous
learning. Applications of heterogeneous learning in Bioinformatics,
Health care informatics, Drug Discovery, Social network analysis will
be reviewed.
Short Bio.
Dr. Jun (Luke) Huan's Profile
Dr. Jun (Luke) Huan is
a Professor in the Department of Electrical Engineering and Computer
Science at the University of Kansas. He directs the Bioinformatics and
Computational Life Sciences Laboratory at KU Information and
Telecommunication Technology Center (ITTC) and the Cheminformatics core
at KU Specialized Chemistry Center, funded by NIH. He holds courtesy
appointments at the KU Bioinformatics Center, the KU Bioengineering
Program, an adjunct professorship from the Department of Internal
Medicine in the KU Medical School, and a visiting professorship from
GlaxoSmithKline plc.. Dr. Huan received his
Ph.D. in Computer Science from the University of North Carolina.
Dr.
Huan works on data science, machine learning, data mining, big data,
and interdisciplinary topics including bioinformatics. He has published
more than 80 peer-reviewed papers in leading conferences and journals
and has graduated more than ten graduate students including six PhDs.
Dr. Huan serves the editorial board of several international journals
including the Springer Journal of Big Data, Elsevier Journal of Big
Data Research, and the International Journal of Data Mining and
Bioinformatics. He regularly serves the program committee of top-tier
international conferences on machine learning, data mining, big data,
and bioinformatics.
Dr. Huan's research is recognized
internationally. He was a recipient of the prestigious National Science
Foundation Faculty Early Career Development Award in 2009. His group
won the Best Student Paper Award at the IEEE International Conference
on Data Mining in 2011 and the Best Paper Award (runner-up) at the ACM
International Conference on Information and Knowledge Management in
2009. His work appeared at mass media including Science Daily, R&D
magazine, and EurekAlert (sponsored by AAAS). Dr. Huan's research was
supported by NSF, NIH, DoD, and the University of Kansas.
TUTORIAL 4: Big Data Benchmarking
Presenters:
Chaitan Baru and Tilmann Rabl
Summary:
This tutorial will introduce the audience to the
broad set of issues involved in defining big data benchmarks, for
creating auditable industry-standard benchmarks that consider
performance as well as price/performance. Big data benchmarks must
capture the essential characteristics of big data applications and
systems, including heterogeneous data, e.g. structured, semi-
structured, unstructured, graphs, and streams; large-scale and evolving
system configurations; varying system loads; processing pipelines that
progressively transform data; workloads that include queries as well as
data mining and machine learning operations and algorithms. Different
benchmarking approaches will be introduced, from micro-benchmarks to
application- level benchmarking.
Since May 2012, five workshops have been held on Big
Data Benchmarking including participation from industry and academia.
One of the outcomes of these meetings has been the creation of
industry’s first big data benchmark, viz., TPCx-HS, the Transaction
Processing Performance Council’s benchmark for Hadoop Systems. During
these workshops, a number of other proposals have been put forward for
more comprehensive big data benchmarking. The tutorial will present and
discuss salient points and essential features of such benchmarks that
have been identified in these meetings, by experts in big data as well
as benchmarking. Two key approaches are now being pursued—one, called BigBench, is based on extending the TPC- Decision
Support (TPC-DS) benchmark with big data applications characteristics. The other called Deep Analytics
Pipeline, is based on modeling processing that is routinely encountered
in real-life big data applications. Both will be discussed.
We
conclude with a
discussion of a
number of future directions for
big data benchmarking
.
Content:
Introduction
– Introduction to benchmarking: What are
TPC and SPEC; what is each organization’s role and approach to
benchmarking.
– Characteristics
of good industry standard benchmarks: Why has TPC-C lasted so long?
Brief overview of TPCx-HS.
– Overview
of big data benchmarking approaches: From micro-benchmarks to
application-level pipelines.
– Applications
scenarios and use cases: Big data scenarios and use cases that help
define the application-level benchmark.
BigBench:
In-depth discussion of an example, proposed big data benchmark
– Data
generation: Synthetic data generation for big data.
– The
Benchmarking process: Steps involved in setting up, executing, and
verifying end-to-end benchmarks.
– Benchmark
metrics: Existing metrics for industry standards, and appropriate
metrics for big data benchmarks.
– Discussion
of performance results on a small 6-node cluster at Intel and a large,
540-node cluster at Pivotal.
– Possible
extensions to BigBench
Benchmarking
challenges and future directions
– Modeling
system failures in the benchmark; extrapolating from one scale factor
to the next; benchmarking for new application scenarios, e.g. the Internet of Things.
Q&A.
Short
Bio.
Dr.
Chaitan Baru and Dr. Tilmann Rabl 's Profile
The tutorial presenters are Dr. Chaitan Baru from the San Diego Supercomputer
Center, UC San Diego, and Dr. Tilmann
Rabl, from the Middleware Systems Research
Group, University of Toronto.
Dr.
Baru and Dr. Rabl have been
collaborating since 2012 on the topic of big data benchmarking. They
were both instrumental in starting the Workshops on Big Data
Benchmarking series, and serve on the Steering Committee for these
workshops. Five workshop have been held so far in May 2012 (San Jose),
December 2012 (Pune, India), July 2013 (Xi’an, China), October 2013
(San Jose), and August 2014 (Potsdam, Germany). They are co-editors of
the Springer Verlag Lecture Notes in Computer Science series on
Specifying Big Data Benchmarks. They have also co-authored three papers:
•Big Data
Benchmarking and the
BigData Top100 List,
C. Baru, M.
Bhandarkar, R. Nambiar, M. Poess, T.
Rabl, Big Data
Journal, Vol.1, No.1,
Mary Ann Liebert Inc.
Publishers, http://online.liebertpub.com/toc/big/1/1.
•Setting the Direction for
Big Data Benchmark Standards, C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, T. Rabl, TPC Technical
Conference, VLDB 2012, Aug 27-30, Istanbul, Turkey.
http://link.springer.com/chapter/10.1007/978-3-642-36727-4_14.
•Discussion of BigBench: A Proposed
Industry Standard Performance Benchmark for Big Data, Baru, Bhandarkar,
Curino, Danisch, Frank, Gowda, Jacobsen, Jie, Kumar, Nambiar, Poess,
Raab, Rabl, Ravi, Sachs, Sen, Yi, Youn, Proceedings of the TPC
Technical Conference, VLDB 2014, September, Hangzhou, China.
Furthermore, Dr. Rabl is the Chair of the recently
formed SPEC Research Group on Big Data Benchmarking, and Dr. Baru is
co-Chair. Thus, even though the tutorial instructors are from different
institutions, they have worked together closely for several years, and
have a continuing working relationship.
|
Panel with Program Directors: Big Data
Challenges and Opportunities
|
Panelists:
1)
Dr. Chaitanya Baru (NSF)
2)
Dr. Yuan Liu (NIH)
3)
Dr. David Kuehn (DoT)
4)
Dr.Tsengdar Lee (NASA)
5)
Dr. Sudarsan Rachuri (NIST)
6)
Mr. Matti Vakkuri (DIGILE)
Bios of
Panelists
Dr. Chaitanya
Baru (NSF)
Chaitan Baru currently serves as Senior Advisor for
Data Science in the CISE Directorate at the National Science
Foundation. He is on assignment from the San Diego Supercomputer
Center, UC San Diego, where he is Distinguished Scientist and Associate
Director of Data Initiatives. He has served in leadership positions in
a number of national-scale cyberinfrastructure R&D initiatives
across a wide range of science and engineering disciplines including,
earth science, ecology, earthquake engineering, and biomedical
informatics. In 2012, he initiated an industry-academia effort to
define big data benchmarks via the Workshops for Big Data Benchmarking
(WBDB). This has resulted in the recent formation of the SPEC Research
Group on Big Data Benchmarking, which he co-chairs. He is co-editor of
the Lecture Notes in Computer Science series entitled, Specifying Big
Data Benchmarks, published by Springer Verlag. He co-chairs the
National Institute for Standards and Technology’s (NIST) Public Working
Group on Big Data. He is a member of the teaching faculty for the
Masters in Advanced Studies program in Data Science and Engineering
(MAS-DSE) in the Computer Science Department at UC San Diego.
Baru has co-edited the book, Geoinformatics:
Cyberinfrastructure for the Solid Earth Sciences with Prof. Randy
Keller, University of Oklahoma, published by Cambridge University Press
(ISBN: 9780521897150).
Baru has a B.Tech (Electronics Engineering) from IIT
Madras and an M.E. and Ph.D. (Electrical Engineering) from the
University of Florida.
Dr. Yuan Liu
(NIH)
Dr. Yuan Liu is the Chief of the Office of International
Activities, and the Director of Computational Neuroscience and
Neuroinformatics Program at the National Institute of Neurological
Disorders and Stroke (NINDS), National Institutes of Health (NIH).
Dr. Liu leads the NINDS’ international activities,
which focus on fostering international and global health research,
training and collaborations. She also oversees the computational
neuroscience and neuroinformatics program, which promotes
collaborations between experimental, computational and informatics
neuroscientists to advance the understanding of nervous system
structure and function, and the mechanisms underlying nervous system
disorders.
In addition, Dr. Liu has been serving as the NINDS
representative on over 30 international, interagency and trans-NIH
committees and working groups that develop international and
computational biology and bioinformatics related programs, including
many inter-agency initiatives (e.g., CRCNS, IMAG, and BigData),
trans-NIH programs (e.g., Roadmap, BISTI, and BD2K), and Blueprint for
Neuroscience activities (e.g., NIF, NITRC and Human Connectome
Project). For her achievement and contribution, she received several
NIH Director’s Awards and NIH Blueprint Neuroscience Research Directors
Awards.
Dr. Liu received her bachelors and masters degrees in
neurophysiology from Peking University in P.R. China, and her Ph.D. in
neuroscience, under the mentorship of Prof. John G. Nicholls, from the
Biozentrum, Universität Basel in Switzerland. Her research career was
focused on the area of neurophysiology at single channel, synaptic and
systems levels. Between 1999 and 2004, she managed the research
portfolio centered on channels, synapses and circuit grants at NINDS.
Prior to joining the NINDS, Dr. Liu was Program Director for Basic
Neuroscience Research at
Dr. David Kuehn
(DoT)
David Kuehn is the Program Manager for the Federal
Highway Administration (FHWA) Exploratory Advanced Research Program.
The Program Manager serves as the senior advisor to agency leadership
on the communication and coordination of exploratory advanced research
activities and fosters partnerships with other Federal agencies,
national scientific societies and organizations, and the academic
community in support of the Program.
The program focuses on longer term and higher risk research with
the potential for transformational improvements to the transportation
system. David entered federal
service as a Presidential Management Fellow. Before working at the federal level,
David worked in local government and as a consultant in southern
California. He holds a Masters
of Public Administration from the University of Southern California and
a B.A from the University of California, Irvine and is a member of the
American Institute of Certified Planners (AICP).
Dr.Tsengdar Lee
(NASA)
Dr. Tsengdar Lee manages the High-End Computing
Program from NASA Headquarters. He is responsible for maintaining the
high-end computing capability to support the agency's aeronautics
research, human exploration, scientific discovery, and space operations
missions. Lee is also the manager of the NASA Weather Data Analysis
Program, focusing on the transition of research results into the
operational forecast centers and the acceleration of operational use of
research data. Two major activities include the multi-agency Joint
Center for Satellite Data Assimilation and the Short-term Prediction
Research and Transition Center.
In 2011, Lee served as Acting Chief Technology Officer
(CTO) for Information Technology (IT) in the NASA Office of the Chief
Information Officer. In this capacity, Lee funded agency-wide IT
research and advanced prototyping and created NASA's IT Labs. He also
chaired the CTO-IT Council.Lee joined NASA in 2001 as the High-End Computing
Program Manager for the Earth Science Enterprise. He was responsible
for the Earth science computational modeling needs, primarily focusing
on weather and climate modeling. Between 2002 and 2006, Lee also
managed the Earth Science Global Modeling Program. He funded research
efforts to study the global climate change, weather forecasting, and
hurricane prediction problems.Prior to 2001, Lee held positions as
Senior Technical Advisor with Northrop Grumman Information Technology
and Senior Staff Engineer with Litton PRC. He worked on the Advanced
Weather Information Processing System (AWIPS) project for the National
Weather Service. He was responsible for the rapid development,
integration, and commercialization of the AWIPS client-server system.
Lee also was a principal engineer on the effort to develop the AWIPS
network monitoring and control system.
He was a Research Scientist and worked on the
dispersion problem of bio-chemical agents during his short tenure with
the Science Applications International Corporation between 1994 and
1996. Lee received two graduate degrees from Colorado State University,
a PhD in Atmospheric Science in 1992 and an MS in Civil Engineering in
1988. Trained as a short-term weather modeler, his work focused on the
integration of weather and ancillary geographical information data into
weather models to produce reliable forecasts. His research pioneered
the modeling of land surface hydrology’s impact on weather forecasting.
Dr.
Sudarsan Rachuri (NIST)
Dr. Sudarsan Rachuri is the Program Manager for Smart
Manufacturing Design and Analysis program at NIST. Prior to joining
NIST, he was a research professor at George Washington University. His
primary research objectives are to develop and transfer knowledge to
industry about information models for sustainable and smart
manufacturing, green products, big data
analytics for manufacturing, system level analysis, and knowledge
representation. Specific focus
is on identifying integration and technology issues that promote
industry acceptance of information models, and standards,
that will enable designers to develop products that are
sustainable and manufactured using smart technologies in a distributed
and collaborative environment. Dr. Rachuri's primary areas of interest
are smart and sustainable manufacturing, scientific computing,
CAD/CAM/CAE, design for Sustainability, data analytics, object-oriented
modeling, and ontology.
Dr. Rachuri is an ASME Fellow, having been elected in
2012 for his significant contributions in the areas of information and
semantic modeling of product life cycle management, and the application
of measurement science for sustainable manufacturing.
Mr. Matti
Vakkuri (DIGILE)
Mr. Matti Vakkuri, Program Director, Big Data, Tieto
& Focus Area Director, DIGILE’s Data-to-Intelligence Programme
Mr. Matti Vakkuri graduated from Finnish Military
Academy in 1993. He has 20 years
of experience from areas of management, leadership, business
development, security, quality, human resource management, project management,
program management, offering development , crisis management and
consulting in both governmental and private sector.
In his current position in Tieto his tasks are to
enable the power of Big Data and its enormous impact on the customers’
businesses, develop and ramp-up the offering, sales and delivery
capabilities, build competences in Big Data, Hadoop and data sciences,
assure cross-organizational collaboration and network, evaluate
partners, suppliers and competitors in Big Data market. Tasks include advocating
and lobbying Big Data’s possibilities internally and externally in
business operations, research and product development. Since April 2013, in addition to his
job in Tieto he has held a part-time occupation of Focus Area Director
(the Head of the Program) for Digile’s Data to Intelligence research
program. The program is focused on Big Data, data reserves and
user-centric service development. The aim of the program is together
with companies and research institutions to develop intelligent tools
and methods for managing, refining and utilizing diverse data. The
results of the program enable innovative business models and services.
One of the program’s targets is to develop methods for Big Data
analytics that handle complexity through fusion of heterogeneous data
sources, and use adaptivity, context-sensitivity, scalability, and user
relevance as the main methodological objectives.
From January, 2014 he has been a full member of
Finland’s ministry of Transportation and communications Big Data working
group which has built and written Finland’s national Big Data strategy
draft in June, 2014.
His motto is "Management by leadership".
|
|
|
|
|
|
|
Last update: 22 Oct. 2014
|
|