IEEE Big Data 2016 Accepted Papers

Main Conference

Regular Paper
Ricardo Calix and Rajesh Sankaran
On the Feasibility of an Embedded Machine Learning Processor for Intrusion Detection
Jinwei Liu, Haiying Shen, and Husnu Narman
CCRP: Customized Cooperative Resource Provisioning for High Resource Utilization in Clouds
Xiao Pan, Jiawei Zhang, Fengjiao Wang, and Philip S. Yu
DistSD: Distance-based Social Discovery with Personalized Posterior Screening
Christian B�hm, Martin Perdacher, and Claudia Plan
Cache-oblivious Loops Based on a Novel Space-filling Curve
Yida Wang, Bryn Keller, Mihai Capota, Michael Anderson, Narayanan Sundaram, Jonathan Cohen, Kai Li, Nicholas Turk-Browne, and Ted Willke
Real-time Full Correlation Matrix Analysis of fMRI Data
Heqing Huang, Cong Zheng, Junyuan Zeng, Wu Zhou, Sencun Zhu, Peng Liu, Suresh Chari, and Ce Zhang
Android Malware Development on Public Malware Scanning Platforms: A Large-scale Data-driven Study
Yuan Yuan, Meisam Fathi, Kaibo Wang, Rubao Lee, and Xiaodong Zhang
Spark-GPU: High Performance In-Memory Data Processing with GPUs
Jingyuan Zhang, Chun-Ta Lu, Mianwei Zhou, Sihong Xie, Yi Chang, and Philip S. Yu
HEER: Heterogeneous Graph Embedding for Emerging Relation Detection from News
Hongfu Liu, Yuchao Zhang, Bo Deng, and Yun Fu
Outlier Detection via Sampling Ensemble
Kai Zhao, Denis Khryashchev, Juliana Freire, Claudio Silva, and Huy Vo
Predicting Taxi Demand at High Spatial Resolution: Approaching the Limit of Predictability
Christian Beecks and Alexander Graß
Multi-step Threshold Algorithm for Efficient Feature-based Query Processing in Large-scale Multimedia Databases
Xiaoli Song and Xiaohua Hu
Pairwise Topic Model and its Application to Topic Transition and Evolution
Fang Zhou, Mohamed Ghalwash, and Zoran Obradovic
A Fast Structured Regression for Large Networks
Hao Zhang, Yuanyuan Zhu, Lu Qin, Hong Cheng, and Jeffrey Xu Yu
Efficient Triangle Listing for Billion-Scale Graphs
Angen Zheng, Alexandros Labrinidis, Panos Chrysanthis, and Jack Lange
ARGO: Architecture-Aware Graph Partitioning
Yongyi Xian, Yan Liu, and Chuanfei Xu
Parallel Gathering Discovery over Big Trajectory Data
Cheong Hee Park and Youngsoon Kang
An Active Learning Method for Data Streams with Concept Drift
Zhuozhao Li, Haiying Shen, Jeffrey Denton, and Walter Ligon
Comparing Application Performance on HPC-based Hadoop Platforms with Local Storage and Dedicated Storage
Ngot Bui, Thanh Le, and Vasant Honavar
Labeling Actors in Multi-view Social Networks by Integrating Information From Within and Across Multiple Views
Victor Giannakouris, Nikolaos Papailiou, Dimitrios Tsoumakos, and Nectarios Koziris
MuSQLE: Distributed SQL Query Execution Over Multiple Engine Environments
Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ramli, and Xin Liu
Shooting a Moving Target: Motion-Prediction-Based Transmission for 360-Degree Videos
Hariton Efstathiades, Demetris Antoniades, George Pallis, Marios Dikaiakos, Zoltán Szlávik, and Robert-Jan Sips
Online Social Network Evolution: Revisiting the Twitter Graph
Xiaoyu Ge, Yanbing Xue, Zhipeng Luo, Mohamed Sharaf, and Panos Chrysanthis
REQUEST: An Interactive Big Data Exploration Framework
Francesco Versaci, Luca Pireddu, and Gianluigi Zanetti
Scalable genomics: from raw data to aligned reads on Apache YARN
Ata Turk, Hao Chen, Anthony Byrne, John Knollmeyer, Sastry Duri, Canturk Isci, and Ayse Coskun
DeltaSherlock: Identifying Changes in the Cloud
Michael Anderson, Mihai Capotă, Javier Turek, Xia Zhu, Ted Willke, Yida Wang, Po-Hsuan Chen, Jeremy Manning, Peter Ramadge, and Kenneth Norman
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
HU XU, Sihong Xie, Lei Shu, and Philip S. Yu
Sentence-level Extraction of Complementary Entities using Large Unlabeled Product Reviews
Yixian Zheng, Wenchao Wu, Haipeng Zeng, Nan Cao, Huamin Qu, Mingxuan Yuan, Jia Zeng, and Lionel M. Ni
TelcoFlow: Visual Exploration of Collective Behaviors Based on Telco Data
Zhichuan Huang and Ting Zhu
Leveraging Multi-Granularity Energy Data for Accurate Energy Demand Forecast in Smart Grids
Ahsanul Haque, Zhuoyi Wang, Swarup Chandra, Latifur Khan, and Charu Aggarwal
Sampling based Distributed Kernel Mean Matching using Spark
Liuhua Chen and Haiying Shen
Towards Resource-Efficient Cloud Systems: Avoiding Over-Provisioning in Demand-Prediction Based Resource Provisioning
Subhadeep Karan and Jaroslaw Zola
Exact Structure Learning of Bayesian Networks by Optimal Path Extension
Tomoki Yoshihisa and Takahiro Hara
A Low-Load Stream Processing Scheme for IoT Environments
Hui Li, Jiangtao Cui, Xiaobin Lin, and Jianfeng Ma
Improving the Utility in Differential Private Histogram Publishing: Theoretical Study and Practice
Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, and Satoshi Matsuoka
Predicting Statistics of Asynchronous SGD Parameters for a Large-Scale Distributed Deep Learning System on GPU Supercomputers
Panagiotis Liakos, Alexandros Ntoulas, and Alex Delis
Scalable Link Community Detection: A Local Dispersion-aware Approach
Karuna Joshi, Aditi Gupta, Sudip Mittal, Claudia Pearce, Anupam Joshi, and Tim Finin
Semantic Approach to Automating Management of Big Data Privacy Policies
Sayan Goswami, Arghya Kusum Das, Richard Platania, Kisung Lee, and Seung-Jong Park
Lazer: Distributed Memory-Efficient Assembly of Large-Scale Genomes
Kaiji Chen and Yongluan Zhou
Materialized View Selection in Feed Following Systems
Chunqiu Zeng, Qing Wang, Wentao Wang, Tao Li, and Larisa Shwartz
Online Inference for Time-varying Temporal Dependency Discovery from Time Series
Jingyuan Yang, Chuanren Liu, Mingfei Teng, Hui Xiong, and March Liao
Buyer Targeting Optimization: A Unified Customer Segmentation Perspective
Nguyen Ho, Huy Vo, and Mai Vu,
An Adaptive Information-Theoretic Approach for Identifying Temporal Correlations in Big Data Sets
Katerina Doka, Nikolaos Papailiou, Victor Giannakouris, Dimitrios Tsoumakos, and Nectarios Koziris
Mix 'n Match Multi-Engine Analytics
Xiaowei Jia, Ankush Khandelwal, James Gerber, Kimberly Carlson, Paul West, and Vipin Kumar
Learning Large-scale Plantation Mapping from Imperfect Annotators
Mehrdad Yazdani, Bryn Taylor, Justine Debelius, Weizhong Li, Rob Knight, and Larry Smarr
Using Machine Learning to Identify Major Shifts in Human Gut Microbiome Protein Family Abundance in Disease
Sarasi Lalithsena, Pavan Kapanipathi, and Amit Sheth
Harnessing Relationships for Domain-specific Subgraph Extraction: A Recommendation Use Case
Chao Huang and Dong Wang
Towards Unsupervised Home Location Inference from Online Social Media
Eleazar Leal, Le Gruenwald, and Jianting Zhang
Handling Uncertainty in Trajectories of Moving Objects in Unconstrained Outdoor Spaces
Yanan Xu and Yanmin Zhu
When Remote Sensing Data meet Ubiquitous Urban Data: Fine-Grained Air Quality Inference
Azad Naik and Huzefa Rangwala
Embedding Feature Selection for Large-scale Hierarchical Classification
Yin Huang, Yelena Yesha, Milton Halem, Yaacov Yesha, and Shujia Zhou
inMem: a distributed parallel indexed in-memory computation system for large scale data analytics
Wei Jiang, Juan Rodriguez, and Torsten Sue
Improved Methods for Static Index Pruning
Yudian Ji, Yuda Zang, Wuman Luo, Xibo Zhou, Ye Ding, and Lionel M. Ni
Clockwise Compression for Trajectory Data under Road Network Constraints
Cuong Nguyen and Rhodes Philip
Accelerating Large-scale Unstructured Mesh Queries
Walaa Eldin Moustafa, Vicky Papavasileiou, Ken Yocum, and Alin Deutsch
Giragraphy: Scaling Datalog Graph Analytics on Giraph
Timo Bingmann, Michael Axtmann, Emanuel Jöbstl, Sebastian Lamm, Huyen Chau Nguyen, Alexander Noe, Sebastian Schlag, Matthias Stumpp, Tobias Sturm, and Peter Sanders
Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++
Quan Zhang, Mu Qiao, Ramani Routray, and Weisong Shi
H2O: A Hybrid and Hierarchical Outlier Detection Method for Large Scale Data Protection
Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael Mahoney, and Mr Prabhat
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
Athanasios N. Nikolakopoulos, Antonia Korba, and John D. Garofalakis
Random Surfing on Multipartite Graphs
Naman Shah, Harshil Shah, Matthew Malensek, Sangmi Lee Pallickara, and Shrideep Pallickara
Network Analysis for Identifying and Characterizing Disease Outbreak Influence from Voluminous Epidemiology Data
Darja Krushevskaja, S. Muthukrishnan, and Willam Simpson
Ad Allocation with Secondary Metrics
Nesreen Ahmed, Ted Willke, and Ryan Rossi
Local Graphlet Estimation
Yating Zhang, Adam Jatowt, and Katsumi Tanaka
How Good are Word Embeddings? Automatically Explaining Similarity of Terms
Shiblee Sadik, Le Gruenwald, and Eleazar Leal
In Pursuit of Outliers in Multi-dimensional Data Streams
Morteza Zihayat, Zane Zhenhua Hu, Aijun An, and Yonggang Hu
Distributed and Parallel High Utility Sequential Pattern Mining
Benjamin Sirb and Xiaojing Ye
Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks
Yuan Yuan, Sihong Xie, Chun-Ta Lu, Philip S. Yu, and Jie Tang
Interpretable and Effective Opinion Spam Detection via Temporal Patterns Mining across Websites
Mansurul Bhuiyan and Mohammad Al Hasan
PRIIME: A Generic Framework for Interactive Personalized Interesting Pattern Discovery
Jianliang Gao, Bo Song, Ping Liu, Weimao Ke, Jianxin Wang, and Xiaohua Hu
Parallel Top-k Subgraph Query in Massive Graphs: Computing from the Perspective of Single Vertex
Nusrat Islam, Md Wasi-ur- Rahman, Xiaoyi Lu, and Dhabaleswar K. Panda
Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage
Poonam Goyal, Jagat Sesh Challa, Nikhil S, Aditya Mangla, Sundar S Balasubramaniam, and Navneet Goyal
DD-RTREE: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms
Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, and Dhabaleswar K. Panda
High-Performance Design of Apache Spark with RDMA and Its Benefits on Various Workloads
Ravikant Dindokar, Neel Choudhury, and Yogesh Simmhan
A Meta-graph Approach to Analyze Subgraph-centric Distributed Programming Models
Wooyeol Kim, Younghoon Kim, and Kyuseok Shim
Parallel Computation of k-Nearest Neighbor Joins Using MapReduce
Ke Zhang, Jianwu Xu, Martin Renqiang Min, Guofei Jiang, Konstantinos Pelechrinis, and Hui Zhang
Automated IT System Failure Prediction: A Deep Learning Approach
Jianpeng Xu, Jiayu Zhou, Pang-Ning Tan, Xi Liu, and Lifeng Luo
WISDOM: Weighted Incremental Spatio-Temporal Multi-Task Learning via Tensor Decomposition
Chun Guo and Xiaozhong Liu
Dynamic Feature Generation and Selection on Heterogeneous Graph for Music Recommendation
Xiaokai Wei, Bokai Cao, Weixiang Shao, Chun-Ta Lu, and Philip S. Yu
Community Detection with Partially Observable Links and Node Attributes
Charles Siegel, Jeff Daily, and Abhinav Vishnu
Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems
Short Paper
Jinwei Liu and Haiying Shen
 A Popularity-aware Cost-effective Replication Scheme for High Data Durability in Cloud Storage
Mike Lakoju, Alan Serrano-Rico, and Mark Lycett
A Strategic Approach for Visualizing the Value of Big data (SAVV-BIGD) Framework
Chun-Chich Chen, Chih-Ya Shen, and Ming-Syan Chen
 Massive Parallelism for Non-linear and Non-stationary Data Analysis with GPGPU
Hans Vandierendonck, Karen Murphy, Mahwish Arif
and Dimitrios Nikolopoulos, HPTA: High-Performance Text Analytics
Ting Wu, Chen Zhang, Lei Chen, Pan Hui, and Siyuan Liu
 Object Identification with Pay-As-You-Go Crowdsourcing
Bas van Stein, Matthijs van Leeuwen, and Thomas Bäck
 Local Subspace-Based Outlier Detection using Global Neighbourhoods
Jonathan Stokes and Steven Weber
The self-avoiding walk-jump (SAWJ) algorithm for finding maximum degree nodes in large graphs
Rocco Langone and Johan A. K. Suykens
Efficient multiple scale kernel classifiers
Haofu Liao, Yuncheng Li, Tianran Hu, and Jiebo Luo
 Inferring Restaurant Styles by Mining Crowd Sourced Photos from User-Review Websites
Uwe Jugel, Zbigniew Jerzak, and Volker Markl
 Big data on a few pixels
Shuo Wang, Richard Sinnott, and Surya Nepal
 Protecting the Location Privacy of Mobile Social Media Users
Jorge Veiga, Roberto R. Expósito, Xoán C. Pardo, Guillermo L. Taboada, and Juan Touriño
 Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
Yali Zhao
SLA-Based Profit Optimization for Resource Management of Big Data Analytics-as-a-Service Platforms in Cloud Computing Environments
Xiaoli Song and Xiaohua Hu
 Semantic Pattern Mining for Text Mining
Tong Yu, Ole Mengshoel, Alvin Jude, Eugen Feller, Julien Forgeat, and Nimish Radia
 Incremental Learning for Matrix Factorization in Recommender Systems
Chuan Shi, Bowei He, Menghao Zhang, Fuzheng Zhuang, and Philip S. Yu
 Expenditure Aware Rating Prediction for Recommendation
Natalia Arzamasova, Martin Schäler, and Klemens Böhm
Cleaning Antipatterns in an SQL Query Log
Kenji Yamanishi and Kohei Miyaguchi
Detecting Gradual Changes from Data Stream Using MDL-Change Statistics
Nicolas Kourtellis, Gianmarco De Francisci Morales, Albert Bifet, and Arinto Murdopo
 VHT: Vertical Hoeffding Tree
Chang Liu and Bin Wu
 Mutiple Submodels Parallel Support Vector Machine on Spark
Christoforos Svingos, Theofilos Mailis, Herald Kllapi, Lefteris Stamatogiannakis, Yannis Kotidis, and Yannis Ioannidis
Real Time Processing of Streaming and Static Information
Matthew Edwards, Stephen Wattam, Paul Rayson, and Awais Rashid
Sampling Labelled Profile Data for Identity Resolution
Wanying Ding, Yue Zhang, Chaomei Chen, and Xiaohua Hu
 Semi-Supervised Dirichlet-Hawkes Process with Applications of Topic Detection and Tracking in Twitter
Steven Morse, Marta Gonzalez, and Natasha Markuzon
Persistent Cascades: Measuring Fundamental Communication Structure in a Social Network
rui ren, zhen jia, lei wang, tianxu yi, and jianfeng zhan
 BDTune: Hierarchical Correlation-based Performance Analysis and Rule-based Diagnosis for Big Data System
Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, and Takatsugu Ono
 Evaluating the Impacts of Code-Level Performance Tunings on Power Efficiency
Ramyar Saeedi, Hassan Ghasemzadeh, and Assefaw Gebremedhin
 Transfer Learning Algorithms for Autonomous Reconfiguration of Wearable Systems
Weimao Ke and Javed Mostafa
 Scalability Analysis of Distributed Search in Large Peer-to-peer Networks
Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel Martin, Kesheng Wu, Bin Dong, Scott Klasky, and Nagiza Samatova
 Exploring Memory Hierarchy and Network Topology for Runtime AMR Data Sharing Across Scientific Applications
Kareem Aggour and Bulent Yener
 Adapting to Data Sparsity for Efficient Parallel PARAFAC Tensor Decomposition in Hadoop
Da-Chuan Zhang, Mei Li, and Chang-Dong Wang
 Point of Interest Recommendation with Social and Geographical Influence
Jinfeng Li, James Cheng, Yunjian Zhao,, Fan Yang, Yuzhen Huang, and Haipeng Chen
A Comparison of General-Purpose Distributed Systems for Data Processing
Mei Saouk, Christos Doulkeridis, Akrivi Vlachou, and Kjetil Noervaag
Efficient Processing of Top-k Joins in MapReduce
Ioanna Tsalouchidou, Gianmarco De Francisci Morales, Francesco Bonchi, and Ricardo Baeza-Yates
 Streaming Tensor Summarization
Sergey Nepomnyachiy and Torsten Suel
 Efficient Index Updates for Mixed Update and Query Loads
Stathis Maroulis, Ioannis Boutsis, and Vana Kalogeraki
Context-Aware Point of Interest Recommendation using Tensor Factorization
Hông-Ân Cao, Tri Kurniawan Wijaya, Karl Aberer, and Nuno Nunes
 Estimating Human Interactions with Electrical Appliances for Activity-based Energy Savings Recommendations
Frank Pallas, Johannes Günther, and David Bermbach
Pick Your Choice in HBase: Security or Performance
Chunkun Bo, Ke Wang, Jefferey Fox, and Kevin Skadron
Entity Resolution Acceleration using Micron's Automata Processor
Zhichuan Huang, Ting Zhu, and Jianwu Wang
 Application-Driven Sensing Data Reconstruction and Selection Based on Correlation Mining and Dynamic Feedback
Kyle Chard, Mike D'Arcy, Ben Heavner, Ian Foster, Carl Kesselman, Ravi Madduri, Alexis Rodriguez, Stian Soiland-Reyes, Carole Goble, Eric Deutsch, Ivo Dinov, Ivo Dinov, Kristi Clark, Nathan Price, and Arthur Toga
 I'll Take That to Go: Big Data Bags and Minimal Identifiers for Exchange of Large, Complex Data
Michael Rilee, Kwo-Sen Kuo, Thomas Clune, Amidu Oloso, Paul Brown, and Hongfeng Yu
 Addressing the Big-Earth-Data Variety Challenge with the Hierarchical Triangular Mesh
Weixiang Shao, Lifang He, Chun-Ta Lu, and Philip S. Yu
 Online Multi-view Clustering with Incomplete Views
Joaquim Silva, Carlos Goncalves, and Jose Cunha
 A Theoretical Model for n-gram Distribution in Big Data Corpora
Philip Chan and Ebad Ahmadzadeh
 Improving Efficiency of Maximizing Spread in the Flow Authority Model for Large Sparse Networks
Saliya Ekanayake, Supun Kamburugamuve, Pulasthi Wickramasinghe, and Geoffrey Fox
 Java Thread and Process Performance for Parallel Machine Learning on Multicore HPC Clusters
Rongda Zhu, Aston Zhang, Jian Peng, and Chengxiang Zhai
 Exploiting Temporal Divergence of Topic Distributions for Event Detection
Gheorghi Guzun, Josiah McClurg, Guadalupe Canahuate, and Raghuraman Mudumbai
 Power efficient big data analytics algorithms through low-level operations
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka
 Efficient Breadth-First Search on Massively Parallel and Distributed Memory Machines
Ioanna Filippidou and Yannis Kotidis
Effective and Efficient Graph Augmentation in Large Graphs
Ling Cen and Ruta Dymitr
 A Map based Gender Prediction Model for Big E-Commerce Data
Jinna lv, Bin Wu, Shuai Yang, and Bingjing Jia
 Efficient Large Scale Near-Duplicate Video Detection Base on Spark
Xiaopeng Li, Ming Cheung, and James She
Connection Discovery using Shared Images by Gaussian Relational Topic Model
Ariel Bar, Bracha Shapira, Lior Rokach, and Moshe Unger
 Scalable Attack Propogation Model and Algorithms for Honeypot Systems
Pascal Welke, Alexander Markowetz, Torsten Suel, and Maria Christoforaki
 Three-Hop Distance Estimation in Social Graphs
A. Pavan, P. Quint, S. Scott, N. V. Vinodchandran, and J. Smith
 Computing Triangle and Open-Wedge Heavy-Hitters in Large Networks
Stratos Dimopoulos, Chandra Krintz, and Rich Wolski
 Big Data Framework Interference In Restricted Private Cloud Settings
Susanna Pirttikangas, Ekaterina Gilman, Xiang Su, Teemu Leppänen, Anja Keskinarkaus, Mika Rautiainen, Mikko Pyykkönen, and Jukka Riekki
 Experiences with Smart City Traffic Pilot
Mohammad Mahdi Kamani, Farshid Farhat, Stephen Wistar, and James Z. Wang
 Shape Matching using Skeleton Context for Automated Bow Echo Detection
Abir Zayani, Chiheb-Eddine Ben N'Cir, and Nadia Essoussi
 Parallel clustering method for Non-Disjoint Partitioning of Large-Scale Data based on Spark Framework
Xiaowei Jia, Xi Chen, Anuj Karpatne, and Vipin Kumar
 Identifying Dynamic Changes with Noisy Labels in Spatial-temporal Data: A Study on Large-scale Water Monitoring Application
Aman Gupta, S. Muthukrishnan, and Smita Wadhwa
Optimizing Callout in Unifed Ad Markets
Yadu Babuji, Kyle Chard, Aaron Gerow, and Eamon Duede
 Cloud Kotta: Enabling Secure and Scalable Data Analytics in the Cloud
Elyas Sabeti and Anders Host-Madsen
 How Interesting Images Are: An Atypicality Approach For Social Networks
Ville Hyvönen, Teemu Pitkänen, Sotiris Tasoulis, Elias Jääsaari, Risto Tuomainen, Liang Wang, Jukka Corander, and Teemu Roos
 Fast Nearest Neighbor Search through Sparse Random Projections and Voting
Hitoshi Sato, Ryo Mizote, Satoshi Matsuoka, and Hirotaka Ogawa
 I/O Chunking and Latency Hiding Approach for Out-of-core Sorting Acceleration using GPU and Flash NVM
Adiska Fardani Haryadi, Marijn Janssen, Joris Hulstijn, Haiko Voort, and Agung Wahyudi
 Requirements on and Antecedents of Big Data Quality: An Empirical Examination to Improve Big Data Quality in Financial Service Organizations
Daniel Zhang, Rungang Han, and Dong Wang
 On Robust Truth Discovery in Sparse Social Media Sensing
Joseph Jupin and Yuan Shi
 PSH: A Probabilistic Signature Hash Method with Hash Neighborhood Candidate Generation for Fast Edit-Distance String Comparison on Big Data
Khoa Doan, Amidu Oloso, Kwo-Sen Kuo, Thomas Clune, and Hongfeng Yu
 Evaluating the Impact of Data Placement to Spark and SciDB with an Earth Science Use Case
Luis Pineda-Morales, Ji Liu, Alexandru Costan, Esther Pacitti, Gabriel Antoniu, Patrick Valduriez, and Marta Mattoso
 Managing Hot Metadata for Scientific Workflows on Multisite Clouds
Dipti Shankar, Xiaoyi Lu, and Dhabaleswar K. Panda
Boldio: A Hybrid and Resilient Burst-Buffer Over Lustre for Accelerating Big Data I/O
Zexi Chen, Ranga Vatsavai, Bharathkumar Ramachandra, Qiang Zhang, Nagendra Singh, and Sreenivas Sukumar
Scalable Nearest Neighbor Based Hierarchical Change Detection Framework for Crop Monitoring
Sreenivas Sukumar, Michael Matheson, Ramakrishnan Kannan, and Seung-Hwan Lim
 Mini-Apps for High Performance Scientific Data Analysis
Mai Nguyen, Dylan Uys, Daniel Crawl, Charles Cowart, and Ilkay Altintas
 A Scalable Approach for Location-Specific Detection of Santa Ana Conditions
Tathagata Mukherjee, Biswas Parajuli, Piyush Kumar, and Eduardo Pasiliao
 TruthCore: Non-parametric Estimation of Truth from a Collection of Authoritative Sources
Xiang Liu and Torsten Suel
 What Makes A Group Fail: Modeling Social Group Behavior in Event-Based Social Networks
Gopi Chand Nutakki and Olfa Nasraoui
 Compartmentalized Adaptive Topic Mining on Social Media Streams
Farrukh Ahmed, Michele Samorani, Colin Bellinger, and Osmar R. Zaiane
 Advantage of Integration in Big Data: Feature Generation in Multi-Relational Databases for Imbalanced Learning
Yuh-Jye Lee, Hsing-Kuo Pao, Shueh-Han Shih, Jing-Yao Lin, and Xin-Rong Chen
 Compressed Learning for Time Series Classification
Saman Biookaghazadeh, Yiqi Xu, Shujia Zhou, and Ming Zhao
 Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems
said JABBOUR, Nizar Mhadhbi, Abdesattar Mhadhbi, Badran RAddaoui, and Lakhdar Sais
 Summarizing Large Graphs by Means of Pseudo-Boolean Constraints

Industry and Government

Regular Paper
Syed Yousaf Shah, Brent Paulovicks, and Petros Zerfos
Data-at-Rest Security for Spark
Nathaniel Huber-Fliflet, Jianping Zhang, Haozhen Zhao, Robert Keeling, and Rishi Chhatwal
Empirical Evaluations of Preprocessing Parameters' Impact on Predictive Coding's Effectiveness
zhenyun zhuang, Haricharan Ramachandra, Badri Sridharan, Brandon Duncan, Kishore Gopalakrishna, and Jean-Francois Im
SmartCache: Application Layer Caching to Improve Performance of Large-scale Memory Mapping
Raya Horesh, Kush Varshney, and Jinfeng Yi,
Information Retrieval, Fusion, Completion, and Clustering for Employee Expertise Estimation
Xuchao Zhang, Zhiqian Chen, Weisheng Zhong, Arnold P. Boedihardjo, and Chang-Tien Lu
Storytelling in Heterogeneous Twitter Entity Network based on Hierarchical Cluster Routing
Juergen Heit, Jiayi Liu, and Mohak Shah
An Architecture for the Deployment of Statistical Models for Big Data Era
Adetokunbo Makanju, Zahra Farzanyar, Aijun An, Nick Cercone, Zane Hu, and Yonggang Hu
Deep Parallelization of Parallel FP-Growth Using Parent-Child MapReduce
Bradford Littooy, Sophie Loire, Michael Georgescu, and Igor Mezic
Dynamic Pattern Recognition and Classification of HVAC Faults in Commercial Buildings
Hui Wu, Yi Fang, Huming Wu, and Shenhong Zhu
A Diversified Trending Topic Discovery System
Ganesh Venkataraman, Abhimanyu Lad, Lin Guo, and Shakti Sinha
Fast, Lenient and Accurate: Building Personalized Instant Search Experience at LinkedIn
Pavel Dmitriev, Brian Frasca, Somit Gupta, Ron Kohavi, and Garnet Vaz
Pitfalls of Long-Term Online Controlled Experiments
Michele Samorani, Farrukh Ahmed, and Osmar Zaiane
Automatic Generation of Relational Attributes: An Application to Product Returns
Tomasz Tajmajer, Malwina Spławińska, Piotr Wasilewski, and Stan Matwin
Predicting Annual Average Daily Highway Traffic from Large Data and Very Few Measurements
Ruoyu Wang, Daniel Sun, Muhammad Atif, and Surya Nepal
LogProv: Logging Events as Provenance of Big Data Analytics Pipelines with Trustworthiness Evaluation
Wenjun Zhou, Yun Zhu, Faizan Javed, Mahmudur Rahman, Janani Balaji, and Matt McNair
Quantifying Skill Relevance to Job Titles
Mylene Simon, Joe Chalfoun, Mary Brady, and Peter Bajcsy
Do We Trust Image Measurements?
Sreenivas Sukumar, Michael Matheson, Ramakrishnan Kannan, and Seung-Hwan Lim
Mini-Apps for High Performance Scientific Data Analysis
Elissa Redmiles, Emily Grace, Ankit Rai, and Rayid Ghani
Detecting Fraud, Corruption, and Collusion in International Development Contracts
Nicolas Poggi, Josep Ll. Berral, David Carrera, Jose Blakeley, and Nikola Vujic
The state of SQL-on-Hadoop in the Cloud
Zahra Zohrevand, Uwe Glässer, Hamed Yaghoubi Shahir, and Mohammad A. Tayebi
Hidden Markov Based Anomaly Detection for Water Supply Systems
Short Paper
Archana Ganapathi and Yanpei Chen
Data Quality: Experiences and Lessons from Operationalizing Big Data
Teruyoshi Zenmyo, Satoshi Iijima, and Ichiro Fukuda
Managing a Complicated Workflow based on Dataflow-based Workflow Scheduler
Issei Sato, Masahiro Kazama, Haruaki Yatabe, Tairiku Ogihara, Tetsuro Onishi, and Hiroshi Nakagawa
Company Recommendation for New Graduates via Implicit Feedback Multiple Matrix Factorization with Bayesian Optimization
Derrick Spell, Ling-Yong Wang, Richard Shomer, Bahador Nooraei, Jarrell Waggoner, Xiao-Han Zeng, Jae Chung, Kai-Chen Cheng, and Daniel Kirsche
QED: Groupon's ETL management and curated feature catalog system for machine learning
Thibaud Nesztler, Don Kasper, Michael Georgescu, Sophie Loire, and Igor Mezic
Uniformization, organization, association and use of metadata from multiple content providers and manufacturers: A close look at the Building Automation System (BAS) sector.
Ilaria Bordino, Andrea Ferretti, Marco Firrincieli, Francesco Gullo, Marcello Paris, Stefano Pascolutti, and Gianluca Sabena
Hermes: A distributed-messaging tool for NLP
Jiejun Xu, Samuel Johnson, and Kang-Yu Ni
Cross-Modal Event Summarization: A Network of Networks Approach
Hongfeng Chai, Hao LIU, Xibo Zhou, Yanjun Xu, Shuo He, Jinzhi Hua, Dongjie He, and Weihuai Liu
UStore: An Optimized Storage System for Enterprise Data Warehouses at UnionPay
Rajaraman Kanagasabai, Anitha Veeramani, Hu Shangfeng, Kajanan Sangaralingam, Ying Li, and Giuseppe Manai
Classification of Massive Mobile Web Log URLs for Customer Profiling & Analytics
Vinay Deolalikar
Extensive Large-Scale Study of Error Surfaces in Sampling-Based Distinct Value Estimators for Databases
Li Zhou, Yinglong Xia, Hui Zang, Jian Xu, and Mingzhen Xia
An Edge-Set Based Large Scale Graph Processing System
Leonardo Millefiori, Dimitrios Zissis, Luca Cazzanti, and Gianfranco Arcieri
A distributed approach to estimating sea port operational regions from lots of AIS data
Ljiljana Stojanovic, Marko Dinic, Nenad Stojanovic, and Aleksandar Stojadinovic
Big-data- driven Anomaly Detection in Industry (4.0): an approach and case study
Luca Cazzanti, Antonio Davoli, and Leonardo Millefiori
Automated Port Traffic Statistics: From Raw Data to Visualisation
Nancy Grady
Knowledge Discovery in Data Science: KDD Meets Big Data
Amita Gajewar, Jignesh Parmar, Lizhong Wu, and Ramana Yerneni
Forecasting Squatting of Demand in Display Advertising
Yiming Kong, Hui Zang, and Xiaoli Ma
Human Network Usage Patterns Revealed by Telecom Data