IEEE Big Data 2017 Accepted Papers

Main Conference

Paper IDRegular Papers
BigD216Jonghyun Bae, Hakbeom Jang, Wenjing Jin, Jun Heo, Jaeyoung Jang, Joo-Young Hwang, Sangyeun Cho, and Jae W. Lee,
Jointly Optimizing Task Granularity and Concurrency for In-Memory MapReduce Frameworks
BigD224Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, 
Sampling Algorithms to Update Truncated SVD
BigD233Guyue Han and Harish Sethu, 
Closed Walk Sampler: An Efficient Method for Estimating the Spectral Radius of Large Graphs
BigD238Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Matei Zaharia, and Saman Amarasinghe, 
Making Caches Work for Graph Analytics
BigD252Ioannis Giannakopoulos, Dimitrios Tsoumakos, and Nectarios Koziris, 
A Decision Tree Based Approach Towards Adaptive Modeling of Big Data Applications
BigD254Bo Peng, Bingjing Zhang, Langshi Chen, Mihai Avram, Robert Henschel, Craig Stewart, Shaojuan Zhu, Emily Mccallum, Lisa Smith, Tom Zahniser, Jon Omer, and Judy Qiu, 
HarpLDA+: Optimizing Latent Dirichlet Allocation for Parallel Efficiency
BigD257Diego Marrón, Eduard Ayguadé, José R. Herrero, Jesse Read, and Albert Bifet, 
Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams
BigD260Alessandro Lulli, Luca Oneto, and Davide Anguita, 
Crack Random Forest for Arbitrary Large Datasets
BigD274Giambattista Amati, Simone Angelini, Giorgio Gambosi, Gianluca Rossi, and Paola Vocca, 
Estimation of distance-based metrics for very large graphs with MinHash Signatures
BigD276Jingyuan Zhang, Chunt-Ta Lu, Bokai Cao, Yi Chang and Philip S. Yu, 
Connecting Emerging Relationships from News via Tensor Factorization
BigD301Hiroshi Inoue, 
Fast Interpolation of Grid Data at a Non-Grid Point
BigD303Lei Zheng, Bokai Cao, Vahid Noroozi, Philip S. Yu, and Nianzu Ma, 
Hierarchical Collaborative Embedding for Context-Aware Recommendations
BigD313Pankaj Goel, Prerna Jain, Aniruddha Datta, and M.Sam Mannan, 
Application of Big Data analytics in process safety and risk management
BigD314Sreyasee Das Bhattacharjee, Ashit Talukder, and Bala Venkatram Balantrapu, 
Active Learning Based News Veracity Detection with Feature Weighting and Deep-Shallow Fusion
BigD316Pasan Karunaratne, Masud Moshtaghi, Shanika Karunasekera, Aaron Harwood, and Trevor Cohn, 
Multi-step Prediction with Missing Smart Sensor Data using Multi-task Gaussian Processes
BigD327Natascha Harth and Christos Anagnostopoulos, 
Quality-aware Aggregation & Predictive Analytics at the Edge
BigD333Luna Xu, Seung-Hwan Lim, Min Li, Ali R. Butt, and Ramakrishnan Kannan, 
Scaling Up Data-Parallel Analytics Platforms: Linear Algebraic Operation Cases
BigD343Haoyu Wang, Jiaqi Gong, Yan Zhuang, Haiying Shen, and John Lach, 
HealthEdge: Task Scheduling for Edge Computing with Health Emergency and Human Behavior Consideration in Smart Homes
BigD345Xiaowei Jia, Yifan Hu, Ankush Khandelwal, Anuj Karpatne, and Vipin Kumar, 
Joint Sparse Auto-encoder: A Semi-supervised Spatio-temporal Approach in Mapping Large-scale Croplands
BigD350Shahab Helmi and Farnoush Banaei-Kashani, 
Spatiotemporal Range Pattern Queries on Large-scale Co-movement Pattern Datasets
BigD351Lichao Sun, Xiaokai Wei, Jiawei Zhang, Lifang He, Philip S. Yu, and Witawas Srisa-an, 
Contaminant Removal for Malware Detection on Android
BigD357Abhinav Maurya and Rahul Telang, 
Bayesian Multi-View Models for Member-Job Matching and Personalized Skill Recommendations
BigD365Mai Nguyen, Daniel Crawl, Jianxin Li, Dylan Uys, and Ilkay Altintas, 
Automated Scalable Detection of Location-Specific Santa Ana Conditions from Weather Data using Unsupervised Learning
BigD368Konstantinos Lolos, Ioannis Konstantinou, Verena Kantere, and Nectarios Koziris, 
Elastic Management of Cloud Applications using Adaptive Markov Models
BigD375Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D Sculley, 
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
BigD376Roohollah Etemadi and Jianguo Lu, 
Bias Correction in Clustering Coefficient Estimation
BigD379Sheng Li and Yun Fu, 
Robust Multi-Label Semi-Supervised Classification
BigD393Mohammad Asghari and Cyrus Shahabi, 
On On-line Task Assignment in Spatial Crowdsourcing
BigD394Valerie Hayot-Sasson, Yongping Gao, Yuhong Yan, and Tristan Glatard, 
Sequential algorithms to split and merge ultra-high resolution 3D images
BigD395Yuan Zhang, Chen Lin, Min Chi, Julie Ivy, Muge Capan, and Jeanne M. Huddleston, 
LSTM for Septic Shock: Adding Unreliable Labels to Reliable Predictions
BigD399Philipp Baumann, Dorit Hochbaum, and Quico Spaen, 
High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics
BigD402Panagiotis Liakos, Alexandros Ntoulas, and Alex Delis, 
CoEuS: Community Detection via Seed-set Expansion on Graph Streams
BigD404Panagiotis Liakos, Alexandros Ntoulas, and Alex Delis, 
Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams
BigD409Xiaoli Li, Sai Nivedita Chandrasekaran, and Jun Huan, 
Lifelong Multi-Task Multi-View Learning Using Latent Spaces
BigD410Tong Yang, Binchao Yin, Hang Li, Muhammad Shahzad, Steve Uhlig, Bin Cui, and Xiaoming Li, 
Rectangular Hash Table: Bloom Filter and Bitmap Assisted Hash Table with High Speed
BigD418Yueyao Wang, Qinmin Vivian Hu, Yang Song, and Liang He, 
Potentiality of Healthcare Big data: Improving Search by Automatic Query Reformulation
BigD425Ilir Fetai, Alexander Stiemer, and Heiko Schuldt, 
QuAD: A Quorum Protocol for Adaptive Data Management in the Cloud
BigD432Foteini Katsarou, Nikos Ntarmos, and Peter Triantafillou, 
Hybrid Algorithms for Subgraph Pattern Queries in Graph Databases: An Evaluation
BigD433Chuxu Zhang, Lu Yu, Xiangliang Zhang, and Nitesh Chawla, 
ImWalkMF: Joint Matrix Factorization and Implicit Walk Integrative Learning for Recommendation
BigD437Axel-Cyrille Ngonga Ngomo, Michael Hoffmann, Ricardo Usbeck, and Kunal Jha, 
Holistic and Scalable Ranking of RDF Data
BigD439Alex Watson, Deepigha Vittal Babu, and Suprio Ray, 
Sanzu: A Data Science Benchmark
BigD448Natalia Ponomareva, Thomas Colthurst, Gilbert Hendry, Salem Haykal, and Soroush Radpour, C
ompact Multi-Class Boosted Trees
BigD449Chandramani Chaudhary, Poonam Goyal, and Yi-Ping Phoebe Chen, 
Exploiting Visual and Textual Neighborhood Information to Improve Image-Tag Relevance
BigD457Alec Heifetz, Vaikkunth Mugunthan, and Lalana Kagal, 
Shade: A Differentially-Private Wrapper For Enterprise Big Data
BigD462Baoxin Zhao, Cheng-Zhong Xu, and Siyuan Liu, 
A Data-Driven Congestion Diffusion Model for Characterizing Traffic in Metrocity Scales
BigD476Sheikh Motahar Naim, Arnold Boedihardjo, and M. Shahriar Hossain, 
A Scalable Model for Tracking Topical Evolution in Large Document Collections
BigD491Cheng-Chin Tu, Mi-Yen Yeh, and Tei-Wei Kuo, 
A Fast Non-Volatile Memory aware Algorithm for Generating Random Scale-Free Networks
BigD497Haekyu Park, Jinhong Jung, and U Kang, 
A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems
BigD505Nguyen Vo, Kyumin Lee, and Thanh Tran, 
MRAttractor: Detecting Communities from Large-Scale Graphs
BigD512Jianfeng Jia, Chen Li, and Michael Carey, 
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
BigD516Kalyan Veeramachaneni, Thomas Swearingen, and Arun Ross, 
Delphi: A multi-user, multi-method cloud based model exploration system
BigD520Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, and Agus Trisnajaya Kwee, 
BTCI: a New Framework for Identifying Congestion Cascades Using Bus Trajectory Data
BigD525Mehrnaz Najafi, Lifang He, and Philip S. Yu, 
Error-Robust Multi-View Clustering
BigD526Daniel (Yue) Zhang, Dong Wang, and Yang Zhang, 
Constraint-Aware Dynamic Truth Discovery in Big Data Social Media Sensing
BigD527Nathanael Cheriere and Gabriel Antoniu, 
How Fast Can One Scale Down a Distributed File System?
BigD539Lorenzo De Stefani, Erisa Terolli, and Eli Upfal, 
Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams
BigD549Jim Pivarski, Peter Elmer, Brian Bockelman, and Zhe Zhang, 
Fast Access to Columnar, Hierarchical Data via Code Transformation
BigD552Xinli Yu, Zheng Chen, Jianliang Gao, Bo Song, Wei-Shih Yang, Bo Ji, Xiaohua Hu, and Erjia Yan, 
Large-Scale Joint Topic, Sentiment & User Preference Analysis for Online Reviews
BigD560Suchismit Mahapatra and Varun Chandola, 
S-Isomap++: Multi Manifold Learning from Streaming Data
BigD564Jun Hu, Yuxin Wang, and Ping Li, 
Online City-scale Hyper-local Event Detection via Analysis of Social Media and Human Mobility
BigD571Srinivasan Venkatramanan, Sichao Wu, Bowen Shi, Achla Marathe, Madhav Marathe, Stephen Eubank, Lalit Sah, A.P. Giri, Luke Colavito, Nitin S, Sridhar V, Asokan R, Rangaswamy Muniappan, George Norton, and Abhijin Adiga, 
Towards Robust Models of Food Flows and Their Role in Invasive Species Spread
BigD590Vachik Dave, Nesreen Ahmed, and Mohammad Hasan, 
E-CLoG: Counting Edge-Centric Local Graphlets
BigD594Balaji Palanisamy, Chao Li, and Prashant Krishnamurthy, 
Group Privacy-aware Disclosure of Association Graph Data
BigD596Yuki Ito, Ryo Matsumiya, and Toshio Endo, 
ooc_cuDNN: Accommodating Convolutional Neural Networks over GPU Memory Capacity
BigD602Sarasi Lalithsena, Sujan Perera, Pavan Kapanipathi, and Amit Sheth, 
Domain-specific Hierarchical Subgraph Extraction: A Recommendation Use Case
BigD604Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, and Jinbo Bi, 
VIGAN: Missing View Imputation with Generative Adversarial Networks
BigD608Yizhou Yan, Lei Cao, and Elke Rundensteiner, 
Distributed Top-N Local Outlier Detection in Big Data
BigD612Ahsanul Haque, Bo Dong, Yifan Li, Yang Gao, Latifur Khan, and Mohammad Masud, 
Multistream Regression with Asynchronous Concept Drift Detection
BigD613Limeng Cui, Jiawei Zhang, Zhensong Chen, Yong Shi, and Philip S. Yu, 
Inverse Extreme Learning Machine for Learning with Label Proportions
BigD635Lei Huang, Weijia Xu, Si Liu, Venktesh Pandey, and Natalia Ruiz Juri, 
Enabling Versatile Analysis of Large Scale Traffic Video Data with Deep Learning and HiveQL
BigD643Xiaodong Yu, Kaixi Hou, Hao Wang, and Wu-chun Feng, 
Hierarchical Automata Construction for Approximate Pattern Matching on Automata Processors
BigD645Shashank Gugnani, Xiaoyi Lu, Houliang Qi, Li Zha, and Dhabaleswar K. Panda, 
Characterizing and Accelerating Indexing Techniques on Distributed Ordered Tables
BigD646Feng Chen, Chunpai Wang, and Jin-Hee Cho, 
Collective Subjective Logic: Scalable Uncertainty-based Opinion Inference
BigD648Ismini Lourentzou, Alex Morales, and Chengxiang Zhai, 
Text-based Geolocation Prediction of Social Media Users with Neural Networks
BigD649Arnab K. Paul, Arpit Goyal, Feiyi Wang, Sarp Oral, Ali R. Butt, Michael J. Brim, and Sangeetha B. Srinivasa, 
I/O Load Balancing for Big Data HPC Applications
BigD650HyeongSik Kim, Padmashree Ravindra, and Kemafor Anyanwu, 
A Semantics-Aware Storage Framework for Scalable Processing of Knowledge Graphs on Hadoop
BigD657Xiaoyi Lu, Haiyang Shi, Dipti Shankar, and Dhabaleswar K. Panda, 
Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System
BigD664Ryoya Kaneko, Kohei Miyaguchi, and Kenji Yamanishi, 
Detecting Changes in Streaming Data with Information-Theoretic Windowing
BigD669Xi Zhang, Yu Zeng, Xiao-Bo Jin, Zhi-Wei Yan, and Guang-Gang Geng, 
Boosting the Phishing Detection Performance by Semantic Analysis
Paper IDShort Papers
BigD213Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello, 
In-Depth Exploration of Single-Snapshot Lossy Compression Techniques for N-Body Simulations
BigD219Mohammad Ghassemi, Willow Jarvis, Tuka Alhanai, Emery Brown, Roger Mark, and M. Brandon Westover, 
An Open-Source Tool For The Transcription of Paper-Spreadsheet Data
BigD220Giuseppe Cuccu, Somayeh Danafar, Philippe Cudré-Mauroux, Martin Gassner, Stefano Bernero, and Krzysztof Kryszczuk, 
A Data-Driven Approach to Predict NOx-Emissions of Gas Turbines
BigD230Isabelle Comyn-Wattiau and Jacky Akoka, 
Model-Driven Reverse Engineering of NoSQL Property Graph Databases
BigD231Allard Jan-Jaap van Altena, Perry D Moerland, Aeilko H Zwinderman, and Sílvia Delgado Olabarriaga, 
Analysis of the Term 'Big Data': Usage in Biomedical Publications
BigD234Juan Colmenares, Reza Dorrigiv, and Daniel Waddington, 
A Single-Node Datastore for High-Velocity Multidimensional Sensor Data
BigD236Michael Mercier, David Glesser, Yiannis Georgiou, and Olivier Richard, 
Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics
BigD237Christian Schmid and Bruce Desmarais, 
Exponential Random Graph Models with Big Networks: Maximum Pseudolikelihood Estimation and the Parametric Bootstrap
BigD239Diego Granziol and Stephen Roberts, 
Entropic Determinants
BigD247Zhihua Zhu, Di Yao, Jianhui Huang, and Jingping Bi, 
Sub-trajectory- and Trajectory-Neighbor-based Outlier Detection over Trajectory Streams
BigD264Tingyang Xu, Tan Yan, Dongjin Song, Wei Cheng, Haifeng Chen, Geoff Jiang, and Jinbo Bi, 
Identifying and Quantifying Nonlinear Structured Relationships in Complex Manufactural Systems
BigD266Xibo Zhou, Ye Ding, Fengchao Peng, Qiong Luo, and Lionel M. Ni, 
Detecting Unmetered Taxi Rides from Trajectory Data
BigD294Hoang Anh Dau, Diego Furtado Silva, François Petitjean, Germain Forestier, Anthony Bagnall, and Eamonn Keogh, 
Judicious Setting of Dynamic Time Warping’s Warping Window Width allows more Accurate Classification of Time Series
BigD309Ankit Desai and Sanjay Chaudhary, 
Distributed Decision Tree v.2.0
BigD310Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, and Mustapha Lebbah, 
A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework
BigD317Marzieh Bakhshandeh, Dennis M.M. Schunselaar, Henrik Leopold, and Hajo A. Reijers, 
Improving Cost and Quality of Care by Predicting Treatment Iterations \\in the Dental Implantology Process
BigD318Heng Zhou and Haiying Shen, 
CStorage: An Efficient Classification-based Image Storage System in Cloud Datacenters
BigD321Dong Chen and David Irwin, 
Weatherman: Exposing Weather-based Privacy Threats in Big Energy Data
BigD323Jiuyong Li, Jixue Liu, Lin Liu, Thuc Le, Saisai Ma, and Yizhao Han, 
Discrimination detection by causal effect estimation
BigD331Chenwei Zhang, Nan Du, Wei Fan, Yaliang Li, Chun-Ta Lu, and Philip S. Yu, 
Bringing Semantic Structures to User Intent Detection in Online Medical Queries
BigD335Byron Gao, Robert Tung, and Yong Yang, 
Iterative Matrix Correlation for Bisection Clustering
BigD349Xueying Guo, George Trimponias, Xiaoxiao Wang, Zhitang Chen, Yanhui Geng, and Xin Liu, 
Cellular Network Configuration via Online Learning and Joint Optimization
BigD356Fengchao Peng, Yudian Ji, Qiong Luo, and Lionel M. Ni, 
Event-Based Non-Parametric Clustering of Team Sport Trajectories
BigD361Jian Cao, 
Personalized Flight Recommendations via Paired Choice Modeling
BigD362Ashish Tapdiya and Daniel Fabbri, 
A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics
BigD369Dapeng Dong and John Herbert, 
Compressed Domain-Specific Data Processing and Analysis
BigD373Helge Holzmann, Vinay Goel, and Emily Novak Gustainis, 
Universal Distant Reading through Metadata Proxies with ArchiveSpark
BigD374Ilia Pietri, Yannis Chronis, and Yannis Ioannidis, 
Multi-objective Optimization of Scheduling Dataflows on Heterogeneous Cloud Resources
BigD382Alexander Denzler and Michael Kaufmann, 
Toward Granular Knowledge Analytics for Data Intelligence
BigD385Sheng Li, Hongfu Liu, Zhiqiang Tao, and Yun Fu, 
Multi-View Graph Learning with Adaptive Label Propagation
BigD387Liang Ma, Guohong Cao, and Lance Kaplan, 
Graphical Approach for Influence Maximization in Social Networks Under Generic Threshold-based Non-submodular Model
BigD391Hasan Kurban and Mehmet Dalkilic, 
A novel approach to optimization of iterative machine learning algorithms: over heap structure
BigD392Xiao Meng and Lukasz Golab, 
Optimal Reducer Placement to Minimize Data Transfer in MapReduce-Style Processing
BigD400Masato Asahara and Ryohei Fujimaki, 
Distributed Bayesian Piecewise Sparse Linear Models
BigD403Yuchang Xu and Jian Cao, 
OTPS: A Decision Support Service for Optimal Airfare Ticket Purchase
BigD415Xinhui Tian, Yuanqing Guo, and Jianfeng Zhan, 
Towards Memory and Computation Efficient Graph Processing on Spark
BigD416Shuo Wang, Richard Sinnott, and Surya Nepal, 
Privacy-protected Place of Activity Mining on Big Location Data
BigD417Shuo Wang, Richard Sinnott, and Surya Nepal, 
Sensitive Gazetteer Discovery and Protection for Mobile Social Media Users
BigD428Masahiro Yokoyama, Takahiro Hara, and Sanjay Madria, 
Efficient Diversified Set Monitoring for Mobile Sensor Stream Environments
BigD429Ramyar Saeedi, Skyler Norgaard, and Assefaw Gebremedhin, 
A Closed-loop Deep Learning Architecture for Robust Activity Recognition using Wearable Sensors
BigD430Mohammad Hossein Namaki, Peng Lin, and Yinghui Wu, 
Event Pattern Discovery by Keywords in Graph Streams
BigD441Ebad Ahmadzadeh and Philip Chan, 
Mining Pros and Cons of Actions from Social Media for Decision Support
BigD442Celestine Dünner, Thomas Parnell, Kubilay Atasu, Manolis Sifalakis, and Haralampos Pozidis, 
Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark
BigD445Daniel (Yue) Zhang, Dong Wang, Hao Zheng, Xin Mu, Qi Li, and Yang Zhang, 
Large-scale Point-of-Interest Category Prediction Using Natural Language Processing Models
BigD446Angelo Furno, Nour Eddin El Faouzi, Rajesh Sharma, and Eugenio Zimeo, 
Two-level Clustering Fast Betweenness Centrality Computation for Requirement-driven Approximation
BigD450Tianqing Zhu, Ping Xiong, Gang Li, Wanlei Zhou, and Philip S. Yu, 
Differentially Private Query Learning: from Data Publishing to Model Publishing
BigD454Xing Su, Yuan Yao, Qing He, Jie Lu, and Hanghang Tong, 
Personalized Travel Mode Detection with Smartphone Sensors
BigD455Chuishi Meng, Yu Cui, Qing He, Lu Su, and Jing Gao, 
Travel Purpose Inference with GPS Trajectories, POIs, and Geo-tagged Social Media Data
BigD463Ming Zeng, Tong Yu, Xiao Wang, Le T. Nguyen, Ole J. Mengshoel, Ian Lane, and Joy Zhang, 
Semi-Supervised Convolutional Neural Networks for Human Activity Recognition
BigD464Yangwen Yu, James J. Q. Yu, Victor O. K. Li, and Jacqueline C. K. Lam, 
Low-rank Singular Value Thresholding for Recovering Missing Air Quality Data
BigD469Zhitang Chen, Ke He, Jian Li, and Yanhui Geng, 
Seq2Img: A Sequence-to-Image based Approach Towards IP Traffic Classification using Convolutional Neural Networks
BigD471Naama Kraus, David Carmel, and Idit Keidar, 
Fishing in the Stream: Similarity Search over Endless Data
BigD473Nikos Zacheilas, Stathis Maroulis, and Vana Kalogeraki, 
Dione: Profiling Spark Applications Exploiting Graph Similarity
BigD475Lars Arge, Mathias Rav, Svend C. Svendsen, and Jakob Truelsen, 
External Memory Pipelining Made Easy With TPIE
BigD493Bilal Akil and Uwe Roehm, 
On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science
BigD496Jiankun Huang and Wenjun Wu, 
T-BMIRT: Estimating Representations of Student Knowledge and Educational Components in Online Education
BigD501Amit Pande and Vishal Ahuja, 
WEAC: Word Embeddings for Anomaly Classification from Event Logs
BigD506Xian Wu, Yuxiao Dong, Jun Tao, Chao Huang, and Nitesh Chawla, 
Reliable Fake Review Detection via Modeling Temporal and Behavioral Patterns
BigD511Hung Tran-The and Koji Zettsu, 
Discovering Co-occurrence Patterns of Heterogeneous Events from Unevenly-distributed Spatiotemporal Data
BigD515Tatsuru Kobayashi, Shin Matsushima, Lee Taito, and Kenji Yamanishi, 
Discovering Potential Traffic Risk in Japan using Supervised Learning Approach
BigD528Martin Koehler, Alex Bogatu, Cristina Civili, Nikolaos Konstantinou, Edward Abel, Alvaro A A Fernandes, John Keane, Leonid Libkin, and Norman W. Paton, 
Data Context Informed Data Wrangling
BigD531Wenbo Zhang, Dheeraj Kumar, and Satish Ukkusuri, 
Exploring the Dynamics of Surge Pricing in Mobility-on-Demand Taxi Services
BigD541Sam Wood, Rohit Muthyala, Yi Jin, Hua Gao, Yixing Qin, Amit Rai, and Nilaj Rukadikar, 
Automated Industry Classification with Deep Learning
BigD547Michael Nelson, Sridhar Radhakrishnan, Amlan Chatterjee, and Chandra Sekharan, 
Queryable Compression on Streaming Social Networks
BigD557Kubilay Atasu, Thomas Parnell, Celestine Duenner, Manolis Sifalakis, Haralampos Pozidis, Vasileios Vasileiadis, Michail Vlachos, Cesar Berrospi, and Abdel LAbbi, 
Scalable Document Similarity Using Linear-Complexity Word Mover's Distance
BigD559Aritra Mandal and Mohammad Al Hasan, 
A Distributed k-Core Decomposition Algorithm on Spark
BigD579Mohiuddin Solaimani, Sayeed Salam, and Latifur Khan, 
RePAIR: Recommend Political Actors In Real-time From News Websites
BigD581Robert Bridges, Jessie Jamieson, and Joel Reed, 
Setting the threshold for high throughput detectors: A mathematical approach for ensembles of dynamic, heterogeneous, probabilistic anomaly detectors
BigD583Takeaki Uno, Hiroki Maegawa, Takanobu Nakahara, Yukinobu Hamuro, Ryo Yoshinaka, and Makoto Tatsuta, 
Micro-Clustering by Data Polishing
BigD585Xinjiang Lu, Zhiwen Yu, Chuanren Liu, Yanchi Liu, Hui Xiong, and Bin Guo, 
Forecasting the Rise and Fall of Volatile Point-of-Interests
BigD591Axel Oehmichen, Florian Guitton, Kai Sun, Jean Grizet, Thomas Heinis, and Yike Guo, 
eTRIKS Analytical Environment: A Modular High Performance Framework for Medical Data Analysis
BigD593Chung Ming Cheung, Palash Goyal, Viktor K. Prasanna, and Arash Saber Tehrani, 
OReONet: Deep Convolutional Network for Oil Reservoir Optimization
BigD618Jose Cadena, Saliya Ekanayake, and Anil Vullikanti, 
Fast Graph Scan Statistics Optimization Using Algebraic Fingerprints
BigD622Mohammed Alawad, Hong-Jun Yoon, and Georgia Tourassi, 
Energy Efficient Stochastic-Based Deep Spiking Neural Networks for Sparse Datasets
BigD633Poonam Goyal, Jagat Sesh Challa, Shivin Shrivastava, and Navneet Goyal, 
AnyFI: An Anytime Frequent Itemset Mining Algorithm for Data Streams
BigD638Sumit Purohit, Lawrence Holder, and Sutanay Choudhury, 
Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection
BigD641Er-Chen Huang, Hsing-Kuo Pao, and Yuh-Jye Lee, 
Big Active Learning
BigD661Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane, 
Big Data Cross-Domain Dynamic Topic Modeling Analytics for Discovering Scientific Influence
BigD662Hu Xu, Sihong Xie, Lei Shu, and Philip S. Yu, 
Product Function Need Recognition via Semi-supervised Attention Network
BigD663MD S Q Zulkar Nine, Kemal Guner, Ziyun Huang, Xiangyu Wang, Jinhui Xu, and Tevfik Kosar, 
Big Data Transfer Optimization Based on Offline Knowledge Discovery and Adaptive Real-time Sampling
BigD665Md Wasi-ur- Rahman, Nusrat Islam, Xiaoyi Lu, and Dhabaleswar Panda, 
NVMD: Non-Volatile Memory Assisted Design for Accelerating MapReduce and DAG Execution Frameworks on HPC Systems
BigD673Lina Yu, Michael Rilee, Yu Pan, Feiyu Zhu, Kwo-Sen Kuo, and Hongfeng Yu, 
Visual Analytics with Unparalleled Variety Scaling for Big Earth Data
BigD677Alexander Ulanov, Manish Marwah, Mijung Kim, Roshan Dathathri, Carlos Zubieta, and Jun Li, 
Sandpiper: Scaling Probabilistic Inferencing to Large Scale Graphical Models
BigD678Emanuele Massaro, Stanislav Sobolevsky, Iva Bojic, Juan Murillo Arias, and Carlo Ratti, 
Predicting regional economic indices using big data of individual bank card transactions
BigD679Ricardo Baeza-Yates and Zeinab Liaghat, 
Quality-Efficiency Trade-offs in Machine Learning for Text Processing
BigD680Peter Baumann, 
Standardizing Big Earth Datacubes
BigD682Salima Benbernou and Mourad Ouziri, 
Enhancing Data Quality by Cleaning Inconsistent Big RDF Data.

Industry and Government

Paper IDRegular Papers
N205Lay Wai Kong, 
Performance Optimization In Scale-out Storage Using Design Of Experiment As Heuristic
N209Yihua Astle, Xuning Tang, and Craig Freeman, 
Application of Dynamic Logistic Regression with Unscented Kalman Filter in Predictive Coding
N210Mansurul Bhuiyan and Mohammad Hasan, 
RAVEN: Web-based Smart Home Exploration System Through Interactive Pattern Discovery
N214Ishita Khan, Prathyusha Senthil Kumar, Daniel Miranda, and David Goldberg, 
What is Skipped: Finding Desirable Items in E-Commerce Search by Discovering the Worst Title Tokens
N215Warut D. Vijitbenjaronk, Jinho Lee, and Toyotaro Suzumura, 
Scalable Time-Versioning Support for Property Graph Databases
N217Xuchao Zhang, Liang Zhao, Zhiqian Chen, Arnold Boedihardjo, Dai Jing, and Chang-Tien Lu, 
Trendi: Tracking Stories in News and Microblogs via Emerging, Evolving and Fading Topics
N218Zhiwei Zhang, Ning Chen, Jun Wang, and Luo Si, 
SMART: Sponsored Mobile App RecommendaTion by Balancing App Downloads and Appstore Profit
N220Justin McHugh, Paul Cuddihy, Jenny Williams, Kareem Aggour, Vijay Kumar, and Varish Mulwad, 
Integrated Access to Big Data Polystores through a Knowledge-driven Framework
N221Jacob Montiel, Albert Bifet, and Talel Abdessalem, 
Predicting Over-Indebtedness on Batch and Streaming Data
N223Syed Yousaf Shah, Zengwen Yuan, Songwu Lu, and Petros Zerfos, 
Dependency Analysis of Cloud Applications for Performance Monitoring using Recurrent Neural Networks
N226Derrick Spell, Xiao-Han Zeng, Jae-Young Chung, Bahador Nooraei, Ricki Shomer, Ling-Yong Wang, James Gibson, and Daniel Kirsche, 
Flux: Groupon's automated, scalable machine learning platform
N229Wen-Yuan Zhu, Wen-Yueh Shih, Ying-Hsuan Lee, Wen-Chih Peng, and Jiun-Long Huang, 
A Gamma-based Regression for Winning Price Estimation in Real-Time Bidding Advertising
N231Nenad Stojanovic, Marko Dinic, and Ljiljana Stojanovic, 
A data-driven approach for multivariate contextualized anomaly detection: industry use case
N233Timothy Kennedy, Robert Provence, James Broyan, Patrick Fink, Phong Ngo, and Lazaro Rodriguez, 
Topic Models for RFID Data Modeling and Localization
N236Ye Ouyang and Zhongyuan Li, 
APP-SON: Application Characteristics Driven SON to Optimize 4/5G Network Performance and Quality of Experience
N239Karthikeyan Natesan Ramamurthy, Dennis Wei, Emily Ray, Moninder Singh, Vijay Iyengar, Dmitriy Katz-Rogozhnikov, Jingwei Yang, Kevin Tran, and Gigi Yuen-Reed, 
A Configurable, Big Data System for On-Demand Healthcare Cost Prediction
N240Nathaniel Huber-Fliflet, Jianping Zhang, Haozhen Zhao, Robert Keeling, and Rishi Chhatwal, 
Empirical Evaluations of Active Learning Strategies in Legal Document Review
N241Zheng Chen, Xinli Yu, Chi Zhang, Jin Zhang, Cui Lin, Xiaohua Hu, Erjia Yan, and Wei-Shih Yang, 
Fast Botnet Detection From Streaming Logs Using Online Lanczos Method
N242Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Sameena Shah, Robert Martin, and John Duprey, 
Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data
N243Yuheng Du, Alexander Herzog, Andre Luckow, Ramu Nerella, and Amy Apon, 
Representativeness of Latent Dirichlet Allocation Topics Estimated from Data Samples with Application to Common Crawl
N250Simon Bin, Patrick Westphal, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo, 
Implementing Scalable Structured Machine Learning for Big Data in the SAKE Project
N251Vijil Chenthamarakshan, Dharmashankar Subramanian, Debarun Bhattachrajya, Ruben Torrado, Jeff Kephart, and Jesus Rios, 
A Cognitive Assistant for Risk Identification and Modeling
N252Youngho Kim, Petros Zerfos, Vadim Sheinin, and Nancy Greco, 
Ranking the Importance of Ontology Concepts Using Document Summarization Techniques
N260Hyunjong Lee, Youngin Jo, Sanghyuk Chun, and Gwangseop Gim, 
A Study on Intelligent Personalized Push Notification with User History
Paper IDShort Papers
N206 George Mathew, 
Architectural Considerations for Highly Scalable Computing to Support On-demand Video Analytics
N211 Giannis Spiliopoulos, Konstantinos Chatzikokolakis, Dimitrios Zissis, Evmorfia Biliri, Dimitrios Papaspyros, and Giannis Tsapelas, 
Knowledge extraction from maritime spatiotemporal data: An evaluation of clustering algorithms on Big Data
N224 Martin Ringsquandl, Steffen Lamparter, Evgeny Kharlamov, Raffaello Lepratti, Daria Stepanova, Peer Kröger, and Ian Horrocks, 
On Event-Driven Leaning of Knowledge in Smart Factories: The Case of Siemens
N232 Xuchao Zhang, Zhiqian Chen, Liang Zhao, Arnold Boedihardjo, and Chang-Tien Lu, 
TRACES: Generating Twitter Stories via Shared Subspace and Temporal Smoothness
N234 Leonardo Maria Millefiori, Paolo Braca, and Gianfranco Arcieri, 
Scalable Distributed Change Detection and its Application to Maritime Traffic
N245 Nirupama Appikatala, Miao Chen, Michael Natkovich, and Joshua Walters, 
Demystifying Dark Matter for Online Experimentation
N247 Russell Chen, Miao Chen, Mahendrasinh Ramsinh Jadav, Joonsuk Bae, and Don Matheson, 
Faster Online Experimentation by Eliminating Traditional A/A Validation
N249 Kevin Pratt, 
Linking Many Unusual Co-Incidences
N255 Neela Avudaiappan, Alexander Herzog, Sneha Kadam, Yuheng Du, Jason Thatcher, and Ilya Safro, 
Detecting and Summarizing Emergent Events in Microblogs and Social Media Streams by Dynamic Centralities
N257 Ferosh Jacob, Ilamgumaran Karunanithi, Pramod Salian, and Ravi Sambhu, 
BBC: A DSL for Designing Cloud-based Heterogeneous Bigdata Pipelines
N258 Emmanuel Oyekanlu, 
Predictive Edge Computing for Time Series of Industrial IoT and Large Scale Critical Infrastructure based on Open-source Software Analytics of Big Data
N262 A Nambiar, N Reddy, and Debo Dutta, 
Connected Health: Opportunities and Challenges