IEEE Big Data 2016 Accepted Papers
Main Conference
Regular Paper |
---|
Ricardo Calix and Rajesh Sankaran On the Feasibility of an Embedded Machine Learning Processor for Intrusion Detection |
Jinwei Liu, Haiying Shen, and Husnu Narman CCRP: Customized Cooperative Resource Provisioning for High Resource Utilization in Clouds |
Xiao Pan, Jiawei Zhang, Fengjiao Wang, and Philip S. Yu DistSD: Distance-based Social Discovery with Personalized Posterior Screening |
Christian B�hm, Martin Perdacher, and Claudia Plan Cache-oblivious Loops Based on a Novel Space-filling Curve |
Yida Wang, Bryn Keller, Mihai Capota, Michael Anderson, Narayanan Sundaram, Jonathan Cohen, Kai Li, Nicholas Turk-Browne, and Ted Willke Real-time Full Correlation Matrix Analysis of fMRI Data |
Heqing Huang, Cong Zheng, Junyuan Zeng, Wu Zhou, Sencun Zhu, Peng Liu, Suresh Chari, and Ce Zhang Android Malware Development on Public Malware Scanning Platforms: A Large-scale Data-driven Study |
Yuan Yuan, Meisam Fathi, Kaibo Wang, Rubao Lee, and Xiaodong Zhang Spark-GPU: High Performance In-Memory Data Processing with GPUs |
Jingyuan Zhang, Chun-Ta Lu, Mianwei Zhou, Sihong Xie, Yi Chang, and Philip S. Yu HEER: Heterogeneous Graph Embedding for Emerging Relation Detection from News |
Hongfu Liu, Yuchao Zhang, Bo Deng, and Yun Fu Outlier Detection via Sampling Ensemble |
Kai Zhao, Denis Khryashchev, Juliana Freire, Claudio Silva, and Huy Vo Predicting Taxi Demand at High Spatial Resolution: Approaching the Limit of Predictability |
Christian Beecks and Alexander Graß Multi-step Threshold Algorithm for Efficient Feature-based Query Processing in Large-scale Multimedia Databases |
Xiaoli Song and Xiaohua Hu Pairwise Topic Model and its Application to Topic Transition and Evolution |
Fang Zhou, Mohamed Ghalwash, and Zoran Obradovic A Fast Structured Regression for Large Networks |
Hao Zhang, Yuanyuan Zhu, Lu Qin, Hong Cheng, and Jeffrey Xu Yu Efficient Triangle Listing for Billion-Scale Graphs |
Angen Zheng, Alexandros Labrinidis, Panos Chrysanthis, and Jack Lange ARGO: Architecture-Aware Graph Partitioning |
Yongyi Xian, Yan Liu, and Chuanfei Xu Parallel Gathering Discovery over Big Trajectory Data |
Cheong Hee Park and Youngsoon Kang An Active Learning Method for Data Streams with Concept Drift |
Zhuozhao Li, Haiying Shen, Jeffrey Denton, and Walter Ligon Comparing Application Performance on HPC-based Hadoop Platforms with Local Storage and Dedicated Storage |
Ngot Bui, Thanh Le, and Vasant Honavar Labeling Actors in Multi-view Social Networks by Integrating Information From Within and Across Multiple Views |
Victor Giannakouris, Nikolaos Papailiou, Dimitrios Tsoumakos, and Nectarios Koziris MuSQLE: Distributed SQL Query Execution Over Multiple Engine Environments |
Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ramli, and Xin Liu Shooting a Moving Target: Motion-Prediction-Based Transmission for 360-Degree Videos |
Hariton Efstathiades, Demetris Antoniades, George Pallis, Marios Dikaiakos, Zoltán Szlávik, and Robert-Jan Sips Online Social Network Evolution: Revisiting the Twitter Graph |
Xiaoyu Ge, Yanbing Xue, Zhipeng Luo, Mohamed Sharaf, and Panos Chrysanthis REQUEST: An Interactive Big Data Exploration Framework |
Francesco Versaci, Luca Pireddu, and Gianluigi Zanetti Scalable genomics: from raw data to aligned reads on Apache YARN |
Ata Turk, Hao Chen, Anthony Byrne, John Knollmeyer, Sastry Duri, Canturk Isci, and Ayse Coskun DeltaSherlock: Identifying Changes in the Cloud |
Michael Anderson, Mihai Capotă, Javier Turek, Xia Zhu, Ted Willke, Yida Wang, Po-Hsuan Chen, Jeremy Manning, Peter Ramadge, and Kenneth Norman Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets |
HU XU, Sihong Xie, Lei Shu, and Philip S. Yu Sentence-level Extraction of Complementary Entities using Large Unlabeled Product Reviews |
Yixian Zheng, Wenchao Wu, Haipeng Zeng, Nan Cao, Huamin Qu, Mingxuan Yuan, Jia Zeng, and Lionel M. Ni TelcoFlow: Visual Exploration of Collective Behaviors Based on Telco Data |
Zhichuan Huang and Ting Zhu Leveraging Multi-Granularity Energy Data for Accurate Energy Demand Forecast in Smart Grids |
Ahsanul Haque, Zhuoyi Wang, Swarup Chandra, Latifur Khan, and Charu Aggarwal Sampling based Distributed Kernel Mean Matching using Spark |
Liuhua Chen and Haiying Shen Towards Resource-Efficient Cloud Systems: Avoiding Over-Provisioning in Demand-Prediction Based Resource Provisioning |
Subhadeep Karan and Jaroslaw Zola Exact Structure Learning of Bayesian Networks by Optimal Path Extension |
Tomoki Yoshihisa and Takahiro Hara A Low-Load Stream Processing Scheme for IoT Environments |
Hui Li, Jiangtao Cui, Xiaobin Lin, and Jianfeng Ma Improving the Utility in Differential Private Histogram Publishing: Theoretical Study and Practice |
Yosuke Oyama, Akihiro Nomura, Ikuro Sato, Hiroki Nishimura, Yukimasa Tamatsu, and Satoshi Matsuoka Predicting Statistics of Asynchronous SGD Parameters for a Large-Scale Distributed Deep Learning System on GPU Supercomputers |
Panagiotis Liakos, Alexandros Ntoulas, and Alex Delis Scalable Link Community Detection: A Local Dispersion-aware Approach |
Karuna Joshi, Aditi Gupta, Sudip Mittal, Claudia Pearce, Anupam Joshi, and Tim Finin Semantic Approach to Automating Management of Big Data Privacy Policies |
Sayan Goswami, Arghya Kusum Das, Richard Platania, Kisung Lee, and Seung-Jong Park Lazer: Distributed Memory-Efficient Assembly of Large-Scale Genomes |
Kaiji Chen and Yongluan Zhou Materialized View Selection in Feed Following Systems |
Chunqiu Zeng, Qing Wang, Wentao Wang, Tao Li, and Larisa Shwartz Online Inference for Time-varying Temporal Dependency Discovery from Time Series |
Jingyuan Yang, Chuanren Liu, Mingfei Teng, Hui Xiong, and March Liao Buyer Targeting Optimization: A Unified Customer Segmentation Perspective |
Nguyen Ho, Huy Vo, and Mai Vu, An Adaptive Information-Theoretic Approach for Identifying Temporal Correlations in Big Data Sets |
Katerina Doka, Nikolaos Papailiou, Victor Giannakouris, Dimitrios Tsoumakos, and Nectarios Koziris Mix 'n Match Multi-Engine Analytics |
Xiaowei Jia, Ankush Khandelwal, James Gerber, Kimberly Carlson, Paul West, and Vipin Kumar Learning Large-scale Plantation Mapping from Imperfect Annotators |
Mehrdad Yazdani, Bryn Taylor, Justine Debelius, Weizhong Li, Rob Knight, and Larry Smarr Using Machine Learning to Identify Major Shifts in Human Gut Microbiome Protein Family Abundance in Disease |
Sarasi Lalithsena, Pavan Kapanipathi, and Amit Sheth Harnessing Relationships for Domain-specific Subgraph Extraction: A Recommendation Use Case |
Chao Huang and Dong Wang Towards Unsupervised Home Location Inference from Online Social Media |
Eleazar Leal, Le Gruenwald, and Jianting Zhang Handling Uncertainty in Trajectories of Moving Objects in Unconstrained Outdoor Spaces |
Yanan Xu and Yanmin Zhu When Remote Sensing Data meet Ubiquitous Urban Data: Fine-Grained Air Quality Inference |
Azad Naik and Huzefa Rangwala Embedding Feature Selection for Large-scale Hierarchical Classification |
Yin Huang, Yelena Yesha, Milton Halem, Yaacov Yesha, and Shujia Zhou inMem: a distributed parallel indexed in-memory computation system for large scale data analytics |
Wei Jiang, Juan Rodriguez, and Torsten Sue Improved Methods for Static Index Pruning |
Yudian Ji, Yuda Zang, Wuman Luo, Xibo Zhou, Ye Ding, and Lionel M. Ni Clockwise Compression for Trajectory Data under Road Network Constraints |
Cuong Nguyen and Rhodes Philip Accelerating Large-scale Unstructured Mesh Queries |
Walaa Eldin Moustafa, Vicky Papavasileiou, Ken Yocum, and Alin Deutsch Giragraphy: Scaling Datalog Graph Analytics on Giraph |
Timo Bingmann, Michael Axtmann, Emanuel Jöbstl, Sebastian Lamm, Huyen Chau Nguyen, Alexander Noe, Sebastian Schlag, Matthias Stumpp, Tobias Sturm, and Peter Sanders Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++ |
Quan Zhang, Mu Qiao, Ramani Routray, and Weisong Shi H2O: A Hybrid and Hierarchical Outlier Detection Method for Large Scale Data Protection |
Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael Mahoney, and Mr Prabhat Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies |
Athanasios N. Nikolakopoulos, Antonia Korba, and John D. Garofalakis Random Surfing on Multipartite Graphs |
Naman Shah, Harshil Shah, Matthew Malensek, Sangmi Lee Pallickara, and Shrideep Pallickara Network Analysis for Identifying and Characterizing Disease Outbreak Influence from Voluminous Epidemiology Data |
Darja Krushevskaja, S. Muthukrishnan, and Willam Simpson Ad Allocation with Secondary Metrics |
Nesreen Ahmed, Ted Willke, and Ryan Rossi Local Graphlet Estimation |
Yating Zhang, Adam Jatowt, and Katsumi Tanaka How Good are Word Embeddings? Automatically Explaining Similarity of Terms |
Shiblee Sadik, Le Gruenwald, and Eleazar Leal In Pursuit of Outliers in Multi-dimensional Data Streams |
Morteza Zihayat, Zane Zhenhua Hu, Aijun An, and Yonggang Hu Distributed and Parallel High Utility Sequential Pattern Mining |
Benjamin Sirb and Xiaojing Ye Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks |
Yuan Yuan, Sihong Xie, Chun-Ta Lu, Philip S. Yu, and Jie Tang Interpretable and Effective Opinion Spam Detection via Temporal Patterns Mining across Websites |
Mansurul Bhuiyan and Mohammad Al Hasan PRIIME: A Generic Framework for Interactive Personalized Interesting Pattern Discovery |
Jianliang Gao, Bo Song, Ping Liu, Weimao Ke, Jianxin Wang, and Xiaohua Hu Parallel Top-k Subgraph Query in Massive Graphs: Computing from the Perspective of Single Vertex |
Nusrat Islam, Md Wasi-ur- Rahman, Xiaoyi Lu, and Dhabaleswar K. Panda Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage |
Poonam Goyal, Jagat Sesh Challa, Nikhil S, Aditya Mangla, Sundar S Balasubramaniam, and Navneet Goyal DD-RTREE: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms |
Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, and Dhabaleswar K. Panda High-Performance Design of Apache Spark with RDMA and Its Benefits on Various Workloads |
Ravikant Dindokar, Neel Choudhury, and Yogesh Simmhan A Meta-graph Approach to Analyze Subgraph-centric Distributed Programming Models |
Wooyeol Kim, Younghoon Kim, and Kyuseok Shim Parallel Computation of k-Nearest Neighbor Joins Using MapReduce |
Ke Zhang, Jianwu Xu, Martin Renqiang Min, Guofei Jiang, Konstantinos Pelechrinis, and Hui Zhang Automated IT System Failure Prediction: A Deep Learning Approach |
Jianpeng Xu, Jiayu Zhou, Pang-Ning Tan, Xi Liu, and Lifeng Luo WISDOM: Weighted Incremental Spatio-Temporal Multi-Task Learning via Tensor Decomposition |
Chun Guo and Xiaozhong Liu Dynamic Feature Generation and Selection on Heterogeneous Graph for Music Recommendation |
Xiaokai Wei, Bokai Cao, Weixiang Shao, Chun-Ta Lu, and Philip S. Yu Community Detection with Partially Observable Links and Node Attributes |
Charles Siegel, Jeff Daily, and Abhinav Vishnu Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems |
Short Paper |
---|
Jinwei Liu and Haiying Shen A Popularity-aware Cost-effective Replication Scheme for High Data Durability in Cloud Storage |
Mike Lakoju, Alan Serrano-Rico, and Mark Lycett A Strategic Approach for Visualizing the Value of Big data (SAVV-BIGD) Framework |
Chun-Chich Chen, Chih-Ya Shen, and Ming-Syan Chen Massive Parallelism for Non-linear and Non-stationary Data Analysis with GPGPU |
Hans Vandierendonck, Karen Murphy, Mahwish Arif and Dimitrios Nikolopoulos, HPTA: High-Performance Text Analytics |
Ting Wu, Chen Zhang, Lei Chen, Pan Hui, and Siyuan Liu Object Identification with Pay-As-You-Go Crowdsourcing |
Bas van Stein, Matthijs van Leeuwen, and Thomas Bäck Local Subspace-Based Outlier Detection using Global Neighbourhoods |
Jonathan Stokes and Steven Weber The self-avoiding walk-jump (SAWJ) algorithm for finding maximum degree nodes in large graphs |
Rocco Langone and Johan A. K. Suykens Efficient multiple scale kernel classifiers |
Haofu Liao, Yuncheng Li, Tianran Hu, and Jiebo Luo Inferring Restaurant Styles by Mining Crowd Sourced Photos from User-Review Websites |
Uwe Jugel, Zbigniew Jerzak, and Volker Markl Big data on a few pixels |
Shuo Wang, Richard Sinnott, and Surya Nepal Protecting the Location Privacy of Mobile Social Media Users |
Jorge Veiga, Roberto R. Expósito, Xoán C. Pardo, Guillermo L. Taboada, and Juan Touriño Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics |
Yali Zhao SLA-Based Profit Optimization for Resource Management of Big Data Analytics-as-a-Service Platforms in Cloud Computing Environments |
Xiaoli Song and Xiaohua Hu Semantic Pattern Mining for Text Mining |
Tong Yu, Ole Mengshoel, Alvin Jude, Eugen Feller, Julien Forgeat, and Nimish Radia Incremental Learning for Matrix Factorization in Recommender Systems |
Chuan Shi, Bowei He, Menghao Zhang, Fuzheng Zhuang, and Philip S. Yu Expenditure Aware Rating Prediction for Recommendation |
Natalia Arzamasova, Martin Schäler, and Klemens Böhm Cleaning Antipatterns in an SQL Query Log |
Kenji Yamanishi and Kohei Miyaguchi Detecting Gradual Changes from Data Stream Using MDL-Change Statistics |
Nicolas Kourtellis, Gianmarco De Francisci Morales, Albert Bifet, and Arinto Murdopo VHT: Vertical Hoeffding Tree |
Chang Liu and Bin Wu Mutiple Submodels Parallel Support Vector Machine on Spark |
Christoforos Svingos, Theofilos Mailis, Herald Kllapi, Lefteris Stamatogiannakis, Yannis Kotidis, and Yannis Ioannidis Real Time Processing of Streaming and Static Information |
Matthew Edwards, Stephen Wattam, Paul Rayson, and Awais Rashid Sampling Labelled Profile Data for Identity Resolution |
Wanying Ding, Yue Zhang, Chaomei Chen, and Xiaohua Hu Semi-Supervised Dirichlet-Hawkes Process with Applications of Topic Detection and Tracking in Twitter |
Steven Morse, Marta Gonzalez, and Natasha Markuzon Persistent Cascades: Measuring Fundamental Communication Structure in a Social Network |
rui ren, zhen jia, lei wang, tianxu yi, and jianfeng zhan BDTune: Hierarchical Correlation-based Performance Analysis and Rule-based Diagnosis for Big Data System |
Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, and Takatsugu Ono Evaluating the Impacts of Code-Level Performance Tunings on Power Efficiency |
Ramyar Saeedi, Hassan Ghasemzadeh, and Assefaw Gebremedhin Transfer Learning Algorithms for Autonomous Reconfiguration of Wearable Systems |
Weimao Ke and Javed Mostafa Scalability Analysis of Distributed Search in Large Peer-to-peer Networks |
Wenzhao Zhang, Houjun Tang, Stephen Ranshous, Surendra Byna, Daniel Martin, Kesheng Wu, Bin Dong, Scott Klasky, and Nagiza Samatova Exploring Memory Hierarchy and Network Topology for Runtime AMR Data Sharing Across Scientific Applications |
Kareem Aggour and Bulent Yener Adapting to Data Sparsity for Efficient Parallel PARAFAC Tensor Decomposition in Hadoop |
Da-Chuan Zhang, Mei Li, and Chang-Dong Wang Point of Interest Recommendation with Social and Geographical Influence |
Jinfeng Li, James Cheng, Yunjian Zhao,, Fan Yang, Yuzhen Huang, and Haipeng Chen A Comparison of General-Purpose Distributed Systems for Data Processing |
Mei Saouk, Christos Doulkeridis, Akrivi Vlachou, and Kjetil Noervaag Efficient Processing of Top-k Joins in MapReduce |
Ioanna Tsalouchidou, Gianmarco De Francisci Morales, Francesco Bonchi, and Ricardo Baeza-Yates Streaming Tensor Summarization |
Sergey Nepomnyachiy and Torsten Suel Efficient Index Updates for Mixed Update and Query Loads |
Stathis Maroulis, Ioannis Boutsis, and Vana Kalogeraki Context-Aware Point of Interest Recommendation using Tensor Factorization |
Hông-Ân Cao, Tri Kurniawan Wijaya, Karl Aberer, and Nuno Nunes Estimating Human Interactions with Electrical Appliances for Activity-based Energy Savings Recommendations |
Frank Pallas, Johannes Günther, and David Bermbach Pick Your Choice in HBase: Security or Performance |
Chunkun Bo, Ke Wang, Jefferey Fox, and Kevin Skadron Entity Resolution Acceleration using Micron's Automata Processor |
Zhichuan Huang, Ting Zhu, and Jianwu Wang Application-Driven Sensing Data Reconstruction and Selection Based on Correlation Mining and Dynamic Feedback |
Kyle Chard, Mike D'Arcy, Ben Heavner, Ian Foster, Carl Kesselman, Ravi Madduri, Alexis Rodriguez, Stian Soiland-Reyes, Carole Goble, Eric Deutsch, Ivo Dinov, Ivo Dinov, Kristi Clark, Nathan Price, and Arthur Toga I'll Take That to Go: Big Data Bags and Minimal Identifiers for Exchange of Large, Complex Data |
Michael Rilee, Kwo-Sen Kuo, Thomas Clune, Amidu Oloso, Paul Brown, and Hongfeng Yu Addressing the Big-Earth-Data Variety Challenge with the Hierarchical Triangular Mesh |
Weixiang Shao, Lifang He, Chun-Ta Lu, and Philip S. Yu Online Multi-view Clustering with Incomplete Views |
Joaquim Silva, Carlos Goncalves, and Jose Cunha A Theoretical Model for n-gram Distribution in Big Data Corpora |
Philip Chan and Ebad Ahmadzadeh Improving Efficiency of Maximizing Spread in the Flow Authority Model for Large Sparse Networks |
Saliya Ekanayake, Supun Kamburugamuve, Pulasthi Wickramasinghe, and Geoffrey Fox Java Thread and Process Performance for Parallel Machine Learning on Multicore HPC Clusters |
Rongda Zhu, Aston Zhang, Jian Peng, and Chengxiang Zhai Exploiting Temporal Divergence of Topic Distributions for Event Detection |
Gheorghi Guzun, Josiah McClurg, Guadalupe Canahuate, and Raghuraman Mudumbai Power efficient big data analytics algorithms through low-level operations |
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka Efficient Breadth-First Search on Massively Parallel and Distributed Memory Machines |
Ioanna Filippidou and Yannis Kotidis Effective and Efficient Graph Augmentation in Large Graphs |
Ling Cen and Ruta Dymitr A Map based Gender Prediction Model for Big E-Commerce Data |
Jinna lv, Bin Wu, Shuai Yang, and Bingjing Jia Efficient Large Scale Near-Duplicate Video Detection Base on Spark |
Xiaopeng Li, Ming Cheung, and James She Connection Discovery using Shared Images by Gaussian Relational Topic Model |
Ariel Bar, Bracha Shapira, Lior Rokach, and Moshe Unger Scalable Attack Propogation Model and Algorithms for Honeypot Systems |
Pascal Welke, Alexander Markowetz, Torsten Suel, and Maria Christoforaki Three-Hop Distance Estimation in Social Graphs |
A. Pavan, P. Quint, S. Scott, N. V. Vinodchandran, and J. Smith Computing Triangle and Open-Wedge Heavy-Hitters in Large Networks |
Stratos Dimopoulos, Chandra Krintz, and Rich Wolski Big Data Framework Interference In Restricted Private Cloud Settings |
Susanna Pirttikangas, Ekaterina Gilman, Xiang Su, Teemu Leppänen, Anja Keskinarkaus, Mika Rautiainen, Mikko Pyykkönen, and Jukka Riekki Experiences with Smart City Traffic Pilot |
Mohammad Mahdi Kamani, Farshid Farhat, Stephen Wistar, and James Z. Wang Shape Matching using Skeleton Context for Automated Bow Echo Detection |
Abir Zayani, Chiheb-Eddine Ben N'Cir, and Nadia Essoussi Parallel clustering method for Non-Disjoint Partitioning of Large-Scale Data based on Spark Framework |
Xiaowei Jia, Xi Chen, Anuj Karpatne, and Vipin Kumar Identifying Dynamic Changes with Noisy Labels in Spatial-temporal Data: A Study on Large-scale Water Monitoring Application |
Aman Gupta, S. Muthukrishnan, and Smita Wadhwa Optimizing Callout in Unifed Ad Markets |
Yadu Babuji, Kyle Chard, Aaron Gerow, and Eamon Duede Cloud Kotta: Enabling Secure and Scalable Data Analytics in the Cloud |
Elyas Sabeti and Anders Host-Madsen How Interesting Images Are: An Atypicality Approach For Social Networks |
Ville Hyvönen, Teemu Pitkänen, Sotiris Tasoulis, Elias Jääsaari, Risto Tuomainen, Liang Wang, Jukka Corander, and Teemu Roos Fast Nearest Neighbor Search through Sparse Random Projections and Voting |
Hitoshi Sato, Ryo Mizote, Satoshi Matsuoka, and Hirotaka Ogawa I/O Chunking and Latency Hiding Approach for Out-of-core Sorting Acceleration using GPU and Flash NVM |
Adiska Fardani Haryadi, Marijn Janssen, Joris Hulstijn, Haiko Voort, and Agung Wahyudi Requirements on and Antecedents of Big Data Quality: An Empirical Examination to Improve Big Data Quality in Financial Service Organizations |
Daniel Zhang, Rungang Han, and Dong Wang On Robust Truth Discovery in Sparse Social Media Sensing |
Joseph Jupin and Yuan Shi PSH: A Probabilistic Signature Hash Method with Hash Neighborhood Candidate Generation for Fast Edit-Distance String Comparison on Big Data |
Khoa Doan, Amidu Oloso, Kwo-Sen Kuo, Thomas Clune, and Hongfeng Yu Evaluating the Impact of Data Placement to Spark and SciDB with an Earth Science Use Case |
Luis Pineda-Morales, Ji Liu, Alexandru Costan, Esther Pacitti, Gabriel Antoniu, Patrick Valduriez, and Marta Mattoso Managing Hot Metadata for Scientific Workflows on Multisite Clouds |
Dipti Shankar, Xiaoyi Lu, and Dhabaleswar K. Panda Boldio: A Hybrid and Resilient Burst-Buffer Over Lustre for Accelerating Big Data I/O |
Zexi Chen, Ranga Vatsavai, Bharathkumar Ramachandra, Qiang Zhang, Nagendra Singh, and Sreenivas Sukumar Scalable Nearest Neighbor Based Hierarchical Change Detection Framework for Crop Monitoring |
Sreenivas Sukumar, Michael Matheson, Ramakrishnan Kannan, and Seung-Hwan Lim Mini-Apps for High Performance Scientific Data Analysis |
Mai Nguyen, Dylan Uys, Daniel Crawl, Charles Cowart, and Ilkay Altintas A Scalable Approach for Location-Specific Detection of Santa Ana Conditions |
Tathagata Mukherjee, Biswas Parajuli, Piyush Kumar, and Eduardo Pasiliao TruthCore: Non-parametric Estimation of Truth from a Collection of Authoritative Sources |
Xiang Liu and Torsten Suel What Makes A Group Fail: Modeling Social Group Behavior in Event-Based Social Networks |
Gopi Chand Nutakki and Olfa Nasraoui Compartmentalized Adaptive Topic Mining on Social Media Streams |
Farrukh Ahmed, Michele Samorani, Colin Bellinger, and Osmar R. Zaiane Advantage of Integration in Big Data: Feature Generation in Multi-Relational Databases for Imbalanced Learning |
Yuh-Jye Lee, Hsing-Kuo Pao, Shueh-Han Shih, Jing-Yao Lin, and Xin-Rong Chen Compressed Learning for Time Series Classification |
Saman Biookaghazadeh, Yiqi Xu, Shujia Zhou, and Ming Zhao Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems |
said JABBOUR, Nizar Mhadhbi, Abdesattar Mhadhbi, Badran RAddaoui, and Lakhdar Sais Summarizing Large Graphs by Means of Pseudo-Boolean Constraints |
Industry and Government
Regular Paper |
---|
Syed Yousaf Shah, Brent Paulovicks, and Petros Zerfos Data-at-Rest Security for Spark |
Nathaniel Huber-Fliflet, Jianping Zhang, Haozhen Zhao, Robert Keeling, and Rishi Chhatwal Empirical Evaluations of Preprocessing Parameters' Impact on Predictive Coding's Effectiveness |
zhenyun zhuang, Haricharan Ramachandra, Badri Sridharan, Brandon Duncan, Kishore Gopalakrishna, and Jean-Francois Im SmartCache: Application Layer Caching to Improve Performance of Large-scale Memory Mapping |
Raya Horesh, Kush Varshney, and Jinfeng Yi, Information Retrieval, Fusion, Completion, and Clustering for Employee Expertise Estimation |
Xuchao Zhang, Zhiqian Chen, Weisheng Zhong, Arnold P. Boedihardjo, and Chang-Tien Lu Storytelling in Heterogeneous Twitter Entity Network based on Hierarchical Cluster Routing |
Juergen Heit, Jiayi Liu, and Mohak Shah An Architecture for the Deployment of Statistical Models for Big Data Era |
Adetokunbo Makanju, Zahra Farzanyar, Aijun An, Nick Cercone, Zane Hu, and Yonggang Hu Deep Parallelization of Parallel FP-Growth Using Parent-Child MapReduce |
Bradford Littooy, Sophie Loire, Michael Georgescu, and Igor Mezic Dynamic Pattern Recognition and Classification of HVAC Faults in Commercial Buildings |
Hui Wu, Yi Fang, Huming Wu, and Shenhong Zhu A Diversified Trending Topic Discovery System |
Ganesh Venkataraman, Abhimanyu Lad, Lin Guo, and Shakti Sinha Fast, Lenient and Accurate: Building Personalized Instant Search Experience at LinkedIn |
Pavel Dmitriev, Brian Frasca, Somit Gupta, Ron Kohavi, and Garnet Vaz Pitfalls of Long-Term Online Controlled Experiments |
Michele Samorani, Farrukh Ahmed, and Osmar Zaiane Automatic Generation of Relational Attributes: An Application to Product Returns |
Tomasz Tajmajer, Malwina Spławińska, Piotr Wasilewski, and Stan Matwin Predicting Annual Average Daily Highway Traffic from Large Data and Very Few Measurements |
Ruoyu Wang, Daniel Sun, Muhammad Atif, and Surya Nepal LogProv: Logging Events as Provenance of Big Data Analytics Pipelines with Trustworthiness Evaluation |
Wenjun Zhou, Yun Zhu, Faizan Javed, Mahmudur Rahman, Janani Balaji, and Matt McNair Quantifying Skill Relevance to Job Titles |
Mylene Simon, Joe Chalfoun, Mary Brady, and Peter Bajcsy Do We Trust Image Measurements? |
Sreenivas Sukumar, Michael Matheson, Ramakrishnan Kannan, and Seung-Hwan Lim Mini-Apps for High Performance Scientific Data Analysis |
Elissa Redmiles, Emily Grace, Ankit Rai, and Rayid Ghani Detecting Fraud, Corruption, and Collusion in International Development Contracts |
Nicolas Poggi, Josep Ll. Berral, David Carrera, Jose Blakeley, and Nikola Vujic The state of SQL-on-Hadoop in the Cloud |
Zahra Zohrevand, Uwe Glässer, Hamed Yaghoubi Shahir, and Mohammad A. Tayebi Hidden Markov Based Anomaly Detection for Water Supply Systems |
Short Paper |
---|
Archana Ganapathi and Yanpei Chen Data Quality: Experiences and Lessons from Operationalizing Big Data |
Teruyoshi Zenmyo, Satoshi Iijima, and Ichiro Fukuda Managing a Complicated Workflow based on Dataflow-based Workflow Scheduler |
Issei Sato, Masahiro Kazama, Haruaki Yatabe, Tairiku Ogihara, Tetsuro Onishi, and Hiroshi Nakagawa Company Recommendation for New Graduates via Implicit Feedback Multiple Matrix Factorization with Bayesian Optimization |
Derrick Spell, Ling-Yong Wang, Richard Shomer, Bahador Nooraei, Jarrell Waggoner, Xiao-Han Zeng, Jae Chung, Kai-Chen Cheng, and Daniel Kirsche QED: Groupon's ETL management and curated feature catalog system for machine learning |
Thibaud Nesztler, Don Kasper, Michael Georgescu, Sophie Loire, and Igor Mezic Uniformization, organization, association and use of metadata from multiple content providers and manufacturers: A close look at the Building Automation System (BAS) sector. |
Ilaria Bordino, Andrea Ferretti, Marco Firrincieli, Francesco Gullo, Marcello Paris, Stefano Pascolutti, and Gianluca Sabena Hermes: A distributed-messaging tool for NLP |
Jiejun Xu, Samuel Johnson, and Kang-Yu Ni Cross-Modal Event Summarization: A Network of Networks Approach |
Hongfeng Chai, Hao LIU, Xibo Zhou, Yanjun Xu, Shuo He, Jinzhi Hua, Dongjie He, and Weihuai Liu UStore: An Optimized Storage System for Enterprise Data Warehouses at UnionPay |
Rajaraman Kanagasabai, Anitha Veeramani, Hu Shangfeng, Kajanan Sangaralingam, Ying Li, and Giuseppe Manai Classification of Massive Mobile Web Log URLs for Customer Profiling & Analytics |
Vinay Deolalikar Extensive Large-Scale Study of Error Surfaces in Sampling-Based Distinct Value Estimators for Databases |
Li Zhou, Yinglong Xia, Hui Zang, Jian Xu, and Mingzhen Xia An Edge-Set Based Large Scale Graph Processing System |
Leonardo Millefiori, Dimitrios Zissis, Luca Cazzanti, and Gianfranco Arcieri A distributed approach to estimating sea port operational regions from lots of AIS data |
Ljiljana Stojanovic, Marko Dinic, Nenad Stojanovic, and Aleksandar Stojadinovic Big-data- driven Anomaly Detection in Industry (4.0): an approach and case study |
Luca Cazzanti, Antonio Davoli, and Leonardo Millefiori Automated Port Traffic Statistics: From Raw Data to Visualisation |
Nancy Grady Knowledge Discovery in Data Science: KDD Meets Big Data |
Amita Gajewar, Jignesh Parmar, Lizhong Wu, and Ramana Yerneni Forecasting Squatting of Demand in Display Advertising |
Yiming Kong, Hui Zang, and Xiaoli Ma Human Network Usage Patterns Revealed by Telecom Data |