Reading List
Useful References
Background
Analytical Workloads
-
Paper #3:
Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Simplified Data Processing on Large Clusters.
OSDI 2004.
-
Paper #4:
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins.
Pig Latin: A Not-So-Foreign Language for Data Processing.
SIGMOD 2008.
-
Paper #5:
Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst.
HaLoop: Efficient Iterative Data Processing on Large Clusters.
VLDB 2010.
-
Paper #6:
Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
VLDB 2009.
-
Paper #7:
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski.
Pregel: A System for Large-Scale Graph Processing.
SIGMOD 2010.
-
Paper #8:
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin.
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.
OSDI 2012.
-
Paper #9:
Herodotos Herodotou and Shivnath Babu.
Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs.
VLDB 2011.
-
Paper #10:
Nodira Khoussainova, Magdalena Balazinska, and Dan Suciu.
PerfXplain: Debugging MapReduce Job Performance.
VLDB 2012.
-
Paper #11:
Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, and Jörg Schad.
Only Aggressive Elephants are Fast Elephants.
VLDB 2012.
-
Paper #12:
Rares Vernica, Michael J. Carey, and Chen Li.
Efficient Parallel Set-Similarity Joins Using MapReduce.
SIGMOD 2010.
-
Paper #13:
Amol Ghoting, Rajasekar Krishnamurthy, Edwin P.D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan.
SystemML: Declarative Machine Learning on MapReduce.
ICDE 2011
Transactional Workloads
-
Paper #14:
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, qand Robert E. Gruber.
Bigtable: A Distributed Storage System for Structured Data.
OSDI 2006.
-
Paper #15:
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels.
Dynamo: Amazon's Highly Available Key-value Store.
SOSP 2007.
-
Paper #16:
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni.
PNUTS: Yahoo!'s Hosted Data Serving Platform.
VLDB 2008.
-
Paper #17:
Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems.
SOSP 2007.
-
Paper #18:
Stacy Patterson, Aaron J. Elmore, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi.
Serializability, not Serial: Concurrency Control and Availability in Multi-Datacenter Datastores.
VLDB 2012.
-
Paper #19:
Rui Liu, Ashraf Aboulnaga, and Kenneth Salem.
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting.
VLDB 2013.
-
Paper #20:
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford.
Spanner: Google's Globally-Distributed Database.
OSDI 2012.
-
Paper #21:
Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi.
Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms.
SIGMOD 2011.
-
Paper #22:
Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden.
Schism: a Workload-Driven Approach to Database Replication and Partitioning.
VLDB 2010.
-
Paper #23:
Carlo Curino, Evan P.C. Jones, Samuel Madden, and Hari Balakrishnan.
Workload-Aware Database Monitoring and Consolidation.
SIGMOD 2011.
-
Paper #24:
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi.
Calvin: Fast Distributed Transactions for Partitioned Database Systems.
SIGMOD 2012.
-
Paper #25:
Umar Farooq Minhas, Shriram Rajagopalan, Brendan Cully, Ashraf Aboulnaga, Kenneth Salem, and Andrew Warfield.
RemusDB: Transparent High Availability for Database Systems.
VLDB 2011.