Reading List
The papers we will discuss in class are listed
under five topics, with some further readings provided for each topic.
I have provided links to some of the papers. All the other papers are available
on-line through various sources such as:
- ACM Digital Library for ACM
conferences (e.g., SIGMOD, SOSP) and journals (e.g., TODS).
- IEEE Xplore for IEEE
conferences (e.g., ICDE) and journals (e.g., TKDE).
- Springer LINK for Springer
and Kluwer publications (e.g., Lecture Notes in Computer Science).
- USENIX Events for
USENIX conferences (e.g., OSDI).
- Michael
Ley's DBLP bibliography server for a comprehensive computer science
bibliography containing links to on-line papers for many conferences and
journals (e.g., VLDB).
General Background
Further Reading
-
The Claremont Report on Database
Research.
-
Luiz Barroso, Jeffrey Dean, and Urs Hölzle.
Web Search for a Planet: The Google Cluster Architecture. IEEE Micro 23(2),
2003.
-
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, Mahadev
Satyanarayanan, Robert N. Sidebotham, and Michael J. West. Scale and Performance
in a Distributed File System. ACM Trans. Computer Systems 6(1), 1988.
-
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B.
Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database
Machine. VLDB 1986.
-
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker,
Hui-I Hsiao, and Rick Rasmussen. The Gamma Database Machine Project. IEEE
Trans. Knowledge and Data Eng. 2(1), 1990.
-
Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken,
John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger
Wattenhofer. FARSITE: Federated, Available, and Reliable Storage for an
Incompletely Trusted Environment. OSDI 2002.
Programming and Execution Frameworks
-
Paper #4:
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large
Clusters. OSDI 2004.
-
Paper #5:
David J. DeWitt, Eric Robinson, Srinath Shankar, Erik Paulson, Jeffrey Naughton,
Andrew Krioukov, and Joshua Royalty. Clustera: An Integrated Computation and Data
Management System. VLDB 2008.
-
Paper #6:
Parag Agrawal, Daniel Kifer, and Christopher Olston. Scheduling Shared Scans of
Large Data Files. VLDB 2008.
-
Paper #7: Lei Chen, Christopher
Olston, and Raghu Ramakrishnan. Parallel Evaluation
of Composite Aggregate Queries. ICDE 2008.
Further Reading
-
Hadoop.
-
Hadoop Summit at Yahoo! Research.
-
Eric Robinson and David J. DeWitt. Turning Cluster Management into Data
Management; A System Overview. CIDR 2007.
-
Srinath Shankar and David J. DeWitt. Data Driven Workflow Planning in Cluster
Management Systems. HPDC 2007.
-
Andrew W. McNabb, Christopher K. Monson, and Kevin D. Seppi. MRPSO: MapReduce Particle
Swarm Optimization. Genetic and Evolutionary Computation Conference,
2007.
-
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and Douglas Stott Parker Jr.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters.
SIGMOD 2007.
-
Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava.
Automatic Optimization of Parallel Dataflow Programs. USENIX Annual
Conference 2008.
-
Tamer Elsayed, Jimmy Lin, and Douglas Oard. Pairwise Document Similarity in
Large Collections with MapReduce. Proc. Annual Meeting of the Association for
Computational Linguistics, 2008.
Data Processing Languages
-
Paper #8: Rob Pike, Sean Dorward, Robert Griesemer,
and Sean Quinlan. Interpreting the Data:
Parallel Analysis with Sawzall. Scientific Programming 13(4), 2005.
-
Paper #9: Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar,
and Andrew Tomkins. Pig Latin: A Not-So-Foreign
Language for Data Processing. SIGMOD 2008.
-
Paper #10: Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon
Weaver, and Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive
Data Sets. VLDB 2008.
Further Reading
Virtualization and Resource Sharing
-
Paper #11:
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf
Neugebauery, Ian Pratt, and Andrew Warfield. Xen and the Art of Virtualization.
SOSP 2003.
-
Paper #12: David E. Irwin, Jeffrey S.
Chase, Laura E. Grit, Aydan R. Yumerefendi, David Becker, and Ken Yocum.
Sharing Networked Resources with Brokered Leases.
USENIX Annual Conference 2006.
-
Paper #13:
Lavanya Ramakrishnan, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi,
Adriana Iamnitchi, and Jeffrey S. Chase. Toward a
Doctrine of Containment: Grid Hosting with Adaptive Resource Control. SC 2006.
-
Paper #14:
Matthias Brantner, Daniela Florescu, David A. Graf, Donald Kossmann, and Tim Kraska.
Building a Database on S3. SIGMOD 2008.
-
Paper #15: Ahmed Soror, Umar Farooq Minhas, Ashraf Aboulnaga, Kenneth Salem, Peter
Kokosielis, and Sunil Kamath. Automatic Virtual
Machine Configuration for Database Workloads. SIGMOD 2008.
Further Reading
-
Amazon Web Services.
-
Constantine P. Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica S. Lam,
and Mendel Rosenblum. Optimizing the Migration of Virtual Computers. OSDI
2002.
-
Jeffrey S. Chase, David E. Irwin, Laura E. Grit, Justin D. Moore, and Sara
Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. HPDC 2003.
-
Yun Fu, Jeffrey S. Chase, Brent N. Chun, Stephen Schwab, and Amin Vahdat. SHARP:
An Architecture for Secure Resource Peering. SOSP 2003.
-
Ivan Krsul, Arijit Ganguly, Jian Zhang, Jose A.B. Fortes, and Renato J.
Figueiredo. VMPlants: Providing and Managing Virtual Machine Execution
Environments for Grid Computing. SC 2004.
-
Stefan Aulbach, Torsten Grust, Dean Jacobs, Alfons Kemper, and Jan Rittinger.
Multi-tenant Databases for Software as a Service: Schema-mapping Techniques. SIGMOD
2008.
Consistent Storage and Retrieval
-
Paper #16: Christian Plattner
and Gustavo Alonso. Ganymed: Scalable Replication
for Transactional Web Applications. Middleware 2004.
-
Paper #17:
Khuzaima Daudjee and Kenneth Salem. Lazy Database Replication with Snapshot
Isolation. VLDB 2006.
-
Paper #18:
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,
Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall,
and Werner Vogels. Dynamo: Amazon's Highly Available
Key-Value Store. SOSP 2007.
-
Paper #19:
Marcos Kawazoe Aguilera, Arif Merchant, Mehul A. Shah, Alistair C. Veitch,
and Christos T. Karamanolis. Sinfonia: A New Paradigm for
Building Scalable
Distributed Systems. SOSP 2007.
-
Paper #20:
Marcos Aguilera, Wojciech Golab, and Mehul Shah. A Practical Scalable Distributed
B-Tree. VLDB 2008.
-
Paper #21:
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach,
Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber.
Bigtable: A Distributed Storage System for
Structured Data. OSDI 2006.
-
Paper #22:
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil
Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni.
PNUTS:
Yahoo!'s Hosted Data Serving Platform. VLDB 2008.
-
Paper #23:
Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni,
and Raghu Ramakrishnan. Efficient Bulk Insertion Into a
Distributed Ordered Table. SIGMOD 2008.
Further Reading
-
Couch DB.
-
Cassandra.
-
Susan B. Davidson, Hector Garcia-Molina, and Dale Skeen. Consistency in a
Partitioned Network: A Survey. ACM Computing Surveys 17(3), 1985.
-
Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, and Jennifer Widom. View
Maintenance in a Warehousing Environment. SIGMOD 1995.
-
Jim Gray, Pat Helland, Patrick E. O'Neil, and Dennis Shasha. The Dangers of
Replication and a Solution. SIGMOD 1996.
-
YongChul Kwon, Magdalena Balazinska, and Albert Greenberg. Fault-tolerant Stream
Processing using a Distributed, Replicated File System. VLDB 2008.