Reading List
The papers we will discuss in class are listed 
under five topics, with some further readings provided for each topic.
I have provided links to some of the papers. All the other papers are available 
on-line through various sources such as:
	- ACM Digital Library for ACM 
	conferences (e.g., SIGMOD, SOSP) and journals (e.g., TODS).
- IEEE Xplore for IEEE 
	conferences (e.g., ICDE) and journals (e.g., TKDE).
- Springer LINK for Springer 
	and Kluwer publications (e.g., Lecture Notes in Computer Science).
- USENIX Events for 
	USENIX conferences (e.g., OSDI).
- Michael 
	Ley's DBLP bibliography server for a comprehensive computer science 
	bibliography containing links to on-line papers for many conferences and 
	journals (e.g., VLDB).
 
General Background
Further Reading
- 
The Claremont Report on Database 
Research.
- 
Luiz Barroso, Jeffrey Dean, and Urs Hölzle. 
Web Search for a Planet: The Google Cluster Architecture. IEEE Micro 23(2), 
2003.
- 
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, Mahadev 
Satyanarayanan, Robert N. Sidebotham, and Michael J. West. Scale and Performance 
in a Distributed File System. ACM Trans. Computer Systems 6(1), 1988.
- 
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. 
Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database 
Machine. VLDB 1986.
- 
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, 
Hui-I Hsiao, and Rick Rasmussen. The Gamma Database Machine Project. IEEE 
Trans. Knowledge and Data Eng. 2(1), 1990.
- 
Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, 
John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger 
Wattenhofer. FARSITE: Federated, Available, and Reliable Storage for an 
Incompletely Trusted Environment. OSDI 2002.
 
Programming and Execution Frameworks
- 
Paper #4:
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large 
Clusters. OSDI 2004.
- 
Paper #5:
David J. DeWitt, Eric Robinson, Srinath Shankar, Erik Paulson, Jeffrey Naughton, 
Andrew Krioukov, and Joshua Royalty. Clustera: An Integrated Computation and Data 
Management System. VLDB 2008.
- 
Paper #6:
Parag Agrawal, Daniel Kifer, and Christopher Olston. Scheduling Shared Scans of 
Large Data Files. VLDB 2008.
- 
Paper #7: Lei Chen, Christopher 
Olston, and Raghu Ramakrishnan. Parallel Evaluation 
of Composite Aggregate Queries. ICDE 2008.
Further Reading
- 
Hadoop.
- 
Hadoop Summit at Yahoo! Research.
- 
Eric Robinson and David J. DeWitt. Turning Cluster Management into Data 
Management; A System Overview. CIDR 2007.
- 
Srinath Shankar and David J. DeWitt. Data Driven Workflow Planning in Cluster 
Management Systems. HPDC 2007.
- 
Andrew W. McNabb, Christopher K. Monson, and Kevin D. Seppi. MRPSO: MapReduce Particle 
Swarm Optimization. Genetic and Evolutionary Computation Conference, 
2007.
- 
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and Douglas Stott Parker Jr. 
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. 
SIGMOD 2007.
- 
Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava. 
Automatic Optimization of Parallel Dataflow Programs. USENIX Annual 
Conference 2008.
- 
Tamer Elsayed, Jimmy Lin, and Douglas Oard. Pairwise Document Similarity in 
Large Collections with MapReduce. Proc. Annual Meeting of the Association for 
Computational Linguistics, 2008.
 
Data Processing Languages
- 
Paper #8: Rob Pike, Sean Dorward, Robert Griesemer, 
and Sean Quinlan. Interpreting the Data: 
Parallel Analysis with Sawzall. Scientific Programming 13(4), 2005.
- 
Paper #9: Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, 
and Andrew Tomkins. Pig Latin: A Not-So-Foreign 
Language for Data Processing. SIGMOD 2008.
- 
Paper #10: Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon 
Weaver, and Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive 
Data Sets. VLDB 2008.
Further Reading
 
Virtualization and Resource Sharing
- 
Paper #11:
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf 
Neugebauery, Ian Pratt, and Andrew Warfield. Xen and the Art of Virtualization.
SOSP 2003.
- 
Paper #12: David E. Irwin, Jeffrey S. 
Chase, Laura E. Grit, Aydan R. Yumerefendi, David Becker, and Ken Yocum.
Sharing Networked Resources with Brokered Leases.
USENIX Annual Conference 2006.
- 
Paper #13:
Lavanya Ramakrishnan, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, 
Adriana Iamnitchi, and Jeffrey S. Chase. Toward a 
Doctrine of Containment: Grid Hosting with Adaptive Resource Control. SC 2006.
- 
Paper #14:
Matthias Brantner, Daniela Florescu, David A. Graf, Donald Kossmann, and Tim Kraska.
Building a Database on S3. SIGMOD 2008.
- 
Paper #15:  Ahmed Soror, Umar Farooq Minhas, Ashraf Aboulnaga, Kenneth Salem, Peter 
Kokosielis, and Sunil Kamath. Automatic Virtual 
Machine Configuration for Database Workloads. SIGMOD 2008.
Further Reading
- 
Amazon Web Services.
- 
Constantine P. Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica S. Lam, 
and Mendel Rosenblum. Optimizing the Migration of Virtual Computers. OSDI 
2002.
- 
Jeffrey S. Chase, David E. Irwin, Laura E. Grit, Justin D. Moore, and Sara 
Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. HPDC 2003.
- 
Yun Fu, Jeffrey S. Chase, Brent N. Chun, Stephen Schwab, and Amin Vahdat. SHARP: 
An Architecture for Secure Resource Peering. SOSP 2003.
- 
Ivan Krsul, Arijit Ganguly, Jian Zhang, Jose A.B. Fortes, and Renato J. 
Figueiredo. VMPlants: Providing and Managing Virtual Machine Execution 
Environments for Grid Computing. SC 2004.
- 
Stefan Aulbach, Torsten Grust, Dean Jacobs, Alfons Kemper, and Jan Rittinger. 
Multi-tenant Databases for Software as a Service: Schema-mapping Techniques. SIGMOD 
2008.
 
Consistent Storage and Retrieval
- 
Paper #16:  Christian Plattner 
and Gustavo Alonso. Ganymed: Scalable Replication 
for Transactional Web Applications. Middleware 2004.
- 
Paper #17: 
Khuzaima Daudjee and Kenneth Salem. Lazy Database Replication with Snapshot 
Isolation. VLDB 2006.
- 
Paper #18: 
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, 
Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, 
and Werner Vogels. Dynamo: Amazon's Highly Available 
Key-Value Store. SOSP 2007.
- 
Paper #19: 
Marcos Kawazoe Aguilera, Arif Merchant, Mehul A. Shah, Alistair C. Veitch, 
and Christos T. Karamanolis. Sinfonia: A New Paradigm for 
Building Scalable 
Distributed Systems. SOSP 2007.
- 
Paper #20: 
Marcos Aguilera, Wojciech Golab, and Mehul Shah. A Practical Scalable Distributed 
B-Tree. VLDB 2008.
- 
Paper #21: 
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, 
Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber.
Bigtable: A Distributed Storage System for 
Structured Data. OSDI 2006.
- 
Paper #22: 
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil 
Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni.
PNUTS: 
Yahoo!'s Hosted Data Serving Platform. VLDB 2008.
- 
Paper #23: 
Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, 
and Raghu Ramakrishnan. Efficient Bulk Insertion Into a 
Distributed Ordered Table. SIGMOD 2008.
Further Reading
- 
Couch DB.
- 
Cassandra.
- 
Susan B. Davidson, Hector Garcia-Molina, and Dale Skeen. Consistency in a 
Partitioned Network: A Survey. ACM Computing Surveys 17(3), 1985.
- 
Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, and Jennifer Widom. View 
Maintenance in a Warehousing Environment. SIGMOD 1995.
- 
Jim Gray, Pat Helland, Patrick E. O'Neil, and Dennis Shasha. The Dangers of 
Replication and a Solution. SIGMOD 1996.
- 
YongChul Kwon, Magdalena Balazinska, and Albert Greenberg. Fault-tolerant Stream 
Processing using a Distributed, Replicated File System. VLDB 2008.