CS550 Programming Assignment
CS 550 Programming Assignment #1
1 The problem
This project has two purposes: first to get you familiarize with sockets, processes, threads, makefiles; second to learn the design and internals of a Napster-style peer-to-peer (P2P) file sharing system.
You can be creative with this project. You are free to use either C, C++, or Java programming languages. You may use abstractions such as sockets and threads (you may not use RPCs, RMIs, or web services). Also, you are free to use any computer for your development, but evaluation must be done under Ubuntu Linux 22.04 LTS. Your assignment will be graded in a Linux environment, and you will lose points if your programming assignment does not compile and run correctly.
In this project, you need to design a simple P2P system that has two components:
1. A central indexing server. This server indexes the contents of all of the peers that register with it. It also provides search facility to peers. In our simple version, you don't need to implement sophisticated searching algorithms; an exact match will be fine. Remember that the central index does not store the actual data, but only the metadata about the files stored on the peers. Minimally, the server should provide the following interface to the peer clients:
• registry(peer id, file name, ...) -- invoked by a peer to register all its files with the indexing server.
The server then builds the index for the peer.
• search(file name) -- this procedure should search the index and return all the matching peers to the requestor.
2. A peer. A peer is both a client and a server. As a client, the user specifies a file name with the indexing server using "lookup". The indexing server returns a list of all other peers that hold the file.
The user can pick one such peer and the client then connects to this peer and downloads the file. As a server, the peer waits for requests from other peers and sends the requested file when receiving a request. Minimally, the peer server should provide the following interface to the peer client:
• obtain(file name) -- invoked by a peer to download a file from another peer.
Other requirements:
• Both the indexing server and a peer server should be able to accept multiple client requests at the same time. This could be easily done using threads. Be aware of the thread synchronizing issues to avoid inconsistency or deadlock in your system.
For full credit, your P2P system must support any type of files (e.g. text, binary, etc).
• Add support for data resilience by allowing a configurable replication factor (system wide). The replication can take place at the time of the registry call.
• You may assume that directory contents do not change after a peer has registered all its files; no need to do sophisticated algorithms for automatic indexing of changed files.
• You do not need shared file systems (e.g. NFS) for this assignment; your assignment will be graded in an environment with no NFS between the VMs.
• No GUIs are required. Simple command line interfaces are fine.
2 Evaluation and Measurement
Deploy 2 peers and 1 indexing server over 3 VMs. Each peer has in its shared directory (all of which are indexed at the indexing server) the following datasets:
• 1M: 1KB text files
• 1K: 1MB text files
• 10: 1GB binary file
Name your files uniquely on each node. All files should reside in a single directory on each peer. Replicated data among peers should be placed in the same directory as the rest of the data. If a filename has the same filename (they should not, but if they do), simply overwrite the original file.
Do a simple experiment study to evaluate the behavior of your system.
Do a weak scaling scalability study to measure search time of 10K requests per peer, on 1 node and 2 nodes.
Report the average and standard deviation. Plot your data in figures graphically.
Do a strong scaling scalability study that measures the search and transfer time of 10K small files (1KB), on 1 node and 2 nodes. Repeat the study on 1K medium files (1MB). Repeat the study on 8 large files (1GB).
Report the average and standard deviation. Plot your data in figures graphically. Can you deduce that your P2P centralized system is scalable up to 2 nodes? Does it scale well for some file
sizes, but not for others? Based on the data you have so far, what would happen if you had 1K peers with small, medium, and large files? What would happen if you had 1 billion peers?
You may use tools such as pssh to coordinate the bootstrapping of your P2P system, as well as to automate and conduct the performance evaluation concurrently across your small cluster of VMs.
my wechat:_0206girl
Don't hesitate to contact me