Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Assignment 2: Federated Learning
The main goal of this assignment is to implement a simple Federated Learning (FL) system. This project can be done in a group of two students, with only one team member submitting the work on behalf of the group. You need to register a group on the CANVAS page: COMP3221 → People → Group - A2.
1 Learning Objectives
Figure 1: An example of a Federated Learning system with 1 server and 5 clients.
Your assignment involves developing a Federated Learning (FL) system that consists of one server and five clients as depicted in Fig. 1. Each client possesses its own private dataset used for training a local model, and then contributes its local model to the server in order to build a global model.
On completing this assignment you will gain practical knowledge in:
• Federated Learning principles: Understand how to scale machine learning across mul- tiple devices while ensuring data privacy and minimizing central data storage needs.
• Client-Server programming: Master the basics of network communications using sock- ets, including setting up connections, designing protocols, and handling network issues.
• Machine learning programming: Learn to implement, train, and evaluate machine learning models, focusing on practical aspects such as data handling, model and per- formance optimization.
2 Assignment Guidelines
2.1 Simulation Environment
Due to the unavailability of a physical network for deployment, you will simulate the FL on a single computer for both implementation and evaluation purposes. This simulation requires running separate instances of your program for each entity in the client-server architecture, using ’localhost’ for communication. Specifically, each entity, including every client and the server, will be run in a different terminal window on your machine.
2.2 Federated Learning Algorithm
In this assignment, we will use the Federated Averaging (FedAvg) algorithm, a key approach in Federated Learning where client devices collaboratively train a model by computing up-dates locally and averaging these updates on a central server to improve the global model. The workings of FedAvg are elaborated in Algorithm 1. Here, K represents the total number of clients participating in the training process. T is the total number of global communication rounds between the clients and the server. wt refers to the global model’s parameters at iter- ation t, while wt(k)+1 denotes the local model’s parameters of client k at iteration t + 1. E is the number of local epochs,i.e. the number of times each client goes through its entire dataset to train the model locally before sending updates to the global model. For local model training, clients can use either Gradient Descent (GD) or Mini-Batch GD as optimization methods.
2.3 Dataset and Model
Figure 2: Sammples of the California Housing Dataset.
For this assignment, we work with the California Housing Dataset, which is a widely recog- nized dataset used in machine learning for predicting house prices based on various features. This dataset contains 20640 data samples, which each include 8 features (median income, housing median age, average rooms, average bedrooms, population, average occupancy, lat- itude, and longitude) and 1 target variable (median house value) for different blocks in Cal- ifornia. The dataset is insightful for understanding how house values vary by location and other factors.
To simulate an FL environment that reflects the heterogeneous nature of real-world data, we have distributed the dataset across K = 5 clients. Each client receives a portion of the dataset, varying in size, to mimic the diversity in data distribution one might encounter in practical FL scenarios. The federated dataset is prepared and accessible in FLData . zip, available for download on the page CANVAS → Assignment 2. For every client, we pro- vide two CSV files: one for training set and one for testing set. For instance, the train- ing and testing data for Client 1 are named "calhousing_train_client1 .csv" and "calhousing_test_client1 .csv", respectively.
Considering the objective is a regression problem focused on predicting house values, a Linear Regression model is apt for this task. It efficiently models the correlation between house features and their prices. The ultimate goal is to train a Linear Regression model optimized across the distributed datasets.