Distributed Systems

During Spring'25, I took a course on Distributed Systems. This course covered topics like fault tolerance and scalability in distributed systems. This project was a part of the course where I implemented the concepts I learned on Distributed Systems.

A Distributed RAG Pipeline

This project aims at building a Distributed Retrieval Augmented Generation pipeline using open source tools. It allows user to upload documents and run natural language query on them. The documents are chunked and embedded into vectors. The vectors are stored in the vector database Milvus.

When a user queries, the query is embeded and a similarity search is performed on the corpus stored in the vector database which returns the embedded chunks. The chunks are augmented to the query, providing a more complete context to the LLM.

The project aims to provide Scalability, Fault Tolerance, and Observability for RAG pipelines.

It utilizes:

  1. Ollama: as LLM (qwen2.5) and Embedder (all-mini-lm)
  2. Langfuse: for Observability (tracing, metrics, etc)
  3. Milvus: as a Vector Database
  4. Kubernetes: for orchestration
  5. Chatbot Ollama: for frontend

Read the full report here.

Keywords: LLM, RAG, Vector Database, Chatbot, Kubernetes, Fault Tolerance, Scalability, Observability Distributed Systems


Procman

This is my own implementation of Kubernetes and Docker. I am building a sereis of projects to mimic the functionalities of Docker and Kubernetes. This is an ongoing work and currently it is focused on building a containerization platform from scratch. The intended result of this work will be a set of tools to orchestrate Processes in a cluster. Though it started off as a way for me to learn the inner workings of Docker and Kubernetes, I am learning new technologies everday and hence planning to add my own customizations like: Process migration with criu, CNI using eBPF, Native DPDK/SPDK/RDMA support

The Procman CLI

This is the Go-based interface for the Procman tool. It helps set up the environment for containerization using user-provided inputs. It provides the following features:

  • Manage Images using Alpine Minirootfs
  • Creates filesystem for containers

Keywords: Go, CLI, FileSystems

The Procman Daemon

This will be the long-running daemon for procman. It sets up the isolation for the container processes and starts their run. This is an important part of the Procman ecosystem as it has complete control over the underlying container processes. It provides the following features:

  • Trigger the run for containers
  • Manage namespaces (i.e. Network, MNT, Chroot, etc)
  • Set up networking for the containers ( via veth )
  • Monitor the run of processes - restart if required.

Keywords: Rust, Linux, Namespaces, Networking


OS Projects

These projects were a part of the Advanced Operating Systems course that I took in Spring'25.

Unix Nameserver

A nameserver that listens on a named domain socket and listens for connections, where a client connecting to the nameserver can perform one of the three actions:

  • Publish resource
  • Request resource
  • Exit

The server maintains a list of active connections and the associated resources they provide. It uses epoll to listen on the named domain socket and on existing client connections. The named domain socket is used to establish new connections and the existing connections are used for PUB/REQ.

Keywords: C, Sockets, Nameserver, OS, File Descriptors

Unikernel Proxy

This is a reverse proxy written and built for Unikraft. A reverse proxy is a service used to protect the identity of the servers from externel clients. The proxy also provides authentication and load-balancing features for the services.

Unikraft is a toolkit for building Unikernels. Unikernels are kernels built to specialize for a single application. Specialization for a single application makes the Unikernels extremely lightweight as only the required tools are added.

Keywords: C, Sockets, Unikraft, Unikernel, OS


High Performance HTTP Benchmarking

As a part of my research work at the Cloud Systems Lab at the George Washington University I worked on the following projects under Dr. Timothy Wood to build an infrastructure for load testing web servers and gathering the metrics. This work started with an evaluation of multiple HTTP benchmarking tools and led to a framework that helps with load testing experimentations in a research setting.

A qualitative and quantitive analysis of HTTP Benchmarking Tools

This work presents a thorough comparison of different HTTP Benchmarking tools. It evaluates and compares the following tools:

Read the full report on how these tools compare against each other here.

Keywords: Bash, HTTP, Nginx, Load Testing, SSH, Experimentation

Load testing tool for experiments

A framework for Load Testing your web services with different configuration.

This is a result of the research work I did as a part of my Research course during my Masters at GWU. It provides a repository that lets researchers load test their web servers with variying configurations and gather metrics for the entire duration of the load test.

  • Uses K6 for generating loads
  • Allows reproducibility of experiments
  • Collects metrics for both load generator and the target system
  • Generates timestamped metrics
  • Generates plots for analysis
  • Allows easy configurations and experiments management

Keywords: Bash, HTTP, Nginx, Load Testing, SSH, Experimentation


Cloudlab - Research Tools

As a part of my research work at The George Washington University I use the CloudLab platform a lot. This platform provides researchers with the resources to repeatedly test their work and reproduce it as and when required. As someone who constantly retries his work with multiple configurations and breaks the system while playing around with different tools, this platform has been extremely helpful.

Though this work has been built and tested on CloudLab, it provides utilities and tools to work with any Linux machine that can be accessed via ssh.

CloudLab eBPF

Tools and experiments setup for working with eBPF on CloudLab. I built this tool because I was developing eBPF in a Mac environment and did not have a way to compile the eBPF programs or even perform go generate type operations without having to perform rsync or scp on the remote machine. This tool takes care of that by synchronizing your code to the remote machine, runs the relevant go generate commands, and copies the generated files back to your local machine. This way you can work on eBPF from anywhere. This tool provides the following features:

  • Remote compilation
  • Copying of the generated code
  • Install dependencies
  • Experimentation

Keywords: eBPF, Linux, Kernel, C, Go, SSH

CloudLab Kubernetes

A repository with setup scripts for creating a 3-node Kubernetes cluster with minimum configuration and just a single make command. This tool provides the following features:

  • Creates a 3-Node Kubernetes cluster
  • Copies the .kubeconfig to the local
  • Sets up the CNI
  • Can be customized to manage more/less nodes in the cluster

An intention behind this tool is that the initial setup required for Kubernetes can be overwhelming for newcomers. This tool bridges that gap and allows them to focus on their goals.

Keywords: Kubernetes, Bash

CloudLab DPDK

A repository with setup scripts for creating a 3-node Kubernetes cluster with minimum configuration and just a single make command. This tool provides the following features:

  • Installs DPDK dependencies
  • Allocates Hugepages
  • Loads the Kernel Modules
  • Builds DPDK

Keywords: DPDK, Linux, Networking, Hugepages, C

CloudLab Tools

This repository is the building block of every cloudlab project that I have built. It contains the necessary scripts and commands to get working with CloudLab (or any remote machine). It can be used as a submodule in your experimentation repository and easily integrates while providing tools for ease of development. I made it lightweight as possible. It provides setup scripts for the following:

  • eBPF
  • DPDK
  • Kubernetes
  • Docker
  • Go

Keywords: Automation, eBPF, Docker, DPDK, Kubernetes, Go