About me
Hi, I'm Jade! I work with at SPIN under Amir Houmansadr on Privacy-Enhancing Technologies (PETs), censorship circumvention, and censorship measurement.
PACKET
PACKET stands for "Privacy And Censorship Knowledge Extraction Team". I don't run my own lab but im establishing the name because it's cool :3
Publications
CensorLab: A Testbed for Censorship Experimentation
Abstract
Censorship and censorship circumvention are closely connected, and each is constantly making decisions in reaction to the other. When censors deploy a new Internet censorship technique, the anti-censorship community scrambles to find and develop circumvention strategies against the censor's new strategy, i.e., by targeting and exploiting specific vulnerabilities in the new censorship mechanism. We believe that over-reliance on such a reactive approach to circumvention has given the censors the upper hand in the censorship arms race, becoming a key reason for the inefficacy of in-the-wild circumvention systems. Therefore, we argue for a proactive approach to censorship research: the anti-censorship community should be able to proactively develop circumvention mechanisms against hypothetical or futuristic censorship strategies. To facilitate proactive censorship research, we design and implement CensorLab, a generic platform for emulating Internet censorship scenarios. CensorLab aims to complement currently reactive circumvention research by efficiently emulating past, present, and hypothetical censorship strategies in realistic network environments. Specifically, CensorLab aims to (1) support all censorship mechanisms previously or currently deployed by real-world censors; (2) support the emulation of hypothetical (not-yet-deployed) censorship strategies including advanced data-driven censorship mechanisms (e.g., ML-based traffic classifiers); (3) provide an easy-to-use platform for researchers and practitioners enabling them to perform extensive experimentation; and (4) operate efficiently with minimal overhead. We have implemented CensorLab as a fully functional, flexible, and high-performance platform, and showcase how it can be used to emulate a wide range of censorship scenarios, from traditional IP blocking and keyword filtering to hypothetical ML-based censorship mechanisms.
ProxyGPT: Enabling Anonymous Queries in AI Chatbots with (Un) Trustworthy Browser Proxies
Abstract
AI-powered chatbots (ChatGPT, Claude, etc.) require users to create an account using their email and phone number, thereby linking their personally identifiable information to their conversational data and usage patterns. As these chatbots are increasingly being used for tasks involving sensitive information, privacy concerns have been raised about how chatbot providers handle user data. To address these concerns, we present ProxyGPT, a privacy-enhancing system that enables anonymous queries in popular chatbot platforms. ProxyGPT leverages volunteer proxies to submit user queries on their behalf, thus providing network-level anonymity for chatbot users. The system is designed to support key security properties such as content integrity via TLS-backed data provenance, end-to-end encryption, and anonymous payment, while also ensuring usability and sustainability. We provide a thorough analysis of the privacy, security, and integrity of our system and identify various future research directions, particularly in the area of private chatbot query synthesis. Our human evaluation shows that ProxyGPT offers users a greater sense of privacy compared to traditional AI chatbots, especially in scenarios where users are hesitant to share their identity with chatbot providers. Although our proof-of-concept has higher latency than popular chatbots, our human interview participants consider this to be an acceptable trade-off for anonymity. To the best of our knowledge, ProxyGPT is the first comprehensive proxy-based solution for privacy-preserving AI chatbots. Our codebase is available at https://github.com/dzungvpham/proxygpt.
Investigating Traffic Analysis Attacks on Apple iCloud Private Relay
Abstract
The iCloud Private Relay (PR) is a new feature introduced by Apple in June 2021 that aims to enhance online privacy by protecting a subset of web traffic from both local eavesdroppers and websites that use IP-based tracking. The service is integrated into Apple’s latest operating systems and uses a two-hop architecture where a user’s web traffic is relayed through two proxies run by disjoint entities. PR’s multi-hop architecture resembles traditional anonymity systems such as Tor and mix networks. Such systems, however, are known to be susceptible to a vulnerability known as traffic analysis: an intercepting adversary (e.g., a malicious router) can attempt to compromise the privacy promises of such systems by analyzing characteristics (e.g., packet timings and sizes) of their network traffic. In particular, previous works have widely studied the susceptibility of Tor to website fingerprinting and flow correlation, two major forms of traffic analysis. In this work, we are the first to investigate the threat of traffic analysis against the recently introduced PR. First, we explore PR’s current architecture to establish a comprehensive threat model of traffic analysis attacks against PR. Second, we quantify the potential likelihood of these attacks against PR by evaluating the risks imposed by real-world AS-level adversaries through empirical measurement of Internet routes. Our evaluations show that some autonomous systems are in a particularly strong position to perform traffic analysis on a large fraction of PR traffic. Finally, having demonstrated the potential for these attacks to occur, we evaluate the performance of several flow correlation and website fingerprinting attacks over PR traffic. Our evaluations show that PR is highly vulnerable to state-of-the-art website fingerprinting and flow correlation attacks, with both attacks achieving high success rates. We hope that our study will shed light on the significance of traffic analysis to the current PR deployment, convincing Apple to perform design adjustments to alleviate the risks.
Adversarially Enhanced Traffic Obfuscation
Abstract
As the Internet becomes increasingly crucial to distributing information, Internet censorship has become more pervasive and advanced. A common way to circumvent Internet censorship is Tor, a network that provides anonymity by routing traffic through various servers around the world before it reaches its destination. However, adversaries are capable of identifying and censoring access to Tor due to identifying features in its traffic. Meek, a traffic obfuscation method, protects Tor users from censorship by hiding Tor traffic inside an HTTPS connection to a permitted host. This approach provides a defense against censors using basic deep packet inspection (DPI), but machine learning attacks using side-channel information against Meek pose a significant threat to its ability to obfuscate traffic. In this thesis, we develop a method to 1. efficiently gather reproducible packet captures from both normal HTTPS and Meek traffic, 2. aggregate statistical signatures from these packet captures, and 3. train a generative adversarial network (GAN) to minimally modify statistical signatures in a way that hinders classification. Our GAN successfully decreases the efficacy of trained classifiers, increasing their mean false positive rate (FPR) from 0.183 to 0.834 and decreasing their mean area under the precision-recall curve (PR-AUC) from 0.990 to 0.414.
Improving Meek with Adversarial Techniques
Abstract
As the internet becomes increasingly crucial to distributing information, internet censorship has become more pervasive and advanced. Tor aims to circumvent censorship, but adversaries are capable of identifying and blocking access to Tor. Meek, a traffic obfuscation method, protects Tor users from censorship by hiding traffic to the Tor network inside an HTTPS connection to a permitted host. However, machine learning attacks using side-channel information against Meek pose a significant threat to its ability to obfuscate traffic. In this work, we develop a method to efficiently gather reproducible packet captures from both normal HTTPS and Meek traffic. We then aggregate statistical signatures from these packet captures. Finally, we train a generative adversarial network (GAN) to minimally modify statistical signatures in a way that hinders classification. Our GAN successfully decreases the efficacy of trained classifiers, increasing their mean false positive rate (FPR) from 0.183 to 0.834 and decreasing their mean area under the precision-recall curve (PR-AUC) from 0.990 to 0.414.
Spring 2025 Reverse Engineering & Understanding Exploit Development - Teaching Assistant
This course was developed as a revamping of CS390R. For this semester, I took on significant responsibility. My primary role was to develop Gradescope autograders to test student-submitted exploits, which entailed numerous challenges. I also took on major roles in assignment development and course logistics.
Spring 2024 Reverse Engineering & Vulnerability Analysis - Teaching Assistant
For this course, I was primarily tasked with grading and holding office hours.
Spring 2020 Introduction to Informatics - Teaching Assistant
For this course, I was primarily tasked with grading, holding office hours to assist students, and creating an assignment introducing students to cybersecurity.