No results found
We couldn't find anything using that term, please try searching for something else.
2024-11-13 1 Introduction ISPs, advertisers, and national governments are increasingly disrupting, manipulating, and monitoring Internet traffic [ 69 , 22 , 4
ISPs, advertisers, and national governments are increasingly disrupting, manipulating, and monitoring Internet traffic [ 69 , 22 , 47 , 16 , 27 ]. As a result, virtual private network (VPN) adoption has been growing rapidly, not only among activists and journalists with heightened threat models but also among average users, who employ VPNs for reasons ranging from protecting their privacy on untrusted networks to circumventing censorship.
As a recent example, with the passage of Hong Kong’s new national security law, popular VPN providers observed a 120-fold surge in downloads due to fears of escalating surveillance and censorship [62].
In response to the growing popularity of VPNs, numerous ISPs and governments are now seeking to track or block VPN traffic in order to maintain visibility and control over the traffic within their jurisdictions. Binxing Fang, the designer of the Great Firewall of China (GFW) said there is an “eternal war” between the Firewall and VPNs, and the country has ordered ISPs to report and block personal VPN usage [61, 60]. More recently, Russia and India have proposed to block VPN services in their countries, both labeling VPNs a national cybersecurity threat [44, 59]. Commercial ISPs are also motivated to track VPN connections. For example, in early 2021, a large ISP in South Africa, Rain, Ltd., started throttling VPN connections by over 90 percent in order to enforce quality-of-service restrictions in their data plans [64].
ISPs censors known employ variety simple anti – VPN techniques , tracking connections based IP reputation , blocking VPN provider ( provider hereon ) websites , enacting laws terms service forbidding VPN usage [46, 53, 60]. Yet, these methods are not robust; motivated users find ways to access VPN services in spite of them. However, even less-powerful ISPs and censors now have access to technologies such as carrier-grade deep packet inspection (DPI) with which they can implement more sophisticated modes of detection based on protocol semantics [ 48 , 43 ].
In this paper, we explore the implications of DPI for VPN detection and blocking by studying the fingerprintability of OpenVPN (the most popular protocol for commercial VPN services [6]) from the perspective of an adversarial ISP. We seek to answer two research questions: (1) can ISPs and governments identify traffic flows as OpenVPN connections in real time? and (2) can they do so at-scale without incurring significant collateral damage from false positives? Answering these questions requires more than just identifying fingerprinting vulnerabilities; although challenging, we need to demonstrate practical exploits under the constraints of how ISPs and nation-state censors operate in the real world.
We build a detection framework that is inspired by the architecture of the Great Firewall [1, 71, 11], consisting of Filter and Prober components. A Filter performs passive filtering over passing network traffic in real time, exploiting protocol quirks we identified in OpenVPN’s handshake stage. After a flow is flagged by a Filter, the destination address is passed to a Prober that performs active probing as confirmation. By sending probes carefully designed to elicit protocol-specific behaviors, the Prober is able to identify an OpenVPN server using side channels even if the server enables OpenVPN’s optional defense against active probing. Our two-phase framework is capable of processing ISP-scale traffic at line-speed with an extremely low false positive rate.
In addition to core or “vanilla” OpenVPN, we also include commercial “obfuscated” VPN services in this study. In response to increasing interference from ISPs and censors, obfuscated VPN services have started to gain traction, especially from users in countries with heavy censorship or laws against the personal usage of VPNs. Obfuscated VPN services, whose operators often tout them as “invisible” and “unblockable” [54, 49, 5], typically use OpenVPN with an additional obfuscation layer to avoid detection [66, 2].
Figure 1 :OpenVPN Session Establishment (TLS mode).
Partnering with Merit (a mid-size regional ISP that serves a population of 1 million users), we deploy our framework at a monitor server that observes 20 Gbps of ingress and egress traffic mirrored from a major Merit point-of-presence. (Refer to § 5 for ethical considerations.) We use PF_RING [ 38 ] zero – copy mode fast packet processing parallelizedFilters. In our tests, we are able to identify 1718 out of 2000 flows originating from a control client machine residing within the network, corresponding to 39 out of 40 unique “vanilla” OpenVPN configurations.
More strikingly, we also successfully identify over two-thirds of obfuscated OpenVPN flows. Eight out of the top 10 providers offer obfuscated services, yet all of them are flagged by our Filter. Despite providers’ lofty unobservability claims (such as “ even your Internet provider can’t tell that you’re using a VPN” [49]), we find most implementations of obfuscated services resemble OpenVPN masked with the simple XOR-Patch [36], which is easily fingerprintable. Lack of random padding at the obfuscation layer and co-location with vanilla OpenVPN servers also make the obfuscated services more vulnerable to detection.
typical day , single – server setup is analyzes analyzes 15 TB traffic 2 billion flows . – day evaluation , framework is flagged flagged 3,638 flows OpenVPN connections . , we is are able find evidence supports detection results 3,245 flows , suggesting upper – bound false – positive rate orders magnitude lower previous ML – based approaches [26, 3, 14].
We conclude that tracking and blocking the use of OpenVPN, even with most current obfuscation methods, is straightforward and within the reach of any ISP or network operator, as well as nation-state adversaries. Unlike circumvention tools such as Tor or Refraction Networking [ 8 , 74 ], which employ sophisticated strategies to avoid detection, robust obfuscation techniques have been conspicuously absent from OpenVPN and the broader VPN ecosystem. For average users, this means that they may face blocking or throttling from ISPs, but for high-profile, sensitive users, this fingerprintability may lead to follow-up attacks that aim to compromise the security of OpenVPN tunnels [ 51 , 40 ]. We warn users with heightened threat models not expect VPN usage unobservable , connected obfuscated services . propose short – term defenses fingerprinting exploits described paper , we is fear fear , long term , cat – – mouse game similar Great Firewall Tor imminent VPN ecosystem . We is implore implore VPN developers providers develop , standardize , adopt robust , – validated obfuscation strategies adapt threats posed adversaries continue evolve .
Effective investigation of fingerprintability requires incorporating perspectives of how ISPs and censors operate in practice. It is not enough to simply identify fingerprinting vulnerabilities, we need to demonstrate realistic exploits to illustrate the practicality of exploiting the vulnerability, while taking into consideration the ISP and censors’ capabilities and constraints [56]. For instance, previous academic works considered using flow-level features to train ML classifiers for VPN detection [17, 68, 24, 26, 3, 14]. , it is remains remains unclear practical detection approaches ISPs censors , know rigorous studies examine real – world deployment ML – based censorship system [56]. Furthermore, previous works test on the ISCXVPN2016 dataset [ 17 ] with balanced OpenVPN and non-VPN traffic. However, we note that due to the low base rate of VPN traffic in the wild, even the best-performing ML system has false positive rates that can be economically impractical for real-world censors sensitive to collateral damage [67].
, investigations is challenging adopting viewpoint ISPs censors challenging . , investigation is requires requires collaboration real – world ISPs access network traffic . We is need need install monitors inside ISP network , ensuring analysis affect ISP normal routing operations . Furthermore , analyzing traffic real users raises ethical concerns . Processing raw network data is violate violate privacy users , particular VPN users heightened threat model . Finally , deploying system performs ad – hoc traffic analysis real time poses significant engineering challenges . We is need need ensure entire analysis framework ( including processing logging ) keeps pace packet arrival rate consideration effect potential asymmetric routing packet loss analysis results .
Raw network traffic that contains real users’ data is highly sensitive, and this is especially true for traffic related to privacy-oriented services such as VPNs. Here we describe how we consider the security and privacy risks and ethical issues raised by our work, and we detail the procedural and technical steps we take to mitigate the risks.
Foremost among the ethical concerns associated with this work is our Filter deployment inside Meritnetwork analyze user traffic .Merit, which has extensive previous experience collaborating with universities and has well-defined ethics and privacy rules to govern such projects, supervised the deployment. We also cleared our research plan with our university legal counsel and IRB. Although the IRB determined that the work is not regulated, we take extensive measures to minimize potential risks for end-users.
Our framework is fine-tuned on both real and lab-generated traffic data, and it is evaluated on live ISP traffic. For controlled fine-tuning, a small traffic snapshot (the ISP Dataset in section 7) was used to calibrate parameters, e.g., the size of observation window. The traffic snapshot, sampling 1/30 of all flows for 45 minutes on July 28, 2021, was generated and analyzed entirely on Merit systems, with security mechanisms limiting access to select members of the team. As with the design described in Section 6, Filter analyzed only the first payload byte, completely ignoring the remainder of the payload, and it recorded only the observed degree of variation. The raw snapshot was never inspected by humans and was deleted after the fine-tuning concluded.
deployment evaluation live ISP traffic ,Filter architecture is designed to minimize risks of disrupting or modifying user traffic. The Monitoring Station only receives a copy of the traffic, so even if our software were to malfunction, network service would be unaffected.
In addition, to reduce privacy risks, the Filter collects only the minimum information necessary for the subsequent probing operation. It records only the server IP addresses and ports of matching connections, which are bucketed into 5-minute internals to inhibit time correlation. These logs are stored and analyzed on a server that is securely maintained by Merit and is accessible only to a few members of our research team on a least-privilege basis. Merit reviewed our source code prior to deploying it on their network. During deployment and evaluation, no packet payloads or client IP addresses are ever recorded to disk or inspected by humans.
Based on the Filter log, the Probers send probes to candidate VPN servers. To minimize the risk of disrupting server operations, we design the probes to be non-invasive and make information available to assist operators in debugging any problems we inadvertently cause. Each server receives only 2–10 innocuous connection attempts, similar to those commonly used in Internet measurement tools like Nmap. The probes originate from two dedicated machines that we provisioned with web pages that explain the nature of the experiment and provide our contact information. We did not receive any inquiries, complaints, or problem reports.
Since the server IP addresses themselves may sometimes be non-public, we only report aggregate statistics (e.g., the false positive rate) and will not publish any of the addresses that we collect. Any data requests will be referred to Merit.
As with all attack-oriented research, there is a risk that our work developing VPN fingerprinting techniques will be adopted by real attackers. To minimize this risk, we are in the process of responsibly disclosing our findings to the VPN operators whose obfuscated servers we successfully identified in our evaluation. We believe that the security of the VPN ecosystem is best advanced by having these problems surfaced by responsible researchers. Our work will help accurately inform users about the VPN services they rely on, and we hope it will enable more robust countermeasures to be developed and deployed.
We set out to explore if an ISP or censor can fingerprint OpenVPN connections at scale, without significant collateral damage. Adopting the viewpoint of an adversarial ISP, we deploy our framework inside Merit, as shown in Figure 2. Our evaluation is two-fold: we generate control vanilla and obfuscated flows with commercial VPN providers and attempt to identify them as a network intermediary; we also process other traffic passing through our Monitoring Station in order to estimate the false positive rate of our framework.
We set up our framework on a 16-core server (Monitoring Station) inside Merit with two mirroring interfaces that have an aggregated 20 Gbps bandwidth. Due to the large traffic volume, we optimize our deployment with PF_RING [ 38 ] in order to improve the packet processing speed. We employ PF_RING in zero-copy mode and spread the traffic load across a Zeek cluster of 15 workers. Nonetheless, due to limited CPU resources, we only sample 12.5% of all TCP and UDP flows arriving at the network interfaces in order to minimize the effect of packet loss. The sampling is based on IP pairs so that all bi-directional traffic of a flow will be selected/dropped together. With these settings, we are able to operate with an end-to-end packet loss rate under 3%. Even though we process only a fraction of all traffic, our Filter still handles over 15 Terabytes of traffic from over 2 billion flows on an average day on a single server. In addition, processing all traffic without sampling is feasible through parallelism or using faster CPUs.
Next, we set up Probers on two dedicated measurement machines, each provisioned with 10 IPv4 and 1 IPv6 addresses. By the end of each day during the evaluation, the Probers fetch filtering logs from the Monitoring Station. For each target, we run a Masscan [25] /29 subnet IP belongs TCP ports ( 1 – 65535 ) . We is follow follow discovered open port running probing scheme , endpoints confirmed probing recorded manual analysis .
To select VPN services for evaluation, we first generate a list of “top” VPN services ranked by popularity. We combine 80 providers, most of which are paid premium VPN services, from top VPN recommendation sites based on previous work [ 42 ], listed in Appendix Table 4. Next, we visit the websites of these VPN providers searching for “Obfuscation”, “Stealth”, or “Camouflage Mode” etc ., and include providers that offer at least one obfuscated VPN configuration.
In total, we find 24 providers offering obfuscated services. We test all obfuscation configurations if more than one is offered as well as vanilla OpenVPN for each provider. If TCP and UDP modes are both available, we test them separately. In total, we have 81 configurations, 41 of which are obfuscated ones.
We is configure configure Client Station insideMerit act VPN client . upstream downstream traffic is go Client Station router mirrors traffic Monitoring Station . addition , we is exclude exclude server random samplingall traffic / server analyzed . client , we is run run automated script generatecontrol traffic for our evaluation. For each iteration, we start the VPN client application and connect to the “default / recommended” server using Pywinauto [ 41 ]. After a random wait of 20 to 180 seconds, we confirm that the VPN tunnel is active and generate random browsing traffic with Selenium [45] by sending requests to a random website from the Alexa top 500. Finally, we disconnect from the VPN server and wait for 180 seconds before proceeding to the next iteration. For each VPN configuration, we repeat the process 50 times and collect packet captures for reference.