April 10, 2015
This post describes our analysis of China’s “Great Cannon,” our term for an attack tool that we identify as separate from, but co-located with, the Great Firewall of China. The first known usage of the Great Cannon is in the recent large-scale novel DDoS attack on both GitHub and servers used by GreatFire.org.
Authors: Bill Marczak1,2,3 (Lead), Nicholas Weaver2,3 (Lead), Jakub Dalek,1 Roya Ensafi,4 David Fifield,2 Sarah McKune,1 Arn Rey, John Scott-Railton,1 Ronald Deibert,1 Vern Paxson.2,3
Affiliations: (1) Citizen Lab, Munk School of Global Affairs, University of Toronto; (2) International Computer Science Institute; (3) University of California, Berkeley; (4) Princeton University
Section 1: Introduction, Key Findings
On March 16, GreatFire.org observed that servers they had rented to make blocked websites accessible in China were being targeted by a Distributed Denial of Service (DDoS) attack. On March 26, two GitHub pages run by GreatFire.org also came under the same type of attack. Both attacks appear targeted at services designed to circumvent Chinese censorship. A report released by GreatFire.org fingered malicious Javascript returned by Baidu servers as the source of the attack.1 Baidu denied that their servers were compromised.2
Several previous technical reports3 have suggested that the Great Firewall of China orchestrated these attacks by injecting malicious Javascript into Baidu connections. This post describes our analysis of the attack, which we were able to observe until April 8, 2015.
We show that, while the attack infrastructure is co-located with the Great Firewall, the attack was carried out by a separate offensive system, with different capabilities and design, that we term the “Great Cannon.” The Great Cannon is not simply an extension of the Great Firewall, but a distinct attack tool that hijacks traffic to (or presumably from) individual IP addresses, and can arbitrarily replace unencrypted content as a man-in-the-middle.
The operational deployment of the Great Cannon represents a significant escalation in state-level information control: the normalization of widespread use of an attack tool to enforce censorship by weaponizing users. Specifically, the Cannon manipulates the traffic of “bystander” systems outside China, silently programming their browsers to create a massive DDoS attack. While employed for a highly visible attack in this case, the Great Cannon clearly has the capability for use in a manner similar to the NSA’s QUANTUM system,4 affording China the opportunity to deliver exploits targeting any foreign computer that communicates with any China-based website not fully utilizing HTTPS.
Report Structure
We organize our Report as follows:
Section 2 locates and characterizes the Great Cannon as a separate system;
Section 3 analyzes DDoS logs and characterizes the distribution of affected systems;
Section 4 presents our attribution of the Great Cannon to the Government of China;
Section 5 addresses the policy context and implications;
Section 6 addresses the possibility of using the Great Cannon for targeted exploitation of individual users.
Section 2: The Firewall & The Cannon: Separate Systems, Significant Similarities
Figure 1. Simplified logical topology of the Great Cannon and Great Firewall
In general, a firewall serves as an in-path barrier between two networks: all traffic between the networks must flow through the firewall. In contrast, an on-path system like the Chinese “Great Firewall” (GFW) sits off to the side: it eavesdrops on traffic between China and the rest of the world (TAP in Figure 1), and terminates requests for banned content (for example, upon seeing a request for “http://www.google.com/?falun”,5 regardless of actual destination server) by injecting a series of forged TCP Reset (RST) packets that tell both the requester and the destination to stop communicating (INJECT RST in Figure 1).6 On-path systems have architectural advantages for censorship, but are less flexible and stealthy than in-path systems as attack tools, because while they can inject additional packets, they cannot prevent in-flight packets (packets that have already been sent) from reaching their destination.7 Thus, one generally can identify the presence of an on-path system by observing anomalies resulting from the presence of both injected and legitimate traffic.8
The GFW keeps track of connections and reassembles the packets (“TCP bytestream reassembly”) to determine if it should block traffic. This reassembly process requires additional computational resources, as opposed to considering each packet in isolation, but allows better accuracy in blocking. While a web request often fits within a single packet, web replies may be split across several packets, and the GFW needs to reassemble these packets to understand whether a web reply contains banned content.
On any given physical link (e.g., a fiber optic cable), the GFW runs its reassembly and censorship logic in multiple parallel processes9 (perhaps running on a cluster of many different computers). Each process handles a subset of the link’s connections, with all packets on a connection going to the same process. This load-balanced architecture reflects a common design decision when a physical link carries more traffic than a single computer can track. Each GFW process also exhibits a highly distinctive side-channel — it maintains a counter, and numbers the forged TCP Reset packets it injects (via the value of the IP TTL field).
The Great Cannon (GC) differs from the GFW: as we will show, the GC is an in-path system, capable of not only injecting traffic but also directly suppressing traffic, acting as a full “man-in-the-middle” for targeted flows. The GC does not actively examine all traffic on the link, but only intercepts traffic to (or presumably from) a set of targeted addresses. It is plausible that this reduction of the full traffic stream to just a (likely small) set of addresses significantly aids with enabling the system to keep up with the very high volume of traffic: the GC’s full processing pipeline only has to operate on the much smaller stream of traffic to or from the targeted addresses. In addition, in contrast to the GFW, the GC only examines individual packets in determining whether to take action, which avoids the computational costs of TCP bytestream reassembly. The GC also maintains a flow cache of connections that it uses to ignore recent connections it has deemed no longer requiring examination.
The GC however also shares several features with the GFW. Like the GFW, the GC is also a multi-process cluster, with different source IP addresses handled by distinct processes. The packets injected by the GC also have the same peculiar TTL side-channel as those injected by the GFW, suggesting that both the GFW and the GC likely share some common code. Taken together, this suggests that although the GC and GFW are independent systems with different functionality, there are significant structural relationships between the two.
In the attack on GitHub and GreatFire.org, the GC intercepted traffic sent to Baidu infrastructure servers that host commonly used analytics, social, or advertising scripts. If the GC saw a request for certain Javascript files on one of these servers, it appeared to probabilistically take one of two actions: it either passed the request onto Baidu’s servers unmolested (roughly 98.25% of the time), or it dropped the request before it reached Baidu and instead sent a malicious script back to the requesting user (roughly 1.75% of the time). In this case, the requesting user is an individual outside China browsing a website making use of a Baidu infrastructure server (e.g., a website with ads served by Baidu’s ad network). The malicious script enlisted the requesting user as an unwitting participant in the DDoS attack against GreatFire.org and GitHub.