Friday, August 30, 2013

Does CSE receive cash from NSA?

Among the recent Edward Snowden revelations was the news that Britain's SIGINT agency, GCHQ, has received approximately £100 million ($150 million U.S.) from the NSA over the last three years (W.M.N.A. Greenwood, "US paid £15.5m for upgrade at Cornwall spy base," West Morning News, 2 August 2013).

The cash payments were intended at least in part to support GCHQ's recent cable-tapping operations, through which the UKUSA allies gain access to major sources of Internet and telephone traffic, but such subsidies have long been alleged within the UKUSA community.

Desmond Ball and Jeffrey Richelson's groundbreaking book The Ties That Bind: Intelligence Cooperation Between the UKUSA Countries (Allen & Unwin, 1985), for instance, reported that "the US subsidizes, to a great extent, both British and Australian signals intelligence activities and, presumably, those of Canada as well" (p. 8).

Canada's geographic location makes it much less important than the U.K. as a possible location for intercepting cable traffic (although presumably some such interception is done). But it is possible that some CSE activities are receiving or have received support from the NSA in return for meeting various U.S. needs.

The Canadian government has never acknowledged receiving such support, but it may be significant that a specific account exists within the government's Financial Reporting Accounts to record payments that CSE receives from foreign governments. According to Public Works and Government Services Canada, this account, no. 23415, "is used by Communications Security Establishment to record funds received from foreign governments, to cover expenditures to be made on their behalf, in accordance with the provisions of agreements with the Government of Canada."

Thursday, August 29, 2013

Capacity of the Utah Data Center

The Utah Data Center (UDC), a one-million-square-foot data storage warehouse being built by the NSA at Camp Williams, Utah, is scheduled for completion next month. Estimates of the storage capacity of this facility range from as high as a yottabyte or more of data, enough to store this year’s global Internet traffic more than one thousand times over, to as low as three exabytes. However, an assessment based on the facility’s size and projected power consumption suggests that its initial capacity is likely to be around 7 to 10 exabytes. A 7- to 10-exabyte data warehouse would be an extremely large facility, but even at 10 exabytes it would be only one hundred-thousandth the size of a yottabyte facility.

How big is the Utah Data Center?

The NSA has not released any information about the data storage capacity of the UDC. But it has released some details about its size and physical infrastructure.

According to the NSA, the facility will contain four data halls with a total of 100,000 square feet of raised floor. Other structures totaling approximately 900,000 square feet will host support functions such as materials storage, administration, cooling, and backup power.

The NSA has also reported that the facility is being built with the capability to deliver 65 megawatts “technical load to the raised floor”. This figure has often been taken to represent the total electrical power consumption of the entire facility, but it is likely that this careful wording refers only to the power delivered to the IT equipment in the data halls. Data centers have other power needs, most significantly with respect to providing cooling to the IT equipment.

The scale of the cooling requirements at the UDC can be gleaned from the projected water consumption at the facility. According to the NSA, the UDC’s water usage at full load will be “approx. 1.7 million gal/day”. Cooling towers lose water, principally through evaporation, as part of their normal operation. Typically, about two gallons of water are lost per hour to provide one “ton” of cooling. This suggests that the cooling towers at the UDC will have a cooling capacity of about 36,000 tons. Photos of the facility show that 36 cooling tower cells, each with a 12-foot fan, have been constructed (18 at each end of the complex), which suggests that each cell can provide about 1000 tons of cooling. Another estimate puts the cooling capacity at the facility at 60,000 tons, but this does not seem likely unless additional, alternative cooling technologies that do not require water consumption are also used.

One ton of cooling typically also consumes about 0.6 kW of electrical power, which suggests that the UDC will require approximately 22 MW for cooling when running at its full 36,000-ton load (or 36 MW if 60,000 tons is accurate). Assuming the lower figure is the more correct, the total power consumption (IT, cooling, and other) of the facility at peak is likely to be around 90–95 MW.

Not all of this power will be needed immediately, however. According to the NSA, the UDC will require only 30 MW in its data halls when it first opens, which suggests an initial total power requirement of around 40 MW at the facility. This lower initial power requirement probably relates to the initial rate at which data will be delivered to the facility and accessed for processing by NSA. It would not make sense to run, or even to install, the facility’s full potential capacity months or years before it needs to be utilized. It will probably be a couple of years before the NSA needs to run the UDC at or near its full power.

Moore’s Law

And even at that point the facility is unlikely to be “full”. As long as storage technology continues to advance, the replacement of older storage drives with newer generations will enable the capacity of the UDC to continue to increase. Hard drive storage capacity per unit cost has grown by a factor of 10 every 4.2 years for the last three decades, and storage capacity per Watt has grown at a similar rate. If these growth rates can be sustained, the capacity of the UDC could reach the yottabyte range—an almost unthinkably huge amount of data today—in as little as 20 years’ time.

On-going growth in storage capacity does appear to be the NSA’s intent. As the Salt Lake Tribune reported in June 2013, NSA officials will not provide “exact numbers on how much data the NSA is preparing to store at Bluffdale,” but they will confirm that “they built the center’s capacity with an eye on Moore’s law”.

How much data can the UDC hold now?

Assuming that the UDC will operate roughly at but not significantly beyond the best level of technology available to large, commercial data centres, the question of how much data can be stored at the facility can probably be answered within a reasonable range of accuracy by looking at comparable commercial large-scale storage capabilities.

In June 2013 the Internet Archive’s Brewster Kahle estimated that current cloud storage typically uses about 5 kW per petabyte, or 5 MW per exabyte (IT load only). He later reduced this estimate to about 4.2 MW per exabyte (IT load only). Kahle’s newer estimate suggests that the initial storage capacity of the UDC may be about 7.2 exabytes, or perhaps as high as 15 exabytes if the site’s full projected power consumption is used as the basis of calculation. Based on the size of its data halls and perhaps also his power consumption estimates, Kahle himself estimated a 12-exabyte capacity for the data center.

In September 2011, Chris Gladwin of Cleversafe Inc. projected that a 10-exabyte storage facility built in 2014 or 2015 would use 10-TB drives and would require only 11 MW of power, or about 1.1 MW per exabyte (IT load only). Rolling Gladwin’s projection back towards the present suggests that a 10-exabyte system built with today’s technology (4-TB drives) would require around 27.5 MW, which is about the amount of power the NSA expects to be using when the UDC opens.

Gladwin was recently quoted by Forbes to the effect that the overall size of a data facility capable of storing 10 exabytes would have had to have been about two million square feet in January 2012 but would need to be only one million square feet now, which also suggests that 10 exabytes may be a good estimate of the current storage capacity of the one-million-square-foot UDC.

The overall size of a facility is at best a very crude measure of its storage capacity, however. A more promising approach to the question might be to compare the amount of raised-floor space available in the facility’s data halls to the physical space required to house a given level of storage capacity with today’s technology.

Disk drives and their associated servers are commonly installed in 19”-wide equipment racks. An efficiently laid out IT room can require as little as 25–30 square feet per rack, including aisles and other space for accessing the equipment, which suggests that the UDC’s data halls could hold as many as 3,300–4,000 racks in total. The amount of storage that can be accommodated in a single rack is constantly growing, but numbers in excess of 2 petabytes are already feasible. The manufacturer Xyratex Ltd., for example, announced in 2012 that its products could accommodate up to 2.5 petabytes per rack, which suggests that the UDC may be capable of storing as much as 8.3–10 exabytes of data, i.e., about the same as the capacity suggested by the facility’s power consumption numbers.

At least one expert believes that storage of this scale is not yet feasible at the facility. Paul Vixie, also quoted in the Forbes article, estimates that there will be “less than three exabytes of data capacity” at the UDC when it opens. However, the facility’s projected power consumption suggests that Vixie’s estimate is likely to be low. If it were correct, it would mean that even at its low initial power consumption, the UDC would require more than 10 MW of power for every exabyte of data stored (and closer to 15 MW once cooling requirements are included). This seems implausibly inefficient for an all-new facility that is seeking LEED Silver certification.

Thus, the most plausible range for the UDC’s initial storage capacity seems likely to be roughly 7 to 10 exabytes.

What about tape storage?

The numbers used up to this point assume that all of the data stored at the UDC will be stored on disk. This will undoubtedly be the case for data that the NSA needs to have instantaneous access to, such as the metadata used for data mining or indexing intercepted communications. But for cost reasons the agency may prefer to store data that does not require rapid or frequent access on tapes that are loaded into a tape drive only when required. According to a recent Clipper Group study, the purchase cost of storage tapes, tape drives, and automated tape library systems is significantly lower than the cost of comparable disk storage systems. Furthermore, for data that is rarely accessed, the energy costs of tape storage can be as low as 1% of those for disk storage.

The Clipper Group study estimated that tape storage uses roughly one-quarter of the data hall space required for the same amount of disk storage. If, for the sake of analysis, we were to assume that the UDC dedicated half of its data hall space to tape storage, we might expect the total storage capacity at the facility to be around 2.5 times the amount that could be provided solely using disk systems. If the amount of disk-based storage that can currently be accommodated in half of the UDC’s data hall space is around 5 exabytes, for example, the expected overall storage capacity of the UDC would be around 25 exabytes.

This number is probably too high, however. There are probably significant limits to the proportion of its data that the NSA is willing to consign to tape storage, and the extensive power and cooling distribution systems constructed at the UDC, and its large projected power consumption, also argue against the possibility that tape will displace disk-based systems for a large proportion of the site’s storage. Thus, although it is possible that the NSA has opted to include the use of tape at the UDC, tape storage seems unlikely to significantly exceed the scale of disk storage at the site. Allowing for the possibility of tape storage does mean that the initial storage capacity of the site could plausibly be as high as 15–20 exabytes, however.

NSA whistleblower William Binney provided a somewhat similar estimate of the facility’s size in a September 2012 legal declaration, stating that the facility is likely to store “in the range of multiples of ten exebytes [sic]” of data. (Binney has also been associated with much larger estimates of the facility’s size, but it seems likely that he was referring to its possible future growth in those comments.)

What will NSA store at the UDC?

As the revelations from another whistleblower, Edward Snowden, have confirmed, the torrent of information flowing through the global Internet is now NSA’s largest target. The 7–10 exabytes (or more) that NSA may soon be able to store at the UDC is a very large amount of data, but it is minuscule in comparison to the total amount of information flowing through the Internet, which according to an estimate recently cited by the NSA is nearly 670 exabytes per year. Even 20 exabytes represents only 3% of that gargantuan annual total.

But 7–10 exabytes is not minuscule in comparison to the amount of Internet data that the same NSA document says the agency “touches” (i.e., pulls out of the data streams flowing past) per year. According to that document, NSA currently processes in one form or another about 29 petabytes of Internet data per day—or about 10.6 exabytes per year. (A much smaller proportion of that processed data is actually viewed by analysts, of course.)

And that 10.6 exabytes of data, although it represents only 1.6% of the total data flow, probably comprises a fairly significant proportion of the Internet data that NSA is actually interested in monitoring and is capable of accessing.

Even if it were capable of doing so, NSA would have no interest in copying everything transmitted through the Internet. It is estimated, for example, that various forms of video (movies, TV shows, YouTube) constitute roughly 85% of all Internet traffic. No intelligence agency needs to store a separate, complete copy of Batman Returns every time someone streams the movie to her home. Music and online gaming, which are also responsible for large data flows, would also be of little interest.

Similarly, although NSA might want to record the web-browsing histories of individuals, it would not need to record a separate, complete copy of the front page of The New York Times every time one of that paper’s millions of readers downloaded it. One copy of each webpage, updated whenever the page was changed, would be enough to record its content. Metadata documenting the webpages visited is all that would be needed for most individual files.

The main data that the NSA would be interested in recording is original, user-generated content. Authoritative numbers for such traffic are hard to come by, but it is safe to conclude that original voice, text, chat, e-mail, and document traffic comprise only a small proportion of Internet data flows.

Cisco Systems estimates that global consumer Voice Over IP (VOIP) traffic, for example, currently accounts for approximately 159 petabytes per month [Cisco Visual Networking Index: Forecast and Methodology, 2011–2016, Cisco White Paper, 30 May 2012], which would total about 1.9 exabytes per year if it could all be accessed.

E-mail volumes are more difficult to estimate, but an indication of the scale of this traffic can be found in a recent report by the market research company Radicati Group, which estimated that 183 billion e-mails are sent every day by the world’s 2.4 billion e-mail users. Since this amounts to 76 e-mails per user per day, this total must include spam and other multiple-recipient traffic. If we generously assume that 10% of e-mails are original texts written by the sender (Symantec recently estimated that approximately 70% of e-mail is spam, and mailing lists and other forms of duplicated mail must account for a substantial part of the remainder), the maximum number of e-mails that NSA might wish to store, assuming it could access them all, would probably be no more than 7 trillion per year. If the average size of these e-mails, excluding attachments, is approximately 75 kilobytes (Microsoft implied in this graphic that the 150 petabytes of stored e-mail that it migrated from Hotmail to in 2013 was equivalent to about 2.2 trillion e-mails), the total volume it might wish to store would be something like 0.5 exabytes, assuming no further compression of the data were possible. Storing original attachments as well would of course raise this number substantially, perhaps adding several exabytes to the total.

Tweets and other forms of texting would take much less space. More than 400 million tweets are sent every day. But storing a tweet probably takes less than one kilobyte, even when associated metadata is included, so a year’s worth of the world’s tweets would probably take less than 150 terabytes to store. Chat and instant messaging texts presumably pose similar storage requirements.

These statistics suggest that the 10 or so exabytes of Internet traffic currently being processed by the NSA may contain a large proportion of the original, user-generated content being transmitted over the part of the Internet to which NSA and its allies have access. At the moment, the vast majority of that data is reportedly deleted after three days, presumably for lack of space to store it. Completion of the UDC will give the NSA the option to save most of that data indefinitely.

The Internet is not the only source of data that the NSA might wish to save, of course. Non-Internet telephony intercepts, to name perhaps the most important example, might also run into the exabyte range. An indication of the scale of this traffic can be gleaned from the report How Much Information?, which estimated that the total data content of all U.S. domestic phone calls (voice only) in 2008 was 1.36 exabytes. This figure counted each conversation twice, however: once for each participant (see footnote 45 on page 36 of the report). Storing each phone call once—a total of 39 billion hours of phone conversation—would thus take only 680 petabytes. Furthermore, if all of the calls were stored with the quality and compression of cell phone speech, it might be possible to reduce this number to approximately 180 petabytes.

It does not appear likely that the NSA is currently storing U.S. domestic phone calls. (Only the storage of phone call metadata has been confirmed.) But these statistics give a sense of the scale of the storage task posed by the world’s telephone communications. Assuming that the global per capita use of voice telephony is no greater than that of the United States, the storage space required to save all the phone calls made by everyone in the world could be as little as 3.6 exabytes per year.

The Swedish telecommunications company Ericsson recently estimated that worldwide mobile voice traffic comprises about 180 petabytes per month, or roughly 2.2 exabytes per year. Addition of landline traffic (not estimated by Ericsson) would make this number significantly larger, but it probably wouldn’t double it, suggesting that a 3.6-exabyte estimate for all non-Internet voice traffic may well be in the right ballpark.

These examples are not meant to be exhaustive. Telephony and Internet-based communications are not the only forms of data/metadata that NSA might wish to target for acquisition and storage. But these examples do demonstrate that the data streams generated by human communications and online activities are not impossibly large: it is becoming feasible for spy agencies with big budgets to store much of the voice and text data that humans generate.

With an initial capacity probably in the 7- to 10-exabyte range, the Utah Data Center is probably not large enough to store even a single year’s worth of the Internet data, telephony, and other information currently being processed (and mostly discarded) by NSA. But it may well have the room to store the material that NSA deems to be of potential current or future interest.

And within just a few years, if Moore’s Law holds, the UDC’s storage capacity will be large enough—and be growing fast enough—to accommodate the entire ten or more exabytes of Internet data acquired every year and all of the additional telephony, purloined computer files, and other data that the NSA obtains and might want to save. At that point the most significant limitation on the amount of data being stored at the UDC may well lie in the NSA’s ability to access the data it seeks and the capacity of its communications circuits to haul it all back to Utah from the various intercept points around the world.

[Update 11:00 pm: Corrected two instances where I wrote "terabytes" instead of "petabytes". Thanks for catching that, Brewster!]

Friday, August 23, 2013

CSE Commissioner's report raises legality questions

Outgoing CSE Commissioner Robert Decary's 2012-2013 annual report has raised questions about the legality of some of CSE's monitoring activities involving Canadians.

The Commissioner does not state that the activities were unlawful, but he does declare that because of incomplete records, he is not able to determine whether the activities broke the law or not. The report was tabled in Parliament by Defence Minister Rob Nicholson on Wednesday, while Parliament is conveniently not in session.

More details here (Lee Berthiaume & Jason Fekete, "Canadians may be victims of illicit spying," Postmedia News, 22 August 2013):
Canada’s super-secret electronic spy agency may have illegally targeted Canadians over the past year, a government watchdog has concluded.

The findings, contained in a report tabled by retired judge Robert Decary in Parliament Wednesday, are particularly explosive now given revelations prompted by whistleblower Edward Snowden about the U.S. government conducting widespread snooping of its citizens.

Decary, who has served as independent watchdog for the Communications Security Establishment Canada (CSEC) since 2010, said he discovered the potentially illicit spying during a routine review of the electronic surveillance agency’s activities over the past year.

“A small number of records suggested the possibility that some activities may have been directed at Canadians, contrary to the law,” Decary wrote in his report.

But Decary said he was unable to determine conclusively whether the snooping was legal or not because “a number of CSEC records relating to these activities were unclear or incomplete.”

“After (an) in-depth and lengthy review, I was unable to reach a definitive conclusion about compliance or non-compliance with the law.” ...

Decary has also completed a study into whether CSEC has pressed its American, British, Australian and New Zealand spy agency counterparts to respect long-standing promises not to snoop on Canadians.

That could shed light on what Canadian authorities knew about a massive telephone and Internet surveillance program in the U.S. called Prism.

However, it was not included in his report Wednesday because of an administrative error. ...

Decary also slammed the Conservative government for dragging its heels on implementing what he says are badly needed changes to the National Defence Act that will fix ambiguities in the legislation.

Following the Sept. 11, 2001 terrorist attacks in the United States, the federal government adopted the Anti-terrorism Act, which amended the National Defence Act and created legislative frameworks for both the commissioner and CSEC.

Repeated CSEC watchdogs have said clarification is needed to terms and definitions related to CSEC’s legislated authority, which would assist them in interpreting CSEC’s mandate and reviewing how it is applied.

“I started my mandate with the expectation that the legislative amendments to the National Defence Act proposed by my predecessors would soon be introduced in Parliament, but this has yet to happen,” Decary wrote in his report.

“I am deeply disappointed at the lack of action by the government, which is no longer in a minority situation, to address the ambiguities identified by my predecessors and myself.

“These amendments — as I have said many times before — would improve the provisions that were hastily enacted in the aftermath of September 11, 2001. The proposals to address the issues raised by commissioners should not, in my opinion, be controversial.”
CSE, unsurprisingly, assures us that no laws have been broken (Douglas Quan, "In wake of spying allegations, Communications Security Establishment Canada insists it didn’t break law," Postmedia News, 22 August 2013):
Ryan Foreman, a CSEC spokesman, said Thursday that the records in question dated back to the early 2000s and were related to spy activities directed at a “remote foreign location.”

“This conclusion does not indicate that CSEC has acted unlawfully,” Foreman said. “It indicates that certain material upon which the commissioner would have relied for his assessment was incomplete or not available for a number of reasons.”

Foreman said CSEC has since upgraded several of its systems to store and retain information better.
We are also, of course, given the misleading boilerplate assurances:
The agency is forbidden from spying on Canadians no matter where they are in the world. It is also prohibited from eavesdropping on individuals within Canada.

“CSEC respects this prohibition,” Foreman said.

Julie Di Mambro, Nicholson’s spokeswoman, echoed that statement, saying in an email that the privacy of Canadians is of “utmost importance.”

“CSEC is prohibited by law from directing its activities at Canadians anywhere in the world or at any person in Canada,” she said.
Everyone please repeat after me.

CSE has a three-part mandate. When acting under the first two parts of this mandate the agency is forbidden from directing its activities at Canadians and persons in Canada. But when it is acting under the third part, provision of support to security and law enforcement agencies, it can indeed monitor Canadians, as long as those agencies have the lawful authority themselves to do so.

Alles klar? Excellent.

I do not understand why these spokesthings think we will be more reassured if they tell us obvious lies. And, yes, telling half-truths with the intent of misleading is LYING.

Tuesday, August 20, 2013

July 2013 CSE staff size


(If you click through on the link and get a different figure, it's probably because the Canada Public Service Agency has updated its website; they update the numbers once a month.)

Wednesday, August 14, 2013

NSA fails at math

NSA’s recently released backgrounder The National Security Agency: Missions, Authorities, Oversight and Partnerships (9 August 2013) has a short but very interesting section on the “Scope and Scale of NSA Collection”:
According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world’s traffic in conducting their mission—that’s less than one part in a million. Put another way, if a standard basketball court represented the global communications environment, NSA’s total collection would be represented by an area smaller than a dime on that basketball court.
Now, there is a lot that could be said about this paragraph, but the first thing that should be said is that its numbers don’t add up.

Let’s do the math. The world’s Internet traffic is estimated at 1,826 petabytes per day. Fair enough. NSA “touches” about 1.6% of that traffic, or about 29.2 petabytes per day. Call it 30 petabytes (i.e., 0.164%), which is probably the round number NSA was working with. Fine. Of those 30 petabytes, 0.025%, or 7.5 terabytes, is “selected for review” per day by NSA analysts. OK. The net effect, we’re then told, is that NSA analysts look at 0.00004% of the world’s traffic, or less than one part in a million. Um, no. The net effect, if the first numbers are accurate, is that NSA analysts look at 0.0004% of the world’s traffic—that’s three zeroes after the decimal point, not four, and the overall math works out to four parts in a million, not "less than one". The NSA’s bottom-line number disagrees with its other numbers by a factor of ten.

With all the mathematicians working at NSA, you would think that the agency could get grade-school arithmetic right in what is, after all, a rather important public document. I guess everyone makes mistakes now and again.

The purpose of the paragraph, of course, is to suggest that the proportion of Internet traffic that the NSA monitors isn’t really very big, and you might consider the point pretty convincing even if the actual number is 0.0004% instead of 0.00004%. I think that number is a lot more significant than it appears, but I’ll leave that question for a future post.

[Update 30 August 2013: The Atlantic Wire also noticed NSA's math (Philip Bump, "The NSA Searches Ten Times as Much of the Internet as It Said It Does," Atlantic Wire, 19 August 2013).]