Saturday, January 06, 2007

Data mining for national security

Some links to information on CSE and data mining (previously blogged about here and here):
  • The Mathematics of Information Technology and Complex Systems (MITACS) project, a network of academia, industry and the public sector, including CSE, that was created to "harness the power of the mathematical sciences to address the inherent complexity of modern industrial and societal problems for the benefit of all Canadians". Projects include Semi-Supervised Learning in Large Graphs:
    As part of ongoing collaborations with the Communications Security Establishment (CSE), we are applying unsupervised and semi-supervised learning methods to understand transactions on large dynamic networks, such as telephone and email networks. When viewed as a graph, the nodes correspond to individuals that send or receive messages, and edges correspond to the messages themselves. The graphs we address can be observed in real-time, include from hundreds to hundreds of thousands of nodes, and feature thousands to millions of transactions. There are two goals associated with this project: firstly, there is the semi-supervised learning task, and rare-target problem, in which we wish to identify certain types of nodes; secondly, there is the unsupervised learning task of detecting anomalous messages. For reasons of efficiency, we have restricted our attention to meta-data of message transactions, such as the time, sender, and recipient, and ignored the contents of messages themselves. In collaboration with CSE, we are studying the problem of counter-terrorism, a semi-supervised problem in which some terrorists in a large network are labeled, but most are not.... Another common feature of counter-terrorism problems is the fact that large volumes of data are often "streamed" through various collection sites, in order to provide maximal information in a timely fashion. A consequence of efficient collection of transactions on very large graphs is that the data itself can only be stored for a short time. This leads to a nonstandard learning problem, since most learning algorithms assume that the full dataset can be accessed for training purposes. Working in conjunction with CSE, we will devise on-line learning algorithms that scale efficiently with increasing volume, and need only use each example once.
  • The National Program on Complex Data Structures project on Data Mining with Complex Data Structures, a team of academic researchers partnered with CSE and Generation 5 Technologies.

  • Examples of recent Canadian work in the field:

  • CSE career opportunities in data mining

  • A contrary view of the intelligence value of data mining: Effective Counterterrorism and the Limited Role of Predictive Data Mining (PDF file), Jeff Jonas and Jim Harper.


Blogger Pete's Blog said...

Interesting Bill.

The MITACs description is very explicit (I don't know whether CSE would have been delighted about that).

Interesting the English requirement for the CSE career add - due the Quebec-ese get tetchy about such a seeming barrier to equality?


January 07, 2007 10:21 am  
Blogger Bill Robinson said...

Hey, I was born in Quebec myself (although I'm not Quebecois).

Interesting question about the language requirement. I'm not aware of any major complaints. A francophone has to be fluent in English in order to work in most jobs in the Canadian public service, just as anglophones must be fluent in French for most such jobs. English-French bilingualism is certainly necessary for those hoping to be promoted to the higher ranks of the public service.

Where CSE differs is that French is not required for many of its jobs while English still is. This is of course a consequence of CSE's membership in the UKUSA community, where the working language is -- no surprise -- English. Nonetheless, bilingualism is necessary for some jobs at CSE, e.g., positions dealing with other government departments, and some (most?) of the agency's management positions. So even there it counts as an excellent career booster.

The government's Official Languages Commissioner audited CSE at least once (in 1992), reporting that "CSE has achieved progress in recent years in attaining more equitable acceptance of French as a language of work." The Commissioner also noted, however, that "Essential improvements still need to be made in all areas covered by the audit,... especially in active offer and in the provision of services in both languages." CSE's own website notes its commitment "to create a working environment known for its [respect for] Canada’s two official languages."

None of that means that CSE is or will remain immune from the politics of language in this country, but Quebecois concern (at least what I've seen expressed) has tended more to focus on the fear that CSE is being or has been used to monitor the Quebec separatist movement. In that respect, a shortage of French speakers at the agency may even provide some small measure of reassurance.

January 09, 2007 11:04 pm  
Blogger Pete's Blog said...

Thanks for the explanation Bill. Coming from Australia, which is about 99.7% English speaking (for the whole continent), the concept of official bilingualism is really alien.


January 30, 2007 10:06 am  

