Monday, August 15, 2011

Research: critiques of author recognition

Research: critiques of author recognition

With the high-profile Olympics less than 12-months away, the recent looting and rioting that involved the use of voicemail, instant messenging and web-based media and the approach to using "cybercrime" as the new-labelled tool to describe digital investigation and evidential seizure of alleged culpability, these research papers cover some useful ground that might be helpful in evaluating methodology previously unsuspected as fallible to error or mistake, and may have some useful application when applied in author recognition cases which might be relevant to evidence found on mobile phones and computers.

Authors vs. Speakers: A Tale of Two Subfields

The best part of Monday's post on the Facebook authorship-authentication controversy ("High-stakes forensic linguistics", 7/25/2011) was the contribution in the comments by Ron Butters, Larry Solan, and Carole Chaski. It's interesting to compare the situation they describe — and the frustration that they express about it — with the history of technologies for answering questions about the source of bits of speech rather than bits of text.

Practical Attacks Against Authorship Recognition Techniques

The use of statistical AI techniques in authorship recognition (or stylometry) has contributed to literary and historical breakthroughs. These successes have led to the use of these techniques in criminal investigations and prosecutions. However, few have studied adversarial attacks and their devastating effect on the robustness of existing classification methods. This paper presents a framework for adversarial attacks including obfuscation attacks, where a subject attempts to hide their identity imitation attacks, where a subject attempts to frame another subject by imitating their writing style. The major contribution of this research is that it demonstrates that both attacks work very well. The obfuscation attack reduces the effectiveness of the techniques to the level of random guessing and the imitation attack succeeds with 68-91% probability depending on the stylometric technique used. These results are made more significant by the fact that the experimental subjects were unfamiliar with stylometric techniques, without specialized knowledge in linguistics, and spent little time on the attacks. This paper also provides another significant contribution to the field in using human subjects to empirically validate the claim of high accuracy for current techniques (without attacks) by reproducing results for three representative stylometric methods.

No comments: