Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] Linguistics identifies anonymous users

Shava Nerad shava23 at
Wed Jan 9 05:34:02 PST 2013

Such a framework can be social engineered as easily as SEO.  I make a small
living as a ghost writer and speech writer - the informal version of that
very process. Several of my clients say my writing sounds more like them in
print than they do, because they are less facile writers - but that is a
fault that could be avoided in competent forgeries. ;)

On Jan 9, 2013 8:25 AM, "Eugen Leitl" <eugen at> wrote:

> Linguistics identifies anonymous users
> By Darren Pauli on Jan 9, 2013 9:49 AM
> Researchers reveal carders, hackers on underground forums.
> Up to 80 percent of certain anonymous underground forum users can be
> identified using linguistics, researchers say.
> The techniques compare user posts to track them across forums and could
> even
> unveil authors of thesis papers or blogs who had taken to underground
> networks.
> "If our dataset contains 100 users we can at least identify 80 of them,"
> researcher Sadia Afroz told an audience at the 29C3 Chaos Communication
> Congress in Germany.
> "Function words are very specific to the writer. Even if you are writing a
> thesis, you'll probably use the same function words in chat messages.
> "Even if your text is not clean, your writing style can give you away."
> The analysis techniques could also reveal botnet owners, malware tool
> authors
> and provide insight into the size and scope of underground markets, making
> the research appealing to law enforcement.
> To achieve their results the researchers used techniques including
> stylometric analysis, the authorship attribution framework Jstylo, and
> Latent
> Dirichlet allocation which can distinguish a conversation on stolen credit
> cards from one on exploit-writing, and similarly help identify interesting
> people.
> The analysis was applied across millions of posts from tens of thousands of
> users of a series of multilingual underground websites including
>,,, and
> It found up to 300 distinct discussion topics in the forums, with some of
> the
> most popular being carding, encryption services, password cracking and
> blackhat search engine optimisation tools.
> While successful, the work faces a series of challenges. Analysis could
> only
> be performed using a minimum of 5000 words (this research used the "gold
> standard" of 6500 words) which culled the list of potential targets from
> tens
> of thousands to mere hundreds.
> It also needs to separate discussion on product information like credit
> cards, exploits and drugs from conversational text in order to facilitate
> machine learning to automate the process, according to researcher Aylin
> Caliskan Islam.
> And posts must be translated to English, a process which boosted author
> identification from 66 to around 80 per cent but was imperfect using freely
> available tools like Google and Bing.
> However both of these tasks were performed successfully, and further
> development including the use of "exclusive" language translation tools
> would
> only serve to boost the identification accuracy.
> Leetspeak, an alternative alphabet popular in some forum circles, cannot be
> translated.
> The project is ongoing and future work promises to increase the capacity to
> unmask users. This Islam said would include temporal information which
> would
> exploit users who logged into forums from the same IP addresses and wrote
> posts at around the same time.
> Antichat user analysis
> "They might finish work, come home and log in," Islam said.
> It could also tie user identities to the topics they write about and
> produce
> a map of their interactions, identify multiple accounts held by a single
> author, and combine forum messages with internet relay chat (IRC) data
> sets.
> "We want to automate the whole process."
> Afroz said while the work appeals to law enforcements and government
> agencies, it is not designed to catch users out.
> "We aren't trying to identify users, we are trying to show them that this
> is
> possible," she said.
> To this end, the researchers released tools last year, updated last
> December,
> which help users to anonymise their writing.
> One tool, Anonymouth, takes a 500 word sample of a user's writing to
> identify
> unique features such as function words which could make them identifiable.
> The other, JStylo, is the machine learning engine which powers Anonymouth.
> The Drexel and George Mason universities research team is composed of Sadia
> Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon
> McCoy.
> --
> Unsubscribe, change to digest, or change password at:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the liberationtech mailing list