Social Physics and Cybercrime Detection

The common social laws found in human behavioral data may help solve cybercrimes.

MIT IDE
MIT Initiative on the Digital Economy

--

By Irving Wladawsky-Berger

A few weeks ago I wrote about social physics, a new discipline that aims to help us better understand and predict the behavior of human groups. Social physics is based on the premise that all event-data representing human activity — e.g. phone call records, credit card purchases, taxi rides, web activity — contain a special set of group behavior patterns. As long as the data involves human activity— regardless of the type of data, the demographic of the users or the size of the data sets — similar behavioral dynamics apply. These patterns can be used to detect emerging behavioral trends before they can be observed by other data analytics techniques.

Physics, biology and other natural sciences have long relied on universal patterns or principles to detect a faint signal within a large data set,— i.e. the proverbial needle in a haystack. It’s what has enabled the discovery of very short lived elementary particles in physics,— like the Higgs boson in 2013 — amid the huge amounts of data generated by high-energy particle accelerators. In biology, it’s given rise to DNA sequencing and its growing list of applications in medicine, biotechnology, and other disciplines.

Studying the Behavior of Human Crowds

It’s not surprising that evolutionary biology and natural selection have led to similar universal patterns in the behavior of human crowds. Humans and our ancestors have evolved with the drive to learn from each other because it’s been a major part of our survival over millions of years. And if a new behavior, — whether the result of an innovative idea like the discovery of tools, or a mutation like a larger brain size— helps a human group better adapt to a changing environment, natural selection will favor the survival of that group over others.

Social physics originated in MIT’s Human Dynamics Lab based on research by professor Alex (Sandy) Pentland, his then postdoctoral associate Yaniv Altshuler and their various collaborators.

In 2014, Pentland and Altshuler co-founded Endor, an Israeli-based startup that leverages social physics methods to make fast accurate predictions by analyzing data derived from human behavior.

In my earlier social physics article, I wrote about its application to trading based on the analysis of data from the social trading platform eToro. I would now like to discuss the application of social physics to cybercrime, as described in a recent paper by Altshuler and Pentland published in New Solutions for Cybersecurity.

Social Physics vs Machine Learning

The paper starts out by explaining how social physics differs from and complements machine learning methods. Machine learning and related algorithms like deep learning have played a central role in AI’s recent achievements. These advanced statistical methods have enabled the creation of AI algorithms that can be trained with lots and lots of sample inputs instead of being explicitly programmed. They’ve been most successful when applied to complex problems like machine translation and image and voice recognition where a huge body of data is available and the data is fairly static, that is, the training data— e.g. pictures of cats or the English language — change very infrequently.

Data derived from human behavior is quite different. It’s dynamic, highly versatile, ever-changing and influenced by complex social interactions. Human behaviors exhibit a high degree of variance which make them hard to predict and subject to emergence— where the whole might well be different from the sum of the parts. Predicting human behavior requires the ability to frequently analyze relatively small data sets collected over short periods of time.

“Social physics approaches data from a completely different angle,” write Altshuler and Pentland. “Instead of deriving patterns from input data itself, it is based on the discovery that all human behavioral data is guaranteed to contain within it a set of common social behavioral laws — mathematical relationships that emerge whenever a large enough number of people operate in the same space.”

A few key capabilities differentiate social physics from other analytic methods:

  • It’s content agnostic — you don’t need to know what question to ask, just give examples of the entities of interest (EOI) to search for in the form “here is an example X, find me more of X.”
  • Entities similar to the defined EOIs are searched within the data, based mainly on temporal correlations, which can be done much more quickly and accurately than machine learning algorithms.
  • It’s able to detect dynamic behaviors that correlate with the EOIs in real-time, which might indicate emerging or hidden patterns.
  • Social physics searches for patterns, not content, thus it can analyze fully encrypted data sets, enabling financial companies, health care providers or blockchains to maintain data privacy.

Finding Hidden Threats

The paper discusses two concrete applications of social physics for the detection of cybercrimes. The first application is about detecting ISIS activities on Twitter. Recently, an intelligence agency provided Endor with the metadata of 15 million Tweets’s for analysis in the Endor platform. As a test of the platform’s capabilities, the agency revealed the identity of 50 accounts known to be ISIS activists whose Tweets were included in the input data, and asked Endor to detect an additional 74 accounts that were hidden in the data.

Endor’s analytics engine identified 80 Twitter accounts as potential EOIs because they were similar enough to the positive samples that the agency provided. Forty-five were correct matches— part of the list of 74 hidden accounts, while 35 were false positives results. Such a law false rate makes it possible for human experts to further investigate the targets.

Three key reasons make social physics an ideal tool for detecting such hidden threats in the cyber environment, as was the case in this particular application:

  1. “The ability to connect to structured data streams in a semantics agnostic way enabled the social physics engine to efficiently process streams written in foreign languages, such as Arabic, Urdu, or Farsi, that many mainstream data-analysis tools cannot easily digest.”

2. “Similarly, the use of code-words, evasive behavior or any other attempt to mask one’s intentions, activity, or social ties by metadata or language manipulations — frequent in cyber-terrorism and intelligence use cases — can easily be deciphered (or more accurately, bypassed altogether) using social physics.”

3. “Traditional intelligence analysis often resembles a long process of locating numerous pieces of a single puzzle and meticulously putting them together, unraveling a hidden story. Using social physics, on the other hand… the Social Physics engine receives a ‘loose thread’ from the analyst as input, and automatically sifts hundreds of the most relevant pieces, ready for the analyst to quickly browse through them, and build the complete global picture.”

The second application entailed the detection of fraudulent bitcoin activity. Since bitcoin transactions don’t involve a central authority or trusted third-party, it’s become a payment method of choice for a variety of cybercrime players. In addition, while bitcoin’s blockchain-based infrastructure is highly secure, bitcoin exchanges have been repeatedly hacked over the years. And once bitcoins are stolen, it’s near impossible to retrieve them. If you’ve stored bitcoins in an exchange that’s been hacked, they’re essentially lost.

The entire history of bitcoin transactions are publicly available in the bitcoin blockchain, although all identity information is encrypted. However, social physics can be used to analyze such encrypted bitcoin blockchains looking for clusters of transaction that appear too correlated. “This is done by detecting Bitcoin transactions patterns that social physics dictates are highly unlikely to spontaneously emerge. These behavioral correlations can then be matched against a given set of positive labels (for example, a small set of Bitcoin accounts known to be in possession of stolen Bitcoins) resulting in the detection of behavioral correlations (each representing a ‘real world commonality’) that are associated with the stolen Bitcoins in question.”

Originally published at blog.irvingwb.com on October 1, 2018.

--

--

MIT IDE
MIT Initiative on the Digital Economy

Addressing one of the most critical issues of our time: the impact of digital technology on businesses, the economy, and society.