The rise of novel Twitter social spambots

2 downloads 290 Views 9MB Size Report
11 Oct 2017 - evolving: On Twitter: fake followers (till 2012). 1st evolution (2012-2014) current (?) wave (2015-2017).
The rise of novel Twitter social spambots SoBigData day @EUI, Florence, 11-10-2017

Marinella Petrocchi IIT-CNR, Pisa, Italy

SPAMBOTS & SOCIAL NETWORKS AN OPEN PROBLEM Spambots (Semi-)automated accounts with (often) harmful intention Misinformation spreading, steal of personal data, manipulation of stock market, infiltration in political discourse

spambot

THE RISE OF THE SOCIAL BOTS They escape detection techniques, by evolving: On Twitter: fake followers (till 2012) 1st evolution (2012-2014) current (?) wave (2015-2017) New spambots are almost indistinguishable from genuine accounts E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise of social bots,” Communica)ons of the ACM, vol. 59, no. 7, pp. 96–104, 2016

FAKE FOLLOWERS

NAIVE FAKE ACCOUNTS WERE EASY TO BUY

The new wave

SOCIAL SPAMBOTS

SOCIAL SPAMBOTS

Undistinguishable from genuine accounts if analyzed one-by-one

Analysis of the online behavior of large groups of users, with the goal of detecting possible spambots among them

The idea

MODELING THE ONLINE BEHAVIOR OF USERS Behaviour Sequence of actions performed by an account Digital DNA Each type of action is associated to a character (e.g., A, B, C) The online behaviour of an account is modeled as a sequence of characters (i.e., a string, similarly to biologic DNA) according to the sequence of actions performed by that account

The idea

MODELING THE ONLINE BEHAVIOR OF USERS Timeline of a Twitter account

Encoding T tweet, R retweet, P reply

R P R T R R

…RRTRPR

DIGITAL DNA VS BIOLOGIC DNA T R P

tweet, retweet, reply

…RRTRPRTPRRPRTPRPTPRRTRPR …RPRTPTTRPTRPTPRRRRTPPRPP …TTTRRRPPTPRPTPRTRPTRRRTP …PRTRPRTPPPPRTPRRPRTPPRRT …TRTRPRTPRRPRTPRPTPTPPRTT …TRPPRTPPTRPPTPRRTTTPPRPR

A G T C

adenine, guanine, thymine, cytosine

…AGTCTCCATTTTCAGGTCGTA …GTTTAAGATCGCCTCATCACC …AGGCAATTCGCCTGAACTGG …AGTCTCGATCCTTTCCTCGTT …AAAATCGAACGCCTTGTCGG …ATTCTCCATCGCCTAAACAAC

Spambots characterization

SIMILARITY BETWEEN DIGITAL DNA SEQUENCES Intuition Automated accounts (spambots) have similar DNA sequences LCS (longest common substring) Longest substring between N sequences of digital DNA …TRRRPRRTRRPRTPRPTPRRTRPR …RPRTPTTRRRPRRTPRRRRTPPRP …TTTRRRPRRRPRRTRTRPTRRRTP …PRTRPRTPPPPRTPRRRRRPRRTR

RRRPRRT (length: 7 characters)

M. Arnold and E. Ohlebusch, “Linear Lme algorithms for generalizaLons of the longest common substring problem,” Algorithmica, vol. 60, no. 4, pp. 806–818, 2011

Spambots characterization

LCS: SPAMBOTS VS HUMANS

LCS: similarity measure

Spambots detection

LCS: SPAMBOTS + HUMANS (MIXED GROUP) 1. accounts with high similarity 2. steep decrease in similarity 3. accounts with low similarity

Spambots detection

DETECTION TECHNIQUES Unsupervised approach

Spambots detection

DETECTION TECHNIQUES

2. Supervised approach

Spambots detection

DATASETS

Evaluation datasets: 1.  Mixed1 (1982 accounts): 50% Bot1, 50% human 2.  Mixed2 (928 accounts): 50% Bot2, 50% human

Spambots detection

EVALUATION

C. Yang, R. Harkreader, and G. Gu, “Empirical evaluaLon and new design for fighLng evolving TwiVer spammers,” IEEE Transac)ons on Informa)on Forensics and Security, vol. 8, no. 8, pp. 1280–1293, 2013 Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang, “TwiVer spammer detecLon using data stream clustering,” Informa)on Sciences, vol. 260, pp. 64– 73, 2014

F. Ahmed, and M. Abulaish, “A generic staLsLcal approach for spam detecLon in online social networks,” Computer Communica)ons, vol. 36, no. 10, pp. 1120–1129, 2013

TAKE-HOME MESSAGES •  New evolutionary wave: social spambots •  Current techniques fail in detecting them •  Detection via digital DNA analysis: effective and efficient (lightweight features – no graphs – linear complexity algorithms) Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “The Paradigm Shi? of social spambots: Evidence, theories, and tools for the arms race”, WWW 2017 Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “Social Fingerprin)ng: Detec)on of spambots groups thorugh DNA inspired behavioral modeling” IEEE TransacLons on Dependable and Secure CompuLng, 2017 Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi: “ExploiLng digital DNA for the analysis of similariLes in TwiVer behaviours” IEEE Data Science and AnalyLcs, 2017

Questions?

THANK YOU!

Marinella Petrocchi [email protected] http://mib.projects.iit.cnr.it/dataset.html