Processing Flows of Information: From Data Stream to Complex Event Processing GIANPAOLO CUGOLA and ALESSANDRO MARGARA Dip. di Elettronica e Informazione Politecnico di Milano, Italy
A large number of distributed applications requires continuous and timely processing of information as it flows from the periphery to the center of the system. Examples are intrusion detection systems, which analyze network traffic in real-time to identify possible attacks; environmental monitoring applications, which process raw data coming from sensor networks to identify critical situations; or applications performing on-line analysis of stock prices to identify trends and forecast future values. Traditional DBMSs, which need to store and index data before processing it, can hardly fulfill the requirements of timeliness coming from such domains. Accordingly, during the last decade different research communities developed a number of tools, which we collectively call Information Flow Processing (IFP) Systems, to support these scenarios. They differ in their system architecture, data model, rule model, and rule language. In this paper we survey these systems to help researchers, often coming from different backgrounds, in understanding how the various approaches they adopt may complement each other. In particular, we propose a general, unifying model to capture the different aspects of an IFP system and use it to provide a complete and precise classification of the systems and mechanisms proposed so far. Categories and Subject Descriptors: H.4 [Information Systems Applications]: Miscellaneous; I.5 [Pattern Recognition]: Miscellaneous; H.2.4 [Database Management]: Systems—Query Processing; A.1 [General]: Introductory and Survey General Terms: Design, Documentation Additional Key Words and Phrases: Complex Event Processing, Publish-Subscribe, Stream Processing
An increasing number of distributed applications requires processing continuously flowing data from geographically distributed sources at unpredictable rate to obtain timely responses to complex queries. Examples of such applications come from the most disparate fields: from wireless sensor networks to financial tickers, from traffic management to click-stream inspection. In the following we collectively refer to these applications as the Information Flow Processing (IFP) domain. Likewise we call Information Flow Processing (IFP) engine a tool capable of timely processing large amount of information as it flows from the peripheral to the center of the system. The concepts of “timeliness” and “flow processing” are crucial to justify the need of a new class of systems. Indeed, traditional DBMSs: (i) require data to be (persistently) stored and indexed before it could be processed, and (ii) process data only when explicitly asked by the users, i.e., asynchronously with respect to its arrival. Both aspects contrast with the requirements of IFP applications. As an example, consider the need of detecting fire in a building by using temperature ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 1–70.
G. Cugola and A. Margara
and smoke sensors. On the one hand, a fire alert has to be notified as soon as the relevant data becomes available. On the other, there is no need to store sensor readings if they are not relevant for fire detection, while the relevant data can be discarded as soon as the fire is detected, since all the information they carry, like the area where the fire occurred, if relevant for the application, should be part of the fire alert. These requirements led to the development of a number of systems specifically designed to process information as a flow (or a set of flows) according to a set of pre-deployed processing rules. Despite having a common goal, these systems differ in a wide range of aspects, including architecture, data models, rule languages, and processing mechanisms. In part, this is due to the fact that they were the result of the research ef