TH`ESE - Institut de Recherche en Informatique de Toulouse

1 downloads 187 Views 2MB Size Report
dures of certification, and software errors in the sectors where these systems are ...... by auto. This function sums al
` THESE En vue de l’obtention du

´ DE TOULOUSE DOCTORAT DE L’UNIVERSITE D´ elivr´ e par : l’Universit´e Toulouse 3 Paul Sabatier (UT3 Paul Sabatier)

Pr´ esent´ ee et soutenue le 10/12/2014 par :

Nadezhda BAKLANOVA Formally Verified Analysis of Resource Sharing Conflicts in Multithreaded Java

Louis FERAUD Stephan MERZ Jean-Paul RIGAULT Martin STRECKER

JURY Universit´e de Toulouse INRIA Nancy - LORIA Universit´e de Nice Universit´e de Toulouse

Directeur de Th`ese Rapporteur Rapporteur Directeur de Th`ese

´ Ecole doctorale et sp´ ecialit´ e: MITT : Domaine STIC : Suret´e de logiciel et calcul de haute performance Unit´ e de Recherche : IRIT (UMR 5505) Directeur(s) de Th` ese : ´ Louis FERAUD et Martin STRECKER Rapporteurs : Stephan MERZ et Jean-Paul RIGAULT

Acknowledgements This thesis has been written due to a lot of efforts of many people. I thank the members of the defense committee Jean-Paul Rigault and Stephan Merz that they have accepted to be the reviewers. Thanks to Jean-Paul Rigault who played the double role: as the reviewer and as the President of the defence Committee. Many thanks to Stephan Merz for his valuable comments in the whole text of the thesis. My advisors are the people who I would like to thank the most. Thanks to them this thesis has been started, and they have put a lot of efforts in order that it would be finished. I thank Louis F´eraud and Martin Strecker for all their time and support. Many thanks to Martin for the time he has spent in explaining me the basics of the life in France, for his help in all administrational activities and for all our scientific discussions. I thank all members of the ACADIE and MACAO teams for their live interest and support. I would like to thank my past and present office neighbors: Selma, Elie, Iulia, Wilmer for the advice and discussions we have had. Finally, I thank my parents for their support during these years.

Abstract Multithreaded real-time systems become widespread nowadays. Correctness is critical for real-time applications but it is difficult to ensure by usual methods like testing. Formal verification helps to find possible errors. One kind of errors are resource sharing conflicts which lead to data corruption. A common solution is exclusive locking which can lead to unpredictable delays in execution or even deadlocks in the worst case. Program verification is often done by model checking. A popular model checking formalism for real-time programs are timed automata. It allows to verify certain timing properties in a model of a program and to find a sequence of actions which lead to an error. There exist effective verification algorithms for timed automata which are implemented in widely used model checking tools. We have developed a tool for static analysis of multithreaded Java programs which finds possible resource sharing conflicts. Java programs are annotated with timing information, and a model of the program is built based on the annotations. The model is a system of timed automata which is verified by the Uppaal model checker, and possible resource sharing conflicts are found. A case study has been developed to illustrate the approach. The analysis is complete: whenever a resource sharing conflict occurs in a Java program, it is detected by the our analysis. The abstract model may also output “false positive” warnings which do not correspond to a reachable configuration in the source Java program. In order to make sure that the abstraction of Java programs to timed automata is correct, we have formalized the translation is the Isabelle proof assistant. We have proved that the translation preserve correspondence between a program and its model. For this, we have developed a formal semantics both of multithreaded Java with annotations and of timed automata. The proofs show that the model simulates the behavior of the source Java program. This means that each semantic step made in the Java code has a corresponding sequence of steps in the model which has the same effect on the state, i.e. variable values, time or locked objects. The verified code for translation is generated from the formalized translation using the Isabelle code generator. Then our tool uses the verified code to generate a model of a Java program.

R´ esum´ e Les syst`emes multi-tˆ aches temps-r´eels deviennent de plus en plus r´epandus de nos jours. La correction des syst`emes multi-tˆ aches est difficile `a assurer, pourtant, la correction est critique pour les logiciels temps-r´eels. La v´erification formelle aide `a trouver les erreurs potentielles. Les conflits de partage de ressources qui peuvent produire une incoh´erence des donn´ees sont une sorte d’erreurs. Une solution est le verrouillage exclusif des ressources partag´ees qui peut mener ` a des temps d’execution difficile ` a pr´edire, ou `a l’interblocage dans le pire cas. La v´erification des programmes est souvent effectu´ee par model checking. Un formalisme r´epandu de model checking des programmes temps-r´e´els sont les automates temporis´es. Ils permettent de v´erifier certaines propri´et´es temporelles dans le mod`ele du programme et de trouver la s´equence d’actions qui m`enent ` a l’erreur. Il existe des algorithmes de v´erification effectives pour des automates temporis´es qui sont r´ealis´es dans des outils de model checking largement utilis´es. Nous avons d´evelopp´e un outil pour l’analyse statique de programmes Java multi-tˆaches qui trouve des conflits de partage de ressources possibles. Les programmes Java sont annot´es avec des informations temporelles, et le mod`ele du programme est construit, en se basant sur les annotations. Le mod`ele est un syst`eme d’automates temporis´es qui est v´erifi´e par le model checker Uppaal. Des conflits de partage de ressources possibles sont trouv´es par la v´erification. Nous avons d´evelopp´e une ´etude de cas pour illustrer cette approche. L’analyse est compl`ete: lorsqu’un conflit de partage de ressources peut apparaˆıtre dans un programme Java, il est d´etect´e par notre analyse. Le mod`ele abstrait peut aussi sortir des alertes “faux positifs” qui ne correspondent pas `a une configuration accessible dans le programme Java original. Pour s’assurer que la traduction des programmes Java vers des automates temporis´es est correcte, nous avons formalis´e la traduction dans l’assistant de preuves Isabelle. Nous avons prouv´e que la traduction pr´eserve la correspondance entre le programme et son mod`ele. Pour cela, nous avons d´evelopp´e une s´emantique formelle de Java multi-tˆache avec des annotations et des automates temporis´es. Les preuves montrent que le mod`ele simule le comportement du programme Java original. Cela veut dire que chaque pas de la s´emantique du code Java a une s´equence de pas correspondante dans le mod`ele qui a le mˆeme effet sur l’´etat, c.`a.d. des valeurs des variables, temps ou objets verrouill´es. Le code v´erifi´e de traduction est g´en´er´e `a partir de la traduction formalis´ee en utilisant le g´en´erateur de code d’Isabelle. Puis notre outil utilise le code v´erifi´e pour g´en´erer le mod`ele d’un programme Java.

Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . 1.2 Contributions . . . . . . . . . . . . . 1.2.1 Methodological contributions 1.2.2 Technical Contributions . . . 1.3 Structure of the Thesis . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1 1 2 2 2 3

2 Foundations 2.1 Introduction . . . . . . . . . . . . . . . . . . 2.2 Concurrent Systems . . . . . . . . . . . . . 2.2.1 Real-time Systems . . . . . . . . . . 2.2.2 Time-triggered Architectures . . . . 2.2.3 Synchronous Languages . . . . . . . 2.2.4 Scheduling in Real-time Systems . . 2.2.5 Real-time System Implementations . 2.3 Synchronization Mechanisms . . . . . . . . 2.3.1 Locking Primitives . . . . . . . . . . 2.3.2 Message-based Synchronization . . . 2.3.3 Lock-free and Wait-free Algorithms . 2.4 Formal Verification . . . . . . . . . . . . . . 2.4.1 Model Checking . . . . . . . . . . . 2.4.2 Isabelle Proof Assistant . . . . . . . 2.4.3 Formal Execution Models . . . . . . 2.5 Conclusions . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

5 5 6 6 7 7 8 11 14 14 15 16 16 16 24 29 30

3 Abstracting from Java to Timed 3.1 Timing Annotations . . . . . . 3.2 Verification . . . . . . . . . . . 3.2.1 Parser . . . . . . . . . . 3.2.2 Preprocessing . . . . . . 3.2.3 Syntax checks . . . . . . 3.2.4 Translation . . . . . . . 3.2.5 Model Checking . . . . 3.2.6 Discussion . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

31 31 33 34 34 34 35 35 36

. . . . .

. . . . .

. . . . .

Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

ii

Contents 3.3 3.4

Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Formal Model 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Java Semantics . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Java Statements . . . . . . . . . . . . . . . . . . . . 4.2.2 Java State . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Single-threaded Semantics . . . . . . . . . . . . . . . 4.2.4 Multithreaded Semantics . . . . . . . . . . . . . . . 4.3 Abstract Java Semantics . . . . . . . . . . . . . . . . . . . . 4.3.1 Single-threaded Semantics . . . . . . . . . . . . . . . 4.3.2 Multithreaded Semantics . . . . . . . . . . . . . . . 4.4 Timed Automata . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Timed Automata Syntax . . . . . . . . . . . . . . . 4.4.2 Timed Automata Semantics . . . . . . . . . . . . . . 4.5 Translation from Java to Timed Automata . . . . . . . . . . 4.5.1 Translation from Java to Abstract Java . . . . . . . 4.5.2 Translation from Abstract Java to Timed Automata 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

37 39 41 41 42 42 43 45 51 53 54 57 60 60 62 64 64 66 83

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

5 Formal Correctness of Abstraction 5.1 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Correctness Proof of the Translation from Java to Abstract Java . . . . . . . 5.2.1 Evaluation Step Simulation . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Platform Step Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Combination of Evaluation and Platform Steps . . . . . . . . . . . . . 5.2.4 Scheduling Step Simulation . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Full Step Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Correctness Proof of the Translation from Abstract Java to Timed Automata 5.3.1 Automata Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Simulation Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Scheduler Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

85 . 85 . 86 . 86 . 88 . 88 . 90 . 91 . 91 . 93 . 96 . 99 . 100 . 101

6 Conclusions 103 6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7 Version franc ¸aise de la th` ese 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Motivation . . . . . . . . . . . . . . . . . . . 7.1.2 Contributions . . . . . . . . . . . . . . . . . . 7.2 Etat de l’Art . . . . . . . . . . . . . . . . . . . . . . 7.3 Abstraction de Java vers des Automates Temporis´es 7.3.1 Annotations temporelles . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

107 107 107 108 110 110 111

Contents

7.4

7.5 7.6

7.3.2 V´erification . . . . . . . . . . Mod`ele Formel . . . . . . . . . . . . 7.4.1 S´emantique formelle . . . . . 7.4.2 Traduction . . . . . . . . . . Correction Formelle de l’Abstraction Conclusion . . . . . . . . . . . . . . 7.6.1 R´esultats . . . . . . . . . . . 7.6.2 Perspectives . . . . . . . . . .

iii . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

111 112 112 113 114 116 116 117

Glossary

119

Bibliography

121

iv

Contents

List of Figures 2.1 2.2 2.3 2.4 2.5 2.6

Priority inversion. . . . . . . . . . . . . . Chained blocking. . . . . . . . . . . . . . . Deadlock caused by priority inheritance. . Deadlock example. . . . . . . . . . . . . . An example of a timed automaton . . . . The same proof written in apply-style and

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

9 10 10 15 20 25

3.1 3.2 3.3 3.4 3.5 3.6 3.7

Examples of annotated statements. . . . . . . . . . . . . . . . . . . . . . . Execution timeline in multi- and monoprocessor models. . . . . . . . . . . Program transformation process implemented in the informal prototype. . Program transformation process implemented in the formally verified tool. Producer and consumer threads. . . . . . . . . . . . . . . . . . . . . . . . Execution flow leading to resource sharing conflict. . . . . . . . . . . . . . The generated TA model. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

31 32 33 33 37 38 38

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14

Sequence of substeps in a full semantic step. . . . . . . . . . . . . . . . . . . . . . . . The types of Java expressions and statements. . . . . . . . . . . . . . . . . . . . . . . Datatypes for local state and global state. . . . . . . . . . . . . . . . . . . . . . . . . General form of reduction rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Java evaluation semantics. Control structures. . . . . . . . . . . . . . . . . . . . . . . Java evaluation semantics. Work with locks. . . . . . . . . . . . . . . . . . . . . . . . Java evaluation semantics. Work with time. . . . . . . . . . . . . . . . . . . . . . . . Java platform semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Workaround for WCET computation for conditions. . . . . . . . . . . . . . . . . . . Scheduler semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scheduler platform semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scheduler full step semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functions extracting a program statement and a local state from a full state. . . . . Function injecting a program statement and a local state into corresponding components of full state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Full step single-threaded semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract Java statements and state . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract Java evaluation semantics. Control structures. . . . . . . . . . . . . . . . . Abstract Java evaluation semantics. Work with locks. . . . . . . . . . . . . . . . . .

41 42 44 45 46 47 48 50 50 52 52 52 53

4.15 4.16 4.17 4.18

v

. . . . . . . . . . . . . . . . . . . . . . . . . in Isar.

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

53 53 54 55 56

vi

List of Figures 4.19 4.20 4.21 4.22 4.23 4.24

4.44 4.45 4.46 4.47 4.48 4.49 4.50 4.51 4.52 4.53 4.54 4.55 4.56 4.57 4.58 4.59

Abstract Java evaluation semantics. Work with time. . . . . . . . . . . . . . . . . . . Abstract Java platform semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract Java scheduler semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract Java scheduler platform semantics. . . . . . . . . . . . . . . . . . . . . . . . Scheduler full step semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functions extracting an AJ program statement and local state from an AJ global state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function injecting AJ local components to a global AJ state. . . . . . . . . . . . . . Single-threaded full step AJ semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . Timed automata structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Available variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timed automata actions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling urgent nodes with clock constraints. . . . . . . . . . . . . . . . . . . . . . . Timed automata state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timed automata semantic rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timed automata with committed nodes semantic rules. . . . . . . . . . . . . . . . . Timed automata with committed nodes multiple step semantic rules. . . . . . . . . . Translation function from Java to Abstract Java. . . . . . . . . . . . . . . . . . . . . Translation function for states from Java to Abstract Java. . . . . . . . . . . . . . . Translation of full states from Java to AJ. . . . . . . . . . . . . . . . . . . . . . . . . Schematic automata for AJ statements. . . . . . . . . . . . . . . . . . . . . . . . . . Possible situation in case of complex condition statements. . . . . . . . . . . . . . . . Successor statements computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peeling function for statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The execution flow and the corresponding abstract machine for an example program The execution flow and the corresponding abstract machine for an example program with subexpression identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural snippet for measuring exact time interval. . . . . . . . . . . . . . . . . . . AJEmpty automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJAssign n automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJCompos n e1 e2 automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJCond n e1 e2 automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJLoop n e2 automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJAnnot n t automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJInAnnot n t automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJSleep n t automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJInSleep n t automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJSync n obj e1 automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJInSync n obj e1 automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AJDVio automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scheduler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation of local AJ state to TA state. . . . . . . . . . . . . . . . . . . . . . . . . Translation of global AJ state to TA state. . . . . . . . . . . . . . . . . . . . . . . .

72 72 73 73 73 74 75 75 76 76 77 77 78 79 79 81 82

5.1 5.2

Simulation diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 85

4.25 4.26 4.27 4.28 4.29 4.30 4.31 4.32 4.33 4.34 4.35 4.36 4.37 4.38 4.39 4.40 4.41 4.42 4.43

57 58 59 59 59 59 59 60 60 61 61 62 62 63 63 64 65 66 66 67 69 70 70 71

List of Figures 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 7.1 7.2 7.3 7.4 7.5 7.6 7.7

encA for different semantic step depth. . . . . . . . . . . . . . . . . . Simulation proof between the Java and AJ evaluation steps . . . . . Combination of the two simulation proofs between the Java and AJ mantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-threaded Java simulation proof assumptions. . . . . . . . . . . Simulation proof between the Java and AJ full single-threaded steps Combination of the two simulation proofs for the scheduling step. . . Combination of the simulation proofs for the full multithreaded step. Simulation proof idea. . . . . . . . . . . . . . . . . . . . . . . . . . . Weak equality relation. . . . . . . . . . . . . . . . . . . . . . . . . . . Evolution of isEligible status in AJ and TA. . . . . . . . . . . . . . An example of how automata embedding works. . . . . . . . . . . .

vii . . . . . . . . . . . . . . . . . . local-state se. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Examples of annotated statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Processus de transformation du programme r´ealis´e dans le prototype informel. . . . Processus de transformation du programme r´ealis´e dans l’outil formellement v´erifi´e. S´equence des sous-´etapes dans une ´etape s´emantique enti`ere. . . . . . . . . . . . . . Automates sch´ematique pour les d´eclarations AJ. . . . . . . . . . . . . . . . . . . . . Diagramme de simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combinaison de deux preuves de simulation entre Java et AJ s´emantiques des ´etats locaux. jT oAJ est la fonction de traduction de Java vers AJ pour les ´etats locaux. . 7.8 Combinaison de deux preuves de simulation pour l’´etape multithread. f sT oAJ est la fonction de traduction de Java vers AJ pour les ´etats complets. . . . . . . . . . . . 7.9 Id´ee de la preuve de simulation. ajStateT oT A est la fonction de traduction pour les ´etats AJ vers les ´etats TA, ref est la fonction de r´ef´erence. . . . . . . . . . . . . . .

86 88 89 89 89 90 91 92 92 93 95 111 112 112 113 114 114 115 115 116

viii

List of Figures

Chapter 1

Introduction 1.1

Motivation

Multithreading became a mainstream in user application programming in the 2000s, and nowadays it comes to real-time application programming. A lot of real-time applications are also safetycritical, i.e. their erroneous behavior can have catastrophic consequences. Therefore, the requirements for such applications are much stricter than those for usual user applications. Moreover, real-time applications must often be certified and verified. Multithreaded programming has additional potential pitfalls which do not exist in monoprocessor programs, in particular, management of simultaneous access to resources. Shared resources can be accessed by several processes or threads which can change the resource at the same time. In this case, data consistency can be broken. Therefore, only one process must be authorized to change a shared resource at each moment of time. One of the most common solutions implemented in a lot of programming languages are critical sections which must be executed sequentially. Critical sections guarantee that a process finishes all its work with the resource before another process starts working with the resource. If a process has to wait in order to access a critical section because another process executes its critical section, a resource sharing conflict occurs. Usage of critical sections can be harmful for real-time applications because deadlines can be missed while a process waits in the queue for entering a critical section. Even ranking processes by their priorities does not fully solve this problem due to priority inversion (see section 2.2.4). Additional testing is always required which is complex and cannot check all necessary situations. Resource sharing conflicts can lead to deadlocks which are one of the most severe errors in multithreaded applications. A deadlocked program hangs and cannot be fixed by other means except restarting. The real reasons of incorrect program behavior are difficult to discover, and some automated tool would be desired. In this thesis we propose a static analysis method and a tool for finding potential resource sharing conflicts for programs written in Java. However, it can be adapted to any imperative language which supports explicit synchronization sections. The tool discovers possible resource sharing conflicts for multithreaded Java programs by building a model of program execution. Model checking is a widely used verification technique, especially popular in real-time systems verification. We use timed automata for building a model of a program by an abstraction from 1

2

Chapter 1: Introduction

existing Java code and mapping it to timed automata. The model is later verified with a model checker. As a modeling tool we used the model checker Uppaal. The translation from Java to timed automata itself must be correct and, thus, verified. We want to ensure that both a program and its model have similar behavior. Similar behavior means that each semantic step on the Java level has a corresponding sequence of steps on the timed automata level, and the resulting changes of these steps on both levels have the same effect. The verification is done with a proof assistant which helps to formalize execution semantics of both systems and then is used to establish a correspondance between the two semantics and for verification of correctness. With formal proofs we can claim that the model of a Java program models correctly the necessary behavior. The core of our tool is verified with the Isabelle proof assistant. Verification ensures correctness of modeling.

1.2 1.2.1

Contributions1 Methodological contributions

In the last years formal developments have become more widespread, however, they concern mostly formalization of mathematics and verification of specific algorithms, i.e. things which have already been expressed with a formal language. There are numerous tools that implement static analysis and verification of programs, however, only a few of them are formally verified. Verification tools are obliged to be correct, otherwise their results cannot be considered as relevant. Verification tools and prototypes presented in the computer science community usually cannot be considered as properly tested tools, therefore, the problem of verification correctness persists. This thesis is one of the first contributions which performs analysis of programs based on the formal semantics of programming languages and provides necessary foundations like TA semantics. From this point of view the thesis has a scientific interest and is a technical novelty such as a formal model of multithreaded semantics of Java which takes time into account.

1.2.2

Technical Contributions

On the technical level, in this thesis we have achieved the following: • created a prototype tool, where experimented with translation rules and modeling, figured out syntactic requirements for correct translation, • developed a formal semantics of multithreaded Java taking into account the timing aspect using the Isabelle proof assistant, • developed a formal semantics of timed automata using the Isabelle proof assistant, • introduced an intermediate semantic layer called Abstract Java and developed its semantics, • converted the informal translation to the formally expressed one, • formally proved correctness of the translation, 1 This

thesis was partially supported by the VeriSync project (ANR-10-BLAN-0310).

1.3. Structure of the Thesis

3

• extracted formally verified executable code for the translation and syntax checks, • injected the verified translation core into unverified setup operations such as parsing or pretty printing of timed automata. The results have been presented in the following publications: 1. Nadezhda Baklanova, Wilmer Ricciotti, Jan-Georg Smaus, and Martin Strecker. “Abstracting an operational semantics to finite automata”. In: CoRR abs/1409.7841 (2014) 2. Nadezhda Baklanova and Martin Strecker. “A Formal Model of Resource Sharing Conflicts in Multithreaded Java”. In: Proceedings of the 9th International Conference on ICT in Education, Research and Industrial Applications: Integration, Harmonization and Knowledge Transfer, Kherson, Ukraine, June 19-22, 2013. Ed. by Vadim Ermolayev et al. Vol. 1000. CEUR Workshop Proceedings. CEUR-WS.org, 2013, pp. 550–564 3. Nadezhda Baklanova and Martin Strecker. “Abstraction and Verification of Properties of a Real-Time Java”. In: ICT in Education, Research, and Industrial Applications. Ed. by Vadim Ermolayev, Heinrich C. Mayr, Mykola Nikitchenko, Aleksander Spivakovsky, and Grygoriy Zholtkevych. Vol. 347. Communications in Computer and Information Science. Springer Berlin Heidelberg, 2013, pp. 1–18 4. Nadezhda Baklanova, Martin Strecker, and Louis F´eraud. Resource Sharing Conflicts Checking in Multithreaded Java Programs. Tech. rep. Presentation at Journ´ees FAC’2012. Universit´e de Toulouse, IRIT, 2012

1.3

Structure of the Thesis

The thesis contains four chapters followed by conclusions. Chapter 2 presents the main problems in multithreaded programming and their possible solutions, basics of model checking, in particular, timed automata, and basics of the Isabelle proof assistant. Chapter 3 contains an informal description of the program transformation we propose and explains reasons why some restrictions to source programs were introduced. In chapter 4 the formal models are described. It contains formal semantics of three semantic layers we use in the formalization and the formal translation procedure for Java programs. Chapter 5 contains a general description of the proofs made for verification of the translation. We explain the meaning of the made assumptions and formal constructions used in the proofs. The proofs are available online at www.irit.fr/˜Nadezhda.Baklanova/.

4

Chapter 1: Introduction

Chapter 2

Foundations 2.1

Introduction

Distributed and parallel systems become more and more popular in all areas of software development. Real-time and safety-critical systems are now often developed with multithreading support. However, multithreaded programming is often error-prone, and the programs are difficult to debug. It is a sad fact for real-time and safety-critical systems since they must often pass complex procedures of certification, and software errors in the sectors where these systems are deployed (nuclear power plants, air traffic) often have catastrophic consequences. Formal methods come to help with verification of program correctness. Multiple threads can access and update the same resource so the problem of data consistency comes to play. There are several approaches to the resource sharing problem, one of them is to mark regions of code where a thread needs an exclusive access to the resource. Severe consequences of such an approach like deadlock or deadline miss should be completely excluded, and formal verification of the program is desired. A lot of research is carried out in the domain of formal verification of multithreaded programs. A very popular approach is model checking when an abstract model of a program is built and then formally verified. It is often used together with specially designed runtime environments which facilitate modeling and verification. This chapter gives an overview of concurrent systems design, existing specifications and architectures of parallel systems. It describes a time-based approach to design and verification of parallel systems. This general idea is used further in our development for verification of Java programs. Then, possible mechanisms for preserving data consistency in multithreaded programs are considered. A number of methods such as exlusive locking or message-based communication, their advantages and disadvantages are discussed. Later, we give some basics of model checking, in particular, timed automata, which we use for modeling of Java program behavior. Also, we give basics of the Isabelle proof assistant which will be widely used in our formal development. Apart from that we consider formalizations of execution environments of C and Java where we take some ideas for the formal development. 5

6

Chapter 2: Foundations

2.2 2.2.1

Concurrent Systems Real-time Systems

Real-time systems can be met everywhere in the real world. Often they control critical systems, therefore the verification of correctness of these systems is necessary [24]. Examples of real-time systems are plant control systems, flight and other transport control systems, telecommunication, military systems and many others. Real-time systems must react within a known time interval to the external events coming from the real world. The main property required for real-time systems is predictability. Real-time tasks have deadlines meaning the maximum time when the task execution should be finished. The result produced after the missed deadline is often wrong or useless. Depending on the consequences of a missed deadline two classes of real-time tasks are distinguished: • hard real-time tasks: a missed deadline may cause catastrophic consequences on the environment under control, • soft real-time tasks: a missed deadline may cause performance cutback but does not break correct system behavior. Usually, real-time systems include both soft and hard tasks, so they must handle both. Real-time systems normally support multitasking and, consequently, mechanisms for process communication and synchronization. Also, multitasking often requires priority-based scheduling since different tasks may have different priorities. External interrupts should have higher priorities than the process priorities but it may cause deadline misses for basic processes. The clock in a real-time system should also support real-time activities and be very precise. Real-time systems supporting critical applications must satisfy some important basic properties: • timeliness —results have to be correct not only in their value but also in the time domain, • design for peak load —real-time systems must not collapse when they are peak-loaded, • predictability —the system must be able to predict the consequences of any scheduling decisions; if some task cannot be guaranteed within their time constraints, the system must notify in advance, so that alternative actions can be planned to handle the event, • fault tolerance —single hardware and software failures should not cause the system to crash, • maintainability —the architecture of a real-time system should be designed to be modular in order to make future modifications possible. System predictability is achieved by properly designing all the components of the system. The processor is the main component that affects predictability. Its internal characteristics such as instruction prefetch, pipelining, cache memory and direct memory access mechanism must be designed so that the internal state of the processor could be predicted at each moment of time, even in the middle of task execution. Scheduling, synchronization and communication mechanisms, memory management, interrupt handling also need to be predictable. System predictability makes it possible to build models based on timing information since duration of particular actions is known exactly, i.e. it does not vary depending on cache state, scheduler fluctuations etc. Time-based verification of real-time systems is a wide domain of research which includes time-based architecture design and modeling.

2.2. Concurrent Systems

2.2.2

7

Time-triggered Architectures

Large distributed systems require additional effort to ensure the correctness of computed results and the synchronization between computing nodes. Due to the CAP theorem (Brewer’s theorem) [21] only two of the three base properties of distributed systems: Consistency, Availability and Partition tolerance — can be satisfied. An architect of a distributed system can select only two properties from the three to be guaranteed in the system. Consistency is harder to guarantee. One of the suggested solutions are time-triggered architectures described in [55, 42]. A system consists of several nodes which can send messages to each other and do some computations. The idea is that each computational node has a predefined time interval of activity when it can communicate with other nodes. During the intervals of silence no global significant events are registered. Such a system is deterministic and therefore can be real-time simulated and tested. An example of such a system is Giotto [42, 43]. Giotto is a time-triggered programming model for embedded control systems with real-time constraints. Giotto programs have periodic tasks combined into modes. Tasks communicate with each other and with external sensors providing synchronous communication. Periods of tasks are specified by a programmer, message sending happens instantaneously. Restrictions on time of the task activity simplify the schedulability analysis and verification which is often critical for real-time systems. Such an architecture provides a consistent system and is relatively easy to verify. A formal model of the system is described in [44]. In our work we use an approach somewhat inspired by the idea of time-based verification. We utilize the main idea of system events that happen in predefined time moments, although in our approach execution time is assigned to pieces of code.

2.2.3

Synchronous Languages

Synchronous programming languages are optimized for programming reactive systems. Reactive systems continuously interact with the physical environment when the environment cannot synchronize logically with the system [39]. Response times of the system must meet requirements induced by the environment. Other classes of systems are transformational systems where data is available at the beginning, and the results must be provided upon termination, and interactive systems which interact continuously with environments that possess synchronization capabilities, e.g. operating systems. Reactive systems distinguish themselves from the other types of systems. Usually reactive systems must possess parallelism since the interaction with the environment occurs in parallel jobs, however, they are not necessary multiprocessor systems, that means, parallelism can be emulated by the system single procesor. Anyway, time constraints induced by the environment must be satisfied for all processes since those systems are mostly safety-critical like automation control on power plants or aircraft controllers, and therefore they must be satisfied for all processes. A number of synchronous languages have been developed since the 80s like Esterel [17], Lustre [39], Lucid Synchrone [27], Polychrony [37] or, more recently, Prelude [36]. These languages adopt different programming approaches but all of them use the idea of synchrony. In most synchronous languages computational units receive infinite data flows updated with a predefined period which can be different for different flows. In the Lustre language [39] each variable or expression denotes a flow, i.e. a possibly infinite sequence of values of a given type and a clock representing a sequence of times. Variables not assigned to inputs should be given an explicit single value. Expressions do not have side effects, therefore, variables can be assigned in any order. Lustre has usual arithmetic and boolean operators which can be applied to the elements of flows

8

Chapter 2: Foundations

and a set of temporal operators. A Lustre program represents a network of operators. A network can be encapsulated as a new reusable operator called a node. A node is an interface specification (input and output parameters), internal variable declarations and a body made of equations and assertions defining outputs and internal variables based on inputs. Synchronous languages are widely used for programming embedded systems [16]. They cooperate with model-driven engineering. From the other side, researchers try to include synchronous technology into commonly used languages like C. Unfortunately, the data types supported by synchronous languages are only primitive data types like integer or boolean variables. Lucid synchrone [27] has inductive data types, however, full support of complex data structures in most synchronous languages is still lacking. It makes impossible to implement many algorithms based on data structures, which, however, are desired.

2.2.4

Scheduling in Real-time Systems

When a single processor has to execute a set of concurrent tasks, the CPU time has to be assigned to tasks according to predefined rules called scheduling policy [24]. An operation of allocating the CPU to a task by the scheduling algorithm is called dispatching. A task that can potentially execute on the processor is called an active task. A task waiting for the processor is a ready task, whereas a task in execution is a running task. All ready tasks waiting for the processor form a queue called ready queue. In many systems with dynamic task activation the running task can be interrupted at any point, so that a more important task can immediately gain the processor. The operation of suspending the running task is called preemption. Given a set of tasks J = {J1 , ...Jn } a schedule is an assignment of tasks to the processor so that each task is executed until completion. When a new task starts executing, the processor performs a context switch. A preemptive schedule is a schedule where the running task can be arbitrarily suspended at any time to assign the CPU to another task according to the scheduling policy. A schedule is said to be feasible if all tasks can be completed according to a set of specified constraints. A set of tasks is said to be schedulable if there exists at least one algorithm that can produce a feasible schedule. Task constraints are of three types: timing constraints, precedence relations and mutual exclusion constraints on shared resources. Real-time systems usually have strict timing constraints to be satisfied. Task deadlines are timing constraints which represent time before which the task execution should be completed. Depending on the rules of task activation, tasks are divided into periodic and aperiodic tasks. Periodic tasks consist of an infinite sequence of identical activities called jobs that are regularly activated at a constant rate. Aperiodic tasks also consist of an infinite sequence of identical jobs, however, their activation is not regular. Aperiodic tasks where consecutive jobs are separated by a minimum interarrival time are called sporadic tasks. In certain applications computational activities cannot be executed in arbitrary order but have to respect some precedence relations. Such precedence relations are usually described in a directed acyclic graph G built at the design stage. Tasks are represented by nodes in this graph, and precedence relations by arrows. A precedence graph G induces a partial order on the task set. If G contains a directed path from node Ja to node Jb , task Ja is called a predecessor of task Jb . If G contains an edge from node Ja to node Jb , task Ja is called an immediate predecessor of Jb . Tasks with no predecessors are called beginning tasks. As all predecessors of task Ja are completed, Ja can start.

2.2. Concurrent Systems

9

Computational tasks can have access to several resources that need to be shared between tasks. A resource may be a data structure, a variable, a file, a set of registers in a peripheral device etc. To maintain data consistency many shared resources do not allow simultaneous access but require mutual exclusion among competing tasks. A piece of code executed under mutual exclusion constraints is called a critical section. To ensure sequential access to mutually exclusive resources, operating systems usually provide a synchronization mechanism that can be used by tasks to create critical sections. A task waiting for a mutually exclusive resource is said to be blocked on that resource. Given a set of tasks, a set of timing constraints associated with each task, a set of processors, a set of resources and, probably, a graph describing precedence relations, scheduling needs to assign processors and resources to tasks in order to complete all tasks under imposed constraints. This problem in general form has been shown to be NP-complete [24]. Scheduling algorithms can be of different types: • preemptive — the running task can be interrupted at any time to assign the processor to another active task, • non-preemptive — a task, once started, is executed until its completion, • static — scheduling decisions are based on fixed parameters assigned to tasks before their activation, • dynamic — scheduling decisions are based on dynamic parameters that may change during execution, • off line — scheduling is executed on the entire task set before actual task activation; the generated schedule is stored in a table and used by a dispatcher, • on line — scheduling decisions are taken at runtime every time a new task enters the system or when a running task terminates, • optimal — the scheduling algorithm minimizes some given cost function defined over the task set, • heuristic — the scheduling algorithm searches for a feasible schedule using an objective function; the schedule is not guaranteed to be optimal. Systems allowing preemption have a potential problem when mutually exclusive resources are used. Consider two tasks J1 and J2 sharing a mutually exclusive resource R (fig. 2.1). To guarantee the mutual exclusion, both tasks must have critical sections for work with this resource. If preemption is allowed, and J1 has a higher priority than J2 , then J1 can be blocked if J2 has entered

J1 J2

Figure 2.1: Priority inversion. Green – active execution, red – execution in critical section, light red – process blocked.

10

Chapter 2: Foundations

the critical section before J1 (see fig. 2.1). J1 preempts J2 , however, it gets blocked on the critical section since J2 is currently executing it. If more than two tasks have a critical section on the same resource, the situation gets even more complex and hardly predictable. The higher priority task may get blocked on critical sections which can be unacceptable for real-time tasks because it can easily lead to deadline miss. A priority inversion is said to occur if a higher priority task waits for the execution of a lower-priority task in a critical section. Several approaches have been developed to deal with the priority inversion problem. A simple way is to forbid preemption during the execution of critical sections. However, this method is appropriate for short critical sections only because it creates unnecessary blocking. Priority inheritance is a simple solution of the problem of priority inversion. The priority of the tasks that cause blocking is modified. When a task blocks one or more higher-priority tasks, it temporary inherits the highest priority of the blocked tasks. It prevents low-priority tasks from preempting higher-priority tasks. However, priority inheritance has some disadvantages: with priority inheritance a chain of blocking can be formed, and it does not prevent deadlocks. Consider three tasks J1 , J2 and J3 with decreasing priorities that share two semaphores Sa and Sb . J1 accesses Sa and Sb , J2 accesses Sb , J3 accesses Sa (fig. 2.2). Suppose during execution J3 locks Sa and is then preempted by J2 . J2 locks Sb and is preempted by J1 . Then priority inheritance works for J3 and J2 so they finish the execution of their critical sections. As a result, J1 is blocked for the duration of two critical sections waiting for J3 and J2 until they free the locked semaphores. It is called chained blocking. In the worst case if J1 accesses n distinct semaphores that have been locked by n lower-priority tasks, J1 will be blocked for the duration of n critical sections. J1 J2 J3

Figure 2.2: Chained blocking. Green – active execution, red – execution in the Sa critical section, orange – execution in the Sb critical section, light red – process blocked. Deadlock appears if two tasks use two semaphores in a nested fashion but in reversed order. Suppose J2 locks semaphore Sb , then J1 preempts J2 before it can lock Sa . J1 locks Sa but gets blocked on Sb (see fig. 2.3). Then J2 resumes and continues the execution at the priority of J1 . When J2 attempts to lock Sa , deadlock occurs. blocked on Sb J1 blocked on Sa J2

Figure 2.3: Deadlock caused by priority inheritance. Green – active execution, red – execution in the Sa critical section, orange – execution in the Sb critical section, light red – process blocked.

2.2. Concurrent Systems

11

In order to solve the two mentioned problems, another mechanism called priority ceiling has been developed. The idea of priority ceiling is to disallow a task to enter a critical sections if there are locked semaphores that could block it. Consequently, once a task enters its first critical section, it can never be blocked by lower-priority tasks until its completion. This rule prevents multiple blocking and deadlocks. In the implementation each semaphore is assigned a priority ceiling equal to the priority of the highest-priority task that can lock it. Then a task is allowed to enter a critical section only if its priority is higher than all priority ceilings of the semaphores currently locked by other tasks. Implementation of different scheduling strategies can dramatically change the behavior of a system, so the selection of the scheduling strategy is usually done on the early stages of architecture design. Modeling a scheduling strategy correctly is an important phase in system modeling. Related Work There is a number of works devoted to schedulability analysis. A lot of them use model checking tools (see section 2.4.1). Timed automata formalism is quite popular in this domain. In [28] the authors use timed automata for extensive search of feasible schedules. The authors of [20] build a system of timed automata from Java bytecode, and then perform schedulability analysis. In [1] the schedulability problem is reduced to finding the shortest path in system modeled as a stopwatch automaton which is an extension of timed automata. Although they are not decidable in general case [22], the authors’ particular model can be successfully verified. Another extension, task automata is presented in [33]. Schedulability analysis is reduced to reachability analysis in a task automaton [35, 34].

2.2.5

Real-time System Implementations

Real-Time Specification for Java The success of the Java platform ([4]) lead to the demand of a real-time extension of Java for use in embedded systems. The desired features to add were ([75]) • fixed priority and round-robin scheduling, • mutual exclusion locking avoiding priority inversion, • inter-thread communication, e.g. semaphores, • user-defined interrupt handlers and device drivers, • timeouts and aborts on running thread. Real-Time Specification for Java (RTSJ) is intended to address the execution of real-time Java programs on single-processor systems. RTSJ does not extend Java syntax and is backwardcompatible with non-real-time Java programs [14]. In RTSJ there are several kinds of heap memory. They are distinguished in order to exclude garbage collection since the real-time garbage collector should still be used carefully. Usage of memory from certain classes gives an application additional information how long an object is going to be alive and how much time it takes to create or destroy it.

12

Chapter 2: Foundations

Time is treated with finer granularity in comparison with usual Java. A concept of relative time which can be measured with nanoseconds precision has been introduced. Absolute time is measured as time relative to some epoch. Predictable scheduling is one of the key requirements to real-time systems. RTSJ provides an interface where the necessary parameters for a thread can be specified. The notion of thread is generalized to the notion of schedulable object. Each schedulable object must specify release requirements (periodic, aperiodic and related parameters), memory requirements (e.g. the rate at which it will allocate memory on the heap), scheduling requirements (priority). The default scheduler in RTSJ is a fixed preemptive priority-based scheduler. Other schedulers can be defined by programmer. RTSJ platform checks thread deadlines and can undertake some activities in case it is missed. Schedulability analysis can be carried out before execution or during it. The missed deadlines are reported and the corresponding event handlers are fired. In general, RTSJ supports asynchronous event handlers which are reactions to external interrupts or they can be linked to a timer. RTSJ provides an ability to a schedulable object to transmit the control to another object, so called asynchronous transfer of control. For real-time application it is an important feature which cannot be properly implemented with the standard Java mechanisms. RTSJ supports priority inheritance algorithms which elevate the priority of a thread being in a critical section. However, it works well for real-time schedulable objects only. If a schedulable object can be blocked by a non-real-time thread, the interaction with garbage collector must be also taken into account. It would be inacceptable in most cases, therefore such a situation should be avoided by a programmer. In order to allow communication in such a case, RTSJ provides a non-blocking communication mechanism for use between non-real-time threads and schedulable objects. For resource sharing conflicts analysis essential RTSJ features are locking mechanism which coincides with usual Java locking mechanism and scheduling policy. The default RTSJ scheduler suits well for most of the tasks; for this reason in our work we considered a similar but simplified scheduler of such kind. However, RTSJ allows too much freedom in interaction between threads. A single system may contain all kinds of schedulable tasks: periodic, aperiodic, time-dependent and eventdependent. It complicates modeling and analysis, so we investigated another Java specification for safety-critical systems. Safety-Critical Java The Real-Time Java profile appeared to be too complex to be fully implemented and verified. Safety-Critical Java (SCJ) was aimed to provide a simpler specification which would be easier to verify [41] and be backward compatible with the RTSJ. SCJ has predictable performance, behavior and high reliability, appropriate for safety-critical systems. The Java virtual machine for SCJ contains a minimal set of libraries to have smaller size and be easier to verify [62]. SCJ imposes limitations on how a program should be structured and supports only a few models of concurrency, synchronization, memory. Safety-critical applications must conform to strict certification requirements, therefore they use much simpler programming models than standard Java programs allowed by usual Java specification or RTSJ. A basic execution unit in SCJ is a mission. A SCJ application consists of one or more missions. A mission consists of a bounded set of schedulable objects. A schedulable object is a sequence of statements which can be scheduled by a fixed-priority scheduler. SCJ schedulable objects behavior

2.2. Concurrent Systems

13

is taken from RTSJ schedulable objects specification. For each mission a block of memory called mission memory is defined. Objects created in the mission memory persist until the mission is terminated, and their memory cannot be reused until the end of the mission. Each mission starts in an initialization phase when objects can be allocated in the mission memory and immortal memory. When a mission initialization is completed, its execution phase is entered. When a mission is terminated, a cleanup phase begins. During cleanup each schedulable object completes its execution, and an application’s set of cleanup methods is run. In order to address different complexity levels of applications and to structurize efforts needed for application certification, three levels of complexity are introduced. At Level 0, the application programming model is described as a cyclic executive model. A mission is thought as a set of computations, each of them executed periodically with a fixed time period. Periodic event handlers must have a period, a priority and a start time relative to its period. The schedule of all periodic event handlers is constructed by an application designer. The entire schedule is repeated at a fixed period (i.e. major cycle). It is possible to run several missions sequentially. At Level 1, the an application is a single mission with a set of concurrent computations. The computation is performed in a set of periodic and aperiodic event handlers (PEH and APEH) constructing a schedulable object. The application shares objects in mission memory and immortal memory among its PEHs and APEHs using synchronized methods to maintain integrity of the objects. Condition synchronization is not supported: the wait and notify methods are not available. The fixed priority preemptible scheduler executes schedulable objects in priority order. A higher priority schedulable object may preempt a lower priority object at any time. At Level 2, the an application starts with a single mission but may create and execute concurrently additional missions. Computation is performed in ManagedSchedulable objects consisting of PEHs, APEHs and no-heap real-time threads. Each child mission has its own mission memory and can create and execute other child missions. The SCJ specification was designed so that the applications written using SCJ could be certified according to international and governmental standards to safety-critical application. Also the SCJ specification has been made compatible with RTSJ where possible. There is a number of existing SCJ implementations [41], one of them is Java optimized processor (JOP) [68, 66]. JOP is a hardware which executes the Java bytecode natively. It supports SCJ and partially standard Java libraries. It also provides a set of tools for verification, in particular, worst-case execution time (WCET) analyzer for JOP [69]. WCET analyzer computes WCET of Java programs annotated with some additional information like the maximum number of cycles in a loop. Then it ouputs the WCET for blocks of code if the program was executed on JOP. In the thesis we essentially modeled the Level 1 of SCJ specification. As for the scheduler we did not include the support of priorities, and, consequently, the support of priority ceiling emulation. In our development a scheduler is a fixed generic preemptive scheduler without priority support. It means that any thread can be preempted by another one. The idea was not to narrow too much the class of schedulers that can be supported in our model. We believe that priority support could be implemented by replacing the generic scheduler with a more restrictive one without changing the rest of our model for threads.

14

2.3 2.3.1

Chapter 2: Foundations

Synchronization Mechanisms Locking Primitives

One approach to guaranteeing consistency of multiprocessor systems is to use exclusive access to the shared resources which can be corrupted if several agents may change them simultaneously. The description of the main locking primitives is taken from [23]. The simplest way to implement exclusive access is to create a flag which is raised by the process which uses a resource, and other processes willing to also have exclusive access to this resource must check from time to time if the flag is down. It is called spin lock or busy waiting, and its disadvantage is evident. Such an algorithm is highly inefficient since it requires processor cycles while it does not perform any useful work. A possible solution for this problem is condition synchronization. The process waiting to obtain a lock is suspended if the condition for acquiring the lock does not hold. A simple mechanism implementing this idea are semaphores introduced in [31]. A semaphore is a non-negative integer variable which can be updated by two procedures: P(S) and V(S). P(S) decrements the value of the semaphore S if it is greater than zero, otherwise the process is delayed until S is greater than zero. V(S) increments the value of the semaphore S. General semaphores are called counting semaphores since their operations increment or decrement an interger count. P and V operations are atomic, meaning that two processes executing P and V operations on the same semaphore cannot interfere with each other. A process cannot fail during the execution of a semaphore operation. Condition synchronization and mutual exclusion can easily be programmed using semaphores. If a semaphore is 0, the calling process is suspended until the semaphore becomes greater than 0. Suspended processes are removed from the scheduler, i.e. they cannot be selected for execution. When a semaphore becomes greater than 0, suspended processes are resumed. It is a much more efficient algorithm than busy lock, and it is used in real runtime system implementations. The commonly used semaphores are binary semaphores or mutexes. They permit one process at a time to hold the semaphore. A general semaphore can be implemented using two binary semaphores and an integer [23]. A weakness of semaphores is that a lock or release can be easily forgotten which leads to errors. A more structured locking mechanism would be desired. It is achieved by the idea of conditional critical regions (CCR). A critical region is a section of code which is guaranteed to be executed in mutual exclusion. In a CCR variables that must be used exclusively by processes are grouped into resources. A process cannot enter a region where another process is active. Condition synchronization is provided by guards on regions. A potential performance drawback is that each suspended process must reevaluate its guards every time a CCR reports that the resource is left. Conditional regions can be dispersed throughout the program. Another synchronization primitive called monitor provides more structured control regions ([49]). The critical regions are written as procedures and are placed into a single module called a monitor. All procedure calls in the module are guaranteed to execute with mutual exclusion. Condition synchronization within the monitor is provided by a condition variable. A condition variable can be changed by two operators: P and V. When a process issues the P operation, it is suspended and placed on the queue associated with the condition variable. A suspended process releases its mutually exclusive hold on the monitor allowing another process to enter. When a process executes the V operation, it will release one

2.3. Synchronization Mechanisms

15

suspended process. The use of condition variables in monitors can make understanding the code difficult. A more structured approach based on guards instead of condition variables is proposed by a kind of monitor called a protected object. A protected object collects data items and allows access to them only by protected subprograms or protected entries. These subprograms or entries will be executed so that the data is updated under mutual exclusion. Condition synchronization is provided by guards on entries which must evaluate to true before a task is allowed to enter. Protected objects are like monitors and conditional critical regions. The Java language provides mechanisms for mutual exclusion using monitors (synchronized keyword) and methods for condition synchronization like wait and notify. Java monitors encapsulate the low-level mechanism of locking, so that the user must indicate an object and a critical section where this object must be updated mutually exclusively. The use of blocking synchronization can potentially lead to errors, which cause programs to hang or prevent programs from executing correctly. The most serious error is deadlock when each process is in such a state that it cannot proceed. On fig. 2.4 a possibly deadlocking situation is shown. P1 lock ( S1 ); lock ( S2 ); ... unlock ( S2 ); unlock ( S2 );

P2 lock ( S2 ); lock ( S1 ); ... unlock ( S1 ); unlock ( S2 );

Figure 2.4: Deadlock example. Two processes P1 and P2 are executing in parallel, and they contain the pieces of code listed on fig. 2.4. P1 first locks a semaphore S1, then S2. P2 locks the same semaphores in the reversed order. If P1 locks the semaphore S1 and at the same time P2 locks S2, both processes cannot proceed since they need to lock another semaphore but it is already locked by another process. Nobody can predict when such a situation may appear, and such an error can be solved only by rewriting the code. Another possible situation is when a process wishing to access a resource in a critical section can never do this because there are always other processes gaining access before it. It is called starvation and is highly critical especially for real-time systems. If a process if free from deadlocks and starvation, it possesses liveness. Liveness implies that if a process wants to do some action, it will eventually be allowed to finish it. Error-free multithreaded applications must satisfy liveness property for all their processes.

2.3.2

Message-based Synchronization

As synchronization through locks always has attendant problems like proof of process liveness, another approach to interprocess synchronization was adopted in several programming languages like Ada ([52]) or Erlang ([6]). Shared variable synchronization and process communication are done by passing messages between processes. The implicit synchronization between processes is

16

Chapter 2: Foundations

that a receiver process cannot obtain a message before it has been sent. If a process is waiting for a message, it becomes suspended until the message arrives. Several approaches are used to define the sender process behavior: • asynchronous send —the sender proceeds immediately after having sent a message, regardless of whether a message has been received or not, • synchronous send —the sender proceeds only after the message has been received, • remote invocation —the sender proceeds only when a reply has been returned from the receiver. The asynchronous send can be used to construct synchronous sent or remove invocation [23], so it gives more flexibility to the user. However, asynchronous programs are more complex to write and to debug due to a high degree of indetermenism of the global state. Also, message queues in asynchronously sending processes can be unboundly long which creates additional problems for message storage. The whole system correctness is difficult to be proved. A popular implementation of message-based synchronization is presented in languages supporting the actor model [48]. Actors are viewed as entities which can send and receive asynchronous messages, and they can do some computations based on the internal information. Many languages have support of actors like Erlang, Scala, Java, C++.

2.3.3

Lock-free and Wait-free Algorithms

One more approach to data consistency is to completely deny the usage of locks and apply instead the algorithms specially designed to allow several agents to change the state of shared resources without explicit locking but preserving the data consistency. Lock-free algorithms ensure that at least one thread makes progress within a bounded number of steps whereas wait-free algorithms ensure that each thread makes progress within a bounded number of steps [72]. Lock-freedom guarantees the progress of the system (i.e. no deadlocks), while wait-freedom also prevents starvation of threads. The examples of implementation describe lock-free and wait-free algorithms for linked lists [40], [47], [72]. They support basic operations on lists like find, insert and delete an element. These algorithms are based on use of the low-level instruction CAS (compare-and-swap) which must be executed atomically on a processor. Lock-free algorithms are easier but they may require several attempts to finish an operation. For this reason lock-free algorithms do not suit well for real-time systems since they cannot guarantee that operation finished within a certain deadline. A popular C++ library Boost contains implementations of some lock-free data structures [18]. Wait-free algorithms are more complex and are rarely met. The Java concurrent library contains wait-free algorithms for a concurrent queue [71, 63].

2.4 2.4.1

Formal Verification Model Checking

Reliable computer systems need to be verified formally. A program to be verified must possess certain properties. These properties are usually simple to express like “a system cannot be in a

2.4. Formal Verification

17

certain state under certain conditions”, so they are obtained from the system specification. Formal methods make system design and verification process closer to each other. Model checking uses system models expressed by precise mathematical means; then a model checking procedure explores all possible system states. Model checking can answer qualitative questions like “is this state reachable?” but also quantitative questions about time like “is the response received within 10 minutes after having been sent?”. The system model is usually automatically generated from a model description specified in some programming language (C, Java) or a hardware description language. Models of systems are often expressed using finite-state automata consisting of a finite number of states and transitions between them. The state can store current values of variables, the previously executed statement etc. Transitions describe how a system can move from one state to another. In order to improve the quality of a model, simulation can be done. Simulation helps to find modeling errors, if carried out before model checking, it will reduce cost of verification efforts. Properties are usually described in a temporal logic. When a model checker runs to check a property, it can either say if the property is satisfied or not in the model, or the model is too large to be verified on this particular computer. Model checking can help to find design errors, and it has a strict mathematical base. However, several drawbacks of model checking need to be kept in mind when doing verification. Model checking checks the correctness of a model, not of the initial program. Not all properties can be verified due to general decidability issues. If a property can be verified, an additional difficulty is that the verification algorithms often have exponential complexity in time or in space. Efficiency is hardly achievable for those algorithms. State explosion is one of the most severe disadvantages of model checking when the number of states in the model grows so rapid that it exceeds the available computer memory. One more disadvantage is that a model inevitably contains abstractions and simplifications, therefore the process of modeling needs to be thoroughly checked by other means [7]. Despite the listed issues, model checking is a quickly growing field of study, and it takes its place in industrial development process. Transition Systems Transition systems are often used to model the behavior of different systems. Transition systems are directed graphs where nodes represent states, and edges represent transitions between states. A state describes a certain state of the system. Definition. A transition system T S is a tuple (S, Act, →, I, AP, L) where • S is a set of states, • Act is a set of actions, • →⊆ S × Act × S is a transition relation, • I ⊆ S is a set of initial states, • AP is a set of atomic propositions, • L : S → 2AP is a labeling function. T S is called finite if S, Act and AP are finite.

18

Chapter 2: Foundations α

Transitions are denoted s − → s0 instead of (s, α, s0 ) ∈→. A transition systems starts in some initial state s0 ∈ I and evolves according to the transition relation →. If s is a current state, a α transition s − → s0 is selected nondeterministically and taken, so the action α is performed, and the transition system is moved to the state s0 . The labeling function L assigns a set L(s) ∈ 2AP of atomic propositions to any state s. L(s) stands for exactly those atomic propositions which are statisfied by state s. Definition. Let T S = (S, Act, →, I, AP, L) be a transition system. For s ∈ S and α ∈ Act the set of direct α-successors of s is defined as α

P ost(s, α) = {s0 ∈ S | s − → s0 }, [ P ost(s) = P ost(s, α). α∈Act

Each state s0 ∈ P ost(s) is called a direct successor of s. Definition. Let T S = (S, Act, →, I, AP, L) be a transition system. A finite execution fragment ρ of T S is an alternating sequence of states and actions ending with a state αi+1

ρ = s0 α1 s1 α2 ...αn sn such that si −−−→ si+1 for all 0 6 i < n, where n > 0. An infinite execution fragment ρ of T S is an infinite alternating sequence of states and actions: αi+1

ρ = s0 α1 s1 α2 s2 α3 ... such that si −−−→ si+1 for all 0 6 i. Timed Automata Different formal systems are used for modeling. One of the most popular are timed automata (TA) used for model checking of real-time systems [2, 3]. In this thesis they are also used for model checking of multithreaded programs. The following TA theory is taken from [7]. Timed automata are finite automata with a set of clock variables and their semantics. Clocks are different from other variables: they can be only read or reset to 0, and their values are increasing altogether. Conditions on clock values are used to enable transitions: a transition can be taken if and only if the condition on it is true. Conditions on clock values are called clock constraints. Definition. A clock constraint over set C is formed according to the grammar g ::= true | x < c | x 6 c | g ∧ g | ¬g where c ∈ N and x ∈ C. CC(C) denotes the set of clock constraints over C. Definition. A timed automaton (TA) is a tuple T A = (Loc, Act, C, ,→, Loc0 , Inv, AP, L) where • Loc is a finite set of locations, • Loc0 ⊆ Loc is a set of initial conditions, • Act is a finite set of actions,

2.4. Formal Verification

19

• C is a finite set of clocks, • ,→⊆ Loc × CC(C) × Act × 2C × Loc is a transition relation, • Inv : Loc → CC(C) is an invariant-assignment function, • AP is a finite set of atomic propositions, • L : Loc → 2AP is a labeling function for locations. Edges of TA are labeled with tuples (g, α, D) where g is a clock constraint on the clocks of the TA, α is an action to be performed, D ⊆ C is a set of clocks to reset. The intuitive interpretation of g,α,D l ,−−−−→ l0 is that the TA can move from the location l to the location l0 when the clock constraint g holds. When moving from l to l0 all clocks in D will be reset to 0 and the action α will be performed. Function Inv assigns to each location a location invariant which specifies how long a timed automaton may stay in this location. The location l must be left before Inv(l) becomes invalid, otherwise further transitions are not possible, it is called a timelock. The function L associates to each location a set of atomic propositions which are true in this location. The semantics of timed automata requires to define a satisfaction relation for guards and invariants. Definition. A clock valuation η for a set C of clocks is a function η : C → R>0 assigning to each clock x ∈ C its current value η(x). Using the definition of a clock valuation, a satisfaction relation for clock constraints can be defined. Definition. For a set C of clocks, x ∈ C, a clock valuation η for C, c ∈ N and g, g 0 ∈ CC(C) the satisfaction relation  is defined as following: η  true η  x < c iff η(x) < c η  x 6 c iff η(x) 6 c η  ¬g iff η 2 g η  g ∧ g 0 iff η  g ∧ η  g 0 For a clock valuation η and a positive real d, η + d denotes the clock valuation where all clocks in η are increased by d. reset x in η denotes the clock valuation where all clocks from η are equal to η except the clock x which is 0. TA semantics defines two types of semantic steps: discrete transition when a transition in the timed automaton is taken and delay transition when time progresses while the automaton stays in the same location. Definition. Let T A = (Loc, Act, C, ,→, Loc0 , Inv, AP, L) be a timed automaton. The transition relation → is defined by following: α

— discrete transition: hl, ηi − → hl0 , η 0 i if the following conditions hold:

20

Chapter 2: Foundations g,α,D

(a) there is a transition l ,−−−−→ l0 in TA (b) η  g (c) η 0 = reset D in η (d) η 0  Inv(l0 ) d

— delay transition: hl, ηi − → hl, η + di for d ∈ R>0 if (a) η + d  Inv(l) c>2 a

b

c64

c=0

Figure 2.5: An example of a timed automaton The automaton on fig. 2.5 has two locations: a and b, a is the initial location. The invariant of a allows the automaton to stay in this location as long as the clock c is less than 4. It is obliged to leave this location before that time expires. The guard on the outgoing edge allows to take the transition when the clock c is greater than 2. The automaton may stay in the location b as long as it wants, and when it returns to the location a the clock c is reset to 0. A possible trace for this automaton would look like loc : a ◦ c=0

2

loc : a ◦ c=2

loc : b ◦ c=2

6

loc : b loc : a reset c ◦ ◦ c=8 c=0

...

where loc : a means that the automaton is currently in location a and c = 2 means that the current value of clock c is 2. The labels above the arrows denote action labels according to the semantics definition. Property verification often requires measuring of execution time for some path in a TA. Execution time of discrete transitions is 0, execution time of delay transitions equals to its duration. τ0 τ1 Execution time of an infinite execution fragment ρ = s0 −→ s1 −→ s2 ... of an automaton T A where τi ∈ Act ∪ R>0 is calculated as ExecT ime(ρ) =

∞ X

ExecT ime(τi ).

i=0

Definition. The infinite path fragment π is time-divergent if ExecT ime(π) = ∞ otherwise π is time-convergent. For state s of T A let P athsdiv (s) = {π ∈ P aths(s) | πis time-divergent}. Time-convergent paths are infinite paths with finite execution time, it means, the duration of actions decreases infinitely. It contradicts with the systems from the real world which are modeled by TA, therefore in the following only time-divergent paths will be considered. A network of timed automata is the parallel composition of a set of timed automata A1 , ...An . The parallel composition operator kH is parametrized with a set of handshaking actions H.

2.4. Formal Verification

21

Definition. Let T Ai = (Loci , Acti , Ci , ,→i , Loc0,i , Invi , APi , Li ), i = 1, 2 be timed automata with H ⊆ Act1 ∩ Act2 , C1 ∩ C2 = ∅ and AP1 ∩ AP2 = ∅. The timed automaton T A1 kH T A2 is defined as (Loc1 × Loc2 , Act1 ∪ Act2 , C1 ∪ C2 , ,→, Loc0,1 × Loc0,2 , Inv, AP1 ∪ AP2 , L) where L(hl1 , l2 i) = L1 (l1 )∪L2 (l2 ) and Inv(hl1 , l2 i) = Inv1 (l1 )∧Inv2 (l2 ). The transition relation ,→ is defined by the following rules: • for α ∈ H: g2 ,α,D2

g1 ,α,D1

l1 ,−−−−−→1 l10 ∧ l2 ,−−−−−→2 l20 g1 ∧g2 ,α,D1 ∪D2

hl1 , l2 i ,−−−−−−−−−−→ hl10 , l20 i • for α ∈ / H: g,α,D

l1 ,−−−−→1 l10 g,α,D

hl1 , l2 i ,−−−−→ hl10 , l2 i

g,α,D

and

l2 ,−−−−→2 l20 g,α,D

hl1 , l2 i ,−−−−→ hl1 , l20 i

The location invariant of a composite location is simply the conjunction of the location invariants of its parts. For α ∈ H the transition in the resulting timed automaton is guarded by the conjunction of the guards of the individual automata. Therefore, an action in H can be taken only if it is enabled in both automata. Let T A1 kH T A2 kH ...kH T An denote the parallel composition of timed automata T A1 through T An where H ⊆ Act1 ∩ ... ∩ Actn . An extension of traditional TA is used in the TA model checker Uppaal [15]. It does not add anything to expressiveness and does not change decidability of the classical timed automata but makes modeling easier. Among those extensions are shared variables and committed locations which will be used later in the development. Uppaal adds bounded integer and boolean variables and arrays of them to the background state. Predicates over those variables can be used as guards on edges and location invariants. Variable values can be updated using resets on edges. Guard and invariant expressions are formed like C expressions and cannot have side effects. The semantics is extended naturally. During delay transitions variables may not be updated so the rule for delay transition remains unchanged. Action transitions must satisfy the edge guard including the expressions with variables, and when the edge is taken, the variable values are updated according to the updates specified for this edge. To model atomic sequences of actions Uppaal supports committed locations. A committed location is a location where no delay is allowed, and in a network only action transitions starting from such a committed location are allowed. Let T Ai = (Loci , Acti , Ci , ,→i , Loc0,i , Invi , APi , Li ), i = 1..n be timed automata which are synchronized over set H. Let l = hl1 , ...ln i where l1 ∈ Loc1 , ...ln ∈ Locn be a tuple of locations in the corresponding automata. Let Comm(l) be a set of committed locations in l. Let l[li /li0 ] denote a tuple where the i-th coordinate li is replaced with li0 and the others remain the same. The transition semantics of automata with committed locations →c updates the transition semantics as following: d

d

• hl, ηi − →c hl, η + di if hl, ηi − → hl, η + di and Comm(l) = ∅,

22

Chapter 2: Foundations α

• hl, ηi − →c hl[li /li0 ], η 0 i if α

– hl, ηi − → hl[li /li0 ], η 0 i and – either li ∈ Comm(l) or Comm(l) = ∅, α

• hl, ηi − →c hl[li /li0 ][lj /lj0 ], η 0 i if α

– hl, ηi − → hl[li /li0 ][lj /lj0 ], η 0 i and – either li ∈ Comm(l), lj ∈ Comm(l) or Comm(l) = ∅. Temporal Logic Time computational tree logic (TCTL) is a real-time variant of CTL ([26]) aimed to express properties of timed automata. TCTL state formulas over the set AP of atomic propositions and set C of clocks are formed according to the following grammar: Φ ::= true | a | g | Φ ∧ Φ | ¬Φ | ∃ϕ | ∀ϕ where a ∈ AP , g ∈ CC(C) and ϕ is a path formula defined by ϕ ::= ΦUJ Φ where J ⊆ R>0 is an interval whose bounds are natural numbers. ΦUJ Ψ asserts that a Ψ-state is reached within t ∈ J time units while only visiting Φ-states before reaching the Ψ-state. Timed variants of modal operators ♦ and  are defined as follows: • ♦J Φ = true UJ Φ • ∃J Φ = ¬∀♦J ¬Φ • ∀J Φ = ¬∃♦J ¬Φ. ∃♦J Φ and ∀♦J Φ mean that there exists a path (or for all paths correspondingly) such that ♦J Φ holds. Similarly, ∃J Φ and ∀J Φ hold if there exists a path (or for all paths correspondingly) such that J Φ holds. Let us consider again the automaton on fig. 2.5. The property ∀(a → ∀[0,2) ¬b) expresses that the automaton will stay in location a between 0 and strictly less than 2 time units. Another property ∀♦[5,6] b expresses that for all paths there is a state between 5 and 6 time units such that the automaton will be in location b. Definition. Let T A = (Loc, Act, C, ,→, Loc0 , Inv, AP, L) be a timed automaton, a ∈ AP , g ∈ CC(C) and J ⊆ R>0 . For state s = hl, ηi and TCTL state formulas Φ and Ψ and TCTL path formula ϕ the satisfaction relation  is defined as following: s  true s  a iff a ∈ L(l)

2.4. Formal Verification

23

s  g iff η  g s  ¬Φ iff not s  Φ s  Φ ∧ Ψ iff s  Φ and s  Ψ s  ∃ϕ iff π  ϕ for some π ∈ P athsdiv (s) s  ∀ϕ iff π  ϕ for all π ∈ P athsdiv (s) d

d

0 1 For time-divergent path π ∈ s0 = ⇒ s1 = ⇒ ... the satisfaction relation  for path formulas is defined by Pi−1 π  ΦUJ Ψ iff ∃i > 0.si + d  Ψ for some d ∈ [0, di ] with k=0 dk + d ∈ J and

∀j 6 i.sj + d0  Φ ∧ Ψ for any d0 ∈ [0, dj ] with

j−1 P

dk + d0 6

k=0

i−1 P

dk + d

k=0

where for si = hli , ηi i and d > 0 we have si + d = hli , ηi + di. The interpretations for atomic propositions, negation and conjunction are natural. Clock constraint g holds in hl, ηi if the values of the clocks in η satisfy g. A formula ∀(aU(4,∞) b) would be satisfied for the automaton from fig. 2.5 since after 4 time units it is obliged to be in location b, and before this time it can be either in a or in b. Simulation Different models of the same system can have a different level of abstraction. At high abstraction levels many details are left unspecified. If one model has higher abstraction level than another one, it is said to be an abstraction of the other. Simulation relation is used for proving that some abstract model has the same semantical behavior as a more fine-grained transition system. If it holds, the abstraction models the transition system correctly. A simulation relation is a preorder on the state spaces of models requiring that if s0 simulates s then state s0 can mimic all stepwise behavior of s. Simulation can be lifted to entire transition systems by comparing their initial states. Definition. Let T Si = (Si , Acti , →i , Ii , AP, Li ), i = 1, 2 be transition systems over AP . A simulation for (T S1 , T S2 ) is a binary relation (R) ⊆ S1 × S2 such that — ∀s1 ∈ I1 .(∃s2 ∈ I2 .(s1 , s2 ) ∈ (R)) — for all (s1 , s2 ) ∈ (R) the following holds: (a) L1 (s1 ) = L2 (s2 ) (b) if s01 ∈ P ost(s1 ) then there exists s02 ∈ P ost(s2 ) with (s01 , s02 ) ∈ (R). T S2 simulates T S1 (T S1  T S2 ) if there exists a simulation (R) for (T S1 , T S2 ). The simulation relation is reflexive and transitive.

24

Chapter 2: Foundations

Related Work Model checking is widely used for verifying programs written in imperative programming languages. Verification with timed automata helps in real-time cases when time frames of certain actions are well predictable. A model of a program is built and then correctness properties are verified. Real-time programs written in SystemC are verified in [46, 45] by generating a network of timed automata. Deadlocks and timing constraints are the properties that are checked. In the similar way schedulability analysis of real-time Java programs is carried out in [19]. Also general schedulability analysis using TA is carried out in [28]. For particular problems extensions of timed automata are used. An important topic is decidability of the model, so the extension description is usually accompanied with the decidability proof. Timed I/O automata [29], stopwatch automata [1], task automata [34] are applied for finding proper schedules for multiprocessor programs. Other formalisms based on automata are developed for modeling and verification of distributed and parallel systems. Modeling of real-time systems with periodic finite state machines is described in [65]. In the paper [73] a formal model for static analysis of indeterminism in multiprocessor programs and a verification tool are built. A model checker for multithreaded Java programs Java PathFinder (JPF) has been developed by NASA [74] and is based on the Spin model checker [51]. It is a general verification tool combining model checking, program analysis and testing. JPF builds a tree of possible execution states that is searched for errors such as deadlocks. Model-based development of real-time Java programs is shown in [38]. The authors generate real-time Java code from TA models.

2.4.2

Isabelle Proof Assistant

Isabelle is a proof assistant for modeling formal languages and code generation. Isabelle can be used for formally defining transition systems and proving relations between them. We use Isabelle for proving semantic correctness of an abstract model of Java programs. The core of Isabelle is the Pure logic framework, which is further extended to support different logics like HOL (higher-order logic) or FOL (first-order logic) [64]. Terms in Isabelle/Pure are built with the constructs [76]: • α ⇒ β — functions, V • x.B(x) — universal quantification, • A =⇒ B — implication. Isabelle/HOL contains the usual logical constructs: • ∧, ∨, → — conjunction, disjunction and implication, • ∀x.P (x), ∃x.P (x) — universal and existential quantification. Isar Proof Language Theorems in Isabelle can be written either by using directly Isabelle/Pure constructs (so-called apply-style) or using the Isar high-level language. In apply-style the proof goals are simplified stepby-step using appropriate proof methods until they become trivial. The Isar language produces structured proofs which are more comprehensible and more suitable for complex cases [76].

2.4. Formal Verification lemma (¬¬A =⇒ A) apply (rule classical) apply (erule notE) apply assumption done

25 lemma assumes a:(¬¬A) shows A proof − { assume (¬A) with a have False by(rule notE) } then show ?thesis apply(rule ccontr) — contradiction by assumption qed

Figure 2.6: The same proof written in apply-style and in Isar. The general form of Isar reasoning is from facts have subgoals by proof method (see fig. 2.6). Here facts are assumptions or other facts obtained before in the proof, subgoals facts that are currently proved, proof method is either an automatical proof method or rule, erule, assumption etc. Facts can be labeled with a name and then be referenced by this name (see the label a in the example above). Curly braces enclose proof blocks which will output a fact in the form assumptions =⇒ goals. assume or assumes introduces new assumptions in the proof body or in the theorem statement respectively. have states a fact that needs to be proved, show states the final goal of the theorem which is often replaced by ?thesis proof variable. If the fact after show is proved, the theorem proof is finished. then takes the previous fact as an assumption, with takes the previous fact and other facts referenced by their labels. Types The built-in types in Isabelle/HOL are • unit — singleton type, • bool, nat, integer, real — booleans, natural numbers, signed integers and real numbers • 0a × 0b — pairs, • 0a option — optional values, can be None or Some x, • 0a list, 0a set — lists and sets, • char — characters. 0

a,0 b are type parameters used as type constructors in parametrized types similar to ML. New types can be constructed from the existing ones with a datatype construct. A datatype provides a set of constructors for itself which can have terms of other types as parameters. datatype 0a tree = Leaf 0a | Node ( 0a tree) 0a ( 0a tree)

The datatype 0a tree has two constructors accepting parameters. Constructors can be recursive. For example, terms like Node (Node (Leaf (Suc 0 )) 4 (Leaf 8 )) 16 (Leaf 32 ) have type nat tree.

26

Chapter 2: Foundations

Parametrized types are often used for defining collections. The two collections commonly used in Isabelle development are 0a set and 0a list. Sets are unordered collections of elements of type 0a. Each element can be presented in a set only once. Lists are ordered collections of elements of type 0 a. List elements can be accessed by index and they may repeat in the list. Commonly used operations on sets are: • usual set union, intersection, subtraction defined naturally, • insertion of an element into a set: insert a S. Empty list is designated []. Common list operations are: • insertion of an element into the beginning of a list: x # xs, • head of a list: hd (x # xs) = x, • tail of a list: tl [] = [], tl (x # xs) = xs, • get a set of all elements in a list: set [] = {}, set (x # xs) = insert x (set xs), • apply a function to all elements of a list: map f [] = [], map f (x # xs) = f x # map f xs, • append two lists: [] @ ys = ys, (x # xs) @ ys = x # xs @ ys, • length of a list length xs, • distinctness predicate: distinct [] = True, distinct (x # xs) = (x ∈ / set xs ∧ distinct xs), • get element on the n-th place: (x # xs) ! n = (case n of 0 ⇒ x | Suc k ⇒ xs ! k), • index of an element in a list: index xs x. Data structures with several named fields are called records. Records allow to fetch or update a certain field of the data structure. record point = X :: int Y :: int

Terms like r = (|X = 4 , Y = −1 |) represent concrete record instances. In order to access record fields, Isabelle defines functions X and Y corresponding to field names so X r = 4. Field update is written like r(|Y := 8 |). Isabelle generates update functions X-update, Y-update for each field. A commonly used interface for them is update of a concrete record field with a concrete value r(|Y := 8 |). In specific cases the general functions can be used, for example, if the new field value depends on the old value. In the previous example the update can be written like r(|Y := 8 |) where the first argument of Y-update function is a rule for updating, and the second one is the record to update. In this case any value in the Y field of the record r will be overwritten with 8. Records can be extended with additional fields. The defined type point can be extended to three-dimensional space. record point3D = point + Z :: int

2.4. Formal Verification

27

Each record of the type point3D can be treated as a record of the type point but not vice versa. In order to define a function for all types which can be obtained by record extension, a generic type 0 f point-scheme should be used. Here 0f is a type parameter of the extension. It is replaced by a concrete type like int in the point3D type. definition projX ::( 0f point-scheme ⇒ int) where (projX p = X p)

The function projX defines a projection onto X-axis which would be the same for all possible record extensions, so it can be applied to objects of type point as well as to objects of type point3D. Functions Functions can be defined in several ways. For non-recursive functions a keyword definition can be used. definition add::(nat ⇒ nat ⇒ nat) where (add a b = a + b)

Primitively recursive functions can be defined with a primrec. They can recurse only on datatype recursive arguments. There can be only one equation for each constructor of the datatype. primrec height::( 0a tree ⇒ nat) where (height (Leaf a) = 1 ) |(height (Node t1 a t2 ) = 1 + max (height t1 ) (height t2 ))

A general recursive function is defined with a keyword fun. It can have more than one equation for a constructor of the datatype. Its termination is proved automatically by lexicographic order in case Isabelle can do it. fun isLargeTree::( 0a tree ⇒ bool) where (isLargeTree (Leaf a) = False) |(isLargeTree (Node (Leaf a1 ) a2 (Leaf a3 )) = False) |(isLargeTree - = True)

This function checks if the tree has more than one node (of course, it could be written by primitive recursion). The pattern − matches everything that has not been matched in the previous clauses. If Isabelle cannot prove the function termination automatically, a more general function keyword can be used. The user must provide a termination proof for the defined function and a proof of pattern exhaustiveness [56]. function sum :: (nat ⇒ nat ⇒ nat) where (sum i N = (if i>N then 0 else i + sum (Suc i) N )) by pat-completeness auto termination sum apply(relation (measure (λ(i,N ). N +1 −i))) by auto

This function sums all natural numbers up to N using i as a counter. The pattern exhaustiveness is proved by pat completeness method. The termination proof requires manually specifying the measure function which strictly decreases during recursive calls. The rest of the proof is done automatically. Some useful Isabelle functions include

28

Chapter 2: Foundations • image of a set: f ‘ S = {x. ∃ y∈S. x = f y}, • relational image of a set: r ‘‘ s = {y. ∃ x∈s. (x, y) ∈ r} • domain of a function: dom m = {a. m a 6= None}, • range of a function: ran m = {b. ∃ a. m a = Some b}, • function restriction on a set: mA = (λx. if x ∈ A then m x else None), • get the value of an option: the (Some x) = x, • injectivity predicate: inj-on f A = (∀ x∈A. ∀ y∈A. f x = f y −→ x = y).

Locales Locales are used to write parametric theories [12]. Locales introduce some locale parameters and assumptions on them. Localized proofs can use these assumptions as preconditions. locale partial-order = fixes le:: ( 0a ⇒ 0a ⇒ bool) (infixl (v) 50 ) assumes refl: (xvx) and antisym: (xvy =⇒ yvx =⇒ x=y) and trans: (xvy =⇒ yvz =⇒ xvz)

A proof writer can say that the proof is for some locale by enclosing it into a context block. context partial-order begin

... proofs ... end

Locales can be interpreted in theories by providing suitable parameters which satisfy the locale assumptions. interpretation int-order: partial-order (op ≤ :: int ⇒ int ⇒ bool) by unfold-locales auto

Here locale partial-order is interpreted with usual ordering on integers. Proof of locale assumptions follows directly from the locale definition unfolded by unfold-locales method. Then locale assumptions for this interpretation can be used in theories like int order.ref l which gives x ≤ x. Code Generation Isabelle has a built-in code extractor which extracts executable code from functions and inductive predicates. It is highly extensible and provides a flexible way of generating code for common functional languages like SML, OCaml, Scala, Haskell. Isabelle sets its basic types into correspondance with the native language types, and other extracted functions use directly language types instead of generating them from scratch every time. Code generation is not entirely sure because Isabelle itself can contain errors, but it has an architecture with a small reliable kernel which is thoroughly tested. Testing a small kernel and generating code for larger user application is easier than testing each of the large applications. The verified code generated from Isabelle formalizations less likely contains errors than if it was directly written by a developer.

2.4. Formal Verification

2.4.3

29

Formal Execution Models

Jinja and JinjaThreads The Jinja project [54] is a formally verified compiler of a subset of Java. The formal development was done in Isabelle. Jinja does not make a distinction between expressions and statements, i.e. program is an expression which can contain only other sub-expressions. Expressions may have side effects, and they always return a value. In case of control constructs such as loops or if-statement, the returned expression is an empty value U nit. The Jinja state is a pair of heap and local variables. The heap stores references from memory addresses to objects, local variables are stack variables of primitive types. Jinja defines two operational semantics: big-step and small-step. The big-step semantics evaluates an expression up to the end until it can no longer be reduced, and updates the state accordingly. The small-step semantics evaluates expressions step-by-step visiting all intermediate expressions and states. Later, the authors show the equivalence of big-step and small-step semantics. The semantics of Java Virtual Machine is also formally defined. Finally, the compiler is extracted from the compilation function which is formally verified by means of Isabelle. Jinja was one of the first works when a compiler for a realistic language has been formally verified. However, the number of limitations still remained open, mainly multithreading support. It was what the extension of Jinja, JinjaThreads, was aimed to complete. JinjaThreads [60, 61, 59] is a multithreaded extension of Jinja. It supports synchronization, wait-notify mechanism, thread creation and joining. The semantics of multithreaded Java has two levels: the small-step single-threaded semantics partially inherinted from Jinja and the interleaving semantics which is responsible for proper thread selection. JinjaThreads includes three semantic levels: Java source code level, JVM bytecode level and an intermediate language J1 . The correctness of the compiler is proved by proving a bisimulation property between different semantic levels. The extracted compiler code is executable, however, it does not include optimizations. A drawback of the JinjaThreads project is that it does not work with time, so that the multithreaded semantics is purely event-based, and not time-based. Java constructs like Thread.sleep(n) or Thread.wait(n) do not make any sense, since the time advancement is ignored. For full multithreading support it would be desirable to add time support to the developed semantics. CompCert In the CompCert project [57] a formally verified optimizing compiler of a subset of C called CMinor was developed. The target architecture is PowerPC due to its popularity in safety-critical avionic systems. The compiler consists of several stages like construction of a control-flow graph, register allocation, linearization etc. Each step of the compilation is proved to preserve execution semantics of the initial program. The semantics used is a small-step semantics which uses continuations where it stores statements to be evaluated in the future steps. The compiler itself is executable. The concurrency is not fully supported. In [50] a concurrent semantics for CMinor has been developed, supporting threads and locks. This semantics models programs which are race-free, i.e. implement proper mutual exclusion.

30

2.5

Chapter 2: Foundations

Conclusions

We considered common problems encountered in multithreaded programming and different approaches to the program verification. Model checking is a popular and promising way to verify real-time programs. Time-based models are widely used for verification of programs written in different programming languages for a number of system architectures. Some architectures are designed so that program models could be easily built. However, programs in popular imperative languages like C or Java are too complex to be modeled and verified in general case, although there is a lot of software written in these languages, and it requires verification. In order to make verification possible, additional information can be added to the programs. In this thesis we propose a method of verification of Java programs annotated with additional timing information. A model of a program is built based on the annotations, and then model checking is run. The verification method tries to find possible resource sharing conflicts which can lead to deadlock or deadline miss for real-time tasks.

Chapter 3

Abstracting from Java to Timed Automata In this thesis we make a tool for finding resource sharing conflicts in multithreaded programs written in Java. The tool performs static analysis of programs by building a model based on timing information which should be provided in the program. In this chapter we present an informal description of the tool architecture and its components.

3.1

Timing Annotations

The programs considered for verification should be annotated with timing information. The annotations are treated by the verification tool as exact execution time of the annotated statement. Annotations have the form of comments in Java code, so the annotated Java programs can be run in the usual manner. Annotated statements may themselves have complex structure. On fig. 3.1 examples of different annotated statement are shown. The first statement is assumed to take 4 time units for execution, the second whole statement should take 18 time units for execution. // @ 4 @ // v = dict . toString ; // @ 18 @ // if ( list . isEmpty ) l =0; else l = list . size ();

Figure 3.1: Examples of annotated statements. Execution time on multiprocessor systems depends on the quantity of processors. Initially, we considered two models: a monoprocessor model which has a single processor, i.e. only one thread can execute at any moment of time, and a multi-processor model where each thread has its own 31

32

Chapter 3: Abstracting from Java to Timed Automata Thread 1

Thread 1

Thread 2

Thread 2

Figure 3.2: Execution timeline in multi- and monoprocessor models. Green - active execution, grey - sleeping, white - waiting for processor time.

processor to execute on (see fig. 3.2). In such a multi-processor system threads do not wait for the processor time, once they are eligible for execution, they start running. The scheduler in such a system would have nothing to do since there is no need to share processor time. Such kind of multi-processor systems is less realistic because usually processor time needs to be shared between several threads. Due to lack of time, we decided to concentrate on the single-processor model only since it corresponds better to the reality of real-time systems. The most realistic model would contain a bounded number of processors and a number of threads which can be greater than the number of processors. We restrict ourselves to the single-processor model which is good enough for our modeling purposes, but another model can also appear to be useful in some situations. Timing annotations are expected to be received from worst-case execution time computation [77]. WCET analyzers analyze basic blocks of a program and provide the time estimate. However, an important difference between timing annotations and WCET is that the annotations are considered as exact execution time, and not the worst case. It means that if an annotated statement has finished its execution earlier than it has been claimed in its annotation, it has to wait until the annotation expires. It makes WCET of a statement the exact execution time of the annotated statement. This approach is useful when the WCET annotations are the best possible estimates. The precise computation of WCET makes sense for real-time systems only since they have clearly defined policies for cache, scheduling etc. which are important for WCET computation. In the beginning of the work we experimented with formalizing RTSJ [14]. Unfortunately, this specification is extremely complex and contains a lot of features which are not directly connected with the purposes of our work. The existing implementations of RTSJ like JamaicaVM [70, 53] are operable but they do not implement some features described in RTSJ. For these reasons we investigated the SCJ specification [62] which took essential features from RTSJ but is more modular. It has three levels with increasing complexity; levels 0 and 1 are suitable for verification. All the real-time specifications define their behavior for systems with a single processor. Multi-processor systems are much less predictable and may produce situations when the behavior of the runtime is undefined; such situations in multi-processor systems can hardly be covered in a specification on the present day. For the reasons mentioned above the presented verification tool models single-processor systems. The main verification model represents execution of annotated Java programs in the single-processor environment. We concentrated on basic Java features for now and do not support specific SCJ constructs like Mission or Schedulable objects. However, the modeling we do is applicable to programs running in a real-time predictable environment since it is based on WCET computations and exact time measurement. We do not take into account system interrupts, and such models are possible only in real-time systems. We selected SCJ as a reference system due to its modular level structure which allows to exclude systems which are too complex for verification. We modeled all features of SCJ Level 1 which are relevant to resource sharing and exclusive access. Although, full integration with

3.2. Verification

33

SCJ environment still remains to do. The locking mechanism of shared resources is the same in standard Java and in SCJ, so we believe, the integration of our tool with SCJ could be done quite easily and will not exceed changes for initial preparation of a program to verification, e.g. instead of Thread the tool for SCJ should use MissionSchedulable.

3.2

Verification

Verification concentrates on two aspects: execution time of statements and locks acquired by threads. The goal is to find places where threads may have conflicting access to shared resources. It is done by modeling the program execution flow based on timing information about statements in the program. Java programs with timing annotations are translated to timed automata (TA) for verification. The obtained system of timed automata is then model checked. The output of model checking contains the result if there are resource sharing conflicts and a trace to the conflicting configuration, if conflicts are found. While writing this thesis, we first have developed a prototype in OCaml. There we designed the architecture of the tool, wrote the code which interacts with the physical world such as parsing or printing to file, experimented with translation rules. On the second stage we formalized the translation part and verified it using the Isabelle proof assistant. The verified Isabelle code was extracted to executable OCaml code and replaced the informal code. The main stages of translation are (fig. 3.3): parse and produce the abstract syntax tree (AST), extract the executable parts from class declarations and prepare to translation, check syntactic correctness of a program for verification, translate to TA, print the generated automata using the Uppaal -compatible syntax. The architecture is slightly different for the informal prototype (fig. 3.3) and the formally verified version (fig. 3.4). In the informal prototype syntax checks are done separately, they are not integrated into the whole translation process. In the formally verified tool both syntax checks and translation are extracted from the verified Isabelle code, and syntax checks run obligatory before the translation. They do not change the AST but may abort the translation if any of them fails. syntax checks parsing Java

preprocessing AST

AST

translation

print TA

Uppaal

Figure 3.3: Program transformation process implemented in the informal prototype. Red — unverified transformation. syntax checks

Java→AJ→TA

generate code parsing Java

preprocessing AST

syntax checks AST

AST

translation

print TA

Uppaal

Figure 3.4: Program transformation process implemented in the formally verified tool. Red — unverified transformation, green — formally verified transformation, blue — formal Isabelle development.

34

Chapter 3: Abstracting from Java to Timed Automata

It ensures that the output of the verified code would always contain a correct model. Both formal and informal parts use the same code for parsing, preprocessing and printing. The implementation details of particular stages are discussed in the following sections.

3.2.1

Parser

The described timing annotations can annotate any statement. Unfortunately, the existing Java annotation mechanism that is a part of the Java standard [5] does not allow to annotate statements. Recently, a new proposal to Java has been accepted allowing to annotate Java types [32]. The earlier versions of this proposal also allow to annotate conditions of if and while statements. However, it is still too narrow set, hence we developed a Java parser recognizing annotated statements also. The parser has been developed with OcamlYACC [58], and it produces a valid OCaml expression as an AST. The parser supports main constructions of the Java language: class declarations, including nested declarations, method declarations, all kinds of statements, annotated statements, try...catch constructs. It does not support generics and arrays except array declarations. Support of array declarations has been added because each Java program contains the main method which accepts an array of strings as an argument. However, the full support of arrays was not planned since it is a secondary feature for discovering resource sharing conflicts. The same parser is used for both the prototype and the formally verified tool.

3.2.2

Preprocessing

Java programs contain class declarations which contain method declarations. Method bodies contain executable code which is to be translated. Preprocessing extracts code from class declarations for the further translation procedure. The translation is correct only for the statements with defined semantics. We did not define the semantics for dynamic Java features like exceptions or using the keyword this in the argument of a synchronized statement since those statements cannot be properly translated. Although in the presence of the keyword this the further translation does not make sense, translation of exception bodies would be desirable. Moreover, Java requires to wrap any sleep statement with a try...catch block (or change the declaration of the method), so the presence of try...catch is inevitable in a large part of programs. We assume that a correct program execution does not throw any exceptions, and therefore the code in catch clauses is never executed. In order to generate a correct model for the code, try...catch blocks are replaced with the body of try, and the catch block is thrown away. Under the previous assumption such a transformation does not change the behavior of the program.

3.2.3

Syntax checks

The annotated Java programs must satisfy several conditions in order to be correctly translated to TA. Programs not satisfying some of these conditions cannot be translated to TA at all or the resulting system of timed automata does not model correctly the Java program. • The whole syntax tree must be covered with annotations. Any leaf of the syntax tree must have an annotation in some ancestor. If some part of the program does not have timing information, its behavior is undefined, and the timing analysis does not make sense.

3.2. Verification

35

• Arguments of synchronized statements must be object names, not expressions. In the TA model objects are identified by their names. Therefore, a connection between a name and an object must be known at compilation time. this object is not allowed also since the object to be locked cannot be defined at compilation time. • Shared objects cannot have aliases. It follows from the same fact that names are identifiers for objects in the TA model. An object must be tied to its name during the execution of the program, and different names are considered as different objects. • Nested synchronized statements on the same object are not allowed. This limitation was done in the prototype for simplicity and also because it did not add a lot to the semantics of synchronized statements. However, Java allows reentrant locks on the same object, and therefore in the formally verified tool this restriction has been eliminated. • Synchronized blocks inside annotated statements are not allowed. Annotated statements in the TA model are treated as single atomic statements. However, the synchronized statements have their own semantics which will be lost if a synchronized statement is put inside an annotated statement, and the obtained TA model will be incorrect. • Nested annotated statements or sleep statements inside annotated statements are not allowed. The outermost annotated statement are translated as atomic statement, so the timing information of the inner annotated and sleep statements would be lost. The semantics of sleep statements is different from the semantics of annotated statements, hence the translated TA model would be incorrect. • The main method must be in the first declared class of the program. It has been done for translation simplicity but later can be easily generalized. • Program threads must be created with the constructor Thread(Runnable r, String name). Thread name and thread logic contained in the Runnable object are necessary for the translation. Other thread parameters one can specify on thread creation do not play a role in the translation, and therefore their support was not done. Of course, other constructor support can be added quite easily. If all the listed properties are true for a Java program, it can be correctly translated to timed automata and then model checked. All these requirements must hold for the formally verified version also except the one where the contrary is explicitly indicated. In the formally verified tool the code for syntax checks is generated together with the translation procedure, and the syntax checks are executed before the translation. In case the syntax checks fail, the error is reported, and the translation is aborted.

3.2.4

Translation

The translation details are described later in the section 4.5.

3.2.5

Model Checking

The property to model check must ensure that there are no resource sharing conflicts on any execution path. Information about conflicts in TA model is stored in a two-dimensional array

36

Chapter 3: Abstracting from Java to Timed Automata

conflicts[obj][aut] which has an element for each automaton and each shared object. If the element conflicts[obj][aut] is set to true, the automaton with index aut wants to acquire a lock of the object with index obj but it failed because this object has already been locked by another thread. Therefore, the property to verify should check the values of conflicts array element for each shared object and each automaton for every node on every path. Formally expressed in TCTL logic (see section 2.4.1) the property looks like ∀  ∀(i : int[0, numObjs]) ∀(j : int[0, numAuts]) ¬conf licts[i][j]. Here numObjs, numAuts is the total number of shared objects and thread automata respectively. The quantified variables i and j range over a finite domain, so they can be replaced by a finite conjunction, and the body of the formula can be expressed by a formula without quantifiers. If this property is true, then no resource sharing conflicts are possible for any execution path of the program. Otherwise, Uppaal can provide a trace which leads to the property violation. Based on the trace the possibly conflicting places in the program can be analyzed and, probably, changed by the developer. Our abstraction extends the set of possible execution paths since we abstract from some information like expressions in condition and loop statements. It is the price for making an abstraction. At the same time, all execution paths which are possible for Java programs are also valid in the model. This statement has been proved formally, the proofs are described in section 5. Due to this fact, if an initial Java program contains any resource sharing conflicts, they will be found by model checking. However, model checking can also output false warnings which do not correspond to a resource sharing conflict because the execution path found by model checking is valid in the model but it is not valid in the initial Java program.

3.2.6

Discussion

Overall, the full verified tool supports some more essential features of Java and the execution model which were not modeled in the prototype tool. In the prototype only the annotated statements take time for execution, but in the model produced by the verified tool each action which makes automata configuration change must take time, including scheduling. It is closer to reality since any change of the runtime state takes some amount of time (possibly, small). Another feature that was added to the verified tool is reentrant locking. Java allows the same thread to lock one object several times. The number of locks must be equal to the number of lock releases. The timed automata generated by the verified tool implement mechanisms similar to counting semaphores (see section 2.3.1). The same thread is authorized to acquire the lock on the object it has already acquired. Then each acquired lock must be released separately. One more feature added on the formal stage is scheduling that recognizes the threads which have terminated their execution. If the automaton which has terminated its execution is scheduled again, the system will be deadlocked since the scheduled automaton cannot move from its final node. In the prototype a workaround was implemented: a looping edge without conditions was added to the final node of each automaton. If an automaton in a final node was scheduled, this edge has been taken, and the control returned back to the scheduler without any changes to the whole configuration since this action did not take time. However, in case scheduling takes time, such a workaround does not work any more since every fake scheduling increases the time, although the whole configuration should not change. This problem was solved by adding flags for each automaton

3.3. Case Study

37

indicating if this automaton has finished its execution or not. The solution is explained more in detail in the section 4.5.2.

3.3

Case Study

A case study for justifying the approach has been developed. We present here the formally verified variant which can give different results than the prototype due to difference in the models. The example was developed to show the main verification principles. The program under verification is a primitive producer-consumer buffer with two threads: producer and consumer. The producer generates a value and puts it into the shared buffer. The consumer takes this value from the buffer. The buffer can contain a single value and does not check whether its value has changed since the last access of the consumer or not. It is possible that producer puts a new value when the old one has not been taken or when consumer takes the same value twice. Implementing entirely correct producer-consumer buffer was not our top-priority goal, thus we sacrificed the correct processing of all cases to simplicity and clearness of the program.

public void run (){ int value , i ; // @ 1 @ // i =0; while (i