Improving Source Code Locality - CiteSeerX

Improving Source Code Locality Jonathan Sillito Ph.D. Thesis proposal December 9, 2004

Abstract Developers working with source code have a strong sense of locality defined by both the modularity of the system and the development environment. For a particular change task some subset of the system will be relevant, and when that subset is not a localized subset, it can be difficult for the developer to identify the relevant entities and to understand how they fit together. This is due to an effect sometimes called cognitive tunnel vision. We propose to study and formalize this notion of locality for software development and produce an interactive visualization that supports a more flexible locality based on navigation activity and multiple relationship types. This should make relevant entities more localized and easier to identify and work with.

1

Contents 1 Introduction

3

2 Problem 2.1 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Information Locality . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Tunnel Vision . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 5 7

3 Proposed Solution 3.1 Interactive Visualization . . . . . . . . . . . . . . . . . . . . . . 3.2 Neighbourhood Model . . . . . . . . . . . . . . . . . . . . . . . 3.3 Tableau Eclipse Plugin . . . . . . . . . . . . . . . . . . . . . .

8 9 11 12

4 Sample Scenario 14 4.1 First Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Later in the Task . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Related Work 5.1 Degree of Interest Based Views . . . . . . . . 5.2 Code Browsing Tools . . . . . . . . . . . . . 5.3 Advanced Source Code Editing Environments 5.4 Software Visualization . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

21 21 22 23 23

6 Evaluation 24 6.1 First User Study . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.2 Second User Study . . . . . . . . . . . . . . . . . . . . . . . . 25 7 Time Line

26

8 Contributions

27

2

1

Introduction

Developers working with source code have a strong sense of entities being local or not. In our view, this locality is defined by the development environment. A group of items will be considered more or less local, relative to a particular view, depending on how much information about those items can be seen simultaneously, how easily a user can switch between the items, and how much set up and maintenance effort the view requires. For tasks that are based on entities that are not local, there is additional navigation overhead, and it is difficult to see and understand relationships and patterns. Remembering and (mentally) integrating various pieces of important information becomes more difficult. The consequence is that it is more difficult to identify the relevant code entities and to understand how they fit together. Our proposed solution to this problem has two parts. The first is an interactive visualization of source code that is flexible, so that an arbitrary subset of the source code can be localized. The second is a model of the neighbourhoods in the system. This model will be used to determine which subset of the system to make the most prominent in the visualization at any point in time. The proposed neighbourhood model is based on navigation history and multiple types of relationships between source code entities. In particular, the thesis of the research is that: A developer’s navigation actions and relationship information derived from source code can be used to produce a view that localizes source code dynamically. This localized view will make relevant entities easier to identify and work with. The evaluation will focus on first showing that we can better localize relevant information for a range of tasks, and second that this locality can help a developer when performing change tasks on large systems. In particular we will evaluate the tool on developer’s that are experienced at software construction, but are newcomers to the source code targeted by the task.

2

Problem

Developers need wide access to many parts of the program [26]. Rather than reading the program line by line, they visit different parts to make comparisons, establish dependencies and so on; and they tend to do this in repeated cycles that cover functionally relevant entities. When those entities 3

are not well localized they are more difficult to identify, understand and work with. In this proposal we use modularity and locality as distinct terms. The modularity of a group of entities can be judged based on the decomposition of the system into nested components. Entities that are close together, in terms of the decomposition tree would be considered well modularized. For example, in a Java program, a group of entities may be considered modularized if they are in the same package. Locality, on the other hand, is defined by the interaction with and presentation of those entities in the development environment interface. We believe that the locality of a group of entities, in a particular view, depends on how much information about those items can be seen simultaneously, how easily a user can switch between the items, and how much set up and maintenance effort the view requires. There is an interaction between the modularity and the locality of a group of entities, in fact the primary presentation of source code in standard development environments (a package explorer and a file based source code editor, for example) closely follows its decomposition. So, generally, entities that are well localized by the environment are exactly those that are well modularized by the decomposition. Work in the area of software modularity is reviewed in Section 2.1. A discussion about the locality provided by development environments can be found in Section 2.2. The challenges encountered by software developers when working with poorly localized entities are discussed further in Section 2.3.

2.1

Modularity

The issue of modularity is central to a large body of software engineering research. A careful modularization of a system is considered crucial for managing the complexity of software systems. Software design and development follow certain principles such as maximizing cohesion of modules, and minimizing coupling between modules [22], with the aim of producing a system which can be easily understood in pieces and extended with changes limited to just one module. In an object-oriented language like Java this involves decomposing the system into packages and classes based on a certain set of concerns. However it is often the case that the entities relevant to a given change task are scattered across multiple modules [16]. This is not necessarily an indication that the system is not well modularized. It may simply be a 4

B

A

C

D

Figure 1: A screen shot of the Eclipse Development Environment, with several parts labelled: (A) package explorer, (B) source code editor with tabs, (C) outline view, and (D) call hierarchy browser. consequence of the fact that only one decomposition of a system is possible and so some concerns are clearly modularized at the expense of others being left scattered [32]. When working on a task based on entities that are not well modularized, it can be difficult for a developer to identify all of the relevant entities and to understand how they interact. In such cases, incorrect or inefficient modifications [19] or modifications not respecting the system’s design [23] become more likely.

2.2

Information Locality

Here we discuss the issue of locality as related to the information presentation and interaction provided by development environments. We will consider different sets of entities that differ in how well they are modularized by the systems decomposition. For each of these sets we will talk about how the developer might work with them and then summarize with a discussion of what constitutes locality. We use the Eclipse Java Development Environment1 as representative of today’s state of the art. 1

http://www.eclipse.org

5

First consider two or three short methods next to each other in the same source file; the ultimate in modularized entities. With the editor scrolled appropriately both methods can be simultaneously seen in full. It is possible to look at one while working on the other. Navigation between them is trivial, in fact the cursor can be directly placed at any point in either method, and in the process there is no context switch. Something similar can be accomplished for a few methods in two or three different source files, using multiple editors, carefully arranged. Likely the effort required to set up this view will act as a disincentive and will contribute to the entities feeling less local than if the view were some how natural or automatic. Scaling this up a bit, consider a few methods that are in the same source file, but not adjacent. Now not all information about the given entities can be seen simultaneously, however Eclipse provides an outline view (labelled C in Figure 1) that provides signature information about each of these that will be visible. This outline view will help a developer navigate between the entities, though less precisely than in the previous case. In both cases these will be noise, in that the outline view, by default, shows all members in the source file. Now consider a number of entities that are located in a number of source files that are contained in the same package. Eclipse provides a package explorer (labelled A in Figure 1), which in some cases will, with a bit of set up, allow the the names of the files containing those entities to be simultaneously visible (though generally not the entities themselves) in the tree. Moving between entities is a multi-stage process generally: select file in package explorer, select entity in outline view (or scroll), then move the cursor to the correct place in the entity. Something similar can be said for a group of entities that are located in the same inheritance hierarchy, when the hierarchy explorer is used instead of the package explorer. For a group of entities that are not contained in the same package, the package explorer becomes more awkward as scrolling is required which affects how much information can be seen and how easy it is to move between the entities. In this case, after an initial set up the editor tabs may be used, at least to help with navigation. For editor tabs to remain useful this view will need to be maintained carefully since it does not scale beyond a few source files. An alternative to using tabs is to use the back-and-forward navigation. Like tabs, this does not provide information about the entities that can be seen side by side, however it does help with moving between entities, though this must be strictly linear and does not scale beyond just a few entities. 6

No context is provided to get back to entities that are lost when a branch is taken. In summary we have shown that entities that are well modularized are generally easier to interact with in the development environment. The reasons for this is that they are made more local as defined by three measures: first, the amount of information about the entities that can be seen simultaneously; second, the ease with which a developer can move between arbitrary entities in the group; and third, the effort required to set up and maintain the view (i.e. how natural or automatic it is).

2.3

Tunnel Vision

In Section 2.1 we discussed software modularity, including the observation that working with tasks based on information that is not well modularized is difficult for developers. In Section 2.2 we discussed information locality and showed a correspondence between the modularity of a system and the locality provided by development environment. Here we discuss several cognitive considerations that we believe help explain the difficulties developers face. Green et al. propose a cognitive dimensions framework, which captures some high-level aspects of usability for programming environments [9]. Two of the dimensions in this framework relate to the notion of locality that we have just described, and to the problem of tunnel vision generally: visibility and hidden dependencies. Visibility captures how much cognitive work is required to make entities accessible. Hidden dependencies occur when one part is affected by another in a way that is not made explicit, which can lead to errors. Program comprehension falls into the category of complex problem solving and Albers points out that, for complex problem solving, “understanding a situation requires mentally integrating many pieces of information. It requires understanding both that the information exists and how it is interrelated to the situational context and other pieces of information” [1]. These problem solving activities are made more difficult if the information is not localized. When information is presented in pieces, a user’s need to navigate across these pieces is increased, and the user’s ability to see them in context is decreased [35]. In studies of people’s ability to perform mental simulation, including understanding software systems, working memory has been found to be a key limitation [18, pages 52–53]. People appear to be limited to a simulation of approximately six parts, and possibly fewer if there are interactions 7

between these parts. When an environment, such as a development environment, presents the parts in separate pieces, pressure is placed on the user’s working memory and the user’s ability to develop a mental model that captures all of the entities essential to a task, is affected. It becomes more difficult to see and understand relationships and patterns and to integrate various pieces of important information. This phenomenon is sometimes called “cognitive tunnel vision”, see [33] and [29] for example. In summary, the information locality provided by development environments is closely tied to the modularization of the system. Then, since not all concerns can be cleanly modularized by any one decomposition, many change tasks will be based on entities that are not well localized. In these situations, this lack of locality makes it difficult for a developer to: identify the entities that are relevant, understand how those entities interact, and work with those entities.

3

Proposed Solution

A significant body of existing work focuses on improving the modularity of software systems. While all of this work has value, it also has limitations as discussed in Section 2.1. So, we propose, instead to focus on locality, that is to focus on the information interaction and presentation provided by the development environment. More precisely our solution has two parts. The first is an interactive visualization of source code (see Section 3.1) that is more flexible, so that an arbitrary subset of the source code can be localized. The second is a model of the neighbourhoods in the system (see Section 3.2) for determining which subset of the system to make the most prominent in the visualization at any point in time, or in other words, what to localize. The neighbourhood model allows for entities that are not modularized (i.e. are not siblings or cousins relative to the tree based decomposition of the system) to be considered neighbours, by considering navigation activity and multiple relationship types. The visualization will present these neighbourhoods in the context of a view of the entire system. This use of an overview leverages a developer’s spacial memory and provides access to information over and above the modelled neighbourhood, recognizing that the model will not be able to perfectly capture all of the relevant information. We believe that working with source code in this way will give better locality to a larger percentage of the relevant entities for a wide range of

8

A

A

B

C

D

E B

C

D

F G

E

F

G

H

H

Figure 2: On the left is a standard node link representation of a tree. On the right is that same tree displayed as a treemap. change tasks; where locality is based on three measures: the amount of information that can be seen simultaneously, how easy it is to navigate between those entities and how much effort is invested in setting up the view. We further believe that this improved locality will help a developer avoid the problem of tunnel vision; making it easier to recognize what information is relevant and better able to piece that information together.

3.1

Interactive Visualization

One of the keys to our proposed solution is an is an interactive visualization of source code that is flexible in the subset of the system that can be localized. This visualization should show as much information as possible about this subset, support navigating between these entities, and require little or no effort to setup and maintain. The details given here represent our current thoughts on how best to do this, however some of the particulars may change. The visualization revolves around a treemap [14, 34] based interactive spacial layout of an entire software system, geared toward large displays. Treemaps are a technique for space-constrained visualization of hierarchies. For a simple example see Figure 2. The tree on the left describes the hierarchy and the nested rectangles on the right show how this hierarchy could be displayed as a treemap. We will base our layout on the decomposition of the system. So in an object-oriented language such as Java, methods, fields and inner classes will be nested in classes; and classes will be nested in packages. The reason for selecting a treemap is that it is a space filling approach, that is the amount of space used to present the nodes (and in particular the

9

class DiskDatabase extends BaseDatabase


void remount(File newDir) throws Exception { if (_log.isDebugEnabled()) _log.debug("remounting from " + ... _tableDirectory = newDir; super.remount(newDir); }

void remount(File newDir) throws Exception {


File getTableDirectory() { void defrag() throws Exception {

File getTableDirectory() { return _tableDirectory; } void defrag() throws Exception { checkpoint(); for (Iterator tables = getTables(); table... Table table = (Table)tables.next(); ...

Figure 3: The same class shown at three different zoom levels. Demonstrates one possible approach to semantically zooming source code. leaf nodes) is maximized and the hierarchal structure is simply implied by the nesting. Each parent node displayed will be shown as a label above a rectangle space for its children. Leaf nodes (primarily methods and fields) displayed will show as much source code as will fit in the allocated rectangle. This approach to the display of source code can be described as a kind of semantic, as opposed to geometric or fisheye, zoom mechanism [31]. Figure 3 illustrates one way this may work. Also this can be combined with, elision techniques [10], with some entities not shown at all. We will use an ordered treemap layout algorithm [3] to help give entities a more stable relative position, making it easier for the developer to navigate between entities. We further believe that this will help a developer see and remember entities and relationships between entities. The size of the rectangles in the visualization, or in other words the space allocated to a particular entity will be informed by the neighbourhood model described in Section 3.2. The basic idea is to base space allocation on the developer’s navigation activity and multiple relationship types between entities, with the goal of emphasizing, as far as possible, the entities most relevant to the developer at the moment. In addition to visualizing source code information, the visualization will attempt to communicate information about the underlying model. That is techniques will be used to help explain the composition of a neighbourhood. A simple example of this is highlighting to show the navigation history captured by the model, which should help the developer better understand the exploration and avoid unnecessary re-explorations [11].

10

A

D

C

B

J L I

E

G

H K

F

M

Figure 4: A graph representing source code entities and relationships between those entities. The solid lines represent relationships such as calls, references and overrides. The dotted lines represent navigation relationships. So in this case the developer moved from node E to node H and from node H to node K.

3.2

Neighbourhood Model

Here we propose a simple neighbourhood model that is richer, more dynamic than one based strictly on the hierarchal decomposition of the system. Our model is based on the developer’s navigation activity and on a view of a program as a graph with typed edges. Nodes in the graph are source code entities, classes, methods and fields, for example. The edges in the graph are relationships between those entities, references, calls and overrides, for example. We also consider the developer action of navigating from one node to another as introducing a relationships between those nodes, with that relationship decaying over time. The goal of this is to capture conceptual relationships discovered by the developer, but not explicit in the graph of the system. We also expect that developer’s navigation activity will help in various situations to decide which entities to emphasize and which to deemphasize. Based on this graph structure, we can model the neighbourhood of the node that is currently selected or focused on, in the context of the developer’s navigation history. This is based on a simple distance function, with those nodes that are closest (as defined by that function) to the selected node forming the neighbourhood. So, entities can be more or less in the neighbourhood of a given entity, and this will be reflected in the space allocated

11

in the visualization. The details of this distance function still need to be worked out, however for the sake of concreteness we show how this might work relative to the graph illustrated in Figure 4. Assume that the node K is the currently selected node, with node H being the previously selected node, and node E selected before that. The dotted lines in the diagram capture relationships introduced by this navigation. Assume also that the distance function considers the distance between two entities connected by a navigational relationship 0.4, and for all other relationships 1.0. Then the shortest distances between K and each node in the graph are shown in the following table. Distance 0 0.4 0.8 1.0 1.4 1.8 2.0 2.4

Node(s) K H E I, J, M D, G A, F B, L C

So, roughly speaking, the neighbourhood of node K, at this point in time, might be considered, to include nodes H, E, I, J, M , D and G. These nodes then are the nodes most likely to be presented as local by the interface described in the previous section, with the hope that these nodes will represent a meaningful and helpful context to make available to a developer focused on K. Decomposing a system into modules that are internally cohesive and loosely coupled with other modules has been found to make the system easier to change and understand. The goal of our neighbourhood model will be to produce groups of conceptually cohesive entities that are possibly scattered across the decomposition. Such groups of related entities can be seen as cooperatively implementing some slice of the system’s functionality. To understand or change that implementation often requires understanding and working with all, or many, of the entities in such a grouping.

3.3

Tableau Eclipse Plugin

To develop and empirically evaluate our proposed solution we will implement an Eclipse plugin, called Tableau, that enhances the Eclipse development environment. At this stage, we have completed a proof of concept 12

Figure 5: A screen shot of a prototype version of the Tableau plugin. implementation of our visualization. A screen shot of this is in Figure 5. The layout featured in the prototype is based on a standard ordered treemap algorith (using a pivot-by-size scheme). The information density of this will need to be improved for us to reach our information locality goals. One simple modification we propose is to modify the layout to prefer wider rectangles to tall thin rectangles. Also we intend to fill the rectangles with additional source entities and source code (for methods and fields) as space permits. In addition to this prototype, we propose to create two versions of the plugin. The first version, like the prototype, will feature an interactive source code visualization capable of being used in place of the standard package explorer. In this implementation, our display will be based on the Eclipse JDT Java model hierarchy: Workspace, Project, Package Root, Package Fragment, Class, etc. Our goal is to be able to support workspaces with, at least 100,000 lines of source code. The visualization and the screen real estate will determine how much of that information can be localized, and the task will determine how much of that information should be localized. We hope to be able to more or less localize 20 to 30 entities. As will be described in Section 6, we propose to perform a small user

13

study using this first version. We will then implement a second version based on that experience. We expect that the second version of Tableau will add features towards making this a more complete Focus+Context [25] visualization. To do this we will need at least to make the source code displayed in the rectangles sufficiently detailed and support editing. For some tasks then, this version of the visualization should be capable of replacing both the package explorer and the source code editor. A feature of both versions will be the use of navigation information. When a developer navigates from one source code entity to another, a navigational relationship is introduced between those entities. To do this our implementation will monitor Eclipse events such as the following of hyperlinks, use of back and forward arrows, contextual searches, direct navigation in Tableau or other views. The model we propose is based on various types of relationships between entities. In addition to the navigational relationships we will consider relationships that are available via the eclipse JDT. So for example, when a developer is focused on method A, our plugin could lazily determine which entities are called by or call A, which fields are referenced by A, which methods override or are overridden by A. For each of the methods and fields returned, this same operation could be performed recursively (though only for a small number of recursive steps). This then gives us the ability to describe the neighbourhood of the method A based on a distance function that is configured with a distance for each relationship type. The visualization then will perform space allocation in a way that allocates more space for A and the entities that are closest to A. The consequence is that more information will be displayed about those nearest neighbours, with information about more distant neighbours being semantically zoomed smaller, or elided altogether.

4

Sample Scenario

Here we describe a short interaction with Tableau. This discussion follows a developer performing a few steps of a change task to the Axion system2 , which is an open source relational database written in Java. The task is to implement sub-selects (i.e. selects nested in other commands), and the developer begins by looking at how to add sub-select support to insert commands. During this discussion we will argue that the interaction and visualiza2

http://axion.tigris.org

14

tion provided by Tableau solve the problem we have posed. We will show that the visualization localizes relevant information making it easier to explore and build an understanding of it. Further we discuss our expectation that putting the information in the context of the entire system gives the developer useful contextual information and will allow the interaction to leverage spacial memory. The figures used to illustrate this task are very rough mock ups of the Tableau plugin. We expect to evaluate our Tableau on the 4800x2800 display in the Imager lab which will allow a display of much more information than is shown in the mock ups used here. Measuring the sizes based on number of characters across and lines down, the mock up supports 140x70 while Tableau running on the large display may support 570x280. Although some space will be consumed by boxes, the surrounding context, and other visual elements, it 50% of the space can be used to show source code, and the average method is 10 lines long then more than 40 methods could be shown simultaneously on the large display.

4.1

First Steps

Using a search facility, the developer finds the InsertCommand class, and uses this as starting point for exploring the code, with a goal of understanding how instances of this class are created. In Figure 6 the developer has selected the first constructor of that class. This constructor calls the third constructor and is called by the second constructor. These second and third constructors, then, are the closest neighbours to the selected entity. The class TableIndentifier is the declared type of the constructor’s argument so it is also a close neighbour. Other neighbours include entities that are close to these closest neighbours, including the methods addColumn and addValue, and the fields cols, vals and table in the InsertCommand class; a number of methods in AxionSqlParser, TestInsertCommand, TestDatabase and TestDiskDatabase; and the types ColumnIdentifier and Selectable. In Figure 7 the developer has moved the focus to the third constructor. Several entities are less closely related to this constructor than the first: the methods in TestInsertCommand for example. Also several entities are more closely related: three fields in the InsertCommand class, and the methods in AxionSqlParser, TestDatabase and TestDiskDatabase. These differences are reflected in the space allocated to these entities. Notice that the set of entities localized by these two views are the constructors, the methods and fields referenced by the constructors, and meth15

Figure 6: Tableau mock up with InsertCommand’s first constructor selected.

16

Figure 7: Tableau mock up with InsertCommand’s third constructor selected.

17

ods that directly construct instances of this class. Each of these are relatively visible and easy to navigate between which we believe will aid a developer in developing an understanding of the how and where instances are created.

4.2

Later in the Task

Later in the task the developer is determining how to modify InsertCommand’s execute methods to handle the sub-select case. In Figure 8 the developer has selected the class’s executeUpdate method. The navigation history (going backward in time) is: 1. InsertCommand.executeUpdate() (currently selected), 2. InsertCommand.execute(), 3. SelectCommand.executeQuery(), 4. SelectCommand.execute(), 5. AxionCommand.execute(), and 6. InsertCommand.InsertCommand() (an added fourth constructor). Each of these entities are more or less in the neighbourhood of the current entity, with InsertCommand.execute() being the closest neighbour. Other neighbours include: the resolve method of the InsertCommand class, the executeUpdate method in AxionStatement class, several methods in the TestDatabase and TestDiskDatabase classes, the evaluate method in the Selectable interface, among others. In Figure 9, the developer has moved the focus to the resolve method, which is called from the executeUpdate method. A few items are more closely related to the resolve method than the executeUpdate method. This includes the resolveSelectable method in the Database interface, and the resolveSelectable method in the BaseAxionCommand class. Again, these differences are reflected in the space allocated to these entities. The result is a view on the system which emphasizes the various execute and resolve methods. Much of the relevant source code can be seen simultaneously and easily navigated between. We believe that this view will help a developer understand how these methods work together and therefore what modifications need to be made to handle the sub-select case. Notice that in the two transitions we have shown (from Figure 6 to Figure 7, and from Figure 8 to Figure 9) the new neighbourhood is a small 18

Figure 8: Tableau mock up with InsertCommand’s executeUpdate method selected.

19

Figure 9: Tableau mock up with InsertCommand’s resolve method selected.

20

modification of the previous neighbourhood. One reason for this is that the neighbourhood model is based on relationships and we believe that most navigation is between related entities. When a developer navigates between entities that are not related via a relationship captured by the model, the modification to the neighbourhood will be greater, however this will be tempered by the introduction of a navigational relationship. In other words the newly selected entity will be related to the entities in the previous neighbourhood. We expect the visualization, for the most part, to smoothly move from neighbourhood to neighbourhood, allowing the developer’s spacial memory to help with remembering “where” entities are located. Tracking will be further aided by highlighting or other mechanisms aimed at emphasizing entities and communicating neighbourhood composition.

5

Related Work

Our proposed research builds on, or relates to, work in a number of different areas, including: degree of interest based views, code browsing tools, advanced source code editing environments and software visualization. We highlight some of this related work in the following subsections.

5.1

Degree of Interest Based Views

In [8] Furnas presents generalized fisheye views, and shows how they can be applied to displaying source code. The idea of a fisheye view is to present detailed information of a local “neighbourhood”, while presenting less detail (perhaps just some major landmarks) about things that are farther away. The neighbourhood model comes from a tree based degree of interest function. Our proposed neighbourhood model can be seen as an extension to this, adding navigation information and multiple relationships. Rather than presenting a tree of information, we aim to localize information scattered across a tree. A degree-of-interest model for viewing software has been used more recently in Mylar [15]. Mylar’s model is based on a developer’s navigation and editing activity. Using highlighting and filtering, structural views in the IDE such as the package explorer are updated to emphasize the most interesting information. The goal of Mylar is to minimize the amount of time a developer spends navigating the source code. In other work on capturing a developer’s interest, context was inferred by analyzing the structural navigation paths [27]. This context was captured 21

as a concern graph (there is more about concern graphs in Section 5.2). Our proposed neighbourhood model differs in that rather than defining global degree of interest, it defines a neighbourhood for a given node. Further, in our model both relationships and navigation paths are included, though much of the information used by Mylar is not. Our choice of model is based on the desire to help developer’s discover information (i.e. identify relevant information) and to localize a cohesive set of entities. Perhaps more importantly, we will focus on developing a novel visualization that localizes information based on the model.

5.2

Code Browsing Tools

Using FEAT [28] a developer can add interesting elements to a concern graph. A concern graph describes concerns in source code in terms of relationships between program elements. The created concern graph is displayed as a tree which includes signature and some relationship information. The tree serves as a navigation aid, allowing a developer to easily move between entities in the concern graph. JQuery [12] allows a developer to explore source code using contextual queries. The query path is saved as a tree, which can be iteratively extended using additional queries stemming from nodes in the tree. This tree facilitates recalling entities returned as query results, puts them in a meaningful context and facilitates navigation between those entities. Both FEAT and JQuery can improve the locality of a group of relevant entities, however these views are built manually. We propose to dynamically build a view based on our neighbourhood model, with less effort required by the developer. It may be the case that the views we build, will less precisely capture the information of interest to a developer, however we hope that it will be more fluid and able to adjust smoothing as the task progresses. We also believe that the visualization we are proposing will be able to more fully localize the information of interest. Lemma [20] is a set of tools for the visual representation of code abstraction and code navigation. Lemma supports navigating forward and backward through the code along various paths. As one example, it is possible to navigate from a variable to the first use of that variable and then to step between subsequence uses in the code. While some entities can be accessed more easily, the entities can not be seen simultaneously.

22

5.3

Advanced Source Code Editing Environments

Stellation [5] allows a developer to work with virtual source files, which are editor windows that contain pieces from different source files. The idea is to provide a localized view of a information that is not necessarily modularized. In the context of Stellation, no work has been done on how to build these views. While in our proposed approach the entities to localize will be automatically selected based on the model described in Section 3.2. Using virtual source files, Decal [13] provides two different types of views on software systems: one is the standard class view, and the other is a module view which crosscuts the class view. Having these two views, both of which can be edited, likely improves the locality of relevant information for particular types of tasks. Traits [4] are an object-oriented programming language construct that allow groups of methods to be named and reused in an inheritance hierarchy. Tool support for working with traits allows a programmer to choose between a structured view, a flattened view in which all the internal structure is elided, and partially structured point in between these extremes. For a narrow range of tasks this can lead to improved locality for the relevant entities. The Decal and Traits tools support a fixed set of views on the underlying software systems. In our approach we aim to be more flexible by basing the views on multiple relationships and the developer’s navigation activity. As a result, the information localized will change as the developer’s task progresses. MView [24] is a source-code editor, implemented as an Eclipse plugin, that provides dynamic crosscutting views of source code. These views are source code fragments presented together in a kind of source file so that the fragments can be edited together. Our proposed work also differs from Stellation, Decal, Traits and MView in that we will provide a visualization that goes beyond the scroll based source file metaphor. In particular, our treemap based approach leaves the entities in context and makes as much of the source code visible at one time. This is as opposed to moving the entities out of context to one flat source file, and allowing access via a scrolling or overview approach.

5.4

Software Visualization

Tools such as Rigi [21] and SHriMP [30] visualize the structure of software systems. SNiFF+ [17] performs static analysis to identify references to sym-

23

bols and visualize inter-module relationships. We have different goals than these projects, and so our visualization differs in a number of ways. First we focus on presenting source code in flexible pieces, and so have decided to use a treemap rather than a nodelink visualization. And second, rather than attempting to show an overview of the system’s structure, we aim to localize information that is scattered across that structure. The Jaba development environment [6, 7] is an example of a source code editing environment that uses a fisheye based visualization. This environment increases the locality of entities contained in the same source file. However, again, we aim to increase the locality of entities that are spread across the decomposition. The SeeSys [2] system is for visualizing statistics associated with code. SeeSys uses a treemap to present these statistics on an overview of the system. On the other hand we focus on a scattered subset of the system. Also, we aim to allow the developer to work with the source code using our tool.

6

Evaluation

Tableau realizes our ideas about locality as an Eclipse plugin, so we propose to evaluate our claims by evaluating Tableau. In particular we will investigate how well the tool localizes relevant information, and helps the developer identify and work with that information. The model we described in Section 3.2 needs to be refined. Which relationships are helpful? What distance should be associated with each relationship type? How quickly should navigation relationships decay? As a first step toward this we will perform an analysis study, based on user navigation data from a previous experiment. Using the data we will simulate a developer using our tool for various settings in our our model. We will then make some initial predictions about which settings will lead to better locality for relevant information. In addition, we propose to perform two user studies. The first will follow up on the analysis study and help prepare for the final user study. The primary question the first user study will answer is how well localized is the relevant information? The second user study will attempt to measure the effect this locality has on developers performing change tasks based on entities that are not well modularized. For both user studies, the subjects will be software developers with some

24

minimum level of experience at software construction. The tasks the subjects are asked to perform will target software systems that the subjects have no experience with. Subjects will use a high resolution display (4800x2800) during the study. There are more details about each study below.

6.1

First User Study

This study will investigate how well the first version of our plugin can localize the information most relevant to a task. This will act as a validation of the decisions made during the analysis study, and possibly result in further refinements to our model. While this is an important exploration for us, no statistical significance of these results will be claimed. We believe that for this validation a small number of subjects is sufficient. Four subjects will be given a short training session and then be asked to perform, using our enhanced version of Eclipse, one or more modest sized change tasks on a large software system. The change task will require exploring some non-modularized information in the system. Before running the study, some independent experts will be asked to specify which source code entities are most relevant to the task. We will then measure how well localized those entities become during the subjects exploration. Like in the analysis study, this will be based on the notion of locality described in Section 2.2. A second goal for this study will be to prepare for the second user study. Feedback from the subjects will be considered, and modifications will be made to the interface before the second study. Also, various performance measurements will be taken and the subjects will be given an interview and asked questions that we expect to use in the second study. The information localized by the visualization will change as the task progresses. Our hypothesis is that the information localized will cover a significant portion of the relevant subset identified by the experts. We further believe that we will find some qualitative evidence that suggests the locality can help developers in identifying and working with that subset of the system.

6.2

Second User Study

The second user study will be based on the second version of the Tableau Eclipse plugin which is described in Section 3.3. In the study we will attempt to show that the improved locality achieved by our interface, can in fact be helpful to a developer in identifying and working with entities relevant to a

25

task. In particular, when that task requires dealing with information that is not well modularized by the system’s decomposition. We will employ both quantitative and qualitative analysis to show this. In the study we will take twelve subjects and give each of them a pair of change tasks: task A and task B. Half the participants will use plain Eclipse to perform task A and the enhanced Eclipse for task B. For the other half of the subjects it will be the other way around. Each subject will be given a training session. The subject will then be asked to perform one task and then the other, with the order decided randomly. Each participant will be given at most 45 minutes to complete each task and half an hour for the training session. Then allowing for a couple of short breaks the study should take two and a half hours for each participant. At the end of each task, the participants will be given a short interview and a quiz to measure their understanding of the concern and gain some qualitative insight into their experience with the tool. We will take two performance measures: time to identify relevant entities, and the navigational effort incurred, perhaps capturing total exploration and exploration of irrelevant entities (i.e. wasted effort). These performance measures will be analyzed for statistical significance using the between-within ANOVA technique. Our hypothesis is that using the enhanced version of Eclipse, the subjects will more quickly identify the relevant entities and will spend less time exploring entities that are irrelevant. We believe further that qualitative evidence will suggest that the visualization helped the subjects understand and recall the entities and the interactions between the entities.

7

Time Line

Completion Date

Milestone

December 2004

Complete proof of concept plugin

December 2004

Defend thesis proposal

January 2005

Complete analysis study

March 2005

Complete first user study

September 2005

Complete second user study

Early/Mid 2006

Defend thesis 26

8

Contributions

The contributions the proposed research is expected to produce include: • Development of a new measurable notion of information locality, which may also be applicable to other domains. • Design and implementation of a novel source code interaction and visualization technique. This could be useful for visualizing and interacting with other information about programs. • Demonstration of the utility of a locality based on a developer’s navigation history and multiple relationships, in particular how well it localizes the most relevant information. • Investigation of the impact of this locality on a developer’s ability to understand and work with entities that are not well modularized. This investigation includes both quantitative and qualitative aspects.

References [1] M. J. Albers. Information design considerations for improving situation awareness in comples problem-solving. In Proceedings of the 17th anual international conference on Computer documentation, pages 154–158. ACM, 1999. [2] M. J. Baker and S. G. Eick. Space-filling software visualization. Journal of Visual Languages and Computing, 6(2):119–133, 1995. [3] B. B. Bederson, B. Shneiderman, and M. Wattenberg. Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies. ACM Transactions on Graphics (TOG), 21(4):833–854, 2002. [4] A. Black and N. Scharli. Programming with traits. In Proceedings of the international conference on Software engineering (ICSE). IEEE Computer Society Press, 2004. [5] M. C. Chu-Carroll, J. Wright, and A. T. T. Ying. Visual separation of concerns through multidimensional program storage. In Proceedings 2nd International Conference on Aspect-Oriented Software Development (AOSD-2003), pages 188–197. ACM, 2003.

27

[6] A. Cockburn. Supporting tailorable program visualisation through literate programming and fisheye views. Information and Software Technology, 43(13):745–758, 2001. [7] A. Cockburn and M. Smith. Hidden messages: Evaluating the effectiveness of code elision in program navigation. Interacting with Computers: The Interdisciplinary Journal of Human-Computer Interaction, 15(3):387–407, 2003. [8] G. W. Furnas. Generalized fisheye views. In Proceedings of the conference on computer human interaction (CHI). ACM, 1986. [9] T. R. G. Green and M. Petre. Usability analysis of visual programming environments: A ’cognitive dimensions’ framework. Journal of Visual Languages and Computing, 7(2):131–174, 1996. [10] J. Huotari, K. Lyytinen, and M. Niemela. Improving graphical information system model use with elision and connecting lines. ACM Transactions on Computer–Human Interaction, 11(1):26–58, 2004. [11] T. J. Jankun-Kelly and K.-L. Ma. Focus+context display of the visualization process. Technical report, UC-Davis Computer Science Department, 2002. [12] D. Janzen and K. D. Volder. Navigating and querying code without getting lost. In proceedings of the international conference on aspectoriented software development, pages 178–187, 2003. [13] D. Janzen and K. D. Volder. Programming with crosscutting effective views. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP). Springer-Verlag, 2004. [14] B. Johnson and B. Shneiderman. Treemaps: a space-filling approach to the visualization of hierarchical information structures. In Proceedings of the 2nd International IEEE Visualization Conference, pages 284– 291. IEEE, 1991. [15] M. Kersten and G. C. Murphy. Mylar: a degree-of-interest model for ides. In Proceedings of Fourth International Conference on AspectOriented Software Development. ACM, 2005. [16] G. Kiczales, J. Lamping, A. Menhdhekar, C. Maeda, C. Lopes, J.M. Loingtier, and J. Irwin. Aspect-oriented programming. In Pro28

ceedings of the European Conference on Object-Oriented Programming (ECOOP), volume 1241, pages 220–242. Springer-Verlag, 1997. [17] M. Klaus. Simplifying code comprehension for legacy code reuse. Embedded Developers Journal, April 2002. [18] G. Klein. Sources of power: how people make decisions. MIT Press, 1998. [19] S. Letovsky and E. Soloway. Declocalized plans and program comprehension. IEEE Software, 3(3):41–49, may 1986. [20] R. Mays. Power programming with the lemma code viewer. Technical report, IBM TRP Networking Labratory, 1996. [21] H. Muller and K. Klashinsky. Rigi – a system for programming-in-thelarge. In Proceedings of the 5th International Conference on Software Engineering. IEEE Computer Society Press, April 1988. [22] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Communications of the ACM (CACM), 15(12), December 1972. www.acm.org/classics/may96. [23] D. L. Parnas. Software aging. In Proceedings of the 16th international conference on Software engineering (ICSE), pages 279–287. IEEE Computer Society Press, 1994. [24] P. J. Quitslund. Beyond files: programming with multiple source views. In Proceedings of the 2003 OOPSLA workshop on eclipse technology exchange, pages 6–9. ACM, 2003. [25] G. G. Robertson and J. D. Mackinlay. The document lens. In Proceedings of the 6th annual ACM symposium on User interface software and technoloty, pages 101–108. ACM, 1993. [26] S. P. Robertson, E. F. Davis, K. Okabe, and D. Fitz-Randolf. Program comprehension beyond the line. In Proceedings of the IFIP TC13 third international conference on human-computer interaction, pages 959– 963. ACM, 1990. [27] M. P. Robillard and G. C. Murphy. Automatically inferring concern code from program investigation activity in source code. In Proceedings of the 18th international conference on automated software engineering (ASE), pages 225–234, 2003. 29

[28] M. P. Robillard and G. C. Murphy. Feat. a tool for locating, describing, and analyzing concerns in source code. In Proceedings of the International Conference on Software Engineering (ICSE), pages 822–823, 2003. [29] D. Schaffer, Z. Zuo, S. Greenberg, L. Bartram, J. Dill, S. Dubs, and M. Roseman. Navigating hierarchically clustered networks through fisheye and full–zoom methods. ACM Transactions on Computer–Human Interaction, 3(2):162–188, 1996. [30] M. Storey, H. Muller, and K. Wong. Manipulating and documenting software structures. Software Visualization, 1996. [31] M. Storey, K. Wong, and H. A. Muller. How do program understanding tools affect how programmers understand programs? Science of Computer Programming, 36(2–3):183–207, 2000. [32] P. Tarr, H. Ossher, W. Harrison, and S. M. Sutton Jr. N degrees of separation: Multi-dimensional separation of concerns. In Proc. 21st Int’l Conf. Software Engineering (ICSE’1999), pages 107 – 119. IEEE Computer Society Press, May 1999. [33] L. A. Teodosio and M. Mills. Panoramic overviews for navigating realworld scenes. In Proceedings of the first ACM international conference on Multimedia, pages 359–364. ACM, 1993. [34] F. Vernier and L. Nigay. Modifiable treemaps containing variableshaped units. In Proceedings of Extended Abstracts of IEEE Information Visualization (InfoVis), pages 28–35. IEEE, 2000. [35] D. D. Woods, E. S. Patterson, and E. M. Roth. Can we ever escape from data overload? A cognitive systems diagnosis. Cognition, Technology, and Work, 4(1):22–36, 2002.

30