Ordered and Unordered Treemap Algorithms and Their Applications ...

2 downloads 198 Views 1MB Size Report
and Their Applications on Handheld Devices ... Denna rapport beskriver ett sätt att visualisera hierarkiska data på PD
Ordered and Unordered Treemap Algorithms and Their Applications on Handheld Devices

BJÖRN ENGDAHL

Master’s Degree Project Stockholm, Sweden 2005

TRITA-NA-E05033

Numerisk analys och datalogi KTH 100 44 Stockholm

Department of Numerical Analysis and Computer Science Royal Institute of Technology SE-100 44 Stockholm, Sweden

Ordered and Unordered Treemap Algorithms and Their Applications on Handheld Devices

BJÖRN ENGDAHL

TRITA-NA-E05033

Master’s Thesis in Computer Science (20 credits) at the School of Computer Science and Engineering, Royal Institute of Technology year 2005 Supervisor at Nada was Olle Bälter Examiner was Yngve Sundblad

iii Abstract This thesis describes a way to visualize hierarchical structures on PDAs using Treemaps. A new ordered layout algorithm for treemaps, called Split Layout, is presented. Traditionally, treemap algorithms have exhibited a trade off between good aspect ratio and ordered layout. The new Split Layout is compared to five published treemap algorithms. Properties such as avarage aspect ratio, stability, run time and readability are measured, and it is shown that the new Split Layout performs better than any known ordered layout. The average aspect ratio is about 23% better and the stability increased between 28% and 40%. A user study was performed to measure the quality of the ordering of the new algorithm. 30 users gave a total of 360 measured response times, and it was found that the median time to locate a specific rectangle in the treemap only took 0.14 seconds (5%) longer for the Split Layout than for the Pivot by Split Size algorithm. The technique was implemented on a PDA and used for visualizing threaded discussion forums. User studies confirm that the concept of using treemaps on PDAs looks promising. The contents of the forum were easily grasped, even though the number of articles exceeded one hundred.

iv

Ordnade och oordnade treemap-algoritmer och deras användningsområden på handdatorer Sammanfattning Denna rapport beskriver ett sätt att visualisera hierarkiska data på PDA:er genom att använda Treemaps. En ny ordnad Treemap-algoritm kallad Split Layout presenteras. Den jämförs med fem publicerade algoritmer. Egenskaper som längd-breddförhållande, stabilitet, körtid och läsbarhet mäts och det visas att den nya Split Layoutalgoritmen skapar rektanglar vars längd-breddförhållande är 23% bättre än kända algoritmer. Dessutom ökar stabiliten i layouten med mellan 28% och 40%. En användarstudie utfördes för att mäta kvaliten på ordningen av rektanglarna. 30 användare gav totalt 360 mätresultat som visar att mediantiden att hitta en specifik rektangel bara tog 0.14 sekunder (5%) längre för Split Layout än för Pivot by Split Size. Tekniken implementerades på en PDA och användes för att visualisera trådade diskussionsforum. Användarstudier bekräftar att metoden att använda Treemaps på PDA:er verkar lovande. Innehållet i forumet var lätt att överblicka, trots att antalet artiklar översteg hundra stycken.

Contents Contents 1 Introduction 1.1 Treemaps . . 1.2 Project Goal 1.3 Definitions . . 1.4 Outline . . .

v . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2 Background 2.1 Traditional Visualization Techniques 2.2 Treemaps . . . . . . . . . . . . . . . 2.3 Other Visualization Techniques . . . 2.3.1 Cone Trees . . . . . . . . . . 2.3.2 Reconfigurable Disc Trees . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 1 1 2 3

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

5 5 6 8 9 9

Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

11 11 12 13 13 15 15

. . . . . .

17 17 18 18 19 19 19

5 Algorithms 5.1 Slice And Dice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Squarified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 22 23

3 Visualizing Threaded Discussion 3.1 Threaded Discussion Fourms . 3.2 Layout Style . . . . . . . . . . . 3.2.1 Overview Layout Style . 3.2.2 Leaf Node Layout Style 3.2.3 Detailed Layout Style . 3.3 Interaction . . . . . . . . . . .

. . . . .

. . . . .

4 Implementation 4.1 Hardware . . . . . . . . . . . . . . . . . . . . . 4.2 Software . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Connecting to Newsgroups . . . . . . . 4.2.2 Pocket Piccolo and Treemap Rendering 4.2.3 Layout Algorithm . . . . . . . . . . . . 4.2.4 Color assignment . . . . . . . . . . . . .

v

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

vi

CONTENTS

5.3 5.4 5.5 5.6

Strip . . . . . . . . . . . . Pivot . . . . . . . . . . . . Split Treemaps . . . . . . Comparison of Algorithms

6 Results 6.1 Algorithm evaluation . . . 6.1.1 Automatic Test . . 6.1.2 Readability . . . . 6.1.3 Treemaps on PDAs 6.2 Pocket Piccolo . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

26 27 30 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Discussion Forums . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

35 35 35 39 41 42

7 Conclusion

43

Bibliography

45

Chapter 1

Introduction This report is part of the final project of the Master’s programme in Computer Science at the Royal Institute of Technology in Sweden. The project was carried out as a part of a larger project at the Collaborative Visual Computing (CVC) research group in the Department of Computer Science at the University of Cape Town in South Africa.

1.1

Treemaps

Listening to a talk by Ben Shneiderman about the research performed at the Human Computer Interaction Lab (HCIL) at the University of Maryland, the idea was first born to try the concept of treemaps on compact displays. Having been a research topic at the HCIL for several years, the treemap is a means to display hierarchies in a space-filling approach. In doing so, you attempt to overcome the inherent drawbacks of using the traditional, treelike methods for visualizing hierarchical data. Traditionally, hierarchies have been visualized using treelike structures with nodes that can be expanded and collapsed as the user navigates down the structure. One of the more obvious benefits of this traditional method is that you quickly grasp objects’ location in the hierarchy, but this comes to the cost of screen space - most pixels are used for the background rendering. One other disadvantage is that as the tree grows and the number of nodes become larger than what fits on the display without scrolling, the difficulties of grasping the hierarchical structures increase. The treemap tries to overcome the above problems by using the entire display for the hierarchical structure. Figure 2.1 shows a traditional tree of a directory structure. A treemap of the same structure is displayed in figure 2.2.

1.2

Project Goal

The properties stated in the previous section make the idea of using treemaps on compact displays appealing. The goal of this Master’s project was to explore different algorithms for laying out the nodes of the treemap. The original Slice and 1

2

CHAPTER 1. INTRODUCTION

Dice algorithm proposed by Ben Shneiderman, very simple in its approach divided the space vertically and horizontally as the tree model was descended. It turns out that this algorithm has some disadvantages, a major one being poor aspect ratios. The rectangles produced by the algorithm are often very thin and hence difficult to select and compare to other rectangles. Figure 5.2 illustrates the problem. To overcome this, several other algorithms have been published during the past years, none of which however targeted for hand held devices with compact displays as a primary focus. The topic of this project was to evaluate the more popular algorithms on compact displays, and come up with modifications that might improve the final result. In a broader perspective, a goal was to find and enhance treemap algorithms targeted for PDAs (Personal Digital Assistant), and if applicable, even for treemaps in general. The thesis was conducted as a part of a larger project, the goal of which was to evaluate different visualization techniques of threaded discussion forums on PDAs. Therefore, the target application for the treemap was a software for browsing threaded discussion forums running on a WiFi (def. sec 1.3) enabled iPAQ (PDA from HP). This was taken in account when stating the criteria for the properties of the treemap algorithm.

1.3

Definitions

Aspect ratio GUI JVM

Layout algorithm

Layout Style

nntp PDA WiFi

WLAN

The longest side of a rectangle divided by its shortest side Graphical User Interface Java Virtual Machine, Required to run Java programs. Converts the platform independent byte code to platform dependent, executable machine code. The way the rectangles of the treemap are distributed on the screen. Position and aspect ratio may vary for different layout algorithms. For treemaps, the way the original data structure is converted into a treemap data structure. For example, some nodes of the original data structure might be omitted from the treemap data structure. Ancient protocol used to communicate with newsgroup servers. Personal Digital Assistant. Handheld computer such as a Palm Pilot or an iPAQ. Wireless Fidelity, Local area network that uses high frequency radio signals to transmit and receive data over distances of a few hundred feet uses ethernet protocol. Wireless lan, see WiFi.

1.4. OUTLINE

1.4

3

Outline

The thesis is organized in seven chapters. A background to the field is given in chapter 2. Chapter 3 describes how this project set about to visualize discussion forums with treemaps. In chapter 4, the software and hardware used in the project are described. The different layout algorithms used and their appearance on the PDA screen are discussed in chapter 5. The advantages and disadvantages of each algorithm are elaborated in detail. Furthermore, the new Split Layout is described. The overall results and experiences from the project are covered in chapter 6 and the thesis is concluded with a summary in chapter 7.

Chapter 2

Background This chapter gives a background to what has been published so far in the field of hierarchical information visualization. First some traditional techniques are presented. Second, an introduction to treemaps is given, and the chapter is concluded by other suggested alternatives to traditional layouts.

2.1

Traditional Visualization Techniques

The traditional way of visualizing hierarchical data can roughly be classified into three categories: listings, outlines and tree diagrams [11]. Listings are generally good at presenting content. The structural information, however, is not reflected very well. Displaying the location in the tree structure next to the items in the list is one solution, but that however is also suboptimal, since it requires that the user parse that path information. Outlines, a trade-off between listings and tree diagrams, summarizes the first couple of lines in the listing, showing the structural location of the items by indentation. An example of this is shown in figure 2.1.

Figure 2.1. Outline (left) and Tree diagram (right) of the course folder hierarchy on one of the project computers. The outline is more detailed than the graphical tree diagram.

5

6

CHAPTER 2. BACKGROUND

The tree diagram is probably the most common method for displaying hierarchical data sets that contain more than a few nodes. The items of the data sets are shown as icons in a graphical tree. The nodes can be expanded and collapsed which shows or hides the children of the nodes. A typical tree diagram is the directory tree in a file manager. The tree diagram is inferior to the other methods at presenting content, but instead, is much more capable of visualizing the structure. Figure 2.1 shows an example of an outline and a tree diagram of a folder hierarchy. All three of the traditional methods have specific benefits and drawbacks, but they all share the property of producing defective layouts for very large data sets. The definition of large is not written in stone and of course varies with screen size. A desktop screen can display a larger structure than the screen of a handheld computer, but studies from treemap readability on desktop computers show that as the number of nodes grow larger than a hundred, it becomes increasingly difficult to grasp the structure [11]. This limit definitely decreases on smaller screens. Another problem with tree diagrams, which in particular occurs on small screens, is horizontal scrolling. In order to expand all the nodes in a hierarchy, the tree might become too wide to fit on the screen. Horizontal scrolling is very likely to make the user loose the orientation in the hierarchy, and should therefore be avoided in most situations [6, 4].

2.2

Treemaps

To overcome the shortcomings of the traditional visualization techniques described in the previous section, the treemap was developed [11]. It works by dividing the screen into nested sequences rectangles each of which representing a node in the tree. The area of the rectangles is proportional to some quantity in the data set and the color of the rectangle encodes another dimension. For example, figure 2.2 shows a treemap of the same file structure as figure 2.1. The area of the rectangles is proportional to the file size and the color indicates date. In this way you get an overview of the hierarchy as well as the content, without blurring up the display with too much text. In a traditional tree diagram, a considerable part of the screen space is used as background pixels, the tree diagram itself only taking up a proportionally small amount of the screen. This is an issue for small screens in particular, where you have to use the display area with care. The treemaps, on the contrary, are space filling in their approach, thereby using 100 percent of the available screen space. An important aspect of the treemap, which has received much attention in the recent years [1, 2, 13], is the algorithm used to layout the rectangles. Given a data set with fixed item sizes, it is possible to distribute the rectangles in many ways, yet retaining the proportional relationship between the areas of the rectangles and obviously some ways are better than others. In order to gain an understanding of which methods are preferable, some sort of metric of the algorithm has to be employed. A common metric used by many designers [1, 2] is the aspect ratio of the

2.2. TREEMAPS

7

Figure 2.2. Treemap (Slice and Dice layout) of the same file structure as shown in 2.1. The area of the rectangles is proportional to the file size and the color proportional to the date of the file. The frames represent the directories.

rectangles. Slice and Dice, the first published algorithm [11], often produces very thin rectangles which are difficult to select and whose areas are difficult to compare. Therefore, one goal for a treemap algorithm is to produce square-like rectangles whose aspect ratios are close to one. The squarified treemap algorithm [2] is good in this sense, and produces a layout with an average aspect ratio very close to one. This algorithm is described in detail in section 5.2. Another metric used by [12] is the amount of change a layout undergoes as the data set is updated (stability). Imagine that you frequently watch the same data set, but the content may have been updated since the last time you studied it. Despite the change in the data, you expect to find the rectangles in roughly the same position as they were the last time. A layout algorithm that totally repositions the rectangles as the data is updated is given a high value and a consistent layout receives a value of zero. [12] defines the layout distance function, d(r1 , r2 ), between to successive layouts of a certain rectangle. Let a rectangle r be defined by a 4-tuple (x, y, w, h) indicating the x- and y-coordinate of the upper left corner, the width and the height. If rectangles r1 and r2 are defined by (x1 , x2 , w1 , h1 ) and (x2 , y2 , w2 , h2 ), then the layout distance function is given by q

d(r1 , r2 ) =

(x1 − x2 )2 + (y1 − y2 )2 + (w1 − w2 )2 + (h1 − h2 )2

Consistency between two successive layouts will hence give a zero change (high stability), whereas the more change means the higher value of the distance function (low stability). The squarified layout described in section 5.2 is an example of an algorithm with a low stability, whereas the Slice and Dice (section 5.1) algorithm has an average change close to zero, an extremely high stability. Although aspect ratio often is a good metric of the usability and clearness of the treemap layout, there are some cases where it falls short. If you have a data

8

CHAPTER 2. BACKGROUND

set that produces a layout with plenty of small rectangles, aspect ratio might not be the best attribute to measure. What really should be avoided is a layout with a lot of rectangles that are too thin to select. It is often better to make the aspect ratio of the bigger rectangles worse in favour of the smaller rectangles. Therefore, this project used a metric of all rectangles whose width or height was smaller than a certain threshold. This gave a good indication of how readable and interactable a certain layout was. Yet another way to evaluate a layout algorithm is its readability. Being a subjective property per se, measuring it is not as the straightforward as the other metrics described earlier. One way is to perform a user study, requesting the participants to perform certain tasks. The readability could then be traced through timings and interviews. Another, more mechanical way used in [1], assigns a numerical value to a given layout, the value based on the number of times that the motion of the reader’s eye changes direction as the treemap layout is scanned in order. The value would then in some way correspond to how easy it is to find a particular item in the treemap. More precisely, they follow the center of the rectangles in order and count the number of times the vector from one center to the next changes direction more than 0.1 radians. The amount of change is not taken into consideration, as they argue that once the direction of scanning has changed at all, the eye has to stop, the actual amount of change being less important. Normalizing the result gives a value of 1.0 to the most readable layout, where all items follow a horizontal or vertical pattern, and 0.0 to a completely shuffled layout. An example of the former is the Slice and Dice layout (section 5.1), and of the latter one is the squarified layout (section 5.2). The discussion in the previous paragraph brings up the concept of ordered and unordered treemaps. Since many real world data sets have a natural order, a desirable property of the layout algorithm would be to preserve that order. In other words, adjacent items in the data set, or list in the ordered case, would be assigned to rectangles next to each other in the treemap. Since the treemap is two-dimensional, there are many ways to convert the one-dimensional list into a two-dimensional treemap and the area has received a lot of attention recently [13, 12, 1]. Specifically, although noone has come up with a mathematical proof, there seems to be a trade-off between aspect-ratio and ordering. The algorithm, currently known, that produces the best aspect ratios (section 5.2) is also the one that produces the worst ordering and vice verse).

2.3

Other Visualization Techniques

Treemaps are not the only alternative to the traditional layout techniques mentioned in section 2.1. Here are some other techniques that have received interest during the recent years.

2.3. OTHER VISUALIZATION TECHNIQUES

9

Figure 2.3. Example of a cone tree taken from [9], page 193. Note how the nodes overlap when projected onto a 2-dimensional plane.

2.3.1

Cone Trees

The Cone Tree [9] is a three dimensional visualization technique which like the treemap tries to make effective use of the available screen space while displaying the whole data structure at the same time. The node is projected onto the apex of the cone and the children are placed around the circular base. To further enhance the interaction, when a node is selected with the mouse, the cones rotate so that the selected node and all the nodes following the path to the top are brought to the front. Figure 2.3 shows a cone tree taken from [9] page 193. One disadvantage with the cone tree is that as the three dimensional structure is projected onto two dimensional plane, overlap inevitably occurs which obscures the view of the rear items. According to the author, cone trees have been used in a number of applications, ranging from file managers, organizational structure browsers to decision support systems, where cone trees visualize the companies’ operating plan. Other potential applications include IDE (Interactive Development Environment) class browsers and network management interfaces.

2.3.2

Reconfigurable Disc Trees

This method (RDT) was presented in [7] and is also 3D visualization technique of large hierarchies. According to the authors, the RDT is an improvement of the cone tree and tries to overcome the disadvantages of the cone tree, where overlapping occurs when the 3d structure is mapped onto a plane. The RDT has two properties associated with each cone: A reference point which is a node above the center point of the plane of the current node, and an apex point which is the top of the cone. By varying the distance between the parent node, the reference point and the apex point, different shapes can be created: The shapes can be flattened out and projected onto a 2d plane, which like the treemap use the available screen space more efficiently. This is achieved without overlap, this in contrast to the cone trees.

Chapter 3

Visualizing Threaded Discussion Forums One of the goals of this project was to evaluate whether treemaps are a suitable visualization method on handheld devices with small screens. The treemap was applied to an application for viewing threaded discussion forums on PDAs. Used as a browsing tool, the treemap would give a good structural overview of the forum, as well as displaying additional information of the articles, such as thread activity and article size. This chapter describes how the discussion forum data set was rendered on the screen as a treemap. Section 3.1 presents the conceptual model for threaded discussion forums and gives an insight into how a treemap can be used to visualize that model. In section 3.2, the three different methods that were used in the project to convert the model are described in detail. The last section 3.3, although not really within the scope of this project, outlines ideas how the user may interact with the treemap. However, nothing is mentioned about what algorithm is used to distribute the rectangles in the treemap. This is covered in chapter 5.

3.1

Threaded Discussion Fourms

A threaded discussion forum is hierachical data set that may consist of hundreds of threads and sub nodes. Here is a definition of the terms that will be used for describing a discussion forum. A forum is a collection of articles. The articles are grouped in discussion topics which are called threads. A thread is represented by the first article in the discussion topic, i.e. the article that initiated the thread. To convert this into a tree, let each forum be represented as a root node in a tree. The threads will be the immediate children to the forum nodes. The articles of a thread are the thread node, together with all its children. If an article is posted as a reply to a specific article in the tree, that article node will become the parent of the reply. If a new thread (discussion topic) is created, that article will be a direct child to the root node (forum node). Fitting the content on a small screen with the traditional text based tree structure, where nodes are expanded and collapsed, is not ideal. The users will have 11

12

CHAPTER 3. VISUALIZING THREADED DISCUSSION FORUMS

to scroll horizontally as well as vertically and moreover, the overview is lost as the number of posts in the forum exceeds the number of text rows that can be displayed on the screen. To overcome the problems connected with traditional text based browsing schemes, a treemap is used to display the contents of the discussion forums, the intention being to provide users with a quick and comprehensible overview of the contents and the social structure (such as most popular thread) of the forum. Rather than being a tool for people seeking information on a very specific matter, the main focus is to support browsing behavior, where users are interested in seeing the activity in various groups. Traditional text based visualization techniques give the same space to each thread with no regard to attributes that might be interesting to forum users. When browsing a forum socially, some threads are more interesting and treemaps can highlight these threads in a natural way on a small screen by varying attributes such as size and color. In this project, the threads are rendered as rectangles in a treemap, their size being proportional to the number of articles in the thread. The color can be proportional to the activity of the thread or, when searching in the forum, proportional to the relevance of the query for that particular thread. Representing the posts in the discussion forum as rectangles in the treemap, differs to some extent from what is normally the case with treemap interfaces. Often [1, 14], it is the leaf nodes of the tree that are the actual content nodes: files in a directory structure and photos in a folder hierarchy, multidimensional data in biomedicine are some examples. The interior nodes normally work as meta-content placeholders for the actual content. In discussion forums, however, there is no such distinction: the action of creating a new thread is not independent of the first posting to that thread. So, unlike a file system which may have an empty folder, discussion forums have data at both branch and leaf nodes. This means that one cannot do a naïve application of treemaps, but must consider how best to modify them for discussion groups. The only exception to this rule are the root nodes, which represent the discussion forums itself. Forum nodes are not content nodes, and can hence, unlike the interior article nodes, be rendered as meta-content place holders.

3.2

Layout Style

Bearing in mind what is mentioned in the previous section, about the differences in converting a discussion forum into a treemap, compared to traditional treemap data sets, three different solution were considered. The first one, which is called the overview version, promotes overview to detail, the emphasis of which is to provide the user with a top down view of the different on-going discussions in the forum. The second version promotes detail and displays all articles in a single thread. The third version, takes a different approach and displays all the leaf nodes of the discussion forum. To illustrate this, the three subsections below describe each of the layout styles. The example figures shown in the subsection shown all refer to the sample discussion forums shown in figure 3.1. The picture shows two forums with two

3.2. LAYOUT STYLE

13

Figure 3.1. Sample discussion forums used for when rendering different treemaps in the next secitons.

threads each. The first thread in the first forum has no reply, the second thread in the first forum has three replies. The first thread in the second forum has no reply, and the second thread has one reply. To read the actual contents of the articles is beyond the scope of the project, but typically, the user clicks on one of the rectangles. Depending on what the rectangle represents (a thread or an individual article), text appears displaying the article content or a list with articles in the thread.

3.2.1

Overview Layout Style

Indicated by its name, the purpose of this style is to give the user an overview of the contents of the entire discussion forum. Motivated by research on browsing behaviour in social networks [3], the style shows only the forum nodes and the thread nodes, the intention being to give a new user an easy way to find which are currently the hot topics. The thread nodes are drawn as rectangles in the treemap with the size proportional to the number of articles in the thread. The color of the rectangle indicates how active that particular thread is. To show which articles that belong to which forums, a frame is painted around the thread rectangles that reside in the same forum. An example is shown in figure 3.2. An example of the final application using overview style is given in figure 3.5.

3.2.2

Leaf Node Layout Style

This experimental style takes a different approach and renders only the leaf nodes of the discussion forum and the representation of the data model is therefore somewhat harder to grasp than for the other layout style. The idea is to only render the leaf nodes, i.e. the most recently posted article in every sub tree within a thread. The motivation for this is that the top level thread node is not necessarily a good indication of the discussion topic for all subtrees in the thread. In fact, the topic of discussion often evolves as more articles are posted

14

CHAPTER 3. VISUALIZING THREADED DISCUSSION FORUMS

Figure 3.2. Overview layout style of the discussion forum shown in figure 3.1. Rectangle area is proportional to the number of articles in the thread. The info rectangle appears when the user touches the different rectangle with the stylus and shows topic and date for the thread. The color is proportional to the thread activity.

to the thread, which creates different discussion topics within a thread, all of which originating from the same top level article. Possible advantages of only rendering the leaf nodes are: • Not only the topic that originated the discussion is shown. Rather, all possible branches are displayed. If the discussion has evolved into different topics, the leaf nodes will probably represent them all. • By clicking on a leaf node rectangle, a list with all the articles up to the thread root node is shown. This way, the hierarchical structure of a discussion, which is hard to render on a small screen, is converted into a linear structure, the content of which can easily be presented with a list. The drawbacks are as mentioned, that the concept of only rendering the leaf nodes, is less intuitive and harder to grasp for the user. Additionally, the articles in between, that are not leaf nodes are hidden from the user. Even though they are shown when clicking on a leaf node, some content is lost. Figure 3.3 shows a picture of the layout style. Compare the picture with the tree structure in figure 3.1. What is rendered are the leaf nodes in the structure. Each rectangle in the figure 3.3 represent a leaf node of the tree in figure 3.1. When the user taps the stylus on one of the rectangles representing a leaf node, the user will be shown the topic of that particular leaf node article, together with all the articles up to the top node article.

3.3. INTERACTION

15

Figure 3.3. Example of Leaf Node Layout Style. The articles are grouped by thread. The area is proportional to the number of articles in each thread. The color of the rectangles is proportional to the date of article.

3.2.3

Detailed Layout Style

This layout style is either meant to be used for smaller forums, or for viewing a single thread, for instance when clicking on a thread in the overview layout style. While the overview style only rendered the top level threads of the discussion forums, this style displays all the articles in the thread. This is done nesting the rectangles inside one another. The color of the rectangle indicates the date it was written, and the size indicates the number of children. Since the number of nodes that have to be rendered is much bigger, and screen space is lost due to the nesting of the articles, this style is only applicable to small forums or individual threads. A picture of the style is shown in figure 3.4. The figure shows all the articles in the second thread of the first forum in figure 3.1.

3.3

Interaction

In order to create a realistic application, some level of interactivity needs to be present. Some treemaps display a label inside the rectangles with information of the item. Due to the small screen, that is not possible in this project. At first, the thought was to provide this information in an information bar at top of the screen. Clicking on the rectangle would show the subject of that article. It was however decided that clicking would add an unnecessary step, and instead a solution similar to tool tips was implemented, where holding the stylus over a rectangle brings up a square next to the stylus, with the subject, date and author of the article. Features like bringing up a complete article with text and posting new articles were considered not to part of the project, and were thus left unimplemented.

16

CHAPTER 3. VISUALIZING THREADED DISCUSSION FORUMS

Figure 3.4. Detailed layout style showing the second thread of the first forum in figure 3.1. This type of display is shown when the user clicks on a thread in the overview layout style (figure 3.3). The color of the rectangles is proportional to the date of the article.

Figure 3.5. Example of how the application might look when running on a PDA. Rectangles represent threads. Size prop. to number of articles in thread and color prop. to thread activity.

Chapter 4

Implementation To test the different treemap algorithms and the suitability of using treemaps for visualizing discussion forums on PDAs, a prototype application was implemented on an iPAQ. This chapter will describe the various aspects and properties of the hardware platform used, the software that was implemented and the tools and libraries that were employed when developing the software.

4.1

Hardware

The PDA used in the project had to follow the requirements stated below. • To be able to display the treemap satisfactory, the screen must be good enough with a high resolution and bit depth. To low a resolution would mean less detail in the treemap which would make small rectangles difficult to select and compare. • High performance. Some of the treemap algorithms have a complexity of O(n2 ) and with hundreds of objects to render, a slow CPU would mean the user would have to wait for treemap updates. • Mobility. The PDA must be easy to carry with so it can be used naturally. Moreover, a wirless internet connection enhances the experience greatly, since the user may connect to internet usegroups without having to put the PDA in its cradle to access the Internet. The device that was used was a Compaq iPaq 4150 equipped with built-in WLAN (Wireless LAN). The PDA was small, light and easy to carry with oneself. With a built in-support for WLAN, you could connect to different newsgroups on the internet whenever you were in range of an access point. Without WLAN support, in order to access different newsgroups, you would have needed an internet connected desktop to which you would attach the cradle with the PDA. 17

18

CHAPTER 4. IMPLEMENTATION

The screen was of a good quality with a resolution of 239x291 pixels and 64000 colors. The processor was an Intel (R) XScale 400 MHz, and the device was equipped with 64 MB RAM. On the negative side were problems with the letter recognition. Apparently, the algorithms are not perfect yet, but as the application used in this project did not involve much writing, the problem was considered of less importance.

4.2

Software

The prototype application was developed in C# .NET using Microsoft Visual Studio 2003. The first issue that had to be addressed was whether to use Java or a Microsoft solution. After a little consideration, it was decided to abolish the thought of using Java, the reason being that in order to install a JVM on the PDA, either Pocket PC had to be replaced by Linux or a special Pocket PC JVM had to be installed on the device. Both of the alternatives had drawbacks and were most likely to cause an overhead in development time compared to using a Microsoft solution. Furthermore, Visual Studio offers excellent debugging support when running the applications on the PDA. Having a good debugger is very important and saves a lot of time. To make the evaluation process easier, the application was split into three parts. One part implemented the nntp protocol and was used to download articles from public news servers on the internet. The second part, which actually used a third part library called Pocket Piccolo, was responsible for rendering and interaction with the treemap elements. The last part was the layout algorithm, which was completely separated from the rest of the code, which made it particularly convenient to switch between and implement new layout algorithms, without relying on which technique was used to render the treemap.

4.2.1

Connecting to Newsgroups

In the first part, which required that the PDA be in range of a wlan access point, allowed the user to connect to a public newsgroup server on the internet. All the newsgroups on the server were downloaded and the user chose which groups to browse. When doing that, all the articles in the selected newsgroups were downloaded to the PDA. To allow for fast off-line access to the newsgroup, there was an option to save the downloaded newsgroups as an xml file locally on the PDA. In this way, the user was able to access the newsgroup later when not in range of a wlan access point. Note that the emphasis of this part was purely to provide necessary and essential functionality so that the treemap algorithms could be evaluated. Therefore, the interface was as simple as possible and no effort was made to enhance the gui or implement more functionality apart from what was considered absolutely crucial for the rest of the application to function.

4.2. SOFTWARE

4.2.2

19

Pocket Piccolo and Treemap Rendering

Pocket Piccolo is a graphics library developed by the University of Maryland. The reason for using this graphics library was that it was easy to create graphics object and add interaction to the objects. Since as little time as possible of the 20 project weeks could be spent on prototype implementation, speed was important. The rectangles of the treemap were all represented as a Pocket Piccolo PNode object which had properties such as color and text. Each PNode object had built in support for interaction such as stylus down and tap and for manipulation such as zooming and scaling. This made it easy to get something working that also looked good in little time. In chapter 6 it is argued whether it would be feasible to use the same graphics library in a real system. All the articles that the user wanted to browse were stored in a data structure that resembled a tree. When it was time to render the tree as a treemap, the nodes were traversed and for each node that would be drawn on the screen a PNode object was created and added to the canvas. To each PNode object, an input listener was attached which kept track on when the stylus was tapped inside that particular PNode. Upon such an event, an information square was displayed with data, such as date, author and subject, of the article represented by that rectangle.

4.2.3

Layout Algorithm

When all the PNode objects had been created, they were laid out on the screen by the current layout algorithm. All PNode objects implemented a common interface the purpose of which was to separate the layout algorithm from the presentation layer, thereby making the algorithm independent of Pocket Piccolo. In this way, it was easy to plug in new layout algorithms into the application. The different algorithms are described in detail in chapter 5, but they were all resulted in that each PNode object was assigned a bounds value, indicating the coordinates on the screen for each PNode rectangle.

4.2.4

Color assignment

The way in which the color was assigned to each rectangle was not obvious. In this application, the color indicated the date of the article, so that recent articles become green and older ones become darker. To make this flexible, each PNode had a function called getColorAttribute() which returned an integer, which in this case was the number obtained from the date by adding year, month, day, hour and second. For example, March 7 2005, 11.30 p.m. became 200503072330. The second issue was how to distribute the colors among the articles. One option is to use all the colors, which basically means that all the rectangles are sorted by date and the most recent rectangle is assigned the greenest hue, the second most recent is assigned the second greenest hue, and the oldest article becomes black. This has the advantage that all the available colors are used, which hopefully makes it easy for the user to discriminate between the articles and to locate the most recent

20

CHAPTER 4. IMPLEMENTATION

article. The disadvantage is that if all the articles are very recent, say from the same day, still one will become green and one black, no difference in hue from another data set with articles ranging from today and three years ago. Another option would be to sort the articles by date but let the actual date influence the color. That is, if 3 articles are from the same day and one article is three years old, the old one will be black whereas the recent articles will all be very green. This means, as opposed to the former alternative, that the hue will be proportional to the date. The disadvantage is that unless the article dates are equally distributed on the time scale, not all colors will be used when rendering the treemap. The method used in this application resembles the first alternative which uses all colors, but with some modification. In social discussion forums, it is probably the most recent articles that are important, but if an article is six or eight weeks old is of less importance. Therefore, it would be good if the resolution was higher for more recent articles than for older ones. To accomplish this, the idea was to apply some sort of logarithmic function to the date value when the color was obtained. To get more tuning though, a discreet solution was implemented which assigned 50 percent of the colors to 20 percent of the most recent rectangles. This way, it became easier to discriminate between the more recent articles, whereas the older ones were more similar in hue.

Chapter 5

Algorithms A key part of the treemap layout is the algorithm used to calculate the dimension and location of the rectangles. As mentioned in earlier chapters, there exists no unique mapping from a given data set to the way the rectangles are distributed. Neither does there to this day exist an equation to calculate the optimum layout for a given data set. Rather, the different algorithms are designed to optimize various aspects of the layout. The algorithms that have been used in this project all serve a different purpose, each of which taking a different approach aimed at optimizing one or multiple of the metrics described in section 2.2. Although optimizing different metrics, the general procedure is roughly the same for the algorithms. The overall workflow can be described in a recursive way: void layoutNodes(Node[] nodes, Rectangle totalBounds) { foreach node in nodes Rectangle nodeBounds = calculateBounds(node.size, totalBounds); node.bounds := nodeBounds; totalBounds.remove(nodeBounds); // Layout the children within the bounds of the parent node if (node.hasChildren) layoutNodes(node.children, node.childrenBounds); }

The central part of the algorithm, is of course the third line, where some part of the available area is assigned to each node. The above listing is very simplified. It follows a sequential pattern, processing each node one at a time. This might not be the case. Some algorithms look ahead in the list of nodes when caculating the bounds for a single node. But the overall flow is identical for all the algorithms a function that processes each node, depending on its size, assigns a portion of the available area. If the node has children, they are laid out within the bounds of the parent node. 21

22

CHAPTER 5. ALGORITHMS

Each of the algorithms listed below were used in the experiment. They will be described in detail and each graded on the metrics aspect ratio, thin rectangles, change and readability described in section 2.2. The algorithm can be grouped depending on how they function. Among the ordered treemap algorithms, which preserve the natural order of the data set are, are the Slice and Dice and the Strip layout. Partially ordered treemaps are the Pivot and the Split layout. Although preserving order, the layout is not as readable as for the fully ordered layouts which follow a linear pattern. Instead, adjacent elements in the data set are laid out next to each other in the treemap, but the direction may change vertically or horizontally for each element, making it somewhat more difficult to grasp. The squarified layout is totally unordered. The rectangles are sorted by size and their aspect ratios are extremely good. All algorithms except the Split layout are well known published algorithms. The new Split layout was developed in this project to get an ordered algorithm with good average aspect ratio.

5.1

Slice And Dice

Published in 1991, this was the first layout algorithm for treemaps and perhaps one of the simplest to understand [11]. It functions well for smaller layouts, though its biggest problem being poor aspect ratios, the algorithm becomes less useful for larger data sets, and naturally small screens. For each level in the tree the layout direction is conversely changed between horizontal and vertical, and for all the children at a given level, the display area is divided amongst the children, each child obtaining a slice proportional to its size attribute. For example, suppose there are three children with sizes 1, 2 and 2 at a certain level and the layout direction for that level is horizontal. If the available space is a rectangle of dimension 100x100 pixels, the first item of size 1 will be placed to 1 = 15 of the total size, the rectangle representing the the left. Its size being 1+2+2 first item will have dimensions 20x100. Analogously, the second and third item will be in the middle and to the right, both in rectangles of dimensions 40x100 pixels. Now suppose the second rectangle has two children with sizes 2 and 8. Descending one level, the rectangles will be laid out vertically inside the middle rectangle of 2 dimension 40x100. The first child will be given 10 of the total area and thus placed in the upper rectangle of dimension 40x20 pixels, whereas the second item will be placed in the lower rectangle with dimension 40x80 pixels. Figure 5.1 shows how a small tree is converted into a Slice and Dice treemap. The structure is easy to grasp for very small trees, but as the number of nodes grows larger, some rectangles become very thin. This makes it difficult, not only to compare the area of the rectangles, but also to interact with the treemap and select different items.

5.2. SQUARIFIED

23

Figure 5.1. Example of how a tree is converted into a Slice and Dice treemap. The label indicate the node sizes which correspond to the rectangle areas.

The strengths of the algorithm are found in the change and readability measures. Being an ordered treemap algorithm, it is particularly predictable in its placement of the rectangles, and hence it is easy to find a certain item among the rectangles. When updated, all the rectangles stay in the same place, the only difference being that the left siblings of the changed node are squeezed to the left, and the right siblings are squeezed to the right which gives a very good value of the change metric. This is in contrast to the other algorithms where the rectangles may change location completely, making it difficult to locate the same item as the data set is updated. Due to the drawbacks with low aspect ratio described above, this algorithm is rarely used in practice, merely provided as a comparison to other algorithms.

Performance Analyzing the performance of the Slice and Dice algorithm is straightforward. The treemap can be drawn with one traversal through the tree. Assuming the calculation of size and color takes constant time, the run time is O(n), where n is the number of nodes in the tree.

5.2

Squarified

The squarified layout was first published by Bruls, Huizing and van Wijk in 2000 [2]. Realizing the need to find treemap algorithms that produce rectangles with good aspect ratios, they developed the squarified layout, which seeks a solution where all rectangles have an aspect ratio as close to one as possible. For a set of rectangles, R to be laid out, they try to fulfil the criterion ∀r ∈ R,

max(r.width, r.height) ≈1 min(r.width, r.height)

Finding the optimal solution is not feasible as the problem is NP-hard, but by using a fairly simple rationale it is possible to get close to producing optimal aspect

24

CHAPTER 5. ALGORITHMS

ratios. Instead of looking at all levels at the same time, which would result in an explosion of computation time, the algorithm tries to produce square-like rectangles for the siblings at a certain level. Assuming that the algorithm succeeded in making good squares for the nodes at one level, this gives a good starting point when descending to the next level, and recursively assigning rectangles to the children of each parent sibling. The idea behind the algorithm is to layout the rectangles along a row in the available display area as long as the worst (highest) aspect ratio of any rectangle in the current row keeps improving. Once the highest aspect ratio among the rectangles has reached a minimum, and adding another rectangle to the current row would increase the worst aspect ratio, the current row is fixed and a new row is created. Upon creation of a new row, the layout direction for the row is determined. If the width of the remaining available area is larger than the height, the rectangles are added vertically within the row, otherwise horizontally. The order in which the rectangles are added plays a significant importance. As the larger rectangles are generally harder to fit than the smaller ones, the list of items to lay out is initially sorted so that the larger items are processed first. The algorithm is described below in pseudo code squarify(Queue nodes) { Queue currentRow; nodes.sort(); // Sort on size while {nodes.length > 0) { Item current := nodes.dequeue(); // Worst aspect ratio improves - add to current row if (worst(currentRow + current) < worst(currentRow)) currentRow.enqueue(current); // Worst aspect ratio increased - create new row else { layoutRow(currentRow); currentRow.clear(); currentRow.enqueue(current); } } foreach (Item node in nodes) squarify(node.children); }

First the nodes are sorted with the biggest item first. Then for each node in the queue, the function worst checks if adding the item to the current row would improve the worst aspect ratio of any rectangle in the current row. If not, the function layoutRow is called, which has access to a global data structure Rectangle availableArea, and sets the width and height of the nodes in the row depending on

5.2. SQUARIFIED

25

Figure 5.2. Comparison between squarified layout (left) and Slice and Dice layout (right). Note the much better aspect ratios of the squarified layout

their size attribute. The last row processes the children of the nodes by calling squarify recursively for each list of children. This algorithm produces very good layouts in terms of aspect ratios, with average aspect ratios close to one, most often only differing in the second decimal. Though possible to come up with counter examples of poor layouts, for the type of data used in this project however, the average aspect ratio was excellent. Figure 5.2 shows a comparison of the squarified and the Slice and Dice layout. Note how close to 1 the aspect ratio is in the squarified layout. Also note how difficult layouts the Slice and Dice algorithm produces as the data set grows larger. The main drawback with the algorithm is that it is unordered. Since the nodes are sorted by size, the natural ordering of many data set is lost. This makes it less readable in application where finding a specific item in the treemap is important. The change metric is also worse than most of the other algorithms. Updating the data set may completely relocate the rectangles in the treemap, making it difficult for users to orientate between successive accesses to the same data set.

Performance First, the rectangles are sorted by size, which takes O(n log n) time. The run time for the remaining part of the algorithm depends on the average number of rectangles in each row. Since the dimension of the rectangles in the current row must be recomputed for every new rectangle added to the row, which has to be done for every rectangle in the data set, the run time becomes O(n log n) + O(nβ(n)) where β(n) is the average number of rectangles in the row. The run time is thus bounded below by O(n log n) which will be the case if O(β(n)) ≤ O(n log n). The worst case will be if all the rectangles are laid out in one row, meaning that β(n) = n, the run

26

CHAPTER 5. ALGORITHMS

time becoming O(n2 ), which hence constitutes the upper bound of the run time. √ On average however, each row will contain n rectangles, giving a total average run √ time of O(n n).

5.3

Strip

The Strip treemap [12] is an ordered treemap and is a modification of the squarified algorithm described in section 5.2. The inputs to the algorithm is the available area to be subdivided and a set of items to distribute. Throughout the layout process, a current strip is maintained. For each item to layout, a check is made whether adding the item to the current strip decreases (improves) or increases the average aspect ratios of the rectangles in the current strip1 . If the average aspect ratio decreases, the rectangle is added to the current strip, otherwise the current strip is fixed and a new strip is created, to which the item is added. As opposed to the squarified algorithm, when the layout direction is re-determined for every new row, the direction for this layout is fixed from the start. If width of the total available area is bigger than the height, the strips are lined up horizontally, and vice verse vertically. The Strip algorithm is an ordered layout algorithm, and hence the list of rectangles is first sorted for a given attribute. The main advantage of this algorithm is that it is very simple to implement and produces excellent readability. The placement of the rectangles is predictable. The average aspect ratio and the stability is slightly worse than the Pivot and the Split treemap algorithm described in section 5.4 and 5.5. strip(Queue nodes) { nodes.sort(); // Sort on any attribute while (nodes.length > 0) { List currentStrip; Item currentItem = nodes.dequeue; if (avgAspectRatio(currentStrip + currentItem) < avgAspectRatio(currentStrip)) currentStrip.add(currentItem); else { layoutRow(currentStrip); currentStrip.clear(); currentStrip.add(currentItem); } } foreach (Item node in nodes) strip(node.children); } 1

As opposed to the squarifed treemap where you checked the maximum aspect ratio of any of the rectangles in the current row

5.4. PIVOT

27

Figures 5.4 and 5.5 compare the different ordered treemap layouts. As mentioned, the readability metric of the algorithm is very good. Finding a specific item in the treemap is fast, due to the predictable layout of the items. The average aspect ratios are not as good as for other ordered algorithms, but often acceptable. The number of thin rectangles is sometimes a problem though when a lot of small and large items are being laid out. Moreover, the stability is quite good compared to the other algorithms. A problem in this regard, is that a small update in the data set can cause an entire strip to be removed or added, which induces a lot of change in the layout. The algorithm as defined above works quite well in most cases, but a frequent problem occurs with the last strip that is laid out. Since the decision whether a rectangle is added to the current strip, or to a new strip, is based only on the average aspect ratio of the rectangels in the current strip, one can in the end be left with a couple of thin, poor aspect ratio rectangles in the last strips. Adding a lookahead to the layout often solves the problem. By comparing the average aspect ratio of the rectangles in the current strip and the lookahead strip to what happens if the lookahead rectangles are moved to the current strip, you can often avoid the thin rectangles in the last strips. Though at least doubling the run time of the algorithm, the complexity remains the same. This is motivated by that a maximum √ √ of one more strip is examined, which on average contains n rectangles. 2 n or √ n has no effect on the complexity of the algorithm.

Performance The run time of the algorithm can be analyzed as follows: First the items are sorted, √ which is done in O(n log n) time. Each strip contains an average of n rectangles. For each strip, the dimension of all the rectangles in the current strip must be √ recomputed ( n rectangles). The recomputation is done for every rectangle in the data set (n rectangles). Therefore, the total average run time will be O(n log n) + √ √ O(n n) ∈ O(n n). Of course the very unlikely event may occur, that the entire layout is constituted by a single strip with all the rectangles, for example if the display area is very thin. This means that the number of rectangles in the strip is n and the worst case run time becomes O(n2 ).

5.4

Pivot

The Pivot is a partially ordered layout algorithm [12]. With a procedure similar to the quick sort algorithm where the elements in the list are processed recursively to render an ordered layout. The list of items are allocated into four parts: three sub lists and one special item called the pivot. The dimension of the pivot is fixed and the contents of the three sub lists are laid out recursively by applying the same sub-division process to each one of them.

28

CHAPTER 5. ALGORITHMS

Figure 5.3. Pivot configuration. The largest element is laid out in the Pivot. The figure shows the configuration for rectangles whose width >= the height. If this is not the case, the configuration is flipped along the line y = x.

The input to the algorithm is an ordered list of items of different sizes and a rectangle in which the items will be laid out. The first step is to select the pivot according to some strategy. The Pivot-by-Size variant, for instance, assigns the pivot to the largest element in the list. Next, all items in the list with an index less than that of the pivot item are assigned to list L1 and are laid out in a large sub rectangle to the left of the pivot. The elements with an index higher than the pivot element are assigned to lists L2 and L3 which are laid out in sub rectangles below and to the right of the pivot item respectively. The distribution of items between L2 and L3 is determined so that all items in L2 have an index less than the items of L3 and so that the aspect ratio of the pivot rectangle is as close to one as possible. Here is a description of the algorithm in more detail: - Input: A list of items that sorted on some attribute, L to layout and a rectangle, R, in which the items are placed. 1. If the number of items in L are 0, return without doing anything. 2. If the number of items in L equals 1, set the bounds of the item to be R. Recursively lay out the children of the item by starting at step 1 with the item’s children as the list L and the rectangle R as their bounding rectangle. 3. |L| > 1, choose an item to be the pivot, P , according to some Pivot selection strategy described later. 4. Divide R into four sub rectangles, r1 , r2 , rp and r3 , according to figure 5.3. 5. Put the pivot P in rectangle rp , and put all items in L with an index less than that of P in the list L1 . Put the elements of L with an index greater than P into L2 . 6. Move an item from the end of L2 and insert it at the beginning of L3 as long as that action improves the aspect ratio of rp .

5.4. PIVOT

29

7. When moving yet another item would make the aspect ratio of rp worse, we have three lists which all may be empty: L1 with all items of L whose index is less than P , L2 with items greater than P , and L3 with items whose index is greater than those in L2 . Now that the contents of L1 , L2 and L3 is determined the exact dimensions of rp is calculated and fixed. 8. Now recursively lay out the items of L1 , L2 , and L3 in the rectangles r1 , r2 , and r3 starting at step 1. The average aspect ratio is often slightly better than the Strip treemap. However, since it depends on the type of data set, there are several occasions where the Strip layout actually performs better regarding the average aspect ratio. As to the number of thin rectangles, the Pivot layout almost always performs better than the Strip treemap. Moreover, the layout is considerably more stable. The problem with the Strip layout, where an update in the data set can cause an entire strip to be added or removed does not occur in the Pivot layout. Being a partially ordered algorithm, the readability is far better than the squarified treemap, although not as good as the Strip layout which follows a simple linear pattern. Figures 5.4 and 5.5 show examples of the Pivot by Split Size layout.

Selecting the pivot According to [12], the strategy used to select the pivot has implications upon the quality of the treemap. The best average aspect ratio is obtained if the Pivot-by-Size variation is followed. It assigns the pivot item to the largest item in the list. This makes sense, since the largest item often is the most difficult one to fit. Processing them first gives a lot of space to play with in order to make there aspect ratio close to one. An alternative is the Pivot-by-Middle variation, assigns the pivot to the middle item of the list. (If the list contains n items, the pivot will be the item with index bn/2c). In doing this, the layout often becomes more stable, since the pivot selection is independent of the size of the items, while at the same time, splitting the list in the middle is likely to create a balanced layout. In general, Pivot-by-Size has slightly better aspect ratio, while Pivot-by-Middle performs better on the stability.

Performance The worst case run time for the Pivot by Size algorithm is O(n2 ). For all n elements, the biggest element has to be located. If this happened to be the last element in the list, n elements would have to be traversed, thus creating a worst case run time of O(n2 ). However, on avarage, only half the list will have to be traversed, yielding an average run time of O(n log n).

30

5.5

CHAPTER 5. ALGORITHMS

Split Treemaps

None of the above algorithms is ideal. The squarified layout, which has excellent aspect ratio, mixes the natural order of the data set, while the ordered layout algorithms have suboptimal aspect ratios. In an attempt to produce an ordered layout with better average aspect ratio than any existing algorithm, the Split treemap layout was designed for this project. Especially for small screens, the importance of avoiding thin rectangles must be adhered to. Otherwise users will not be able to gain from the benefits that treemaps may provide compared to traditional layouts. The first experiments on a small screen for the newsgroup browsing application were implemented with the squarified layout. Its square like rectangles made the treemap work well on a small screen. However, realizing the benefits of ordered layouts, the Pivot and the Strip layouts were implemented. Unfortunately, the average aspect ratio was too high and it appeared as though the current ordered treemap algorithms (Pivot and Strip) were not a satisfactory substitution for the squarified version already implemented. Therefore, an attempt was made to develop a new ordered treemap algorithm with improved average aspect ratio. The result is the Split treemap which, like the Pivot, is a partially ordered algorithm. It produces a layout where the natural ordering of the data set is roughly preserved, while in most cases producing better aspect ratios than the Pivot and the Strip treemaps. The procedure is very simple. Inputs to the algorithm are an ordered list, L = {l1 , l2 , ..., ln } of items to layout and a rectangle, R, in which the items are distributed. Furthermore, define the weight of the list, w(L) to be the sum of the sizes of all the elements in the list. The algorithm follows a recursive process, where L is split into two halves, L1 and L2 , such that that w(L1 ) is as close as possible to w(L2 ). Note that the ordering of the elements must not be changed. L1 and L2 are both ordered, and all the elements of L1 have an index less than those of L2 . We now have: w(L1 ) ≈ w(L2 ) ≈

w(L) 2

and ∀li ∈ L1 , ∀lj ∈ L2 : li ≤ li+1 ≤ lj ≤ lj+1

Now, define the α(R) to be the area of a rectangle R. The rectangle R is split, either horizontally or vertically depending on whether the width is bigger than the height, into two sub rectangles, R1 and R2 such that their areas corresponds to the size of the elements of L1 and L2 . That is: α(R1 ) w(L1 ) α(R2 ) w(L2 ) = , = α(R) w(L) α(R) w(L) Finally, recursively layout the contents of L1 and L2 in R1 and R2 according to the algorithm.

5.5. SPLIT TREEMAPS

31

splitLayout(List items, Rectangle r) { if (items.length == 0) return; if (items.length == 1) { items.bounds = r; splitLayout(items.children, r); // Layout the children within } List l1, l2; Rectangle r1, r2; double halfSize = w(items) / 2; double w1 = 0, tmp = 0; // Pick out half the weight into l1, half into l2 for all items { Item front = items[front]; tmp = w1 + front.size; // Test if it got worse by picking another item if (abs(halfSize - tmp) > abs(halfSize - w1)) break; // It was good to pick another l1.enqueue(items.dequeue); w1 = tmp; } // The rest of the items go into l2 l2 = items; r1 = new Rectangle(r.x, r.y, r.width * w(l1)/w(l1 + l2), r.height); r2 = new Rectangle(r.x + r1.width, r.y, r.width - r1.width, r.height); splitLayout(l1, r1); splitLayout(l2, r2); }

What about the quality of the treemap produced by the Split treemap algorithm? More about that in chapter 6 below, but generally, the Split algorithm produces layouts with lower average aspect ratio and better stability than does either of the Strip and the Pivot algorithms. Figures 5.4 and 5.5 show examples of the Split layout.

32

CHAPTER 5. ALGORITHMS

Performance The run time of the algorithm is analyzed as follows. First the total sum of the items has to be computed, which takes O(n) time. Furthermore, the list and sublists have to be divided a total of n − 1 times. For each sublist division, the split point of the list has to be located. A naïve implementation does a linear search, which takes O(n) time. But by using dynamic programming, the cumulative sum can be saved the first time the sum is calculated on the entire list. That is, for each list element li , the sum w(l1 ) + w(l2 ) + · · · + w(li−1 ) + w(li ) is saved. Doing that, the split point can be found with a binary search, which takes O(log n) time in the worst case. On average however, the split point will be in the middle, which gives O(1) time to locate the middle. The split point must be found for each n − 1 list division. The total run time, supposing dynamic programming is used, hence becomes: O(n) + O[(n − 1) log n] ∈ O(n log n) (Worst case) O(n) + O(n − 1) ∈ O(n) (Average) That means that the average and worst case run time of the split layout is better than any of the other ordered layout algorithms, except for the Slice and Dice.

5.6

Comparison of Algorithms

To get an estimation of the quality of the different algorithms, figures 5.4 and 5.5 shows a comparison of the algorithms laid out on an area proportional to a PDA screen. The first set (figure 5.4) compare the three ordered layout algorithms for a relative small data set of 20 elements. The Split layout has the best average aspect ratio, but the Strip treemap on the contrary, lays out the element in a very predictable manner, making the layout more readable. For a data set of this size, the Strip treemap is thus probably the best choice of algorithm. The second set of figures (figure 5.5) compares the algorithm for a bigger data set of 100 elements. The order of the element is indicated by the shading. First note the outstanding aspect ratio of the squarified layout compared to the other algorithms. However, since it is unordered, locating a specific element takes a very long time. Among the ordered algorithms, the Split treemap again has the best average aspect ratio. The Strip treemap starts to produce layouts with thin rectangles for data sets of this size, and hence for sizes of 100 or more elements, the Strip treemap is probably an unwise choice.

5.6. COMPARISON OF ALGORITHMS

Figure 5.4. Comparison of layout of 20 items for three ordered algorithms. Left: Strip, avg aspect ratio = 2.19. Middle: Pivot by Split Size, avg aspect ratio = 1.92. Right: Split treemap, avg aspect ratio = 1.68.

Figure 5.5. Comparison of layout of 100 elements. Shading indicates order. UL: Squarified, avg aspect ratio = 1.15. UR: Strip, avg aspect ratio = 1.96. LL: Pivot by Split Size, avg aspect ratio = 2.7. LR: Split, avg aspect ratio = 1.88.

33

Chapter 6

Results In this chapter, the general results from the project are presented. The chapter is divided into three parts. The first section evaluates the different treemap algorithms that were tested. In the second section, the general results from using treemaps on PDAs are discussed, specifically the prototype newsgroup browser, and finally the chapter is concluded by a remark on using Pocket Piccolo as a graphics library.

6.1

Algorithm evaluation

A total of five algorithms were implemented and evaluated. They are all described in detail in chapter 5. When evaluating the algorithms, you can either do a subjective discussion of the overall impression of the result or you can make an automatic test that gives numerical data on different properties of the algorithms.

6.1.1

Automatic Test

The properties that were tested were the average aspect ratio, stability, readability, number of thin rectangles and run time. In order to get results that were comparable to published data, the same technique as [1] describes was employed. The surrounding bounds (screen) had the same proportion as an iPaq 4450. The test was run for different one level data sets where the number of items were 20, 100, 200, 300, 400, 600 or 800. One additional test was run with a two-level data set of size 10x10, i.e. a total of 100 items, with 10 items on the top level, and each of the top level items having 10 children. For each experiment (data set), 100 trials were tested and for each trial the data set was updated 100 times. Two different experiments were performed - one where the rectangle sizes were drawn from log-normal distribution created by exponentiating a normal distribution with mean 0 and variance 1. This distribution is frequent for positive natural data [10]. In the second experiment, the data was drawn from the Zipf distribution [10]. 35

36

CHAPTER 6. RESULTS

In order to test the stability of the algorithm, for each of the 100 trials, the data set was multiplied by a random variable Y = eX , where X ∈ N (0, 0.05). This was repeated 100 times for each trial to simulate natural occurring updates to the data set. For each update, the change in position, average aspect ratio and number of thin rectangles were recorded. 3.5

Avg. aspect ratio

3

2.5

2

Split Pivot by split size Pivot by size Squarified Strip

1.5

1 0

100

200

300

400 #elements

500

600

700

800

Figure 6.1. Plot of the average aspect ratios for five algorithms with various data set sizes. The new, ordered Split algorithm has good aspect ratio. The Slice and Dice algorithm is excluded, since its aspect ratio is between 200 and 3700.

The diagrams in figures 6.1 to 6.4 and table 6.1 show the results from the test. Each of the figure shows a specific attribute plotted for all of the algorithms. To get a hold of how the algorithms performed for various number of elements, the attribute was measured for 20, 100, 200, 300, 400, 600 and 800 rectangles. Start by studying the plot in figure 6.1. It shows the average aspect ratio for all algorithms for different number of items. As can be seen, the unordered squarified layout is superior to all other algrithms. If you do not need an ordered layout, this is obviously the algorithm to choose. However, if you do need an ordered version, the new Split algorithm performs very well, with an average aspect ratio of about 2.3. Of the other published algorithms, Strip treeamap seems to be good for small data sets. The Slice and Dice algorithm has been excluded from the plot, since it has far to bad aspect ratios even to be considered for large data sets. Another important measure is the stability of the algorithm. As can be seen in figure 6.2, Slice and Dice has the best stability. However, having an extremely poor aspect ratio, it is seldom worth considering at all. Of the other algorithms, the new Split algorithm shows the best stability. The other ordered algorithms seem

6.1. ALGORITHM EVALUATION

37

80

70

Split Pivot by split size Pivot by size Squarified Strip Slice and Dice

Stability (low value=stable)

60

50

40

30

20

10

0 0

100

200

300

400 #elements

500

600

700

800

Figure 6.2. The stability (amount of change to layout when data is updated) for various algorithms and data set sizes. The lower score the better stability. Again, the new Split layout performs well. The worst stability belongs to the unordered squarified layout, whereas the ordered Slice and Dice layout receives the best score.

to perform equally well. The only unordered layout, squarified, shows the worst results, but in some cases, no updates of the data set occurs, and stability is an unimportant criterion. Figure 6.3 shows the run time, which is an important attribute if treemaps are to be useful, especially on PDAs which have less powerful CPUs. If the algorithm is too slow, updates will be as well and the usability of the treemap will thus be affected. Traditionally, studies of run time focus on worst case run time which often is an important measure, but as the ordo eats up all the constant factors, two algorithms with the same worst case run time may differ. A complement to the average aspect ratio is the number of thin rectangles produced by an algorithm. Figure 6.4 shows the number of thin rectangles produced by the different algorithms for different treemap sizes. Comparing this figure with the plot in figure 6.1, it is evident that the an algorithm with low aspect ratio also produces a low number of thin rectangles. A summary of the results from all the test suites are shown in table 6.1. What is displayed is an average over both tests (Gaussian and Zipf distribution) for all the 100 trials, 100 updates and all the different treemap sizes. The run time, though, is not an average, but a summation of all the values. It can be seen that when comparing all the ordered treemap algorithms, and excluding the Slice and Dice, the new Split layout gets the best score for all the attributes in the table.

38

CHAPTER 6. RESULTS

1800 1600 1400

Split Pivot by split size Pivot by size Squarified Strip Slice and Dice

Run time

1200 1000 800 600 400 200 0 0

100

200

300

400 #elements

500

600

700

800

Figure 6.3. The run time in seconds of the entire test suite 100 trials with 100 updates each for different algorithms and data sets. The Split and Slice and Dice layouts seem to be equally fast, while the Pivot layouts appear to be most time consuming.

150 Split Pivot by split size Pivot by size Squarified Strip

#thin rectangles

100

50

0 0

100

200

300

400 #elements

500

600

700

800

Figure 6.4. The plot illustrates the number of rectangles that were considered to be thinner than a certain threshold, and hence difficult to select and compare. The number of thin rectangles clearly is connected to the average aspect ratio (figure 6.1). High average aspect ratio means plenty of thin rectangles.

6.1. ALGORITHM EVALUATION

Aspect ratio Stability # Thin Run time (s)

Split 2.15 18.8 31.5 559

PbySS 2.81 30.3 37.3 8007

39

PbySize 2.90 29.04 36.0 9090

Strip 2.79 25.36 37.3 2179

SnD 1227 1.20 290 524

Squarified 1.21 60.2 7.1 1650

Table 6.1. Mean value of the algorithms for all test suites (20, 100, 200, 300, 400, 600 and 800 items), with rectangle sizes from both gaussian and zipf distribution. All attributes are averaged, except the run time, which is summed over all experiments.

6.1.2

Readability

As mentioned in the last section, table 6.1 shows that the new ordered Split layout received the best values for all test attributes when compared to the other ordered treemap layouts, (except for Slice and Dice). What remains to be shown is whether the algorithm performs an equally readable ordered treemap as does the Pivot and Strip algorithms. The readability was measured in two different ways. First the automatic readability metric mentioned in section 2.2 was measured. However, the problem with this metric is that it gives no feeling of how good the readability of an algorithm actually is. Suppose one algorithm receives the score 0.2 and another 0.1. Does that mean the first algorithm is twice as readable as the second algorithm? The methodology used when acquiring the metric as described by [1] is very heuristic and only gives a feeling of which algorithm is more readable, not how readable it actually is. To compensate for this, a similar user study as the one described by [1] was designed. Users were shown a treemap of hundred random sized rectangles. Each rectangle was labled with a number. The test then asked the user to click on random rectangles in the treemap. The elapsed time until the user clicked the correct rectangle was measured. For each of the two algorithms Split layout and Pivot by Split Size, the user was asked to click on twelve random rectangles. Figure 6.5 shows the application used in the study. The purpose of the test was to measure how easy it was to locate specific, numbered rectangles in the treemap, and hence get an estimate of how intuitive the layout was. A total of 30 people (23 men and 7 women) participated in the study. All of them stated they were experienced computer users. The participants all were between 20 and 35 years old. The time to click was measured and hence gave a total of 30 · 12 = 360 times for each algorithm. 30 participants is enough for getting statistically accurate data. However, if just a few users are unfocused which results in a very long time, the mean value is heavily affected. Therefore, due to the relative low number of inputs, the median is probably a better indication of the readability. Figure 6.6 shows that the response times for the split layout is only slightly

40

CHAPTER 6. RESULTS

Figure 6.5. User study to measure readability of the Split layout and the Pivot by Split Size. Users were asked to click twelve random rectangles, and the time to click was measured. 4 Split layout Pivot by Split Size 3.5

Time to click [s]

3

2.5

2

1.5

1

0.5

0 Mean

Median

Figure 6.6. Mean and median of the number of seconds to click for the Split and Pivot by Split Size layouts. The median is probably a better indication of the readability than the mean.

6.1. ALGORITHM EVALUATION

41

worse than that of the Pivot by Split Size layout. The median response time is 2.87 seconds for the Split layout and 2.73 seconds for the Pivot by Split Size. As argued above, the median is probably a better indication of the readability than the mean value. The results from the test indicate a slightly worse readability for the Split layout. The reason why the Pivot layout gives faster response time is probably because of the pivot. Assigning a pivot and then splitting the list in three parts generates a more consistent layout than the Split layout, which splits the list into two parts. Since the layout direction can alter between horizontal and vertical every time the list is split, the Pivot algorithm will be somewhat more predictable, since all the three sublists will be laid out in the same directions, whereas the Split layout, with only two sublists, will change direction more frequently. However, the difference in response time being only 0.14 seconds, the Split layout looks more appealing as a layout algorithm because of its better average aspect ratio and stability. As to the automatic readability metric described in section 2.2, which measures how many times the eye will have to change direction when scanning the treemap in order, the Pivot algorithm received slightly better values than the Split layout. The Pivot by Split size received a score of 0.19 and the Split layout a score of 0.11. This reflects the property of the Split layout, which changes direction more often than the Pivot layouts, that use three sub lists instead of two.

6.1.3

Treemaps on PDAs and Discussion Forums

This section will present the experiences from using the treemap as a visualization technique for threaded discussion forums. Another project member performed user studies and interviews with participants who used the sample application [8]. The interviews showed that the overall design of the treemap with threads nested inside the forum rectangles was easy to grasp for users. They learned the structure quickly and were able to solve the tasks given them. The only time when the text based interface turned out to be faster, was when the list was sorted directly on the search criterion: e.g. finding the most recent threads is trivial if the list is sorted by date. Placing important rectangles in the upper left corner is supported both by the user test and in [3]. Therefore, using the squarified treemap layout seems reasonable, as large threads are positioned in the upper left corner. Although this could be achieved with ordered treemaps algorithms too, they produce worse aspect ratios than the squarified layout. However, in the long run when extending the features of the newsgroup application, treemap ordering and stability will become important properties of the treemap algorithm used. Recurrent users will expect articles to stay in roughly the same position between successive sessions, and hence the squarified layout will probably not work satisfactory. Therefore, the ordered split layout with its good aspect ratio and stability will probably be a good choice. During the user test, it was discovered that users often wanted information on where they were in the forum hierarchy. Therefore, a label showing the name of the

42

CHAPTER 6. RESULTS

discussion forum was added to the treemap. To get the most out of the limited screen space, the name of the forum is printed on top of the rectangles. An alternative idea would be to print the name in the frame [5], but that was considered to take too much of the valuable screen. The overall result from the user studies and interviews is that aspect ratio is indeed the most important criterion when using treemaps on PDAs. The small screens makes the small rectangles thin, and bad aspect ratio would make them reduced to a single line, just a few pixels wide.

6.2

Pocket Piccolo

The reason for using Pocket Piccolo was that a quick and easy way of creating graphical object was needed. Pocket Piccolo provides built in support for interaction, scaling and zooming of the graphical objects. It was indeed easy to use the library and the interface was very intuitive. Although some features, such as rotation, that is supported on the desktop version, is removed from the pocket pc version, the built in functionality was very handy. Overall, Pocket Piccolo is a good option for prototype applications as the one used in this project, but its major drawback is the performance. For large treemaps with hundreds of rectangles, each of which represented by a PNode object, the performance was heaviliy affected in a way that would have been infeasable in a real application. C# .NET together with Pocket Piccolo hence works good for small applications, but if graphical performance is of importance to you, such as smooth scrolling and zooming of multiple resource demanding objects, native C++ code seems to be the only option.

Chapter 7

Conclusion This thesis has described a project which tried the treemap as visualization technique on PDAs. Different published treemap algorithms were implemented and compared. Additionally, a new layout algorithm was developed, called the Split Layout. The new Split layout is an ordered treemap algorithm, and studies show that the average aspect ratio and stability are better than any other published ordered treemap algorithm so far. The studies showed that the average aspect ratio improved by 23% compared to the known Pivot by Split Size algorithm. The stability increased by between 28% and 40%. An important property of ordered treemap algorithms is the time it takes to locate specific rectangles in the treemap. A user study was performed to test the readability of the new algorithm, and it showed an increase in 5% for the time it took to locate a specific object in the treemap. The median time was 2.87 seconds compared to 2.73 seconds for the Pivot by Split Size algorithm. The response time is according to the study thus slightly worse than the known ordered algorithms, but the increase in response time is so low that it is probably irrelevant when compared to the 24% improvement in average aspect ratio. Therefore, the new Split layout algorithm is likely to be a useful complement to the already published treemap algorithms. Since aspect ratio is one of the most important properties when using treemaps on small screens, the Split layout makes a good choice of algorithm on PDAs as well. Moreover, the experiences from using the treemaps as a visualization technique for discussion forums on PDAs looks promising. The benefits of using treemaps on traditional desktop computers are well known. One goal of the project was to find out whether the advantages of using treemaps instead of traditional text based interfaces are still valid on small screens. The study confirmed that although reducing the screen size to an area of only 6% of a desktop screen, the benefits of the treemap remain. The user study showed that the concept of using treemaps to visualize discussion forums was intuitive to grasp for people who had limited experience with traditional discussion forums. When compared to a text based interface, searching the forum for largest and most active threads was quicker in 43

44

CHAPTER 7. CONCLUSION

the treemap, provided the text version was not sorted on the search criterion. Since thread activity and size are important criteria for social browsing [3], using treemaps is likely to improve the user experience. Furthermore, it was discovered that aspect ratio becomes the one of the most important property when implementing treemaps on PDAs. None of the published ordered treemap algorithms (Strip and Pivot variants) produced a satisfactory result. The rectangles became too thin as the number of elements became large. The unordered, squarified layout was very appealing and good looking, but being unordered makes it less useful in many cases. The new ordered Split layout proved to be a good alternative. Overall, using treemaps to visualize discussion forums on compact displays looks highly promising and the new Split layout, applicable both to traditional desktop computers and to PDAs, will probably contribute to the flora of ordered treemap algorithms.

Bibliography [1] Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies. ACM Transactions on Graphics (TOG) archive, 21(4):833–854, 2002. ISSN 0730-0301. [2] M. Bruls, K. Huizing, and J. van Wijk. Squarified treemapss. In Proc. of Joint Eurographics and IEEE TCVG Symp. on Visualization (TCVG), pages 33–42, 2000. URL citeseer.ist.psu.edu/bruls99squarified.html. [3] Andrew T. Fiore, Scott Lee Tiernan, and Marc A. Smith. Observed behavior and perceived value of authors in usenet newsgroups: bridging the gap. In CHI, pages 323–330, 2002. [4] Murielle Florins and Jean Vanderdonckt. Graceful degradation of user interfaces as a design method for multiplatform systems. In IUI ’04: Proceedings of the 9th international conference on Intelligent user interface, pages 140–147. ACM Press, 2004. ISBN 1-58113-815-6. [5] Chintalapani G, Plaisant C, and Shneiderman B. Extending the utility of treemaps with flexible hierarchy. Proceedings of the Information Visualisation, 0:335–344, 2004. [6] Vicki L. Hanson. The user experience: designs and adaptations. In W4A: Proceedings of the international cross-disciplinary workshop on Web accessibility, pages 1–11. ACM Press, 2004. ISBN 1-58113-903-9. [7] Chang-Sung Jeong and Alex Pang. Reconfigurable disc trees for visualizing large hierarchical information space. In INFOVIS ’98: Proceedings of the 1998 IEEE Symposium on Information Visualization, pages 19–25. IEEE Computer Society, 1998. ISBN 0-8186-9093-3. [8] Malin Köksal. Visualization of threaded discussion forums on handheld devices. Master’s Thesis at NADA (In preparation), 2005. [9] George G. Robertson, Jock D. Mackinlay, and Stuart K. Card. Cone trees: animated 3d visualizations of hierarchical information. In CHI ’91: Proceedings 45

46

BIBLIOGRAPHY

of the SIGCHI conference on Human factors in computing systems, pages 189– 194. ACM Press, 1991. ISBN 0-89791-383-3. [10] Ross Sheldon. A First Course in Probability. Englewood Cliff, Prentice Hall, 1997. [11] Ben Shneiderman. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph., 11(1):92–99, 1992. ISSN 0730-0301. [12] Ben Shneiderman and Martin Wattenberg. Ordered and quantum treemaps. Information Visualization, 2001. INFOVIS 2001. IEEE Symposium on, pages 73–78, 2001. ISSN 1522-404X. [13] Martin Wattenberg. Visualizing the stock market. CHI ’99 extended abstracts on Human factors in computing systems table of contents, pages 188–189, 1999. [14] Ming Zhang, Hong Zhang, Donny Tjandra, and Stephen T. C. Wong. Dbmap: A space-conscious data visualization and knowledge discovery framework for biomedical data warehouse. IEEE Transactions on Information Technology in Biomedicine, 8(3):343–353, 2004.