Mapping Descriptive Models of Graph ... - Semantic Scholar

Mapping Descriptive Models of Graph Comprehension into Requirements for a Computational Architecture: Need for Supporting Imagery Operations B. Chandrasekaran and Omkar Lele Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 USA [email protected], [email protected]

Abstract. Psychologists have developed many models of graph comprehension, most of them descriptive, some computational. We map the descriptive models into requirements for a cognitive architecture that can be used to build predictive computational models. General symbolic architectures such as Act-R and Soar satisfy the requirements except for those to support mental imagery operations required for many graph comprehension tasks. We show how Soar augmented with DRS, our earlier proposal for diagrammatic representation, satisfies many of the requirements, and can be used for modeling the comprehension and use of a graph requiring imagery operations. We identify the need for better computational models of the perception operations and empirical data on their timing and error rates before predictive computational models can become a reality. Keywords: graph comprehension, cognitive architectures, diagrammatic reasoning, mental imagery.

1 Graph Comprehension Models In this paper, we investigate the requirements for a cognitive architecture that can support building computational models of graph comprehension. We start by reviewing research on graph comprehension models that psychologists and cognitive scientists have built over the last three decades. High-Level Information Processing Accounts. Bertin [1] proposed a semiotics-based task decomposition that anticipated the later information-processing accounts of Kosslyn [2] and Pinker [3]. These accounts provide a framework to place the rest of the research. Because of their information-processing emphasis and consequent greater relevance to computational modeling, we focus on [2] and [3]. Their proposals have much in common, though they differ in their emphases and details. They envision a process that produces a representation (“perceptual image” for Kosslyn, and "visual description" for Pinker) in visual working memory (WM), respecting Gestalt Laws and constraints such as distortions and discriminability. In [3], the visual A.K. Goel, M. Jamnik, and N.H. Narayanan (Eds.): Diagrams 2010, LNAI 6170, pp. 235–242, 2010. © Springer-Verlag Berlin Heidelberg 2010

236

B. Chandrasekaran and O. Lele

description is an organization of the image into visual objects and groups of objects – lines, points, regions, and abstract groupings corresponding to clusters. It is not clear if this visual description is purely symbolic or if it retains shape information, as Kosslyn’s images do, but keeping the shape information is essential, as we shall see. In both models, the construction of this internal visual representation is initially a bottom-up process, but soon after, it is a sequential process in which top-down and bottom-up processes are opportunistically combined: the state of the partial visual representation at any stage and the agent's goals trigger the retrieval of relevant knowledge from Long-Term Memory (LTM), which in turn directs further processes. Pinker’s account of retrieval of goal- and state-relevant information from LTM uses the idea of "schemas" (or “frames” as they are often called in AI), organized collections of knowledge about the structure of graphs, in general as well as for various graph types. Comprehension of the specific graph proceeds by filling in the “slots” – the graph type, the axes, the scales, quantities represented, etc. – in the schemas. The schema knowledge guides the agent's perception in seeking the information needed to perform this slot-filling. Shah et al. [4]1 propose that graph comprehension can be divided, at a high level, into two primary processes – a pattern recognition process followed by a bottom-up and top-down integrative cognitive process, an account consistent with the Pinker/Kosslyn accounts. Their research on attention during this process highlights the role of domain-specific background knowledge in guiding the comprehension process. Their model includes an account of learning: novice graph users, in contrast to experts, tend to spend more time understanding the graph structure (how the graph represents information). In the schema language, learning results in the details of the general schema being filled in so that only the problem-specific instantiation needs to take place, considerably speeding up the process. Perception. Graphs contain information that is encoded using graphical properties such as position, size, and angle. The accuracy of the information obtained from the graph depends on the how well humans can decode the encoded information. Cleveland and McGill (see, e.g., [5]) identify a set of “elementary graphical perception tasks,” that is, tasks which they propose are performed instantaneously with no apparent mental effort. They order them in terms of accuracy: Position along common scale, Position on identical but non-aligned scales, Length, Angle or Slope (with angle not close to 0, 90o or 180o), Area, Volume, Density, and Color Saturation. Simkin and Hastie [6] claim that the ordering is not absolute, but changes as the graphs and conditions change. They further argued that these judgment tasks were not always instantaneous, but often required a sequence of mental imagery operations such as anchoring, scanning, projection and superimposition. Human perception is good at instantly perceiving certain properties, e.g., 90o angle, the midpoint of a line segment. Anchoring is the use of such standard perceptions to estimate properties that are not standard. For example, estimating the distance of a point p on a line segment from one end of the line can be done by a series of midpoint perceptions of segments containing p. Relative lengths of segments from p to the two ends of the line can be estimated by the relative scanning durations. Projection is 1

Due to space limitations, we only cite a subset of relevant papers by many authors. The bibliographies of the cited references contain pointers to other relevant work by the same authors.

Mapping Descriptive Models of Graph Comprehension into Requirements

237

when a ray is sent out from one point in the image to another point for the purpose of securing an anchor. Superimposition is when one object is mentally moved onto another object, such as when a smaller bar might be mentally moved on to a longer one so that their bottoms align, as part of estimating the ratio of their lengths.

Fig 1. Example from [10]: Need to imagine line extension to answer question extended by the user

Gillan et al. (see, e.g., [7]) describe a model for graph comprehension, in which in addition to external perception, imagery operations are used, such as rotate and move image objects, and mental comparison of lengths of lines. Trafton et al. (see e.g., [8]) draw attention to how the individual steps of information extraction from a representation are integrated to answer a question. Their cognitive integration is functionally akin to the schema-based comprehension account of Pinker, while their visual integration includes spatial transformations and imagination. Fig. 1 is an example, where the user has to extend a line in their imagination to answer a question about a value in the future. Computational Models. The models discussed so far have been qualitative and descriptive. The two models we discuss in this section, Lohse’s [9] and Peebles and Cheng’s [10], are both computational and share many features. Both models use a goal-seeking symbolic architecture with working and long-term memories. Both the models assume an agent who knows how to use the graph, that is, the early general schema instantiation steps did not need to be modeled. That still leaves many visual search, perceptions and information integration tasks that the agent needs to perform. The models share other features as well: neither model has an internal representation of the spatiality of the graphical objects, nor use visual imagination, their examples being simple enough not to need it; and neither of them has a computational version of external perception, whose results are instead made available to the models as needed for simulation. Lohse's model could predict the times taken to answer specific questions based on empirical data on timings for the various operations, and he reported agreement between the predictions and the results, but Foster [11] did not find such an agreement in his experiments using Lohse’s model. The research in [10] used ACT-R/PM, a version of ACT-R that has a visual buffer that can contain the location, but not the full spatiality, of the diagrammatic objects in the external representation. Procedural and declarative knowledge about using the graphs is represented in ACT-R’s production rules and propositions. The research also modeled learning the symbol-variable associations and the location information,

238


reducing visual search. Predictions from the model of changes in the reaction times as questions changed matched the human performance qualitatively, but not quantitatively. Putting the Models Together. The process that takes place when a user is presented with a graph along with some questions to answer can be usefully divided into two stages: one of general comprehension in which the graph type is understood and some details instantiated, and a second stage in which the specific information acquisition goals are achieved. Initial perception organizes the image array into visual elements (Kosslyn, Shah, Trafton), at the end of which working memory contains both symbolic and diagrammatic information, the latter supporting the user's perceptual experience of seeing the graph as a collection of spatial objects in a spatial configuration (Kosslyn). After this, the two stages share certain things in common: a knowledgeguided process involving both bottom-up and top-down operations performing visual and cognitive integrations (Pinker, Shah, Trafton). Perceptions invoke (bottom-up) schema knowledge (Pinker) about graphs and graph types, which in turn guide (topdown) perceptions in seeking information to instantiate the schemas, and the processes are deployed opportunistically. Comprehension initially focuses on identifying the type of graph and then on instantiating the type: the domain, the variables and what they represent, the scales, etc. For information extraction, in simple cases, the user may apply a pre-specified sequence of operations, as in the methods of Lohse and Peebles. In more complex cases, increasingly open-ended problem solving may be necessary for visual and cognitive integration, driven by knowledge in LTM. While some of the perceptions simply require access to the external image and result in symbolic information to cognition, others require mental imagery operations (Simkin, Gillan, Trafton), such as imagining one or more of the image objects as moved, rotated, or extended, and elements combined or abstracted. Relational perception may be applied to objects some of which are the results of imagery operations. Perception may be instantaneous, or involve visual problem solving.

2 Requirements on a Cognitive Architecture to Support Modeling Graph Comprehension and Use The family of symbolic general architectures, with Soar and Act-R as two best known members, has the architectural features required for building computational models of graph comprehension, with the important exception of supporting mental imagery operations. Supporting goal-directed problem solving. The requirements for cognitive integration – combining bottom-up and top-down activities, representation of schema knowledge, instantiating the schema to build comprehension models, and using schema-encoded information acquisition strategies by deploying appropriate perceptions – can be handled by the architectural features of Act-R and Soar. Schemas are just a higher level of knowledge organization than rules and declarative sentences, and knowledge representation formalisms in Act-R and Soar can be used to implement the schemas. Both architectures have control structures that can produce a combination of bottom-up


239

(information from perception triggering retrieval of new knowledge) and top-down (newly retrieved knowledge creating new perception goals) behaviors. Appropriate knowledge can produce needed cognitive integration. Act-R and Soar also support certain types of learning, with Act-R providing more learning mechanisms than Soar. The available mechanisms can be used to capture some of the observed learning phenomena [4], as demonstrated in [10]. Supporting Imagery Operations. For imagery operations, the architectures need a working memory component with a representation functionally organizing the external or internal representation as a spatial configuration of spatial objects, tagging which objects are from the external representations and which belong to imagery. Operations to create imagery objects and configurations should be available: imagery elements may be added afresh, or may be a result of operations such as moving, rotating, or modifying existing objects so they satisfy certain spatial constraints. Diagrammatic perception operations, by which we mean relational and property perceptions after figure-ground, are to be applied to configurations of diagrammatic objects, whether the objects correspond to external objects, imagined objects or a combination. There may also be benefits to having some of the perception operators be always active without the need for cognition to specifically deploy them, so that a certain amount of bottom-up perception is always available. Such bottom-up perceptions can be especially useful in early stages. Treating Perceptions as Primitives vs modeling the computational details. The cognitive mechanisms of Act-R and Soar, especially the former, derive validation to a more or less degree – both from human problem solving experiments and from neuroimaging studies. However, there is really not much in the way of detailed computational models for perception and mental imagery operations, especially ones that would replicate timing and error data. Such computational models would have a role for pre-attentive perception as well, e.g., to explain when certain perceptions are instantaneous and when they require extended visual problem solving. Because of the lack of computational models, one approach is to treat the internal perception and imagery operators as primitives and simply program them without concern about the fidelity of their implementation with respect to human abilities. Models built in this way will be good for certain purposes, e.g., the effect on agent’s behavior of the availability or the absence of specific pieces of knowledge and strategies, and not for others, e.g., predict timing data, or perceptual learning. It should be a goal of the modeling community to develop computational models of perception and imagery operations that account for human performance, including pre-attentive phenomena. Augmenting Architectures with DRS – A Diagrammatic Representation System. We propose that the DRS representation and the associated perception and action routines reported in [12] provide the basis for augmenting the architectures in the symbolic family with an imagery capability. The DRS system as it exists only supports diagrams composed of points, curves and regions as elements, which happens to cover most of the graphs. DRS is a framework for representing the spatiality of the diagrammatic objects that result after the early stage of perception has performed a figure-ground separation. DRS is a list of internal symbols for these objects, along with a specification of their spatiality, the intended representation as a point, curve or

240


region, and any explicit labeling of the objects in the external diagram. DRS can also be used to represent diagrammatic objects in mental images, or a combination from external representation and internal images, while also keeping track of each object's provenance. A DRS representation may have a hierarchical structure, to represent any component objects of an object. The DRS system comes with an open-ended set of imagery and perception operations. The imagery operations can move, rotate or delete elements of a DRS representation, and add DRS elements satisfying certain constraints, to produce new DRS structures. Relational perception operations can be defined between elements of a DRS representation: e.g., Longer(C1, C2), Inside(R1,R2). Operators are also available to detect emergent objects when diagrammatic objects overlap or intersect. Kurup [13] has built biSoar, a bi-modal architecture, in which DRS is integrated with Soar. Matessa [14] has built Act-R models that perform diagrammatic reasoning by integrating DRS with Act-R.

4 Building a Computational Model Using DRS In this section, we will show the functional adequacy of a general symbolic cognitive architecture augmented with DRS to build a computational model for graph comprehension. We use biSoar, but we could have used Act-R plus DRS as well. We implemented the scanning, projection, anchoring and superimposition operators [6], the last three being imagery operations (that is, they create objects in WM corresponding to imaged objects). We treat scanning as instantaneous only for obvious judgments such as 50%. If the proportion is say 70%, the agent we model would perform it by a recursive mid-point algorithm until an estimate within a specified tolerance is reached. While we modeled several graph usage tasks, we will use the model for the graph in Fig. 1, where a line needed to be mentally extended so as to answer a question about future, “What might the Y value be for x = 4?”. The model starts with a DRS representation corresponding to figure-ground separated version of the external representation. This is what perception would deliver to cognition. In this example, the DRS consisted of the curves for the axes and the graph for the x-y function, the scale points, and the point for the origin. For convenience in simulation, the entire DRS representation is not in WM, rather it is kept separately as a proxy for the intensity array version of the external representation. Depending on attention, parts of this DRS will be in WM, along with diagrammatic objects resulting from imagery operations. Certain initial perceptions are automatically deployed at the beginning, that is, without any specific problem solving goal in mind. These initial perceptions are intended to model what a person familiar with the graph domain might notice when first looking at a graph, such as intersecting horizontal and vertical lines. These will serve as cues to retrieve the appropriate schemas from LTM, in this case the schema for a graph of a function in Cartesian coordinates. This schema sets up perceptions to identify the axes, the scale markers, the origin, the functional curve, and the variables. The schema also has procedures to answer classes of questions, such as the Y value for a given X, and performing trend estimates, and using them for inferring Y values for ranges of X not covered by the existing graph. The extension of the graph is now imagined and added to the DRS. The general procedure is instantiated to call for the


241

perception of the point on the extension that corresponds to x =4, which in turn called for a vertical line from x = 4 to be imagined, and an anchor point mentally created, and then a projection to be drawn to the Y-axis and another anchor to be created on Y-axis, and finally the value to be determined. The representational capabilities of biSoar and the associated perceptions and imagery operations were adequate for all steps of the process. The above model, and others we have built, display all the phenomena identified in our review: visual image in WM, visual descriptions built guided by graph schema knowledge in LTM, bottom-up and top-down processing for cognitive integration (Shah, Trafton), goal-driven problem solving, and the use of imagery operations in cognition (Simkin, Gillan, Trafton). While the level of modeling we described can be useful to investigate the role of different pieces of knowledge and certain types of learning, the true usefulness of such models is the potential to predict timing and error rates in the use of graphs, so that proposed graphs designs can be evaluated. For this we need human performance data, and, even better, computational models that reproduce human performance, on a variety of perceptions and imagery operations required for graph use. Empirical research of this sort, and deeper understanding of the underlying perceptual mechanisms are needed. Acknowledgments. This research was supported by the Advanced Decision Architectures Collaborative Technology Alliance sponsored by the U.S. Army Research Laboratory under Cooperative Agreement DAAD19-01-2-0009.

References [1] Bertin, J.: Semiology of Graphs. University of Wisconsin Press, Madison (1983) [2] Kosslyn, S.: Understanding charts and graphs. Applied Cognitive Psychology 3, 185–226 (1989) [3] Pinker, S.: A theory of graph comprehension. In: Feedle, R. (ed.) Artificial Intelligence and the Future of Testing, pp. 73–126. Erlbaum Hillsdale, New Jersey (1990) [4] Shah, P., Freedman, E.: Toward a model of knowledge-based graph comprehension. In: Hegarty, M., Meyer, B., Narayanan, H. (eds.) Diagramatic Representation and Iinference, pp. 18–31. Springer, Berlin (2002) [5] Cleveland, W.S., McGill, R.: Graphical perception and graphical methods for analyzing scientific data. Science 229(4716), 828–833 (1985) [6] Simkin, D., Hastie, R.: An information-processing analysis of graph perception. Journal of the American Statistical Association 82(398), 454–465 (1987) [7] Gillan, D.J.: A Componential model of human interaction with graphs: VII. A Review of the Mixed Arithmetic-Perceptual Model. In: Proceedings of the Human Factors and Ergonomics Society 52th Annual Meeting, pp. 829–833 (2009) [8] Trafton, J.G., Ratwani, R.M., Boehm-Davis, D.A.: Thinking graphically: extracting local and global information. Journal for Experimental Psychology 14(1), 36–49 (2008) [9] Lohse, J.: A cognitive model for the perception and understanding of graphs. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Reaching Through Technology, New Orleans, Louisiana, pp. 137–144 (1991) [10] Peebles, D., Cheng, P.C.: Modeling the effect of task and graphical representation on response latency in a graph reading task. Human Factors 45(1), 28–46 (2003)

242


[11] Foster, M.E.: Evaluating models of visual comprehension. In: Proceedings of Eurocogsci 2003: The European Cognitive Science Conference 2003. Erlbaum, Mahwah (2003) [12] Chandrasekaran, B., Kurup, U., Banerjee, B., Josephson, J.R., Winkler, R.: An architecture for problem solving with diagrams. In: Blackwell, A.F., Marriott, K., Shimojima, A. (eds.) Diagrams 2004. LNCS (LNAI), vol. 2980, pp. 151–165. Springer, Heidelberg (2004) [13] Kurup, U.: Design and Use of a Bimodal Cognitive Architecture for Diagrammatic Reasoning and Cognitive Modeling, Ph D. Thesis, Columbus. The Ohio State University, Ohio (2007) [14] Matessa, M., Archer, R., Mui, R.: Dynamic Spatial Reasoning Capability in a Graphical Interface Evaluation Tool. In: Proc. 8th International Conference on Cognitive Modeling, Ann Arbor, MI, USA, pp. 55–59 (2007)