Towards a Natural Language Driven Automated Help Desk - CiteSeerX

help desks and appears to encourage more automation in help desk service. Typically, a help ... phone call, email, chat, fax or letter) and their appropriate analysis techniques. The first .... I have follow all of the set up instructions from the book ...
68KB Sizes 2 Downloads 90 Views
Towards a Natural Language Driven Automated Help Desk Melanie Knapp1 and Jens Woch2 1

Institute of Computer Science III University of Bonn, Germany 2 Department of Computer Science University of Koblenz, Germany

Abstract. In this paper, we present the linguistic components required for a natural language driven automated help desk. This work is significant for two reasons: First, the combination of neural networks and supertagging represents a novel and very robust way to classify non-trivial user utterances. Second, we show a novel way of integrating known linguistic techniques for the analysis of user input, knowledge processing, and generation of system responses, resulting in a natural language interface both for input and output. Our approach separates domain specific, language specific and discourse specific knowledge.

1 Introduction The rapid development of technologies associated with the World Wide Web offers the possibility of a new, relatively inexpensive and effective standard user interface to help desks and appears to encourage more automation in help desk service. Typically, a help desk is defined as centralized help to users within an enterprise. Independent from the actual domain, help desks have to deal with two main problems: (1) efficient use of the know-how of an employee and (2) cost-efficient handling of many support requests. In this light, we present a natural language driven approach for modeling an automated help desk. This objective is motivated by the evaluation of support requests which showed that for 80 percent of all requests no specialized knowledge is needed. Hence, a solution database is sufficient for routine requests. Under this condition, our research concentrates on a computer-based so-called first support level. Modeling a first support level requires the definition of all processing steps in a generic help desk system. We define a system structure with three main components. Within this design we do not distinguish among various input capabilities (e.g. telephone call, email, chat, fax or letter) and their appropriate analysis techniques. The first step in finding solutions is to analyze the textual input (independent of the extraction method) and to reduce the support request to a specific problem class. The second step is to request missing task parameters from the user. If the user’s initial input is explicit, this step may be skipped. The third step in a generic help desk system is the verification of the specified solution. If the user is not satisfied with the solution, more task parameters for finding the solution must be extracted. In cases where no more task parameters This work partly is funded by the German Research Foundation.

can be asked, the user request has to be delegated to a higher support level together with the already existing query information. Our claim is that all three steps in the aforementioned generic system can be processed automatically. The automation should be based on a linguistically motivated solution, because empirical evaluations demonstrate that adaption to the user’s dialogue preference leads to significantly higher user satisfaction and task success (cf. [Litman et al., 1998]). Wizard-of-Oz experiments by Boje (cf. [Boje et al., 1999]) also point out that users of automatic dialogue systems would like to take the initiative in many dialogues instead of answering a long list of tiny little questions. For modeling user-initiative dialogue systems, one important objective is to avoid leaving a user without a clear understanding of his/her options at a given point in the dialogue. Hence, for the design of the algorithm we define the following criteria: (1) the formulation of the user request should not be restricted, (2) no unnatural breaks between the user input and the result of the computer (especially for telephone calls, real time response must be guaranteed) and (3) no further inquiries into already explicitly or implicitly mentioned facts. A first approach of modeling user-initiative in an automatic help desk is described in [Harbusch et al., 2001]. Based on that experiences, this paper presents a further developed approach. The paper is organized as follows. In the next section we describe the linguistic techniques used for the modeling of an automated help desk by delineating the components query extraction, inferencing and feedback generation, as well as their integration. Since this is work in progress, the paper closes with a discussion of open problems and future work.

2 Architecture of a natural language driven automated help desk

INPUT text


In this section we discuss the three tasks query extraction, inferencing and feedback generation and the difficulties which arise under the constraints of the aforementioned criteria for user-initiative dialogue systems.

Linguistic KB (TAG)

Domain KB (PROLOG)


Inference Machine

Discourse KB (TAG)


OUPTPUT Generation


Fig. 1. Design of the automated help desk approach

We propose a system design illustrated in Fig. 1 which combines four main techniques. Starting with the text of the user input (analysis depends on the input medium),

the artificial neural networks (ANN) allow a flexible classification of the user input to a specific problem class without any restrictions. Thereafter, the classified problem together with the input text is processed by a supertagger. The result is a logical representation of the user input. After that, the inference mechanism is used to extract all missing task parameters for a finer restriction of the problem class (if necessary), as well as to find a solution insofar it is supported by the prolog-based domain knowledge. Finally, the integrated automated text generation component, based on discourse and syntactic knowledge as well as on the output of the inference mechanism, generates a natural language utterance which, depending on the medium, is either printed out, e-mailed, or sent through a speech synthesizer. A detailed illustration of the linguistic techniques follows in the next sections. 2.1 Artificial Neural Networks (ANN) For our approach the use of neural networks is motivated by the demand of a user initiative system. Neural networks allow the design of an unrestricted user interface. Supporting the user with a free natural language formulated problem specification increases the acceptance of such systems. On the other hand, it requires a great deal of energy to prepare the training and test corpus for a new domain. The context and consequently the importance of words is measured by a hierarchy of recurrent plausibility networks ([Wermter, 1995]). Such a neural net (NN), which basically compares to simple recurrent networks by [Elman, 1990] consists - in addition to one input and one output layer - of n > 0 hidden layers, each of which has recursive link(s) to its context layers. For the classification of the user input to a problem class, the following steps must be executed to design an ANN: – The n main problem classes must be specified for a support domain and sample dialogues must be labeled in order to refer to their correct problem class. – A reduced vocabulary for special word groups (i.e. general concepts) must be defined. – Each word group w must be represented by a significance vector (c1 , c2 , c3 , : : :, cn ) with ci corresponding to one of the n problem classes. For each ci of a word group w the significance is computed by:





frequency of a word from w within class j


(frequency of


words from w within class i)

– Design of the net topology and training of the neural network with the labeled dialogues. In order to build a prototypical system we have labeled 379 dialogues with the correct problem class out of the following seven problem classes in the domain computer hardware (note that the three major classes were intentionally selected to be easily differentiated by the topmost neural net (NN), the subclasses are selected to lay closely together to prevent all NNs from simply reacting to some key words but nevertheless enforce the learning of differentiating significance vectors):

a) Problems with the hard disk(s): (1) hard disk’s size identified not correctly (2) two hard disks interfere with each other (3) other hard disk problem b) Problems with the monitor: (4) monitor glimmering (5) colour shifting (6) other monitor problem c) (7) Other hardware problems

Any class has its own co-set (i.e. main-rest, monitor-rest and disk-rest). See Fig. 2 for an illustration of the hierarchy of plausibility nets which divides the classification into four problem classes (class 1, 2, 4, 5) and 3 cosets, respectively, at the individual levels in the hierarchy (class 3, 6, 7).










Fig. 2. Hierarchy of three plausibility nets

For our domain, we have defined 131 word groups, i.e. general concepts in this domain (such as ”cable”, ”monitor”, ”setup”, : : :) with a total of 616 English and German words which can be considered being more or less synonymous to a concept. The table below lists some of the defined general concepts. word groups cable monitor setup

corresponding words cable, connection, : : : monitor, screen, TFT, : : : setup, install, uninstall, : : :

Followed by the computation of all significance vectors of the word groups, examples for some concepts are outlined in the following table: word group w hard disk monitor setup

1 .40 .01 .28

2 .26 .02 .13

significance vector 3 4 5 .27 .00 .00 .01 .28 .33 .19 .01 .01

6 .00 .34 .01

7 .07 .02 .36

The examples illustrate that words of individual word classes are more likely to occur in specific problem classes (e.g., ’setup’ has a high probability of occurring in the context of a hard-disk problem or a general problem of the main rest-class and only has a low probability of occurring in the context of a monitor problem).

Hence, a text is represented by a sequence of significance vectors. Although different words could be theoretically represented by the same significance vector, the probability is small that such a vector sequence describes different phrases. After a series of tests, we dimensioned each recurrent plausibility network for our 7 problem classes into an input layer with 7 nodes, one hidden layer with 5 nodes, one context layer with 5 nodes connected to the hidden layer, and an output layer with 3 nodes. The topology of the hierarchical structure directly depends on the overall number of problem classes in the respective domain and is a free parameter of the system to increase its ability to classify more reliably. We have trained our system with 242 of the labeled and reduced turns and tested it with 137 randomly taken test dialogues. The system classifies quite reliably on all three levels of the hierarchy on the basis of context consideration. Particularly for the two sub-networks so far, our results are promising (cf. [Harbusch et al., 2001]). 2.2 Supertagging While ANNs of reasonable complexity are capable of taking context into account, they are restricted in recognizing the grammatical structure of the context. For example, in I have no problems with my screen the ANN would classify that there is a problem with the user’s screen – much to the contrary of what the user actually said. In Cannot get this printer to work with this computer. I have follow all of the set up instructions from the book and on the screen and still nothing. Can you help? the classification between printer, computer and screen is not obvious without deeper analysis. Parsing, and even partial parsing of the input would help but cannot be applied to solve the problem, since the user input most often is incomplete and/or grammatically incorrect. Assigning an elementary structure (supertag) to each lexical item of the user’s input, such that the sequence of supertags is the most probable combination of structures a parser would need to actually parse the input, is the job of a supertagger [Joshi and Srinivas, 1994]. In our architecture, the sequence of tokens of the user’s utterance (be it spoken, or written) are tagged with a supertagger which is trained on a domain specific corpus. Those supertags are elementary structures of a lexicalized Tree Adjoining Grammar (LTAG, cf. [Schabes et al., 1988]). The result of the supertagger is aligned with the result of the neural network. If no token of the input sequence matches with the classification of the neural network, i.e. there is no anchor which has the same label as the classification, then either the classification or the supertagging failed, and the user is requested to paraphrase his statement. Otherwise, the anchor is analyzed in its structural context given by the supertag combination, i.e., the input is partially and only “almost” parsed [Srinivas, 1997]. This is no real parsing, since only the combination is checked whether the input could be derived from it. With respect to the screen example above, negation of NPs could be applied to the input, as shown in Fig. 3. Here, the selective adjoining at I3 allows negation. The complete structure now reveals, that the user in fact does not have a problem with his screen. The system does not know at this point what the problem actually is and should try to re-iterate on this question, but for now it suffices to state that the classification


S NP VP V NP have

I2: NP N i

I3: NP {ANEG } N PP problem



with my screen

Fig. 3. A combination of supertags for I have no problems with my screen. (A ’*’ marks the foot node of auxiliary trees)

of having a problem with the screen, as suggested by the neural network, should be rejected. 2.3 Inference mechanism The domain knowledge is represented as a prolog database which can be queried about facts or things to do and can be extended with new facts derived from the conversation with customers. Generally, facts and clauses are applicable. How the domain knowledge itself is modeled exactly is irrelevant as long as the set of predicates \ anchors is not empty and their connection is meaningful to the domain knowledge; in other words, the naming of predicates determines what’s utterable and what’s not. For example, the rules

flicker(screen; occasional) := check(cables) flicker(screen; permanent) := switch off (fridge)

(1) (2)

state the recommendation that the cables should be checked if the screen flickers occasionally, or the fridge should be switched off if it flickers permanently. The mapping from the tagged word list to a logical form happens on a simple but effective way: each anchor of the supertags is interpreted as a logical predicate. Additionally, adjoining is interpreted as bracketing, as well as valence positions of verbs. Conjunctions are interpreted as equivalent logical operations, and so on. Thus, the logical representation have(i; no(problem; PP )) is derived from the supertags as depicted in Fig. 3 and sent to the prolog machine for further inferencing. This approach does not map any supertag set to a correct logical form, and that is why and where the syntactic realization of rules do play a role in modeling domains, but it turns out to be mostly sufficient for the kind of requests common in first support level conversations. However, the logical form equivalence problem (cf. [Shieber, 1993]) arises here, i.e. different logical forms of the same semantics lead to different surface realizations. This is an inherent weakness of the proposed mapping to logical formulae, since it kills portions of the generator’s flexibility.1 1

It has to be investigated whether generic mapping or at least extended lexical choice could help to remedy this gap. For example, in rule (2), sw itch of f should be mapped to grammar structures with different suffix positions.

In this way, the domain knowledge (the so-called what-to-say knowledge) is represented as a set of prolog clauses and a set of corresponding grammar structures, and is therefore separated from the linguistic and discourse knowledge (the so-called howto-say knowledge). Inference mechanisms such as backtracking on solutions or decisions about what to do next, are implicitly given by the prolog machine. Thus, surfing through the problem space (discourse) is inherently guided by the prolog clauses. The advantages obviously are a relatively high domain independence: Switching the prolog and TAG database and extending the linguistic database for domain specific vocabulary are the only steps required to adapt the help desk to another domain. Additionally, the possibility of automatically checking the domain knowledge base for consistency by a theorem prover helps immensely to reduce maintenance costs. The output of the inference mechanism is then fed to the generation process without further need of processing. The generator in principle does a reverse mapping by interpreting the predicates of the logical input as being anchors of lexicalized TAGs (see below for an extended example).

2.4 Automated text generation The generation of the system’s response is based on an integrated natural language generation system (cf. [Harbusch and Woch, 2002] in this proceeding). Basically, in an integrated or uniform generation system the linguistic knowledge is represented in the same formalism as the domain specific knowledge, i.e. the what-to-say component, and runs the same processing unit. A main advantage of such a system is that negotiation strategies on revisions can easily be imposed. This means that any communication between generation components is modeled implicitly by the overall decision making and backtracking mechanisms according to competing rules taken from the individual knowledge bases (i.e. no explicit communication language is required). In this approach the rules that cause the production of sentence initial elements, i.e. rules that are leftbranching are collected on a lower level than right-branching rules. If a specific solution, which applied rules on a lower level, cannot continue with a new piece of input then, according to the general strategy, the more fine-grained rules are tried before more general decisions are backtracked. Thus, an overall solution is found with as few revisions as possible. Therefore, as in hierarchical constraint satisfaction systems, a fine-grained hierarchy of rules is assumed within any component. This means that any rule belongs to a hierarchical level of the respective component indicating how general or specific the rule is. According to these levels the granularity of rules becomes comparable. The generation process tries to find an optimal solution by satisfying as much general rules over the components as possible. In cases of backtracking, more fine-grained rules are revised before more general rules are considered. The definition of these hierarchies is done by the provider and leads to differently behaving systems. A strictly sequential model for conceptualization, micro-planning and sentence formulation results from defining three hierarchical levels. All conceptual rules are put on the most general level, all micro-planning rules on the second level and all sentence formulation rules comprise the set of the most fine-grained rules. Hence, the overall system

will first apply all conceptualization rules followed by all applicable micro-planning rules and finally the syntactic shaping is done2 .

I3: NP [R=1













I2: NP [R=1



N screen

S REQ I6: NP [R=2

VP [R=1


VP [R=1




flicker occasionally




N fridge


Fig. 4. A grammar fragment for the surface realization of rule (1)

Fig. 4 shows the grammar for the mapping of rule (1) to If the screen flickers then check the cables. The features for the syntactic alignment of number, gender, case etc. are omitted in the picture. After gathering all relevant trees according to the mapping of rule (1), the generator tries to build up a structure by applying any tree until none is applicable anymore (cf. [Harbusch and Woch, 2002]). Although each tree is tried, I6 fails because of the failing unification of the attached feature R which is responsible for the correct selection according to the logical input. Thus, If the fridge flickers occasionally then check the screen is prevented.3 In summary, the generation component is able to perform conceptualization and formulation tasks in an integrated approach. The advantage, besides those for the generation process itself, is the relinquishment of an explicit dialogue graph, whose poor flexibility vis-a-vis modifying the domain knowledge is a well-known problem. 2


Another strategy is incrementality. In its simplest case, no lattice is specified at all, i.e. any rule is as general as another and so any rule of any component is applied as soon as possible. Thus, the parts of an utterance can completely be formulated out of already available constituents whereas some parts still undergo sentential planning and some constituents are not yet handed to the system. As known from incremental systems, already made local decisions about a prefix of the whole utterance can lead to dead-end situations which cannot resolve without rule revision. Those situations are fixed by trying other rules of the same level before higher level rules are revised. The abdication of such features is possible, but one would be forced to write less decomposed and therefore more redundant grammars which eventually culminates in highly specialized trees for each and every rule.

3 Conclusions and future work In this paper, we have developed an architecture for a natural language driven automated help desk by addressing and integrating its specific requirements. The problem of providing a less restrictive, more user-initiative input has been tackled twofold: – A neural network captures the problem of classifying the user input according to significance vectors specific to the domain knowledge. – A supertagger supports the classification by considering parts of the sentence’s structure. Completeness is not required at this stage, particularly if the user input is spoken language which more often is incomplete and/or grammatically incorrect. In general, [Litman et al., 1998] and [Boje et al., 1999] have shown that natural language interfaces do have a positive impact on the user acceptance, which in turn is profitable for the supporters. Whether or not the supertagger may eventually replace the use of the neural network completely (and thereby remedy the need of its training) is part of our current research. For our simple domain, neural networks were serviceable, but their adjustment to other and probably bigger domains substantially equires a complete rewrite (on account of their topology, labeling and training) with unpredictible success. The problem of generating system output is strongly related to the problem of representing knowledge. We have shown how domain knowledge (what to say) has been separated from linguistic and discourse knowledge (how to say it) and provided a mechanism for generating natural language sentences on the basis of the three of them. Switching the domain has little impact on the system. However, the problem of different lexicons per domain is not solved yet, but there is hope that by the time the growth of the lexicon approximates zero. As a side effect, the dialogue graph, a common component of other help desk systems, which functions as a guide through the problem space of the domain, is realized implicitly, thereby remedying the problems associated with extending the domain knowledge: – In the case of spoken output the new utterances are provided by the same “speaker”, i.e. a speech synthesizer, and – the dialogue model is not affected if the domain knowledge is extended. Additionally, the consistency of domain knowledge can be automatically checked by a theorem prover. Therefore, we expect a significant reduction in maintenance costs for companies. Whether or not the realization of such a system is profitable depends on the proportion of first support level requests in relation to the overall support burden of a company. However, the tight interconnection of prolog clauses and supertags uses uncommon formalisms and probably needs familiarization for adopters. As already mentioned above, the paper describes work in progress, thus we do not have any third-party experiences on that topic. The system’s architecture in general has been developed with the goal of modularity in mind. By simply switching the input and output modules the system can be adapted to

a wide range of different media. Thus, adapting the system to email driven information systems, web based chat applications, or telephony services impacts neither the domain nor on the system’s inherent processing characteristics. Whether the output is printed or spoken, is not just a matter of feeding a speech synthesizer: Despite the fact that high quality speech synthesis is not near at hand and therefore it might be necessary to enrich the string with control sequences, evidence ([Nass and Lee, 2001]) has been given that the user acceptance is highly influenced by prosodic parameters. However, studies have yet to be made whether the analysis of the actually spoken user input suffices to parameterize the speech synthesis in real time to gain better user acceptance.

References [Boje et al., 1999] Boje, J., Wirén, M., Rayner, M., Lewin, I., Carter, D., and Becket, R. (1999). Language-processing strategies and mixed-initiative dialogues. In Procs. of IJCAI-99 Workshop on Knowledge and Reasoning in Practical Dialogue Systems. [Elman, 1990] Elman, J. (1990). Finding structure in time. Cognitive Science, 14:179–211. [Harbusch et al., 2001] Harbusch, K., Knapp, M., and Laumann, C. (2001). Modelling userinitiative in an automatic help desk system. In Proc. of the 6th Natural Language Processing Pacific Rim Symposium, Tokyo, Japan. in press. [Harbusch and Woch, 2002] Harbusch, K. and Woch, J. (2002). Integrated natural language generation with schema-tree adjoining grammars. In Procs. of the 3rd International Conferences on Intelligent Text Processing and Computational Linguistics (CICLING), Mexico, Mexico City. Springer-Verlag. forthcoming. [Joshi and Srinivas, 1994] Joshi, A. K. and Srinivas, B. (1994). Disambiguation of super parts of speech (or supertags): Almost parsing. In Nagao, M., editor, Procs. of the 15th International Conference on Computational Linguistics (COLING), volume 2, pages 154–160, Kyoto, Japan. [Litman et al., 1998] Litman, D. J., Pan, S., and Kearns, M. S. (1998). Evaluating response strategies in an web-based spocken dialogue agent. In Procs. of the 36th Annual Meeting of the ACL and the 17th COLING, Montreal, Canada. [Nass and Lee, 2001] Nass, C. and Lee, K. M. (2001). Does computer generated speech manifest personality? experimental test of of recognition, similarity-attraction, and consistencyattraction. Journal of Experimental Psychology, Applied. in press. [Schabes et al., 1988] Schabes, Y., Abeillé, A., and Joshi, A. K. (1988). Parsing strategies with ’lexicalized’ grammars. In Hajicova, E., editor, Procs. of the 12th International Conference on Computational Linguistics (COLING), volume 1, Budapest, Hungary. [Shieber, 1993] Shieber, S. M. (1993). The problem of logical–form equivalence. Computational Linguistics, 19(1):179–190. [Srinivas, 1997] Srinivas, B. (1997). Performance evaluation of supertagging for partial parsing. In Procs. of the 5th International Workshop on Parsing Technologies (IWPT), Boston/USA. [Wermter, 1995] Wermter, S. (1995). Hybrid connectionist natural language processing. Chapman and Hall, International Thomson Computer Press, London, UK.