Fundamentals of Computer Programming with C - Intro Programming ...

Fundamentals of Computer Programming with C# (The Bulgarian C# Programming Book) by Svetlin Nakov & Co. http://www.introprogramming.info ISBN: 978-954-400-773-7 ISBN-13: 978-954-400-773-7 (9789544007737) ISBN-10: 954-400-773-3 (9544007733) Pages: 1122 Language: English Published: Sofia, 2013

Tags: book; free book; ebook; e-book; programming; computer programming; programming concepts; programming principles; tutorial; C#; John Smith"; age = 25;

Initialization of Variables The word initialization in programming means specifying an initial value. When setting value to variables at the time of their declaration we actually initialize them.

Default Variable Values Each Hello string."; int y = x; In the example we assign value 6 to the variable x. On the second line we assign a text literal to the variable helloString, and on the third line we copy the value of the variable x to the variable y.

Chapter 3. Operators and Expressions

149

Cascade Assignment The assignment operator can be used in cascade (more than once in the same expression). In this case assignments are carried out consecutively from right to left. Here’s an example:

int x, y, z; x = y = z = 25; On the first line in the example we initialize three variables and on the second line we assign them the value 25. The assignment operator in C# is "=", while the comparison operator is "==". The exchange of the two operators is a common error when we are writing code. Be careful not to confuse the comparison operator and the assignment operator as they look very similar.

Compound Assignment Operators Except the assignment operator there are also compound assignment operators. They help to reduce the volume of the code by typing two operations together with an operator: operation and assignment. Compound operators have the following syntax:

operand1 operator = operand2; The upper expression is like the following:

operand1 = operand1 operator operand2; Here is an example of a compound operator for assignment:

int x = 2; int y = 4; x *= y; // Same as x = x * y; Console.WriteLine(x); // 8 The most commonly used compound assignment operators are += (adds value of operand2 to operand1), -= (subtracts the value of the right operand from the value of the left one).Other compound assignment operators are *=, /= and %=. The following example gives a good idea of how the compound assignment operators work:

int x = 6;

150

Fundamentals of Computer Programming with C#

int y = 4; Console.WriteLine(y *= 2); int z = y = 3;

// 8 // y=3 and z=3

Console.WriteLine(z); Console.WriteLine(x |= 1); Console.WriteLine(x += 3); Console.WriteLine(x /= 2);

// // // //

3 7 10 5

In the example, first we create the variables x and y and assign them values 6 and 4. On the next line we print on the console y, after we have assigned it a new value using the operator *= and the literal 2.The result of the operation is 8. Further in the example we apply the other compound assignment operators and print the result on the console.

Conditional Operator ?: The conditional operator ?: uses the Boolean value of an expression to determine which of two other expressions must be calculated and returned as a result. The operator works on three operands and that is why it is called ternary operator. The character "?" is placed between the first and second operand, and ":" is placed between the second and third operand. The first operand (or expression) must be Boolean, and the next two operands must be of the same type, such as numbers or strings. The operator ?: has the following syntax:

operand1 ? operand2 : operand3 It works like this: if operand1 is set to true, the operator returns as a result operand2. Otherwise (if operand1 is set to false), the operator returns as a result operand3. During the execution, the value of the first argument is calculated. If it has value true, then the second (middle) argument is calculated and it is returned as a result. However, if the calculated result of the first argument is false, then the third (last) argument is calculated and it is returned as a result.

Conditional Operator "?:" – Example The following example shows the usage of the operator "?:":

int a = 6; int b = 4; Console.WriteLine(a > b ? "a>b" : "bPeter 21 GamesC# Java Sample output:

Peter 21 Games C# Java 11. Write a program that deletes all words that begin with the word "test". The words will contain only the following chars: 0…9, a…z, A…Z. 12. A text file words.txt is given, containing a list of words, one per line. Write a program that deletes in the file text.txt all the words that occur in the other file. Catch and handle all possible exceptions. 13. Write a program that reads a list of words from a file called words.txt, counts how many times each of these words is found in another file text.txt, and records the results in a third file – result.txt, but before that, sorts them by the number of occurrences in descending order. Handle all possible exceptions.

638


Solutions and Guidelines 1.

Use the examples discussed in this chapter. Use the using structure to ensure proper closing of the input and the resulting stream.

2.

You will have to first read the input file line by line and save it in the resulting file in overwrite mode. Then you must open the second input file and save it line by line in the result file in append mode. To create a StreamWriter in overwrite / use mode use the appropriate constructor (find it in MSDN). An alternative way is to read both files in a string with ReadToEnd(), store them in memory and save them in the resulting file. This approach does not work for large files (the likes of several gigabytes).

3.

Follow the examples in this chapter. Think of how you would solve the task if the file were large (several gigabytes).

4.

Follow the examples in this chapter. You will have to open both files simultaneously and read them line by line in a loop. If you reach the end of the (read null) file, that does not match the other’s, that means that both files have different number of rows and an exception should be thrown.

5.

Read the first line of the file and create a matrix with the specified size. After that read the following lines, line by line and separate the numbers. Then save them in the corresponding (row, column) in the matrix. Finally, find the sub-matrix using two nested loops.

6.

Write each read name in a list (List), then sort it properly (look for information on the method Sort()) and then print it in the result file.

7.

Read the file line by line and use the methods of the class String. If you load the entire file into memory, instead of reading it line by line, problems when loading large files might occur.

8.

For every occurrence of “start”, check if that is the whole word or just a part of it.

9.

Look at the examples in this chapter.

10. Read the input file character by character. When you encounter a "", that means a closing tag. All characters you encounter outside of the tags build up the text that must be extracted. You can store it in StringBuilder and print its contents when you encounter "the type of the values in nodes /// public class TreeNode { // Contains the value of the node private T value; // Shows whether the current node has a parent or not private bool hasParent; // Contains the children of the node (zero or more) private List children; /// Constructs a tree node /// the value of the node public TreeNode(T value) { if (value == null) { throw new ArgumentNullException( "Cannot insert null value!"); } this.value = value; this.children = new List(); } /// The value of the node public T Value

Chapter 17. Trees and Graphs

685

{ get { return this.value; } set { this.value = value; } } /// The number of node's children public int ChildrenCount { get { return this.children.Count; } } /// Adds child to the node /// the child to be added public void AddChild(TreeNode child) { if (child == null) { throw new ArgumentNullException( "Cannot insert null value!"); } if (child.hasParent) { throw new ArgumentException( "The node already has a parent!"); } child.hasParent = true; this.children.Add(child); } /// /// /// ///

Gets the child of the node at given index the index of the desired child

686


/// the child on the given position public TreeNode GetChild(int index) { return this.children[index]; } } /// Represents a tree >the type of the values in the /// tree public class Tree { // The root of the tree private TreeNode root; /// Constructs the tree /// the value of the node public Tree(T value) { if (value == null) { throw new ArgumentNullException( "Cannot insert null value!"); } this.root = new TreeNode(value); } /// Constructs the tree /// the value of the root node /// the children of the root /// node public Tree(T value, params Tree[] children) : this(value) { foreach (Tree child in children) { this.root.AddChild(child.root); } } /// /// The root node or null if the tree is empty ///


public TreeNode Root { get { return this.root; } } /// Traverses and prints tree in /// Depth-First Search (DFS) manner /// the root of the tree to be /// traversed /// the spaces used for /// representation of the parent-child relation private void PrintDFS(TreeNode root, string spaces) { if (this.root == null) { return; } Console.WriteLine(spaces + root.Value); TreeNode child = null; for (int i = 0; i < root.ChildrenCount; i++) { child = root.GetChild(i); PrintDFS(child, spaces + " "); } } /// Traverses and prints the tree in /// Depth-First Search (DFS) manner public void TraverseDFS() { this.PrintDFS(this.root, string.Empty); } } /// /// Shows a sample usage of the Tree class /// public static class TreeExample {

687

688


static void Main() { // Create the tree from the sample Tree tree = new Tree(7, new Tree(19, new Tree(1), new Tree(12), new Tree(31)), new Tree(21), new Tree(14, new Tree(23), new Tree(6)) ); // Traverse and print the tree using Depth-First-Search tree.TraverseDFS(); // Console output: // 7 // 19 // 1 // 12 // 31 // 21 // 14 // 23 // 6 } } How Does Our Implementation Work? Let’s discuss the given code a little. In our example we have a class Tree, which implements the actual tree. We also have a class TreeNode, which represents a single node of the tree. The functions associated with node, like creating a node, adding a child node to this node, and getting the number of children, are implemented at the level of TreeNode. The rest of the functionality (traversing the tree for example) is implemented at the level of Tree. Logically dividing the functionality between the two classes makes our implementation more flexible. The reason we divide the implementation in two classes is that some operations are typical for each separate node (adding a child for example),


689

while others are about the whole tree (searching a node by its number). In this variant of the implementation, the tree is a class that knows its root and each node knows its children. In this implementation we can have an empty tree (when root = null). Here are some details about the TreeNode implementation. Each node of the tree consists of private field value and a list of children – children. The list of children consists of elements of the same type. That way each node contains a list of references to its direct children. There are also public properties for accessing the values of the fields of the node. The methods that can be called from code outside the class are: - AddChild(TreeNode child) – adds a child - TreeNode GetChild(int index) – returns a child by given index - ChildrenCount – returns the number of children of certain node To satisfy the condition that every node has only one parent we have defined private field hasParent, which determines whether this node has parent or not. This information is used only inside the class and we need it in the AddChild(Tree child) method. Inside this method we check whether the node to be added already has parent and if so we throw and exception, saying that this is impossible. In the class Tree we have only one get property TreeNode Root, which returns the root of the tree.

Depth-First-Search (DFS) Traversal In the class Tree is implemented the method TraverseDFS(), that calls the private method PrintDFS(TreeNode root, string spaces) , which traverses the tree in depth and prints on the standard output its elements in tree layout using right displacement (adding spaces). The Depth-First-Search algorithm aims to visit each of the tree nodes exactly one. Such a visit of all nodes is called tree traversal. There are multiple algorithms to traverse a tree but in this chapter we will discuss only two of them: DFS (depth-first search) and BFS (breadth-first search). The DFS algorithm starts from a given node and goes as deep in the tree hierarchy as it can. When it reaches a node, which has no children to visit or all have been visited, it returns to the previous node. We can describe the depth-first search algorithm by the following simple steps: 1. Traverse the current node (e.g. print it on the console or process it in some way). 2. Sequentially traverse recursively each of the current nodes’ child nodes (traverse the sub-trees of the current node). This can be done by a recursive call to the same method for each child node.

690


Creating a Tree We to make creating a tree easier we defined a special constructor, which takes for input parameters a node value and a list of its sub-trees. That allows us to give any number of arguments of type Tree (sub-trees). We used exactly the same constructor for creating the example tree.

Traverse the Hard Drive Directories Let’s start with another example of tree: the file system. Have you noticed that the directories on your hard drive are actually a hierarchical structure, which is a tree? We have folders (tree nodes) which may have child folders and files (which both are also tree nodes). You can think of many real life examples, where trees are used, right? Let’s get a more detailed view of Windows file system. As we know from our everyday experience, we create folders on the hard drive, which can contain subfolders and files. Subfolders can also contain subfolders and so on until you reach certain max depth limit. The directory tree of the file system is accessible through the build in .NET functionality: the class System.IO.DirectoryInfo. It is not present as a >the directory to be traversed /// the spaces used for representation /// of the parent-child relation private static void TraverseDir(DirectoryInfo dir, string spaces) { // Visit the current directory Console.WriteLine(spaces + dir.FullName); DirectoryInfo[] children = dir.GetDirectories(); // For each child go and visit its sub-tree foreach (DirectoryInfo child in children) { TraverseDir(child, spaces + " "); } } /// /// Traverses and prints given directory recursively /// /// the path to the directory /// which should be traversed static void TraverseDir(string directoryPath) { TraverseDir(new DirectoryInfo(directoryPath), string.Empty); } static void Main()

691

692


{ TraverseDir("C:\\"); } } As we can see the recursive traversal algorithm of the content of the directory is the same as the one we used for our tree. Here we can see part of the result of the traversal:

C:\ C:\Config.Msi C:\Documents and Settings C:\Documents and Settings\Administrator C:\Documents and Settings\Administrator\.ARIS70 C:\Documents and Settings\Administrator\.jindent C:\Documents and Settings\Administrator\.nbi C:\Documents and Settings\Administrator\.nbi\downloads C:\Documents and Settings\Administrator\.nbi\log C:\Documents and Settings\Administrator\.nbi\cache C:\Documents and Settings\Administrator\.nbi\tmp C:\Documents and Settings\Administrator\.nbi\wd C:\Documents and Settings\Administrator\.netbeans C:\Documents and Settings\Administrator\.netbeans\6.0 … Note that the above program may crash with UnauthorizedAccessException in case you do not have access permissions for some folders on the hard disk. This is typical for some Windows installations so you could start the traversal from another directory to play with it, e.g. from "C:\Windows\assembly".

Breadth-First Search (BFS) Let’s have a look at another way of traversing trees. Breadth-First Search (BFS) is an algorithm for traversing branched >the path to the directory /// which should be traversed static void TraverseDir(string directoryPath) { Queue visitedDirsQueue = new Queue(); visitedDirsQueue.Enqueue(new DirectoryInfo(directoryPath)); while (visitedDirsQueue.Count > 0) { DirectoryInfo currentDir = visitedDirsQueue.Dequeue(); Console.WriteLine(currentDir.FullName); DirectoryInfo[] children = currentDir.GetDirectories(); foreach (DirectoryInfo child in children) { visitedDirsQueue.Enqueue(child); } } } static void Main() { TraverseDir(@"C:\");

694


} } If we start the program to traverse our local hard disk, we will see that the BFS first visits the directories closest to the root (depth 1), then the folders at depth 2, then depth 3 and so on. Here is a sample output of the program:

C:\ C:\Config.Msi C:\Documents and C:\Inetpub C:\Program Files C:\RECYCLER C:\System Volume C:\WINDOWS C:\wmpub C:\Documents and C:\Documents and C:\Documents and …

Settings

Information Settings\Administrator Settings\All Users Settings\Default User

Binary Trees In the previous section we discussed the basic structure of a tree. In this section we will have a look at a specific type of tree – binary tree. This type of tree turns out to be very useful in programming. The terminology for trees is also valid about binary trees. Despite that below we will give some specific explanations about thus structure. Binary Tree – a tree, which nodes have a degree equal or less than 2 or we can say that it is a tree with branching degree of 2. Because every node’s children are at most 2, we call them left child and right child. They are the roots of the left sub-tree and the right sub-tree of their parent node. Some nodes may have only left or only right child, not both. Some nodes may have no children and are called leaves. Binary tree can be recursively defined as follows: a single node is a binary tree and can have left and right children which are also binary trees.

Binary Tree – Example Here we have an example of binary tree. The nodes are again named with some numbers. An the figure we can see the root of the tree – "14", the left sub-tree (with root 19) and the right sub-tree (with root 15) and a right and left child – "3" and "21".


695

Root node Right child

17

Left subtree 9 6

Right child

15 5

8

10 Left child

We have to note that there is one very big difference in the definition of binary tree from the definition of the classical tree – the order of the children of each node. The next example will illustrate that difference:

19

19

23

23

On this figure above two totally different binary trees are illustrated – the first one has root "19" and its left child "23" and the second root "19" and right child "23". If that was an ordinary tree they would have been the same. That’s why such tree we would illustrate the following way:

19

23 Remember! Although we take binary trees as a special case of a tree structure, we have to notice that the condition for particular order of children nodes makes them a completely different structure.

Binary Tree Traversal The traversal of binary tree is a classic problem which has classical solutions. Generally there are few ways to traverse a binary tree recursively:

696


- In-order (Left-Root-Right) – the traversal algorithm first traverses the left sub-tree, then the root and last the left sub-tree. In our example the sequence of such traversal is: "23", "19", "10", "6", "21", "14", "3", "15". - Pre-order (Root-Left-Right) – in this case the algorithm first traverses the root, then the left sub-tree and last the right sub-tree. The result of such traversal in our example is: "14", "19", "23", "6", "10", "21", "15", "3". - Post-order (Left-Right-Root) – here we first traverse the left subtree, then the right one and last the root. The result after the traversal is: "23", "10", "21", "6", "19", "3", "15", "14".

Recursive Traversal of Binary Tree – Example The next example shows an implementation of binary tree, which we will traverse using the in-order recursive scheme.

using System; using System.Collections.Generic; /// Represents a binary tree /// Type of values in the tree public class BinaryTree { /// The value stored in the curent node public T Value { get; set; } /// The left child of the current node public BinaryTree LeftChild { get; private set; } /// The right child of the current node public BinaryTree RightChild { get; private set; } /// Constructs a binary tree /// the value of the tree node /// the left child of the tree /// the right child of the tree /// public BinaryTree(T value, BinaryTree leftChild, BinaryTree rightChild) { this.Value = value; this.LeftChild = leftChild; this.RightChild = rightChild; }


697

/// Constructs a binary tree with no children /// /// the value of the tree node public BinaryTree(T value) : this(value, null, null) { } /// Traverses the binary tree in pre-order public void PrintInOrder() { // 1. Visit the left child if (this.LeftChild != null) { this.LeftChild.PrintInOrder(); } // 2. Visit the root of this sub-tree Console.Write(this.Value + " "); // 3. Visit the right child if (this.RightChild != null) { this.RightChild.PrintInOrder(); } } } /// /// Demonstrates how the BinaryTree class can be used /// public class BinaryTreeExample { static void Main() { // Create the binary tree from the sample BinaryTree binaryTree = new BinaryTree(14, new BinaryTree(19, new BinaryTree(23), new BinaryTree(6, new BinaryTree(10), new BinaryTree(21))), new BinaryTree(15,

698


new BinaryTree(3), null)); // Traverse and print the tree in in-order manner binaryTree.PrintInOrder(); Console.WriteLine(); // Console output: // 23 19 10 6 21 14 3 15 } } How Does the Example Work? This implementation of binary tree is slightly different from the one of the ordinary tree and is significantly simplified. We have a recursive class definition BinaryTree, which holds a value and left and right child nodes which are of the same type BinaryTree. We have exactly two child nodes (left and right) instead of list of children. The method PrintInOrder() works recursively using the DFS algorithm. It traverses each node in "in-order" (first the left child, then the node itself, then the right child). The DFS traversal algorithm performs the following steps: 1. Recursive call to traverse the left sub-tree of the given node. 2. Traverse the node itself (print its value). 3. Recursive call to traverse the right sub-tree. We highly recommend the reader to try and modify the algorithm and the source code of the given example to implement the other types of binary tree traversal of binary (pre-order and post-order) and see the difference.

Ordered Binary Search Trees Till this moment we have seen how we can build traditional and binary trees. These structures are very summarized in themselves and it will be difficult for us to use them for a bigger project. Practically, in computer science special and programming variants of binary and ordinary trees are used that have certain special characteristics, like order, minimal depth and others. Let's review the most important trees used in programming. As examples for a useful properties we can give the ability to quickly search of an element by given value (Red-Black tree); order of the elements in the tree (ordered search trees); balanced depth (balanced trees); possibility to store an ordered tree in a persistent storage so that searching of an element to be fast with as little as possible read operations (B-tree), etc.


699

In this chapter we will take a look at a more specific class of binary trees – ordered trees. They use one often met property of the nodes in the binary trees – unique identification key in every node. Important property of these keys is that they are comparable. Important kind of ordered trees are the so called "balanced search trees".

Comparability between Objects Before continuing, we will introduce the following definition, which we will need for the further exposure. Comparability – we call two objects A and B comparable, if exactly one of following three dependencies exists: - "A is less than B" - "A is bigger than B" - "A is equal to B" Similarly we will call two keys A and B comparable, if exactly one of the following three possibilities is true: A < B, A > B or A = B. The nodes of a tree can contain different fields but we can think about only their unique keys, which we want to be comparable. Let’s give an example. We have two specific nodes A and B:

A

B 19

7

In this case, the keys of A and B hold the integer numbers 19 and 7. From Mathematics we know that the integer numbers (unlike the complex numbers) are comparable, which according the above reasoning give us the right to use them as keys. That’s why we can say that “A is bigger than B”, because “19 is bigger than 17”. Please notice! In this case the numbers depicted on the nodes are their unique identification keys and not like before, just some numbers. And we arrive to the definition of the ordered binary search tree: Ordered Binary Tree (binary search tree) is a binary tree, in which every node has a unique key, every two of the keys are comparable and the tree is organized in a way that for every node the following is satisfied: - All keys in the left sub-tree are smaller than its key. - All keys in the right sub-tree are bigger than its key.

700


Properties of the Ordered Binary Search Trees On the figure below we have given an example of an ordered binary search tree. We will use this example, to give some important properties of the binary tree’s order:

By definition we know that the left sub-tree of every node consists only of elements, which are smaller than itself, while in the right sub-tree there are only bigger elements. This means that if we want to find a given element, starting from the root, either we have found it or should search it respectively in its left or its right sub-tree, which will save unnecessary comparisons. For example, if we search 23 in our tree, we are not going to search for it in the left sub-tree of 19, because 23 is not there for sure (23 is bigger than 19, so eventually it is in the right sub-tree). This saves us 5 unnecessary comparisons with each of the left sub-tree elements, but if we were using a linked list, we would have to make these 5 comparisons. From the elements’ order follows that the smallest element in the tree is the leftmost successor of the root, if there is such or the root itself, if it does not have a left successor. In our example this is the minimal element 7 and the maximal – 35. Next useful property from this is, that every single element from the left sub-tree of given node is smaller than every single element from the right sub-tree of the same node.

Ordered Binary Search Trees – Example The next example shows a simple implementation of a binary search tree. Our point is to suggest methods for adding, searching and removing an element in the tree. For every single operation from the above, we will give an explanation in details. Note that our binary search tree is not balanced and may have poor performance in certain circumstances.


701

Ordered Binary Search Trees: Implementation of the Nodes Just like before, now we will define an internal class, which will describe a node’s structure. Thus we will clearly distinguish and encapsulate the structure of a node, which our tree will contain within itself. This separate class BinaryTreeNode that we have defined as internal is visible only in the ordered tree’s class. Here is its definition:

BinaryTreeNode.cs … /// Represents a binary tree node /// Specifies the type for the values /// in the nodes internal class BinaryTreeNode : IComparable where T : IComparable { // Contains the value of the node internal T value; // Contains the parent of the node internal BinaryTreeNode parent; // Contains the left child of the node internal BinaryTreeNode leftChild; // Contains the right child of the node internal BinaryTreeNode rightChild; /// Constructs the tree node /// The value of the tree node public BinaryTreeNode(T value) { if (value == null) { // Null values cannot be compared -> do not allow them throw new ArgumentNullException( "Cannot insert null value!"); } this.value = value; this.parent = null; this.leftChild = null; this.rightChild = null;

702


} public override string ToString() { return this.value.ToString(); } public override int GetHashCode() { return this.value.GetHashCode(); } public override bool Equals(object obj) { BinaryTreeNode other = (BinaryTreeNode)obj; return this.CompareTo(other) == 0; } public int CompareTo(BinaryTreeNode other) { return this.value.CompareTo(other.value); } } … Let’s have a look to the proposed code. Still in the name of the structure, which we are considering – “ordered search tree”, we are talking about order and we can achieve this order only if we have comparability among the elements in the tree.

Comparability between Objects in C# What does “comparability between objects” mean for us as developers? It means that we must somehow oblige everyone who uses our >The type of the nodes internal class BinaryTreeNode : IComparable where T : IComparable { // … // … The implementation from above comes here!!! … // … } /// /// The root of the tree /// private BinaryTreeNode root; /// /// Constructs the tree /// public BinarySearchTree() { this.root = null; } // … // … The implementation of tree operations come here!!! … // … } As we mentioned above, now we will examine the following operations: - insert an element; - searching for an element; - removing an element.


705

Inserting an Element Inserting (or adding) an element in a binary search tree means to put a new element somewhere in the tree so that the tree must stay ordered. Here is the algorithm: if the tree is empty, we add the new element as a root. Otherwise: - If the element is smaller than the root, we call recursively the same method to add the element in the left sub-tree. - If the element is bigger than the root, we call recursively to the same method to add the element in the right sub-tree. - If the element is equal to the root, we don’t do anything and exit from the recursion. We can clearly see how the algorithm for inserting a node, conforms to the rule “elements in the left sub-tree are less than the root and the elements in the right sub-tree are bigger than the root”. Here is a sample implementation of this method. You should notice that in the addition there is a reference to the parent, which is supported because the parent must be changed too.

/// Inserts new value in the binary search tree /// /// the value to be inserted public void Insert(T value) { this.root = Insert(value, null, root); } /// /// Inserts node in the binary search tree by given value /// /// the new value /// the parent of the new node /// current node /// the inserted node private BinaryTreeNode Insert(T value, BinaryTreeNode parentNode, BinaryTreeNode node) { if (node == null) { node = new BinaryTreeNode(value); node.parent = parentNode; } else { int compareTo = value.CompareTo(node.value);

706


if (compareTo < 0) { node.leftChild = Insert(value, node, node.leftChild); } else if (compareTo > 0) { node.rightChild = Insert(value, node, node.rightChild); } } return node; } Searching for an Element Searching in a binary search tree is an operation which is more intuitive. In the sample code we have shown how the search of an element can be done without recursion and with iteration instead. The algorithm starts with element node, pointing to the root. After that we do the following: - If the element is equal to node, we have found the searched element and return it. - If the element is smaller than node, we assign to node its left successor, i.e. we continue the searching in the left sub-tree. - If the element is bigger than node, we assign to node its right successor, i.e. we continue the searching in the right sub-tree. At the end, the algorithm returns the found node or null if there is no such node in the tree. Additionally we define a Boolean method that checks if certain value belongs to the tree. Here is the sample code:

/// Finds a given value in the tree and /// return the node which contains it if such exsists /// /// the value to be found /// the found node or null if not found private BinaryTreeNode Find(T value) { BinaryTreeNode node = this.root; while (node != null) { int compareTo = value.CompareTo(node.value); if (compareTo < 0)


707

{ node = node.leftChild; } else if (compareTo > 0) { node = node.rightChild; } else { break; } } return node; } /// Returns whether given value exists in the tree /// /// the value to be checked /// true if the value is found in the tree public bool Contains(T value) { bool found = this.Find(value) != null; return found; } Removing an Element Removing is the most complicated operation from the basic binary search tree operations. After it the tree must keep its order. The first step before we remove an element from the tree is to find it. We already know how it happens. After that, we have 3 cases: - If the node is a leaf – we point its parent’s reference to null. If the element has no parent, it means that it is a root and we just remove it. - If the node has only one sub-tree – left or right, it is replacing with the root of this sub-tree. - The node has two sub-trees. Then we have to find the smallest node in the right sub-tree and swap with it. After this exchange the node will have one sub-tree at most and then we remove it grounded on some of the above two rules. Here we have to say that it can be done analogical swap, just that we get the left sub-tree and it is the biggest element. We leave to the reader to check the correctness of these three steps, as a little exercise.

708


Now, let’s see a sample removal in action. Again we will use our ordered tree, which we have displayed at the beginning of this point. For example, let’s remove the element with key 11.

The node 11 has two sub-trees and according to our algorithm, it must be exchanged with the smallest element from the right sub-tree, i.e. with 13. After the exchange, we can remove 11 (it is a leaf). Here is the final result:

19

35

13

7

23

16

11

17

Below is the sample code, which implements the described algorithm:


/// Removes an element from the tree if exists /// /// the value to be deleted public void Remove(T value) { BinaryTreeNode nodeToDelete = Find(value); if (nodeToDelete != null) { Remove(nodeToDelete); } } private void Remove(BinaryTreeNode node) { // Case 3: If the node has two children. // Note that if we get here at the end // the node will be with at most one child if (node.leftChild != null && node.rightChild != null) { BinaryTreeNode replacement = node.rightChild; while (replacement.leftChild != null) { replacement = replacement.leftChild; } node.value = replacement.value; node = replacement; } // Case 1 and 2: If the node has at most one child BinaryTreeNode theChild = node.leftChild != null ? node.leftChild : node.rightChild; // If the element to be deleted has one child if (theChild != null) { theChild.parent = node.parent; // Handle the case when the element is the root if (node.parent == null) { root = theChild; } else {

709

710


// Replace the element with its child sub-tree if (node.parent.leftChild == node) { node.parent.leftChild = theChild; } else { node.parent.rightChild = theChild; } } } else { // Handle the case when the element is the root if (node.parent == null) { root = null; } else { // Remove the element - it is a leaf if (node.parent.leftChild == node) { node.parent.leftChild = null; } else { node.parent.rightChild = null; } } } } We add also a DFS traversal method to enable printing the values stored in the tree in ascending order (in-order):

/// Traverses and prints the tree public void PrintTreeDFS() { PrintTreeDFS(this.root); Console.WriteLine(); } /// Traverses and prints the ordered binary search tree


711

/// tree starting from given root node. /// the starting node private void PrintTreeDFS(BinaryTreeNode node) { if (node != null) { PrintTreeDFS(node.leftChild); Console.Write(node.value + " "); PrintTreeDFS(node.rightChild); } } Finally we demonstrate our ordered binary search tree in action:

class BinarySearchTreeExample { static void Main() { BinarySearchTree tree = new BinarySearchTree(); tree.Insert("Telerik"); tree.Insert("Google"); tree.Insert("Microsoft"); tree.PrintTreeDFS(); // Google Microsoft Telerik Console.WriteLine(tree.Contains("Telerik")); // True Console.WriteLine(tree.Contains("IBM")); // False tree.Remove("Telerik"); Console.WriteLine(tree.Contains("Telerik")); // False tree.PrintTreeDFS(); // Google Microsoft } } Note that when we print our binary search tree, it is always sorted in ascending order (in our case in alphabetical order). Thus in our example the binary search tree of strings behaves like a set of strings (we will explain the "Set" >number of vertices public Graph(int size) { this.childNodes = new List[size]; for (int i = 0; i < size; i++) { // Assing an empty list of adjacents for each vertex this.childNodes[i] = new List(); } } /// Constructs a graph by given list of /// child nodes (successors) for each vertex /// children for each node public Graph(List[] childNodes) { this.childNodes = childNodes; } /// /// Returns the size of the graph (number of vertices) /// public int Size { get { return this.childNodes.Length; } } /// Adds new edge from u to v


719

/// the starting vertex /// the ending vertex public void AddEdge(int u, int v) { childNodes[u].Add(v); } /// Removes the edge from u to v if such exists /// /// the starting vertex /// the ending vertex public void RemoveEdge(int u, int v) { childNodes[u].Remove(v); } /// /// Checks whether there is an edge between vertex u and v /// /// the starting vertex /// the ending vertex /// true if there is an edge between /// vertex u and vertex v public bool HasEdge(int u, int v) { bool hasEdge = childNodes[u].Contains(v); return hasEdge; } /// Returns the successors of a given vertex /// /// the vertex /// list of all successors of vertex v public IList GetSuccessors(int v) { return childNodes[v]; } } To illustrate how our graph >the type of the keys /// the type of the values public struct KeyValuePair { /// Holds the key of the key-value pair public TKey Key { get; private set; } /// Holds the value of the key-value pair public TValue Value { get; private set; } /// Constructs a pair by given key + value public KeyValuePair(TKey key, TValue value) : this() { this.Key = key; this.Value = value; } /// Converts the key-value pair to a printable text. /// public override string ToString() {

Chapter 18. Dictionaries, Hash-Tables and Sets

749

StringBuilder builder = new StringBuilder(); builder.Append('['); if (this.Key != null) { builder.Append(this.Key.ToString()); } builder.Append(", "); if (this.Value != null) { builder.Append(this.Value.ToString()); } builder.Append(']'); return builder.ToString(); } } The class constructor has two parameters: key of type TKey and value of type TValue. There are defined two properties: one to access the key ( Key) and another to access the value (Value). Note that these properties can only access the related members. There is no public functionality to change the key or value. This makes the class non-changeable (immutable). It is a good idea, because the objects, which will be kept inside the dictionary implementation, will be the same as these we will return as a result of a method for taking all the ordered pairs in the dictionary, for instance. We have redefined the ToString() method in order to be able to easily print key-value pairs on the standard console output or in a text file. Following is an example of a generic dictionary interface, which defines the most common operations of the >Key type /// Value type public interface IDictionary : IEnumerable { ///Finds the value mapped to the given key /// the key to be searched /// value for the specified key if it presents, /// or null if there is no value with such key V Get(K key);

750


/// Assigns the specified value to the specified key /// in the dictionary. If the key already exists, its value is /// replaced with the new value and the old value is returned /// /// Key for the new value /// Value to be mapped to that key /// the old (replaced) value for the specified key /// or null if the key does not exist V Set(K key, V value); /// Gets or sets the value of the entry in the /// dictionary identified by the key specified /// A new entry will be created if the value is set /// for a key not currently in the dictionary /// the key to identify the entry /// the value of the entry in the dictionary /// identified by the provided key V this[K key] { get; set; } /// Removes an element in the dictionary identified /// by a specified key /// the key identifying the element to be /// removed /// whether the element was removed or not bool Remove(K key); /// Returns the number of entries in the dictionary /// int Count { get; } /// Removes all the elements from the dictionary /// void Clear(); } In the above defined interface as well as in the previous class, we use generics (template types), by which we define the parameters for the keys (K) and values (V). Such implementation allows us to use various /> interface /// using a hash table. Collisions are resolved by chaining. /// /// Type of the keys. Keys are required /// to correctly implement Equals() and GetHashCode() /// /// Type of the values public class HashDictionary : IDictionary, IEnumerable { private const int DEFAULT_CAPACITY = 16; private const float DEFAULT_LOAD_FACTOR = 0.75f; private List[] table; private float loadFactor; private int threshold; private int size; private int initialCapacity; /// Creates an empty hash table with the /// default capacity and load factor public HashDictionary() : this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR) { } /// Creates an empty hash table with given /// capacity and load factor public HashDictionary(int capacity, float loadFactor) { this.initialCapacity = capacity; this.table = new List[capacity]; this.loadFactor = loadFactor; this.threshold = (int)(capacity * this.loadFactor); } /// Finds the chain of elements corresponding /// internally to given key (by its hash code) /// creates an empty list

752


/// of elements if the chain still does not exist /// a list of elements in the chain or null private List FindChain( K key, bool createIfMissing) { int index = key.GetHashCode(); index = index & 0x7FFFFFFF; // clear the negative bit index = index % this.table.Length; if (this.table[index] == null && createIfMissing) { this.table[index] = new List(); } return this.table[index] as List; } /// Finds the value assigned to given key /// (works extremely fast) /// the value found or null when not found public V Get(K key) { List chain = this.FindChain(key, false); if (chain != null) { foreach (KeyValuePair entry in chain) { if (entry.Key.Equals(key)) { return entry.Value; } } } return default(V); } /// Assigns a value to certain key. If the key /// exists, its value is replaced. If the key does not /// exist, it is first created. Works very fast /// the old (replaced) value or null public V Set(K key, V value) { if (this.size >= this.threshold) { this.Expand();


753

} List chain = this.FindChain(key, true); for (int i = 0; i < chain.Count; i++) { KeyValuePair entry = chain[i]; if (entry.Key.Equals(key)) { // Key found -> replace its value with the new value KeyValuePair newEntry = new KeyValuePair(key, value); chain[i] = newEntry; return entry.Value; } } chain.Add(new KeyValuePair(key, value)); this.size++; return default(V); } /// Gets / sets the value by given key. Get returns /// null when the key is not found. Set replaces the existing /// value or creates a new key-value pair if the key does not /// exist. Works very fast public V this[K key] { get { return this.Get(key); } set { this.Set(key, value); } } /// Removes a key-value pair specified /// by certain key from the hash table. /// true if the pair was found removed /// or false if the key was not found public bool Remove(K key) { List chain = this.FindChain(key, false);

754


if (chain != null) { for (int i = 0; i < chain.Count; i++) { KeyValuePair entry = chain[i]; if (entry.Key.Equals(key)) { // Key found -> remove it chain.RemoveAt(i); this.size--; return true; } } } return false; } /// Returns the number of key-value pairs /// in the hash table (its size) public int Count { get { return this.size; } } /// Clears all ements of the hash table public void Clear() { this.table = new List[this.initialCapacity]; this.size = 0; } /// Expands the underlying hash-table. Creates 2 /// times bigger table and transfers the old elements /// into it. This is a slow (linear) operation private void Expand() { int newCapacity = 2 * this.table.Length; List[] oldTable = this.table; this.table = new List[newCapacity];


755

this.threshold = (int)(newCapacity * this.loadFactor); foreach (List oldChain in oldTable) { if (oldChain != null) { foreach (KeyValuePair keyValuePair in oldChain) { List chain = FindChain(keyValuePair.Key, true); chain.Add(keyValuePair); } } } } /// Implements the IEnumerable /// to allow iterating over the key-value pairs in the hash /// table in foreach-loops IEnumerator IEnumerable.GetEnumerator() { foreach (List chain in this.table) { if (chain != null) { foreach (KeyValuePair entry in chain) { yield return entry; } } } } /// Implements IEnumerable (non-generic) /// as part of IEnumerable IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable)this). GetEnumerator(); } } We will pay attention to the most important points in this code. Let’s begin with the constructor. The public parameterless constructor inside itself it

756


invokes another constructor, by passing some predefined values for capacity and load factor, which are used when the hash table is created. Next thing, we pay attention to, is the actual implementation of the hash table with chaining. At the instantiation of the hash-table, inside the constructor we initialize an array of lists, which will contain any of our objects of type KeyValuePair. We have created a private method FindChain(), for internal usage only, which calculates the hash-code of the key by calling GetHashCode() method and taking the modulus of the returned hashvalue to the length of the table (capacity). Additionally the most-left bit is cleared to ensure the index is always a positive number. In that way the index of the current key in the internal table is calculated. The list of all the elements with the same hash-code is hold inside the internal table for this index. If the list is empty, it may have null as a value. Otherwise, at the specific index position there is a list of the elements for the specified key. A special parameter is passed to the FindChain() method. This parameter indicates whether to create an empty list, if for the specific key there is no list of elements. It gives a kind of convenience for the methods of adding elements and resizing the hash-table. The next thing, we pay attention to, is the Expand() method, which resizes the current internal table when the maximal allowed filling is reached. For this purpose we create a new table (array), with size twice as the current. Then we calculate a new value for the maximal allowed filling (the field threshold). Next coming is the most important part. We have extended the table and in this way we changed the value of this.table.Length. If we search for an element, which we have added already, the FindChain(K key) method will not return the correct chain at all, in which to search for it. That is why, we need to transfer all the elements of the old table, by not just copying the chains, but adding again all the KeyValuePair objects into the newly created internal table of chains. In order to implement the ability for iteration over the hash-table elements in foreach-loops, we have implemented the IEnumerable interface, which has GetEnumerator() method, returning an iterator (IEnumerator) of the elements of the hash-table. We simply iterate over the elements in the internal table and return them one at a time using the yield return C# keyword (it’s is a complex concept explained in details in MSDN). Now let’s give an example of how we can use our implementation of hash-table and its iterator. We want to test whether the hash table copes correctly with collisions and with expanding, so we intentionally change the initial capacity of 3 and load factor of 0.9 when creating the hash table to ensure it will resize soon after few elements are put inside it. We first put an element, then read it, then overwrite its value, then read it again, then add a new element that causes a collision, then read it, then read the first element, then add an element causing the hash table to expand its internal array, etc. The code is given below and it is highly recommended to trace it


757

through the Visual Studio debugger and check at each step how the internal state of the hash table changes:

class PlayWithHashDictionary { static void Main() { HashDictionary dict = new HashDictionary(3, 0.9f); dict[new Point3D(1, 2, 3)] = 1; // Put a key-value pair Console.WriteLine(dict[new Point3D(1, 2, 3)]); // Get value // Overwrite the previous value for the same key dict[new Point3D(1, 2, 3)] += 1; Console.WriteLine(dict[new Point3D(1, 2, 3)]); // Now this point will cause a collision with the // previous one and the elements will be chained dict[new Point3D(3, 2, 2)] = 42; Console.WriteLine(dict[new Point3D(3, 2, 2)]); // Test if the chaining works as expected, i.e. // elements with equal hash-codes are not overwritten Console.WriteLine(dict[new Point3D(1, 2, 3)]); // Creation of another entry in the internal table // This will cause the internal table to expand dict[new Point3D(4, 5, 6)] = 1111; Console.WriteLine(dict[new Point3D(4, 5, 6)]); // Delete an existing by its key dict.Remove(new Point3D(3, 2, 2)); // Iterate through the dictionary entries and print them foreach (KeyValuePair entry in dict) { Console.WriteLine( "Key: " + entry.Key + "; Value: " + entry.Value); } } } As we could expect, the result of the program execution is the following:

758


1 2 42 2 1111 Key: (1, 2, 3); Value: 2 Key: (4, 5, 6); Value: 1111

Open Addressing Methods for Collision Resolution Now let’s look over the methods for collision resolution, alternative to chaining in a list. In general, the idea is, in case of collision we try to put the new pair in a table position, which is free. These methods differentiate from each other in the way they choose where to look for a free position for the new pair. Moreover, the new pair must be easily located at its new place. Main drawback of this group of methods, compared to chaining in a list, is that they are inefficient at high rates of the load factor (close to 1).

Linear Probing This is one of easiest methods for implementation. Linear probing, in general, can be presented with the following sample code:

int newPosition = (oldPosition + i) % capacity; Here capacity is the internal table capacity, oldPostion is the position where collision occurs and i is a number for the next probing. If the new position is free, then we place the new pair there. Otherwise we try again (probing), incrementing i. Probing can be either forward or backwards. Backward probing is when instead of adding, we are subtracting i from the position we have collision for. The advantage of this method is the relatively quick way to find of a new position. Unfortunately, if there was a collision at a certain place, there is an extremely high probability collision to occur again at the same place. So this, in practice, leads to a high inefficiency. Using linear probing as a method for collision resolution in hash tables is inefficient and has to be avoided.

Quadratic Probing Quadratic probing is a classic method for collision resolution. The main difference between quadratic probing and linear probing is that it uses a quadratic function of i (the number of the next probing) to find new position. Possible quadratic probing function is shown below:


759

int newPosition = (oldPosition + c1*i + c2*i*i) % capacity; The given example uses two constants: c1 and c2, such that c2 must not be 0, otherwise we are going back to linear probing. By choosing c1 and c2 we define the position we are going to probe, compared to the starting position. For instance, if c1 and c2 are equal to 1, we are going to probe consequently oldPosition, oldPosition + 2, oldPosition + 6, … For a hash-table with capacity of the kind 2n, the best is to choose c1 and c2 equal to 0.5. Quadratic probing is more efficient than linear the linear probing.

Double Hashing As the name implies, the double hashing method uses two different hash functions. The main concept is that, the second hash function is used for the elements that fall into a collision. This method is better than the linear and quadratic probing, because all the next probing depends of the value of the key and not of the table position inside the hash-table. It makes sense, because the position of a given key depends on the current capacity of the hash-table.

Cuckoo Hashing Cuckoo hashing is a relatively new method for collision resolution, using an open addressing. It was firstly presented by R. Pagh and F. Rodler in 2001. Its name comes from the behavior, observed with some kinds of cuckoos. The mother cuckoos push out the eggs and/or the nests out of other birds, in order to put their own eggs there and the other birds mistakenly care for the cuckoos' eggs in that way. (Also for the nests, after the incubation) The main idea of this method is the use of two hash-functions instead of one. In this way, we have not one, but two positions to place the element inside the hash-table. If one of the positions is free, then we just put the element there. If both are taken, then we put the new element in one of them and it "kicks out" the element, which was already there. In turn, the "kicked" element is going to his alternative position and "kicks" another element out, if necessary. The new "kicked out" is repeating the procedure, and in that way until reaching a free position or we fall into a loop. In the last case, the whole hash table is built again with greater size and new hashfunctions. On the figure bellow it is shown an example scheme of a hash-table using cuckoo hashing. Every position, containing an element, has a link to the alternative position for the key inside. Now, let’s play out different situations of adding an element. If, at least one of the two hash functions result is a free cell, there is no problem. We put the element in one of them. Let both hash functions result is a taken cell and we randomly have been choosing one of them.

760


Let’s assume that this is the cell, containing element A. The new element "kicks out" A from his place, A in turn goes to its alternative position and "kicks out" B from his place. The alternative position of B is free, so the adding is successfully completed. Let’s assume, that the cell, the new element is trying to "kick out" an element, is the cell containing H. Then we have a loop, formed by H and W. In this case, a rebuild must be done using greater size, and new hash-functions. In its simplest version this method has a constant access to its elements, even in the worst case, but this is valid with the constraint that the load factor is less than 0.5. The use of three different hash-functions instead of two could result in an efficient upper limit of the load factor above 0.9. Some researches show, the cuckoo hashing and its modifications could be much more efficient than the widely spread today chaining in a list and open addressing methods. Nevertheless, this method is still not well adopted in the industry and not used internally in .NET Framework. The main stopper is the need of two hash functions, which means that the class System.Object should introduce two GetHashCode() methods.

The "Set" ; static void Main ( ){ FileStream fs= new FileStream(FILE_NAME,FileMode . CreateNew) // Create the writer for >Top of range /// End of range /// a list of all the found primes public List FindPrimes(int start, int end) { List primesList = new List(); for (int num = start; num " is an HTML tag. Here’s how we can replace the tags with a new line:

private static string RemoveAllTags(string str) { string textWithoutTags = Regex.Replace(str, "]*>", "\n"); return textWithoutTags; } After coding this step, we should test it. For this purpose again we print to the console the strings we found via Console.WriteLine(…). And test the code:

Chapter 24. Sample Programming Exam – Topic #1

HtmlTagRemover.cs using using using using

System; System.IO; System.Text; System.Text.RegularExpressions;

class HtmlTagRemover { private const string InputFileName = "Problem1.html"; private const string Charset = "windows-1251"; static void Main() { if (!File.Exists(InputFileName)) { Console.WriteLine( "File " + InputFileName + " not found."); return; } StreamReader reader = null; try { Encoding encoding = Encoding.GetEncoding(Charset); reader = new StreamReader(InputFileName, encoding); string line; while ((line = reader.ReadLine()) != null) { line = RemoveAllTags(line); Console.WriteLine(line); } } catch (IOException) { Console.WriteLine( "Can not read file " + InputFileName + "."); } finally { if (reader != null) { reader.Close(); }

993

994


} } private static string RemoveAllTags(string str) { string strWithoutTags = Regex.Replace(str, "]*>", "\n"); return strWithoutTags; } } Testing the Tag Removal Code Let’s test the program with the following input file:

Clickon this linkfor more info.
This isboldtext. The result is as follows:

(empty rows) Click on this link for more info. (empty row) This is bold text. (empty rows) Everything works perfectly, only that we have extra blank lines. Can we remove them? This will be our next step.

Step 3 – Remove the Empty Lines We can remove unnecessary blank lines, replacing a double blank line " \n\n" with a single blank line "\n". We should not have groups of more than one character for a new line "\n". Here is an example how we can perform the substitution:

private static string RemoveDoubleNewLines(string str)


995

{ return str.Replace("\n\n", "\n"); } Testing the Empty Lines Removal Code As always, before we move forward, we test whether the method works correctly. We try a text, which has no blank rows, and then add 2, 3, 4 and 5 blank lines, including at the beginning and at the end of text. We find that the above method does not work correctly, when there are 4 blank lines one after another. For example, if we submit as input "ab\n\n\n\ncd", we will get "ab\n\n\cd" instead of "ab\ncd". This defect occurs because the Replace(…) finds and replaces a single match, scanning the text from left to right. If in result of a substitution the searched string reappears, it is skipped. See how useful it is when each method is tested on time. We do not end up wondering why the program does not work when we have 200 lines of code, full of errors. Early detection of defects is very useful and we should do it whenever possible. Here is the corrected code:

private static string RemoveDoubleNewLines(string str) { string pattern = "[\n]+"; return Regex.Replace(str, pattern, "\n"); } The above code uses a regular expression to find any sequence of \n characters and replaces it with a single \n. After a series of tests, we are convinced that the method works correctly. We are ready to test the program that removes all unnecessary newlines. For this purpose we make the following changes:

while ((line = reader.ReadLine()) != null) { line = RemoveAllTags(line); line = RemoveDoubleNewLines(line); Console.WriteLine(line); } We test the code again. Still it seems there are blank lines. Where do they come from? Perhaps, if we have a line that contains only tags, it will cause a problem. Therefore we may prevent this case. We add the following checks:

if (!string.IsNullOrEmpty(line))

996


{ Console.WriteLine(line); } This removes most of the blank lines, but not all.

Remove the Empty Lines: Second Attempt If we think more, it could happen so, that a line begins or ends with a tag. Then this tag will be replaced with a single blank line and so at the beginning or at the end of the line we may get a blank line. This means that we should clean the empty rows at the beginning and at the end of each line. Here’s how we can make the cleaning:

private static string TrimNewLines(string str) { int start = 0; while (start < str.Length && str[start] == '\n') { start++; } int end = str.Length - 1; while (end >= 0 && str[end] == '\n') { end--; } if (start > end) { return string.Empty; } string trimmed = str.Substring(start, end - start + 1); return trimmed; } The method works very simply: goes from left to right and skips all newline characters. Then passes from right to left and skips again all newline characters. If the left and right positions have passed each other, this means that the string is either empty or contains only newlines. Then the method returns an empty string. Otherwise it returns back everything to the right of the start position and to the left of the end position.


997

Remove the Empty Lines: Test Again As always, we test whether the above method works correctly with several examples, including an empty string, no string breaks, string breaks left or right or both sides and a string with new lines. We make sure, that the method now works correctly. Now we have to modify the logic of processing the input file:

while ((line = reader.ReadLine()) != null) { line = RemoveAllTags(line); line = RemoveDoubleNewLines(line); line = TrimNewLines(line); if (!string.IsNullOrEmpty(line)) { Console.WriteLine(line); } }

Step 4 – Print Results in a File It remains to print the results in the output file. To print the results in the output file we will use the StreamWriter. This step is trivial. We must only consider that writing to a file can cause an exception and that’s why we need to change the logic for error handling slightly, opening and closing the flow of input and output to the file. Here is what we finally get as a complete source code of the program:

HtmlTagRemover.cs using using using using


class HtmlTagRemover { private const string InputFileName = "Problem1.html"; private const string OutputFileName = "Problem1.txt"; private const string Charset = "windows-1251"; static void Main() { if (!File.Exists(InputFileName)) {

998


Console.WriteLine( "File " + InputFileName + " not found."); return; } StreamReader reader = null; StreamWriter writer = null; try { Encoding encoding = Encoding.GetEncoding(Charset); reader = new StreamReader(InputFileName, encoding); writer = new StreamWriter(OutputFileName, false, encoding); string line; while ((line = reader.ReadLine()) != null) { line = RemoveAllTags(line); line = RemoveDoubleNewLines(line); line = TrimNewLines(line); if (!string.IsNullOrEmpty(line)) { writer.WriteLine(line); } } } catch (IOException) { Console.WriteLine( "Can not read file " + InputFileName + "."); } finally { if (reader != null) { reader.Close(); } if (writer != null) { writer.Close(); } } }


/// /// Replaces every tag with new line /// private static string RemoveAllTags(string str) { string strWithoutTags = Regex.Replace(str, "]*>", "\n"); return strWithoutTags; } /// /// Replaces sequence of new lines with only one new line /// private static string RemoveDoubleNewLines(string str) { string pattern = "[\n]+"; return Regex.Replace(str, pattern, "\n"); } /// /// Removes new lines from start and end of string /// private static string TrimNewLines(string str) { int start = 0; while (start < str.Length && str[start] == '\n') { start++; } int end = str.Length - 1; while (end >= 0 && str[end] == '\n') { end--; } if (start > end) { return string.Empty; } string trimmed = str.Substring(start, end - start + 1); return trimmed;

999

1000


} }

Testing the Solution Until now, we were testing the individual steps for the solution of the task. Through the tests of individual steps we reduced the possibility of errors, but that does not mean that we should not test the whole solution. We may have missed something, right? Now let’s thoroughly test the code. - Test with the sample input file from the problem statement. Everything works correctly. - Test our "complex" example. Everything works fine. - Test the border cases and run an output test. - We test with a blank file. Output is correct – an empty file. - Test with a file that contains only one word "Hello" and does not contain tags. The result is correct – the output contains only the word "Hello". - Test with a file that contains only tags and no text. The result is again correct – an empty file. - Try to put blank lines of at the most amazing places in the input file. These empty lines should all be removed. For example we can run the following test:

Hello

I am here I am not here The result is as follows:

Hello I am here I am not Here It seems we found a small defect. There is a space at the beginning of some of the lines.


1001

Fixing the Leading Spaces Defect Under the problem description it is not clear whether this is a defect but let’s try to fix it. We could add the following code when processing the next line of the input file:

line = line.Trim(); The defect is fixed, but only from the first line. We run the debugger and we notice why it is so. The reason is that we print into the output file a string of characters with value "I\n am here" and so we get a space after a blank line. We can correct the defect, by replacing all blank lines, followed by white space (blank lines, spaces, tabs, etc.) with a single blank line. Here is the correction:

private static string RemoveDoubleNewLines(string str) { string pattern = "\n\\s+"; return Regex.Replace(str, pattern, "\n"); } We fixed that error too. Now we have only to change this name to a more appropriate one, for example RemoveNewLinesWithWhiteSpace(…). Now we need to test again after the “fixes” in the code (regression test). We put new lines and spaces scattered randomly and make sure that everything works correctly now.

Performance Test One last test remains: performance. We can create easily create a large input file. We open a site, for example http://www.microsoft.com, grab the source code and copy it 1000 times. We get a large enough input file. In our case, we get a 44 MB file with 947,000 lines. Processing it takes under 10 seconds, which is a perfectly acceptable speed. When we test the solution we should not forget that the processing of the file depends on our hardware (our test was performed in 2009 on an average fast laptop). Taking a look at the result, however, we notice a very troublesome problem. There are parts of a tag. More precisely, we see the following:

It quickly becomes clear that we missed a very interesting case. In an HTML tag can be closed few lines after its opening, e.g. a single tag may span several consecutive lines. That was exactly our case: we have a

1002


comment tag that contains JavaScript code. If the program worked correctly, it would have cut the entire tag rather than keep it in the source file. Did you see how testing is useful and how testing is important? In some big companies (like Microsoft) having a solution without tests is considered as only 50% of the work. This means that if you write code for 2 hours, you should spend on testing (manual or automated) at least 2 more hours! This is the only way to create high-quality software. What a pity that we discovered the problem just now, instead of at the beginning, when we were checking whether our idea for the task is correct, before we wrote the program. Sometimes it happens, unfortunately.

How to Fix the Problem with the Tag at Two Lines? The first idea that occurs to us is to load in memory the entire input file and process it as one big string rather than row by row. This is an idea that seems to work but will run slow and consume large amounts of memory. Let’s look for another idea.

A New Idea: Processing the Text Char by Char Obviously we cannot read the file line by line. Can we read it character by character? If yes, how we will treat tags? It occurs to us that if we read the file character by character, we can know at any moment, whether we are in or outside of a tag, and if we are outside the tag, we can print everything that we read (followed by a new line). We need to avoid adding new lines, as well as and trailing whitespace. We will get something like this:

bool inTag = false; while (! ) { char ch = (read the next character); if (ch == '') { inTag = false; } else { if (!inTag) { PrintBuffer(ch); } }


1003

}

Implementing the New Idea The idea is very simple and easy to implement. If we implement it directly, we will have a problem with empty lines and the problem of merging text from adjacent tags. To solve this problem, we can accumulate the text in the StringBuilder and print it at the end of file or when switching from text to a tag. We will get something like this:

bool inTag = false; StringBuilder buffer = new StringBuilder(); while (! ) { char ch = (read the next character); if (ch == '') { inTag = false; } else { if (!inTag) { buffer.Append(ch); } } } PrintBuffer(buffer); The missing PrintBuffer(…) method should clean the whitespace from the text in the buffer and print it in the output followed by a new line. Exception is when we have whitespace only in the buffer (it should not be printed). We already have most of the code, so step-by-step implementation mat not be necessary. We can just replace the pieces of wrong old code with the new code implementing the new idea. If we add the logic for avoiding empty

1004


lines as well as reading input and writing the result we obtain is a complete solution to the task with the new algorithm:

SimpleHtmlTagRemover.cs using using using using


public class SimpleHtmlTagRemover { private const string InputFileName = "Problem1.html"; private const string OutputFileName = "Problem1.txt"; private const string Charset = "windows-1251"; private static Regex regexWhitespace = new Regex("\n\\s+"); static void Main() { if (!File.Exists(InputFileName)) { Console.WriteLine( "File " + InputFileName + " not found."); return; } StreamReader reader = null; StreamWriter writer = null; try { Encoding encoding = Encoding.GetEncoding(Charset); reader = new StreamReader(InputFileName, encoding); writer = new StreamWriter(OutputFileName, false, encoding); RemoveHtmlTags(reader, writer); } catch (IOException) { Console.WriteLine( "Cannot read file " + InputFileName + "."); } finally { if (reader != null) {


reader.Close(); } if (writer != null) { writer.Close(); } } } /// Removes the tags from a HTML text /// Input text /// Output text (result) private static void RemoveHtmlTags( StreamReader reader, StreamWriter writer) { StringBuilder buffer = new StringBuilder(); bool inTag = false; while (true) { int nextChar = reader.Read(); if (nextChar == -1) { // End of file reached PrintBuffer(writer, buffer); break; } char ch = (char)nextChar; if (ch == '') { inTag = false; } else { // We have other character (not "") if (!inTag)

1005

1006


{ buffer.Append(ch); } } } } /// Removes the whitespace and prints the buffer /// in a file /// the result file /// the input for processing private static void PrintBuffer( StreamWriter writer, StringBuilder buffer) { string str = buffer.ToString(); string trimmed = str.Trim(); string textOnly = regexWhitespace.Replace(trimmed, "\n"); if (!string.IsNullOrEmpty(textOnly)) { writer.WriteLine(textOnly); } } } The input file is read character by character with the class StreamReader. Originally the buffer for accumulating of text is empty. In the main loop we analyze each read character. We have the following cases: - If we get to the end of file, we print whatever is in the buffer and the algorithm ends. - When we encounter the character "" (end tag) we set inTag = false. This will allow the next characters after the tag to accumulate in the buffer. - When we encounter another character (text or blank space), it is added to the buffer, if we are outside tags. If we are in a tag the character is ignored. Printing of the buffer takes care of removing empty lines in text and clearing the empty space at the beginning and end of text (trimming the leading and trailing whitespace). How exactly we do this, we already discussed in the previous solution of the problem.


1007

In the second solution the processing of the buffer is much lighter and shorter, so the buffer is processed immediately before printing. In the previous solution of the task we used regular expressions for replacing with the static methods of the class Regex. For improved performance now we create the regular expression object just once (as a static field). Thus the regular expression pattern is compiled just once to a state machine.

Testing the New Solution It remains to test thoroughly the new solution. We have to perform all tests conducted on the previous solution. Add test with tags, which are spread over several lines. Again, test performance with the Microsoft website copied 1000 times. Assure that the program works correctly and is even faster. Let’s try with another site, such as the official website of this book – http://www.introprogramming.info (as of April 2011). Again, take the source code of the site and run the solution of our task with it. After carefully reviewing the input >Input text /// Output text (result) private static void RemoveHtmlTags( StreamReader reader, StreamWriter writer) { int openedTags = 0; StringBuilder buffer = new StringBuilder(); while (true) { int nextChar = reader.Read(); if (nextChar == -1) { // End of file reached PrintBuffer(writer, buffer); break; } char ch = (char)nextChar; if (ch == '') { openedTags--; } else { // We aren't in tags (not "") if (openedTags == 0) { buffer.Append(ch); } } } } /// Removes the whitespace and prints the buffer /// in a file /// the result file /// the input for processing private static void PrintBuffer( StreamWriter writer, StringBuilder buffer) { string str = buffer.ToString(); string trimmed = str.Trim(); string textOnly = regexWhitespace.Replace(trimmed, "\n"); if (!string.IsNullOrEmpty(textOnly)) { writer.WriteLine(textOnly); } } }

Testing the New Solution Again we test the solution of the problem. We perform all tests made on the previous solution (see section "Testing the Solution"). We also try the site of MSDN (http://msdn.microsoft.com). Let’s carefully check the output file. We can see that at its end the file contains wrong characters (in April 2011). After carefully reviewing the source code of the MSDN site, we notice that there is an incorrect representation of the character ">" (to visualize this character in the HTML document ">" should be used, not ">"). However, this is an error in the MSDN site, not in our program.

1012


Now it remains to test the performance of our program with the site of this book (http://www.introprogramming.info) copied 1000 times. We assure that the program works fast enough for it too. Finally we are ready for the next task.

Problem 2: Escape from Labyrinth We are given a labyrinth, which consists of N x N squares and each of it can be passable (0) or not (x). Our hero Jack is in one of the squares (*): x

x

x

x

x

x

0

x

0

0

0

x

x

*

0

x

0

x

x

x

x

x

0

x

0

0

0

0

0

x

0

x

x

x

0

x

Two of the squares are neighboring, if they have a common wall. In one step Jack can pass from one passable square to its neighboring passable square. If Jack steps in a cell, which is on the edge of the labyrinth, he can go out from the labyrinth with one step. Write a program, which by a given labyrinth prints the minimal number of steps, which Jack needs, to go out from the labyrinth or -1 if there is no way out. The input data is read from a text file named Problem2.in. On the first line of the file is the number N (2 < N < 100). On the each of next N lines there are N characters, each of them is either "0" or "x" or "*". The output is one number and must be in the file Problem2.out. Sample input – Problem2.in:

6 xxxxxx 0x000x x*0x0x xxxx0x 00000x 0xxx0x Sample output – Problem2.out:

9


1013

Figure Out an Idea for a Solution We have a labyrinth and we should find the shortest path in it. This is not an easy task and we should think a lot or we should read somewhere how to solve such kinds of tasks. Our algorithm will begin its movement from the initial point we are given. We know we can move to a neighboring cell horizontally or vertically, but not diagonally. Our algorithm must traverse the labyrinth in some way, to find the shortest path in it. How to traverse the cells in the labyrinth? One possible decision is the following: we start from the initial cell. Move to one of its neighboring cells, after this in a neighboring cell of the current (which is passable and still unvisited), after this in a neighboring cell of the last visited (which is passable and still unvisited) and we go on forward recursively until we reach an exit of the labyrinth, or we reach a place where we can’t continue (there is no neighboring cell which is free or unvisited). In this moment we go back from the recursion (to the previous cell) and visit another neighboring cell for the previous cell. If we can’t continue, we go back again. The described recursive process is the process of traversing the labyrinth in depth (remember the chapter "Recursion" and DFS traversal). The question “Is it needed to walk through one cell more than once” occurs to us? If we walk through one cell at most once, we can walk through the whole labyrinth faster and if there is an exit, we will find it. But will this be the minimal path? If we draw the whole process on a paper, we will find out quickly the path will not be the minimal. If we mark the cell we leave on the way back of the recursion as free, this will allow us to reach each cell repeatedly, coming from a different path. The full recursive walk of the labyrinth will find all possible paths from the initial cell to any other cell. From all the found paths we can choose the shortest path to a cell on the bound of the labyrinth (exit) and that’s how we will find a solution for the problem.

Verification of the Idea It seems we have an idea for solving the problem: with recursive walk we find all the possible paths in the labyrinth from the initial cell to a cell on the bounds of the labyrinth and from all these paths we choose the shortest one. Let’s check the idea. We take a sheet of paper and make one example of the labyrinth. We try the algorithm. It’s obvious it finds all the paths from the initial cell to the one of the exits and it travels a lot forwards-backwards. As a result it finds all exits and among all paths it can be chosen the shortest one. Does the idea work if there is no exit? We create a second labyrinth, which is without exit. We try out the algorithm on it, again on a sheet of paper. We see after long circulation forwards-backwards that the algorithm does not find an exit and finishes.

1014


It looks we have a correct idea for solving the problem. Let’s move forward and think for the data structures.

What Data Structures to Use? First, we have to decide how to store the labyrinth. It’s natural to use a matrix of characters, just as the one on the figure. We will consider that one cell is passable and we can enter it, if it has a character, different from the character 'x'. We can store the labyrinth in a matrix of numbers and Boolean values, but the difference is not significant. The matrix of characters is comfortable for printing, and this will help us while debugging. There are not many options. We will store the labyrinth in a matrix of characters. After this, we have to decide in what structure to keep the visited through the recursion (current path) cells. We always need the last visited cell. This leads us to a structure, which is “last in, first out”, i.e. stack. We can use Stack, where Cell is a class, containing the coordinates of one cell (number of row and number of column). It remains to think where to keep the found paths, to find the shortest of them. If we think of it, it is not necessary to keep all the paths. It is enough to keep the current path and the shortest till this moment. It’s not even necessary to keep the shortest path till this moment but only its length. Every time we find a path to an exit of the labyrinth we can take its length and if it is shorter than the shortest path to this moment to keep it. It seems we found efficient data structures. According to our recommendations for problem solving, it is early to write the code of the program, because we have to think of the efficiency of the algorithm.

Think About the Efficiency Let’s check our idea against efficiency. What are we doing? We find all the possible paths and we take the shortest. There’s no argument the algorithm will not work, but if the labyrinth is way bigger, will it work fast? To answer this question, we should think how much paths there are. If we take an empty labyrinth, on the each step of the recursion we will have an average number of 3 free cells to go (without the cell we are coming from). If we have for example a labyrinth 10x10, the path could be 100 cells and while we travel on each step we will have 3 neighboring cells. It seems the numbers of paths are sort of 3 to the power of 100. It’s obvious the algorithm will slow down the computer very much and very fast. We found a serious problem with the algorithm. It will work very slowly, even with small labyrinths, and with bigger ones it will not work at all! The good news is that we haven’t written a single line of code and the general change of our approach to the problem will not cost us much time.


1015

Think of Another Idea We found that walking through all the paths in the labyrinth is wrong approach, so we have to think of another. Let’s start with the initial cell and walk through all its neighboring cells and mark them as visited. For each visited cell we can keep a number equal to the number of cells, which we have travelled to reach it (the length of the minimal path from the initial cell to the current cell). For the initial cell the length of the path is 0. For its neighboring cells it should be 1, because we can reach them from the initial cell with one move. For the neighboring cells for the neighbors of the initial cell the length of the path is 2. We can continue this way and we will get to the following algorithm: 1. Write the length of the path 0 for the initial cell. Mark it as visited. 2. For each neighboring cell to the initial we mark the length of the path is 1. Mark these cells as visited. 3. For each cell, which is, neighboring to a cell with length of the path 1 and it is not visited, write the length of the path is 2. Mark the cells as visited. 4. Continuing analogous, on the N step we find all the still unvisited cells, which are on a distance of N moves from the initial cell and mark them as visited.

Check the New Idea To check whether the new idea for solving the “Escape from the Labyrinth” problem is correct we can visualize the process. We take another labyrinth to test our idea in a better way. At each step k our goal is to fill with the number k all cells that can be reached in k steps. If at step 0 we fill the initial cell with 0, at step 1 we fill all cells reachable in 1 step from the initial cell, at step 2 we fill all cells reachable in 2 steps, etc. we will be sure that when we fill a cell with a number, this number reflects the minimal number of steps to reach this cell starting from the initial cell, right? Step 0 – mark the distance from the initial cell to itself with 0 (mark the free cells with "-"): x

x

x

x

x

x

-

x

-

-

-

x

x

0

-

x

-

x

x

-

-

x

-

x

x

-

-

-

-

x

-

x

x

x

-

x

Step 1– mark with 1 all the neighbors to cells with a value of 0:

1016


x

x

x

x

x

x

-

x

-

-

-

x

x

0

1

x

-

x

x

1

-

x

-

x

x

-

-

-

-

x

-

x

x

x

-

x

Step 2 – mark with 2 all the passable neighbors to cells with value 1: x

x

x

x

x

x

-

x

2

-

-

x

x

0

1

x

-

x

x

1

2

x

-

x

x

2

-

-

-

x

-

x

x

x

-

x

Step 3 – mark with 3 all passable neighbors to cells with value of 2: x

x

x

x

x

x

-

x

2

3

-

x

x

0

1

x

-

x

x

1

2

x

-

x

x

2

3

-

-

x

-

x

x

x

-

x

Continuing this way, in a moment either we will reach a cell at the edge of the labyrinth (an exit) or we will find such a cell is unreachable. It seems like our algorithm works correctly. It will either find an exit or will find that there is no reachable exit. If at some step an exit is found, the path to it will be guaranteed to be the shortest possible (otherwise the exit should already be found at some of the earlier steps).

Breaking the Problem into Subproblems Having invented the idea for solving the labyrinth escaping problem, it will be easy to break it into subproblems. The main subproblems could be: reading the input labyrinth, finding the shortest path to some of its exits and printing the results. The path finding subproblem could be further divided into subproblems (steps) which we discussed in the previous section.


1017

Checking the Performance of the New Algorithm Because we never visit a cell more than once, the number of steps, which this algorithm does, should not be big. For example, if we have a labyrinth with size 100 x 100, it will have 10,000 cells, we will visit each of the cells at most once and for each of them we will check every neighbor if it is free, i.e. we will check 4 times each cell. At the end we will do at most 40,000 checks and we will visit at most 10,000 cells. We will do a total amount of 50,000 operations. This means the algorithm will work instantly.

Check If the New Algorithm Is Correct It seems this time we don’t have a problem with the performance. We have a fast algorithm. Let’s check if it is correct. For this purpose we draw a bigger and more complex example on a sheet of paper, which has many exits and a lot of paths, and we begin to perform the algorithm. After this we try with a labyrinth with no exit. It seems the algorithm ends, but does not find an exit so it’s working. We try another 2-3 examples and convince ourselves this algorithm always finds the shortest path to an exit and always works fast, because it visits each of the cells of the labyrinth at most once.

What Data Structures to Use? With the new algorithm we walk consequently through all neighboring cells to the initial cell. We can put them into a data structure, for example in an array or better a list (or list of lists), because we can’t add in the array. Then we take the list of the reached cells on the last step and we add their neighbors in another list. That’s how if we index the lists we have list0, which contains the initial cell, list1, which contains passable neighboring cells to the initial, after this list2, which contains passable neighbors to list1 and so on. At the N step we have the listn, which contains all the cells, which we can reach in exactly N steps, i.e. which are at a distance of n from the initial cell. It seems we can use a list of lists, to keep the cells on each step. If we think about it, to get the n list, we need the ( n-1)-list. So it seems we don’t need list of lists but only the list from the last step. We can make general conclusion: cells are processed in the order of entry: when the cells of step k are finished, then we process the cells from step k+1, and just after them – the cells from step k+2, and so on. The process seems like a queue: earlier accessed cells are processed earlier. If we dig a bit inside, we will conclude, that we have just re-invented the Breadth-FirstSearch algorithm (read about BFS in Wikipedia). To implement the BFS algorithm we can use a queue of cells. For this purpose we have to define class Cell, which contains the coordinates of

1018


given cell (row and column). We can keep the distance from each cell to the initial cell in a matrix. If the distance is not calculated yet, we store -1. If we think a little more, the distance from the initial cell can be kept in the cell itself (in the class Cell) instead of creating a special matrix for the distances. That way we will save memory. Now we are clear about the data structures. Now we have to implement the algorithm step by step.

Step 1 – The Class Cell We can begin with the definition of the Cell class. We need it to save the initial cell, from which begins the searching of the path. We will use autoimplemented properties to make the code shorter and more readable. Here is the Cell class:

public class { public int public int public int }

Cell Row { get; set; } Column { get; set; } Distance { get; set; }

We can add a constructor to simplify the way we use this class:

public Cell(int row, int column, int distance) { this.Row = row; this.Column = column; this.Distance = distance; } Generally it is a good idea to test the code after each step, but the above code is too simple to be tested. We will test is later as part of some more complex piece of code.

Step 2 – Reading the Input File We will read the input file line by line using the well-known class StreamReader. On the each of the lines we will analyze the characters and we will write them in a matrix of characters. When we reach the character "*" we will keep its coordinates in an instance of class Cell to know where to start the searching of the shortest path for getting out of the labyrinth. We can define a class Maze and keep the matrix of the labyrinth and the initial cell in it:


1019

Maze.cs public class Maze { private char[,] maze; private int size; private Cell startCell = null; public void ReadFromFile(string fileName) { using (StreamReader reader = new StreamReader(fileName)) { // Read the maze size and create the maze this.size = int.Parse(reader.ReadLine()); this.maze = new char[this.size, this.size]; // Read the maze cells from the file for (int row = 0; row < this.size; row++) { string line = reader.ReadLine(); for (int col = 0; col < this.size; col++) { this.maze[row, col] = line[col]; if (line[col] == '*') { this.startCell = new Cell(row, col, 0); } } } } } } For simplicity we will skip processing the errors while reading and writing in a file. When an exception occurs we will skip to catch it in the main method and thus we will leave the CLR to print it on the console.

Testing the Input File Reading Code We already have the class Maze and appropriate representation of data of the input file. To be sure the written so far is correct we should test. We can check if the matrix is truly filled as we print it on the console. The other possibility is to view the values of the fields in the class Maze through the debugger of Visual Studio. We add a Main() method which invokes the maze reading method and we test it:

1020


static void Main() { Maze maze = new Maze(); maze.ReadFromFile("Problem2.in"); } Through the Visual Studio debugger we get convinced that the input file is correctly read from the input file:

Step 3 – Finding the Shortest Path We can implement the algorithm directly from what we already discussed. We must define a queue and put in its beginning the initial cell. Afterwards we must take the cell in turn from the queue and add all of its passable unvisited neighbors in a loop. At each step there is a chance to enter in a cell, which is at the border of the labyrinth, and we see we have found an exit and the searching ends. We repeat the loop until the queue is empty. At each visitation of a given cell we check if it is free and if it is, we mark it as impassable. This way we avoid repeatedly visiting the same cell. Here is how the implementation of the algorithm looks like:

public int FindShortestPath() { // Queue for traversing the cells in the maze Queue visitedCells = new Queue(); VisitCell(visitedCells, this.startCell.Row, this.startCell.Column, 0); // Perform Breadth-First Search (BFS) while (visitedCells.Count > 0) { Cell currentCell = visitedCells.Dequeue(); int row = currentCell.Row;


int column = currentCell.Column; int distance = currentCell.Distance; if ((row == 0) || (row == size - 1) || (column == 0) || (column == size - 1)) { // We are at the maze border return distance + 1; } VisitCell(visitedCells, row, column + 1, distance VisitCell(visitedCells, row, column - 1, distance VisitCell(visitedCells, row + 1, column, distance VisitCell(visitedCells, row - 1, column, distance

+ + + +

1021

1); 1); 1); 1);

} // We didn't reach any cell at the maze border -> no path return -1; } private void VisitCell(Queue visitedCells, int row, int column, int distance) { if (this.maze[row, column] != 'x') { // The cell is free --> visit it maze[row, column] = 'x'; Cell cell = new Cell(row, column, distance); visitedCells.Enqueue(cell); } } Checking after Step 3 Before the next step, we must test, to check our algorithm. We must try the normal case and the border cases, when there is no exit, when we step on an exit, when the input file doesn’t exist or the square matrix is with size of 0. Only then can we start doing the next step. Let’s start with testing the normal (typical) case. We create the following code to quickly test it:

static void Main() { Maze maze = new Maze(); maze.ReadFromFile("Problem2.in"); Console.WriteLine(maze.FindShortestPath()); }

1022


We run the above code over the sample input file from the problem description and it works. The code correctly returns the length of the shortest path to the nearest exit:

9 Now let’s test the border cases, e.g. a labyrinth of size 0. Unfortunately we get the following result:

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object. at Maze.FindShortestPath() We’ve made a mistake. The problem is when the variable, in which we keep the initial cell, is initialized with null. This can happen in many scenarios. If the labyrinth has no cells (e.g. size of 0) or the initial cell is missing, the result that the program should return is -1, but not an exception. To fix the bug we just found we can add a check in the beginning of the FindShortestPath() method:

public int FindShortestPath() { if (this.startCell == null) { // Start cell is missing -> no path return -1; } … We retest the code with the typical and the border cases. After the fix it seems the algorithm works correctly now.

Step 4 – Writing the Result to a File It remains to write the result of the FindShortestPath() to the output file. This is a trivial problem:

public void SaveResult(String fileName, int result) { using (StreamWriter writer = new StreamWriter(fileName)) { writer.WriteLine("The shortest way is: " + result); } } Here is how the complete source code of the solution looks:


1023

Maze.cs using System; using System.IO; using System.Collections.Generic; public class Maze { private const string InputFileName = "Problem2.in"; private const string OutputFileName = "Problem2.out"; public class { public int public int public int

Cell Row { get; set; } Column { get; set; } Distance { get; set; }

public Cell(int row, int column, int distance) { this.Row = row; this.Column = column; this.Distance = distance; } } private char[,] maze; private int size; private Cell startCell = null; public void ReadFromFile(string fileName) { using (StreamReader reader = new StreamReader(fileName)) { // Read maze size and create maze this.size = int.Parse(reader.ReadLine()); this.maze = new char[this.size, this.size]; // Read the maze cells from the file for (int row = 0; row < this.size; row++) { string line = reader.ReadLine(); for (int col = 0; col < this.size; col++) { this.maze[row, col] = line[col];

1024


if (line[col] == '*') { this.startCell = new Cell(row, col, 0); } } } } } public int FindShortestPath() { if (this.startCell == null) { // Start cell is missing -> no path return -1; } // Queue for traversing the cells in the maze Queue visitedCells = new Queue(); VisitCell(visitedCells, this.startCell.Row, this.startCell.Column, 0); // Perform Breadth-First Search (BFS) while (visitedCells.Count > 0) { Cell currentCell = visitedCells.Dequeue(); int row = currentCell.Row; int column = currentCell.Column; int distance = currentCell.Distance; if ((row == 0) || (row == size - 1) || (column == 0) || (column == size - 1)) { // We are at the maze border return distance + 1; } VisitCell(visitedCells, VisitCell(visitedCells, VisitCell(visitedCells, VisitCell(visitedCells,

row, column + 1, row, column - 1, row + 1, column, row - 1, column,

distance distance distance distance

+ + + +

1); 1); 1); 1);

} // We didn't reach any cell at the maze border -> no path return -1;


1025

} private void VisitCell(Queue visitedCells, int row, int column, int distance) { if (this.maze[row, column] != 'x') { // The cell is free --> visit it maze[row, column] = 'x'; Cell cell = new Cell(row, column, distance); visitedCells.Enqueue(cell); } } public void SaveResult(string fileName, int result) { using (StreamWriter writer = new StreamWriter(fileName)) { writer.WriteLine(result); } } static void Main() { Maze maze = new Maze(); maze.ReadFromFile(InputFileName); int pathLength = maze.FindShortestPath(); maze.SaveResult(OutputFileName, pathLength); } }

Testing the Complete Solution of the Problem After we have a solution of the problem we must test it. We have already tested the typical case and the border cases (like missing exit or when the initial position stays at the labyrinth edge). We will execute these tests again to get convinced that the algorithm behaves correctly:

Input

Output

Input

Output

Input

Output

0

-1

2 00 xx

-1

3 0x0 x*x 0x0

-1

Input 3 000 000 00*

Output 1

1026


The algorithm works correctly. The output for each of the test is correct. It remains to test with a large labyrinth (performance test), for example 1000 x 1000. We can make such a labyrinth very easy – with copy / paste. We perform the test and we convince ourselves the program is working correctly for the big test and works extremely fast – there is no delay. While testing we should try every way to break our solution. We run a few more difficult examples (for example a labyrinth with passable cells in the form of spiral). We can put large labyrinth with a lot of paths, but without exit. We can try whatever else we wish. At the end we make sure, that we have a correct solution and we pass to the next problem from the exam.

Problem 3: Store for Car Parts A company is planning to create a system for managing a store for auto parts. A single part can be used for different car models and it has following characteristics: code, name, category (e.g. suspension, tires and wheels, engine, accessories and etc.), purchase price, sale price, list of car models, with which it is compatible (each car is described with brand, model and year of manufacture, e. g. Mercedes C320, 2008) and manufacturing company. Manufacturing companies are described with name, country, address, phone and fax. Design a set of classes with relationships between them, which model the data for the store. Write a demonstration program, which demonstrates the classes and their all functionality work correctly with some sample data.

Inventing an Idea for Solution We have a non-algorithmic problem which is intended to check whether the students at the exam know how to use object-oriented programming (OOP), how to design classes and relationships between them to model realworld objects (object-oriented analysis and design) and how to use appropriate data structures to hold collections of objects. We are required to create an aggregation of classes and relationships between them, which have to describe the data of the store. We have to find which nouns are important for solving the problem. They are objects from the real world, which correspond to classes. Which are these nouns that interest us? We have a store, car parts, cars and manufacturing companies. We have to create a class defining a store. It could be named Shop. Other classes are Part, Car and Manufacturer. In the requirements of the problem there are other nouns too, like code for one part or year of manufacturing of given car. For these nouns we are not creating individual classes, but instead these will be fields in the already created classes. For example in the Part class there will be let’s say a field code of string type.


1027

We already know which will be our classes, and fields to describe them. We have to identify the relationships between the objects.

Checking the Idea We will not check the idea because there is nothing to be proven with examples and counterexamples or checked whether it will work. We need to write few classes to model a real-world situation: a store for car parts.

What Data Structures to Use to Describe the Relationship between Two Classes? The data structures, needed for this problem, are of two main groups: classes and relationships between the classes. The interesting part is how to describe relationships. To describe a relationship (link) between two classes we can use an array. With an array we have access of its elements by index, but once it is created we can’t change its length. This makes it uncomfortable for our problem, because we don’t know how many parts we will have in the store and more parts can be delivered or somebody can buy parts so we have to delete or change the data. List is more comfortable. It has the advantages of an array and also is with variable length and it is easy to add or delete elements. So far it seems List is the most appropriate for holding aggregations of objects inside another object. To be convinced we will analyze a few more data structures. For example hash-table – it is not appropriate in this case, because the structure “parts” is not of the key-value type. It would be appropriate if each of the parts in the store has unique number (e.g. barcode) and we needed to search them by this unique number. Structures like stack and queue are inappropriate. The structure “set” and its implementation HashSet is used when we have uniqueness for given key. It would be good sometimes to use this structure to avoid duplicates. We must recall that HashSet requires the methods GetHashCode() and Equals(…) to be correctly defined by the T type. Our final decision is to use List for the aggregations and HashSet for the aggregations which require uniqueness.

Dividing the Task into Subtasks Now we have to think from where to start writing the code. If we start to write the Shop class, we will need the Part class. This reminds us we will have to start with a class, which does not depend on others. We will divide the writing of each class to а subtask, and we will start from the independent classes: - Class describing a car – Car - Class describing manufacturer of parts – Manufacturer - Class or enumeration for the categories of the parts – PartCategory

1028


- Class describing part for a car – Part - Class for the store – Shop - Class for testing rest of the classes with sample data – TestShop

Implementation: Step by Step We start writing classes, which we described in our idea. We will create them in the same sequence as in the list above.

Step 1: The Class Car We start solving the problem by defining the class Car. In the definition we have three fields, which keep the manufacturer, the model and the year of manufacturing of the car and the standard method ToString(), which returns a human-readable string holding the information about the car. We define the class Car in the following way:

Car.cs public class Car { private string brand; private string model; private int productionYear; public Car(string brand, string model, int productionYear) { this.brand = brand; this.model = model; this.productionYear = productionYear; } public override string ToString() { return ""; } } Note that the class Car is designed to be immutable. This means that once created, the car’s properties cannot be later modified. This design is not always the best choice. Sometimes we want the class properties to be freely modifiable; sometimes. For our case the immutable design will work well.

Testing the Class Car Once we have the class Car, we could test it by the following code:


1029

Car bmw316i = new Car("BMW", "316i", 1994); Console.WriteLine(bmw316i); The result is as expected:

We are convinced the class Car is correct so far and we can continue with the other classes.

Step 2: The Class Manufacturer We have to implement the definition of the class Manufacturer, which describes the manufacturer for given part. It will have five fields – name, country, address, phone number and fax. The class will be immutable, because we will not need to change its members after creation. We also define the standard method ToString() for representing the object as human-readable string.

Manufacturer.cs public class Manufacturer { private string name; private string country; private string address; private string phoneNumber; private string fax; public Manufacturer(string name, string country, string address, string phoneNumber, string fax) { this.name = name; this.country = country; this.address = address; this.phoneNumber = phoneNumber; this.fax = fax; } public override string ToString() { return this.name + " "; } }

1030


Testing the Class Manufacturer We test the class Manufacturer just like we tested the class Car. It works.

Step 3: The Part Category Enumeration Part categories are fixes set of values and do not have additional details (like name, code and description). This makes them perfect to be modeled as enumeration:

PartCategory.cs public enum PartCategory { Engine, Tires, Exhaust, Suspention, Brakes } Step 4: The Class Part Now we have to define the class Part. Its definition will include the following fields: name, code, category, list with cars, where we can use the given part, starting and closing price and manufacturer. Here we will use the data structure HashSet to hold all compatible cars. The field that keeps the manufacturer of the part will be of Manufacturer class, because the task requires us to keep additional information about the manufacturer. If it was required to keep only the name of the manufacturer (as in the case with class Car) this class should not be necessary. We would have a field of string type. We need a method for adding a car (object of type Car) to the list of cars (in HashSet). It will be named AddSupportedCar(Car car). Below is the code of the class Part which is also designed as set of immutable fields (except that it accepts adding cars):

Part.cs public class Part { private string name; private string code; private PartCategory category; private HashSet supportedCars; private decimal buyPrice;


1031

private decimal sellPrice; private Manufacturer manufacturer; public Part(string name, decimal buyPrice, decimal sellPrice, Manufacturer manufacturer, string code, PartCategory category) { this.name = name; this.buyPrice = buyPrice; this.sellPrice = sellPrice; this.manufacturer = manufacturer; this.code = code; this.category = category; this.supportedCars = new HashSet(); } public void AddSupportedCar(Car car) { this.supportedCars.Add(car); } public override string ToString() { StringBuilder result = new StringBuilder(); result.Append("Part: " + this.name + "\n"); result.Append("-code: " + this.code + "\n"); result.Append("-category: " + this.category + "\n"); result.Append("-buyPrice: " + this.buyPrice + "\n"); result.Append("-sellPrice: " + this.sellPrice + "\n"); result.Append("-manufacturer: " + this.manufacturer +"\n"); result.Append("---Supported cars---" + "\n"); foreach (Car car in this.supportedCars) { result.Append(car); result.Append("\n"); } result.Append("----------------------\n"); return result.ToString(); } } In the class Part we use HashSet so it is necessary to redefine the methods Equals(…) and GetHashCode() for the class Car:

1032


// The Equals(…) and GetHashCode() methods for the class Car public override bool Equals(object obj) { Car otherCar = obj as Car; if (otherCar == null) { return false; } bool equals = object.Equals(this.brand, otherCar.brand) && object.Equals(this.model, otherCar.model) && object.Equals(this.productionYear,otherCar.productionYear); return equals; } public override int GetHashCode() { const int prime = 31; int result = 1; result = prime * result + ((this.brand == null) ? 0 : this.brand.GetHashCode()); result = prime * result + ((this.model == null) ? 0 : this.model.GetHashCode()); result = prime * result + this.productionYear; return result; } Testing the Class Part We test the class Part. It is a bit more complicated than when testing the classes Car and Manufacturer, because Part it is more complex class. We can create a part, assign all its properties and print it:

Manufacturer bmw = new Manufacturer("BWM", "Germany", "Bavaria", "665544", "876666"); Part partEngineOil = new Part("BMW Engine Oil", 633.17m, 670.0m, bmw, "Oil431", PartCategory.Engine); Car bmw316i = new Car("BMW", "316i", 1994); partEngineOil.AddSupportedCar(bmw316i); Car mazdaMX5 = new Car("Mazda", "MX5", 1999); partEngineOil.AddSupportedCar(mazdaMX5); Console.WriteLine(partEngineOil); Seems like the result is correct:


1033

Part: BMW Engine Oil -code: Oil431 -category: Engine -buyPrice: 633.17 -sellPrice: 670.0 -manufacturer: BWM ---Supported cars-- ---------------------Before we can continue with the next class, we could test for duplicated cars in the set of supported cars for certain part. Duplicates are not allowed by design and we should check whether this is enforced:

Manufacturer bmw = new Manufacturer("BWM", "Germany", "Bavaria", "665544", "876666"); Part partEngineOil = new Part("BMW Engine Oil", 633.17m, 670.0m, bmw, "Oil431", PartCategory.Engine); partEngineOil.AddSupportedCar(new Car("BMW", "316i", 1994)); partEngineOil.AddSupportedCar(new Car("BMW", "X5", 2006)); partEngineOil.AddSupportedCar(new Car("BMW", "X5", 2007)); partEngineOil.AddSupportedCar(new Car("BMW", "X5", 2006)); partEngineOil.AddSupportedCar(new Car("BMW", "316i", 1994)); Console.WriteLine(partEngineOil); The result is correct. The duplicated cars are taken into account only once:

Part: BMW Engine Oil -code: Oil431 -category: Engine -buyPrice: 633.17 -sellPrice: 670.0 -manufacturer: BWM ---Supported cars-- ---------------------Step 5: The Class Shop We already have all needed classes for creating the class Shop. It will have two fields: name and list of parts, which are for sale. The list will be List. We will add the method AddPart(Part part), with which we

1034


will add new parts. With a redefined ToString() we will print the name of the shop and the parts in it. Here is an example of implementation of our class Shop holding the catalog of auto parts (its name is immutable but it can add parts):

Shop.cs public class Shop { private string name; private List parts; public Shop(string name) { this.name = name; this.parts = new List(); } public void AddPart(Part part) { this.parts.Add(part); } public override string ToString() { StringBuilder result = new StringBuilder(); result.Append("Shop: " + this.name + "\n\n"); foreach (Part part in this.parts) { result.Append(part); result.Append("\n"); } return result.ToString(); } } It might be a subject of discussion whether we should use List or Set for the parts in the car shop. The set data structure has an advantage that it avoids any duplicates. Thus if we have for example few tires of certain model, they will be found only once in the set. To use set we need to be sure the parts are uniquely identified by their code or by some other unique identifier. In our case we assume we could have parts with exactly the same code, name, etc. which come at different buy and sell prices (e.g. if the prices change over the time). So we need to allow duplicated parts


1035

and thus using a set will not be appropriate. Parts in the shop will be kept in List. We will test the class Shop though the especially written class TestShop.

Step 6: The Class TestShop We created all classes we need. We have to create one more, with which we will have to demonstrate the usage of the rest of the classes. It will be named TestShop. In the Main() method we will create two manufacturers and a few cars. We will add them to two parts. We will add the parts to the Shop. At the end we will print everything on the console.

TestShop.cs public class TestShop { static void Main() { Manufacturer bmw = new Manufacturer("BWM", "Germany", "Bavaria", "665544", "876666"); Manufacturer lada = new Manufacturer("Lada", "Russia", "Moscow", "653443", "893321"); Car Car Car Car Car Car

bmw316i = new Car("BMW", "316i", 1994); ladaSamara = new Car("Lada", "Samara", 1987); mazdaMX5 = new Car("Mazda", "MX5", 1999); mercedesC500 = new Car("Mercedes", "C500", 2008); trabant = new Car("Trabant", "super", 1966); opelAstra = new Car("Opel", "Astra", 1997);

Part cheapPart = new Part("Tires 165/50/R13", 302.36m, 345.58m, lada, "T332", PartCategory.Tires); cheapPart.AddSupportedCar(ladaSamara); cheapPart.AddSupportedCar(trabant); Part expensivePart = new Part("Universal Car Engine", 6733.17m, 6800.0m, bmw, "EU33", PartCategory.Engine); expensivePart.AddSupportedCar(bmw316i); expensivePart.AddSupportedCar(mazdaMX5); expensivePart.AddSupportedCar(mercedesC500); expensivePart.AddSupportedCar(opelAstra); Shop newShop = new Shop("Tuning Pro Shop"); newShop.AddPart(cheapPart); newShop.AddPart(expensivePart);

1036


Console.WriteLine(newShop); } } This is the result of the execution of the above code:

Shop: Tuning Pro Shop Part: Tires 165/50/R13 -code: T332 -category: Tires -buyPrice: 302.36 -sellPrice: 345.58 -manufacturer: Lada ---Supported cars-- ---------------------Part: Universal Car Engine -code: EU33 -category: Engine -buyPrice: 6733.17 -sellPrice: 6800.0 -manufacturer: BWM ---Supported cars-- ----------------------

Testing the Solution At the end we need to test our code. In fact we have done this in the class TestShop. This doesn’t mean that we have tested entirely our problem. We have to check the border cases, for example when some of the lists are empty. Let’s make a little change of the code in Main() method, to start the program with an empty list:

static void Main() { Shop emptyShop = new Shop("Empty Shop"); Console.WriteLine(emptyShop);


Manufacturer lada = new Manufacturer("Lada", "Russia", "Moscow", "653443", "893321"); Part tires = new Part("Tires 165/50/R13", 302.36m, 345.58m, lada, "T332", PartCategory.Tires); Manufacturer bmw = new Manufacturer("BWM", "Germany", "Bavaria", "665544", "876666"); Part engineOil = new Part("BMW Engine Oil", 633.17m, 670.0m, bmw, "Oil431", PartCategory.Engine); engineOil.AddSupportedCar(new Car("BMW", "316i", 1994)); Shop ultraTuningShop = new Shop("Ultra Tuning Shop"); ultraTuningShop.AddPart(tires); ultraTuningShop.AddPart(engineOil); Console.WriteLine(ultraTuningShop); } The result of this test is:

Shop: Empty Shop Shop: Ultra Tuning Shop Part: Tires 165/50/R13 -code: T332 -category: Tires -buyPrice: 302.36 -sellPrice: 345.58 -manufacturer: Lada ---Supported cars-----------------------Part: BMW Engine Oil -code: Oil431 -category: Engine -buyPrice: 633.17 -sellPrice: 670.0 -manufacturer: BWM ---Supported cars-- ----------------------

1037

1038


From the result it seems the first shop is empty and in the second shop the list of cars for the first part is empty. This is the correct output. Therefore our program works correctly with the border case of empty lists. We can continue testing with other border cases (e.g. missing part name, missing price, missing manufacturer, etc.), as well as with some kind of performance test (e.g. shop with 300,000 parts for 5,000 cars and 200 manufacturers). We will leave this for the readers.

Exercises 1.

You are given an input file mails.txt, which contains names of users and their email addresses. Each line of the file looks like this:

@. There is a requirement for email addresses – can be a sequence of Latin letters (a-z, A-Z) and underscore (_), is a sequence of lower Latin letters (a-z), and has a limit of 2 to 4 lower Latin letters (a-z). Following the guidelines for problem solving write a program, which finds the valid email addresses and writes them together with the names of the users (in the same format as in the input) to an output file valid-mails.txt. Sample input file (mails.txt):

Steve Smith [email protected] Peter Miller pm?!>?#@? Result: > ? ! > ? # @ ? We might think of the above output as partially correct. In fact it does extract correctly the separators between the words but most of them are duplicated several times. We need all the separators without duplications, right?

Correcting the ExtractSeparators(…) Method To correct the method for extracting the separators between the words in the text, we can use a different data structure to keep them. We know that sets keep elements without duplications. So we could use HashSet instead of List to hold the separator characters we find in the text:

private static char[] ExtractSeparators(string text) { HashSet separators = new HashSet(); foreach (char character in text) { // If the character is not a letter, // then by definition it is a separator if (!char.IsLetter(character)) { separators.Add(character); } } return separators.ToArray(); } The code is almost the same, but we use a set instead of list to avoid duplicated separators. We might need to include the System.Linq namespace in the start of the program to use the ToArray() extension method for converting a hash set to an array.

Testing Again after the Fix We test the above method with the same testing code and we find it now works correctly. The separators are extracted correctly with no duplicates:

Test Case: This is wonderful!!! All separators like these ,.(? and these /* are recognized. It works.


1047

Result: ! , . ( ? / * Test Case: SingleWord Result: Test Case: Result: Test Case: >?!>?#@? Result: > ? ! # @ We test also with some borderline cases – text consisting of a single word without separators; text consisting of separators only; an empty string. We’ve already included such tests in our GetTestData() method. It seems that the method works fine and we can proceed to the next step.

Step 2: Splitting Up the Text in Separate Words We will use string’s Split(…) method with the specified separators for splitting up the text by the separators and extracting the words from it. This is how our method looks like:

private static string[] ExtractWords(string text) { char[] separators = ExtractSeparators(text); string[] words = text.Split(separators, StringSplitOptions.RemoveEmptyEntries); return words; } Testing the Word Extracting Method Before we carry on to the next step, we have to see if the method works correctly. To do this, we will reuse the GetTestData() for the input test data and we will test the new ExtractWords(…) method:

private static void TestExtractWords() { List testData = GetTestData(); foreach (string testCase in testData) {

1048


Console.WriteLine("\nTest Case: {0}", testCase); string[] words = ExtractWords(testCase); Console.WriteLine("Result: {0}", string.Join(" ", words)); } } static void Main() { TestExtractWords(); } The result from the above test looks correct:

Test Case: This is wonderful!!! All separators like these ,.(? and these /* are recognized. It works. Result: This is wonderful All separators like these and these are recognized It works Test Case: SingleWord Result: SingleWord Test Case: Result: Test Case: >?!>?#@? Result: We check the results from the other test cases. We verify that they are correct and that our algorithm is accurate (till this stop).

Step 3: Determining Whether a Word Is in Uppercase or Lowercase We already have an idea how to implement the uppercase / lowercase checks, and we can write the corresponding methods directly:

private static bool IsUpperCase(string word) { bool result = word.Equals(word.ToUpper()); return result; } private static bool IsLowerCase(string word)


1049

{ bool result = word.Equals(word.ToLower()); return result; } We test the above methods by passing words in uppercase, lowercase and mixed case. The results are correct.

Step 4: Counting the Words Now we can proceed to solving the problem itself – counting the words. All we have to do is iterate through the list of words and depending on the word’s type to increment the corresponding counters. Then we print the result:

private static void CountWords(string[] words) { int allUpperCaseWordsCount = 0; int allLowerCaseWordsCount = 0; foreach (string word in words) { if (IsUpperCase(word)) { allUpperCaseWordsCount++; } else if (IsLowerCase(word)) { allLowerCaseWordsCount++; } } Console.WriteLine("Total words count: {0}", words.Length); Console.WriteLine("Upper case words count: {0}", allUpperCaseWordsCount); Console.WriteLine("Lower case words count: {0}", allLowerCaseWordsCount); } Testing the Word Counting Method Let’s check if we count the words correctly. We will write another test method using the data from the GetTestData() method and the previously written and tested ExtractWords(…) method:

private static void TestCountWords() {

1050


List testData = GetTestData(); foreach (string testCase in testData) { Console.WriteLine("Test Case: {0}", testCase); Console.WriteLine("Result: "); CountWords(ExtractWords(testCase)); Console.WriteLine(); } } static void Main() { TestCountWords(); } Executing the application, we obtain the correct result: Test Case: This is wonderful!!! All separators like these ,.(? and these /* are recognized. It works. Result: Total words count: 13 Upper case words count: 0 Lower case words count: 10 Test Case: SingleWord Result: Total words count: 1 Upper case words count: 0 Lower case words count: 0 Test Case: Result: Total words count: 0 Upper case words count: 0 Lower case words count: 0 Test Case: >?!>?#@? Result: Total words count: 0 Upper case words count: 0 Lower case words count: 0 The above results are correct (the typical case and a few borderline cases). We perform few other borderline tests, e.g. when the list contains words in uppercase or lowercase only, or when the list is empty. All of them work correctly.


1051

Note that it is a good idea to use unit testing instead of these semiautomated tests. Recall how we write unit tests in Visual Studio (in the chapter “High-Quality Code”) and try to convert our test methods to unit tests for the Visual Studio Team Test (VSTT) framework.

Step 5: Console Input All that’s left to implement is the final step – allowing the user to input text:

private static string ReadText() { Console.WriteLine("Enter text:"); return Console.ReadLine(); } Note that as a rule unless the input comes from a text file or is very short (e.g. just one number or few characters) it should be read as a final step. Otherwise we will need to enter the input data each time when we start the program and this will waste a lot of time and can lead to errors.

Step 6: Putting All Together Now after all subproblems have been solved, we can proceed to the complete solution to the problem. We need to add a Main(…) method, which will combine together the different parts of the solution:

static void Main() { string text = ReadText(); string[] words = ExtractWords(text); CountWords(words); }

Testing the Solution While implementing the solution, we wrote test methods for every method, integrating them with each other gradually. For the moment, we are certain they interact correctly; there’s nothing we have overlooked and there is no method that does unnecessary work or that returns incorrect results. If we would like to test the solution with more data, we would only need to add it to the GetTestData(…) method. If we want, we may even rewrite the GetTestData(…) method so that it reads the test data from an external source, e.g. from a text file. Here’s how the final solution looks like at the end:

WordsCounter.cs

1052


using System; using System.Collections.Generic; using System.Linq; public class WordsCounter { static void Main() { string text = ReadText(); string[] words = ExtractWords(text); CountWords(words); } private static string ReadText() { Console.WriteLine("Enter text:"); return Console.ReadLine(); } private static char[] ExtractSeparators(string text) { HashSet separators = new HashSet(); foreach (char character in text) { // If the character is not a letter, // then by definition it is a separator if (!char.IsLetter(character)) { separators.Add(character); } } return separators.ToArray(); } private static string[] ExtractWords(string text) { char[] separators = ExtractSeparators(text); string[] words = text.Split(separators, StringSplitOptions.RemoveEmptyEntries); return words; } private static bool IsUpperCase(string word) {


1053

bool result = word.Equals(word.ToUpper()); return result; } private static bool IsLowerCase(string word) { bool result = word.Equals(word.ToLower()); return result; } private static void CountWords(string[] words) { int allUpperCaseWordsCount = 0; int allLowerCaseWordsCount = 0; foreach (string word in words) { if (IsUpperCase(word)) { allUpperCaseWordsCount++; } else if (IsLowerCase(word)) { allLowerCaseWordsCount++; } } Console.WriteLine("Total words count: {0}", words.Length); Console.WriteLine("Upper case words count: {0}", allUpperCaseWordsCount); Console.WriteLine("Lower case words count: {0}", allLowerCaseWordsCount); } } We removed the testing methods from our code to simplify it. The best practice is instead of removing the tests to create a separate testing project and put all the tests in a testing class. This is best achieved though the Visual Studio’s unit testing framework, as it was shown in the chapter “High-Quality Code”.

A Word on Performance Since there are no explicit performance requirements, we will only make a suggestion for dealing with the situation when the algorithm turns out to be slow. Splitting the text with separators assumes that the entire text will be

1054


loaded into memory. The list of words, after partitioning the text, will also be written to memory. Therefore, if the input text is large, the program will also consume a large amount of memory. For example, if the input text is 200MB long, then the program will consume at least 800MB of memory, because each word is stored as 2 bytes for every character (.NET uses UTF-16 character encoding for the strings in memory). If we want to avoid high memory consumption then the words must not be stored in memory all at once. We can come up with another algorithm: scanning the text char by char and storing the letters into a buffer (such as StringBuilder). If at a certain moment a separator is encountered, then the buffer contains the most recent word. We can analyze its casing and then empty the buffer. We can repeat this until the end of the file is reached. This appears to be more efficient, doesn’t it? A more efficient lower / upper case checker would be to iterate through all letters using a loop and to examine them char by char. That way we can skip a lower / upper case conversion, which allocates extra memory for every word. After the word has been processed, the memory will be freed, which would eventually lead to extra CPU utilization (for the .NET garbage collector). Obviously, the latter solution is more efficient. The question is if we should scrap the original solution and write a completely different one. It all depends on the performance requirements. The problem description doesn’t hint at an input text measuring in the hundreds of megabytes. Therefore the current solution, although not optimal, is still correct and will suffice. We suggest the reader to implement the proposed fast solution and to compare how faster it is, e.g. by processing an input of 100 MB.

Problem 2: A Matrix of Prime Numbers Write a program that reads a positive integer N from the standard input and prints the first N2 prime numbers as a square matrix of size N x N. The matrix must be filled with numbers starting from the first row and ending at the last one. Each row must be filled with prime numbers from left to right. Note: A prime number is a number that has no divisors other than 1 and itself. The number 1 is not a prime number. Sample input:

2

3

4

2 3 5 7 11 13 17 19 23

2 3 5 11 13 23 29 41 43

Sample output:

2 3 5 7

7 17 19 31 37 47 53


1055

Coming Up with an Appropriate Idea for a Solution We can solve the problem by printing the rows and columns of the resulting matrix using two nested loops. For each of its elements we will extract and print the corresponding prime number.

Breaking Down the Problem into Subproblems We must solve at least two subproblems – finding each successive prime number and printing the prime numbers into a matrix. We can print the matrix right away, but the process of finding each successive prime number will require additional thinking. Perhaps the most intuitive way to accomplish this is to start testing the primality of each number starting from the last prime number that we found. When a new prime is encountered, it is returned as a result. Thus, a new subproblem has come up – checking whether or not a number is a prime.

Verifying the Idea Our idea for a solution leads directly to the required result. We write down a couple of examples on a piece of paper and make sure that it works.

Consider the Data Structures The problem makes use of one data structure only – a matrix. It’s only natural to use a two-dimensional array (matrix).

Consider the Efficiency Displaying at the console large matrices (for example of size 1000 x 1000) cannot be properly handled. This means our solution should work for reasonably large matrices, e.g. on the order of N ≤ 200. We don’t need to consider cases where the matrix is too large. When N = 200, our algorithm will find the first 40,000 prime numbers and should not run slowly. Now we are ready for the implementation of the algorithm we invented.

Step 1: Check to Find If a Number Is a Prime To test a number for primality, we can define a method called IsPrime(…). The test will verify that dividing the number by any of its predecessors always yields a division remainder. To be more precise, it is sufficient to check the integers between 2 and the square root of the number. This holds true, because if the number p has a divisor x, then p = x.y, and at least one or both of the numbers x and y will be less than or equal to the square root of p. What follows is an implementation of the method:

private static bool IsPrime(int number) {

1056


int maxDivider = (int)Math.Sqrt(number); for (int divider = 2; divider … Sample input file words.txt:

for academy student Java develop


1079

CAD Sample input file text.txt:

The Telerik Academy for software development engineers is a famous center for free professional training of .NET experts. Telerik Academy offers courses designed to develop practical computer programming skills. Students graduated the Academy are guaranteed to have a job as a software developers in Telerik. Sample result file result.txt:

for --> 2 academy --> 3 student --> 1 Java --> 0 develop --> 3 CAD --> 3 Below are the locations of the matched words from the above example:

The Telerik Academy for software development engineers is a famous center for free professional training of .NET experts. Telerik Academy offers courses designed to develop practical computer programming skills. Students graduated the Academy are guaranteed to have a job as a software developers in Telerik.

Start Thinking on the Problem The emphasis of the given problem seems not so much on the algorithm, but on its technical implementation. In order to write the solution we must be familiar with working with files in C# and with the basic data structures, as well as string processing in .NET Framework.

Inventing an Idea for a Solution We get a piece of paper, write few examples and we come up with the following idea: we read the words file, scan through the text and check each word from the text for matches with the preliminary given list of words and increase the counter for each matched word.

Checking the Idea The above idea for solving the task is trivial but we can still check it by writing down on a piece of paper the sample input (words and text) and the expected result. We just scan through the text word by word in our paper

1080


example and when we find a match with some of the preliminary given words (as a substring) we increment the counter for the matched word. The idea works in our example. Now let’s think of counterexamples. In the same time we might also come with few questions regarding the implementation: - How do we scan the text and search for matches? We can scan the text character by character or line by line or we can read the entire text in the memory and then scan it in the memory (by string matching or by a regular expression). All of these approaches might work correctly but the performance could vary, right? We will think about the performance a bit later. - How do we extract the words from the text? Maybe we can read the text and split it by all any non-letter characters? Where shall we take these non-letter characters from? Or we can read the text char by char and once we find a non-letter character we will have the next word from the text? The second idea seems faster and will require less memory because we don’t need to read all the text at once. We should think about this, right? - How do we match two words? This is a good question. Very good question. Suppose we have a word from the text and we want to match it with the words from the file words.txt. For example, we have “Academy” in the text and we should find whether it matches as substring the “CAD” word from the list of words. This will require searching each word from the list as a substring in each word from the text. Also can we have some word appearing several times inside another? This is possible, right? From all the above questions we can conclude that we don’t need to read the text word by word. We need to match substrings, not words! The title of the problem is misleading. It says “Counting Words in a Text File” but it should be “Counting Substrings in a Text File”. It is really good that we found we have to match substrings (instead of words), before we have implemented the code for the above idea, right?

Inventing a Better Idea Now, considering the requirement for substring matching, we come with few new and probably better ideas about solving the problem: - Scan the text line by line and for each line from the text and each word check how many times the word appears as substring in the line. The last can be counted with String.IndexOf(…) method in a loop. We already have solved this subproblem in the chapter “Strings and Text Processing” (see the section “Finding All Occurrences of a Substring”).


1081

- Read the entire text and count the occurrences of each word in it (as a substring). This idea is very similar to the previous idea but it will require much memory to read the entire text. Maybe this will not be efficient. We gain nothing, but potentially we will run “out of memory”. - Scan the text char by char and store the read characters in a buffer. After each character read we check if the text in the buffer ends with some of the words from the list. We will not need to search the words in the buffer because we check for each word after each character is read. We could also clear the buffer when we read any non-letter character (because the list of words for matching should contain letters only). Thus the memory consumption will be very low. The first and the last idea seem to be good. Which of them to implement? Maybe we could implement both of them and choose the faster one. Having two solutions will also improve the testing because we should get identical results with both of the solutions on all test cases.

Checking the New Ideas We have two good ideas and we need to check them for correctness before thinking about implementation. How to check the ideas? We can invent a good test case on a piece of paper and try the ideas on it. Let’s have the following list of words:

Word S MissingWord DS aa We might be interested to find the number of occurrences of the above words in the following text:

Word? We have few words: first word, second word, third word. Some passwords: PASSWORD123, @PaSsWoRd!456, AAaA, !PASSWORD The expected result is as follows:

Word --> 9 S --> 13 MissingWord --> 0 DS --> 2 aa --> 3 In the above example we have many different special cases: whole-word matching, matching as a substring, matching in different casing, matches in the start / end of the text, several matches inside the same word, overlapping

1082


matches, etc. This example is a very good representative of the common case for this problem. It is important to have such short but comprehensive test case when solving programming problems. It is important to have it early, when checking the ideas, before any code is written. This avoids mistakes, catches incorrect algorithms and saves time!

Checking the Line by Line Algorithm Now let’s check the first algorithm: read the two lines of text and check how many times each of the words from the given list occurs in each line ignoring the character casing. At the first line we find as substrings (ignoring the case) “word” 5 times, “s” 3 times, “MissingWord” 0 times, “aa” 0 times and “ds” – 1 time. At the second line we find as substrings (ignoring the case) “word” 4 times, “s” 10 times, “MissingWord” 0 times, “aa” 3 times and “ds” – 1 time. We sum the occurrences and we find that the result is correct. We try to find counterexamples, but we can’t. The algorithm may not work with words spanning multiple lines. This is not possible by definition. It may also have issues with the overlapping matches like finding “aa” in “AAaA”. This will be definitely checked after the algorithm is implemented.

Checking the Char by Char Algorithm Let’s check the other algorithm: scan through the text char by char, holding the characters in a buffer. After each character if the buffer ends with some of the words (ignoring the character casing), the occurrences of the matched word are increased. If a non-letter is occurred, the buffer is cleaned. We start from empty buffer and append the first char from the text “W” to the buffer. None of the words match the end of the buffer. We append “o” and the buffer holds “Wo”. No matches. Then we append “r”. The buffer holds “Wor”. Again no matches are found with any of the words. We append the next char “d” and the buffer holds “Word”. We have found a match with the word form a list: “word”. We increase the number of occurrences of the matched word from zero to one. The next char is “?” and we clean the buffer, because it is not a letter. The next char is “ ” (space). We again clean the buffer. The next char is “W”. We append it to the buffer. No matches with any of the words. We continue further and further… After the last character is processed, the algorithm finishes and the results are correct. We try to find counterexamples, but we can’t. The algorithm may not work with words spanning multiple lines, but this is not possible by definition.

Decompose the Problem into Subproblems Now let’s try to divide the problem into subproblems. This should be done separately for the both algorithms we want to try because they differ significantly.

Line by Line Algorithm Decomposed into Subproblems Let’s decompose the line by line algorithm into subproblems (sub-steps):


1083

1. Read the input words. We can read the file words.txt by using File.ReadAllLines(…). It reads a text file in a string[] array of lines. 2. Process the lines of the text one by one to count the occurrences of each word in it. Initially assign zero occurrences for each word. Read the input file text.txt line by line. For each line from the text and for each word check the number of its occurrences (this is a separate subproblem) and increase the counters for each match. The occurrences counting should be case-insensitive. 3. Count the number of occurrences of certain substring in certain text. This is a separate subproblem. We find the leftmost occurrence of the substring in the text though string.IndexOf(…). If the returned index > -1 (the substring exists), we increase the counter and find the next occurrence of the substring on the right from the last found index. We perform this in a loop until we find -1 as a result which means that there are no more matches. To perform case-insensitive searching we can pass a special parameter StringComparison.OrdinalIgnoreCase to the IndexOf() method. 4. Print the results. Process all words and for each word print it along with its counter holding its occurrences in the output file result.txt.

Char by Char Algorithm Decomposed into Subproblems Let’s decompose the char by char algorithm into subproblems (sub-steps): 1. Read the input words. We can read the file words.txt by using File.ReadAllLines(…). It reads a text file in a string[] array of lines. The original words can be saved and a copy of them in lowercase can be made to simplify the matching with ignoring the character casing. 2. Process the text char by char. Read the input file text.txt and append the letters into a buffer (StringBuilder). After each letter appended check whether the text in the buffer ends with some of the words in the input list of words (this check is a separate subproblem). If so, increase the number occurrences of the matched word. If a nonletter character is found, clean the buffer. Letters are converted to lowercase before added in the buffer. 3. Check whether a certain text (StringBuilder) ends by a certain string. In case the string has length n lower than the length of the text, the result is false. Otherwise the n letters of the string should be compared one by one with the last n letters of the text. If a mismatch is found, the result is false. If all checks pass, the result is true. 4. Print the results. Process all words and for each word print it along with its counter holding its occurrences in the output file result.txt.

1084


Think about the Data Structures In the line by line algorithm we don’t have any need of special data structures. We can keep the words in an array or list of strings. We can keep the number of occurrences for each word in array of integer values. The text lines we can keep in strings. In the char by char algorithm the situation is similar. We don’t need any special data structures. We can keep the words in an array or list of strings. We can keep the number of occurrences for each word in array of integer values. The buffer for the characters we can implement by StringBuilder (because we need to append chars many times).

Think about the Performance Following the guidelines for problem solving from the chapter “Methodology of Problem Solving” we should think about the efficiency and performance before writing any code. The line by line algorithm will process the entire text line by line and for each text line it will search for all of the words. Thus if the text has a total size of t characters and the number of words are w, the algorithm will totally perform w string searches in t characters. Each search for a word in the text will pass through the entire text (at least once, but maybe not always). If we assume that searching for a word in a text is a linear time operation, we will have w scans through the entire text, so the excepted running time in quadratic: O(w*t). If we search in MSDN or in Internet, we will be unable to find any information about how exactly String.IndexOf(…) works internally and whether it runs in linear time or it is slower. This method calls a Win32 API function so it cannot be decompiled. Thus the best way to check its performance is by measuring. The char by char algorithm will process the entire text char by char and for each character it will perform a string matching for each of the words. Suppose the text has t characters and the number of the words is w. In the average case the string matching will run in constant time (it will require just one check if the first letter is not matching, two checks if the first letter matches, etc.). In the worst case the string matching will require n comparisons where n is the length of the word being matched. Thus in the average case the expected running time of the algorithm will be quadratic: O(w*t). In the worst case it will be significantly slower. It seems like the line by line algorithm is expected to run faster but we are uncertain about how fast is string.IndexOf(…), so this cannot be definitely stated. If we are at an exam, we will probably choose to implement the line by line algorithm. Just for the experiment, let’s implement both of them and compare their performance.


1085

Implementation: Step by Step If we directly follow the steps, which we have already identified we can write the code with ease. Of course it is better to implement the algorithms step-by-step, to find and fix the bugs early.

Line by Line Algorithm: Step by Step Implementation We can start implementing the line by line algorithm for word counting in a text file from the method that counts how many times a substring appears in a text. It should look like the following:

static int CountOccurrences( string substring, string text) { int count = 0; int index = 0; while (true) { index = text.IndexOf(substring, index); if (index == -1) { // No more matches break; } count++; } return count; } Let’s test it before going further:

Console.WriteLine( CountOccurrences("hello", "Hello World Hello")); The result is 0 – wrong! It seems like we have forgotten to ignore the character casing. Let’s fix this. We need to change the name of the method as well and add the StringComparison.OrdinalIgnoreCase option when searching for the given substring:

static int CountOccurrencesIgnoreCase( string substring, string text) { int count = 0; int index = 0; while (true) {

1086


index = text.IndexOf(substring, index, StringComparison.OrdinalIgnoreCase); if (index == -1) { // No more matches break; } count++; } return count; } Let’s test again with the same example. The program hangs! What happens? We step through the code using the debugger and we find that the variable index takes the first occurrence at position 0 and at the next iteration it takes the same occurrence again at position 0 and the program enters into an endless loop. This is easy to fix. Just start searching from position index+1 (the next position on the right), not from index:

static int CountOccurrencesIgnoreCase( string substring, string text) { int count = 0; int index = 0; while (true) { index = text.IndexOf(substring, index + 1, StringComparison.OrdinalIgnoreCase); if (index == -1) { // No more matches break; } count++; } return count; } We run the fixed code with the same test. Now the result is incorrect (1 occurrence instead of 2). We again trace the program with the debugger and we find that the first match is at position 12. Immediately we find out why this happens: initially we start from position 1 (index + 1 when index is 0) and we skip the start of the text (position 0). This is easy to fix:


1087

static int CountOccurrencesIgnoreCase( string substring, string text) { int count = 0; int index = -1; while (true) { index = text.IndexOf(substring, index + 1, StringComparison.OrdinalIgnoreCase); if (index == -1) { // No more matches break; } count++; } return count; } We test again with the same example and finally the result is correct. We take another, more complex test:

Console.WriteLine(CountOccurrencesIgnoreCase( "Word", "Word? We have few words: first word, second word," + "third word. Passwords: PASSWORD123, @PaSsWoRd, !PASSWORD")); The result is again correct (9 matches). We test with missing word and the result is again correct (0 matches). This is enough. We assume the method works correctly. Now let’s continue with the next step: read the words.

string[] words = File.ReadAllLines("words.txt"); There is no need to test this code. It is too simple to have bugs. We will test it when we test the entire solution. Let’s not write the main logic of the program which reads the text line by line and counts the occurrences of each of the input words in each of the lines:

int[] occurrences = new int[words.Length]; using (StreamReader text = File.OpenText("text.txt")) { string line; while ((line = text.ReadLine()) != null) { for (int i = 0; i < words.Length; i++) {

1088


string word = words[i]; int wordOccurrences = CountOccurrencesIgnoreCase(word, line); occurrences[i] += wordOccurrences; } } } This code definitely should be tested but it will be easier to write the code which prints the results to simplify testing. Let’s do this:

using (StreamWriter result = File.CreateText("result.txt")) { for (int i = 0; i < words.Length; i++) { result.WriteLine("{0} --> {1}", words[i], occurrences[i]); } } The complete implementation of the line by line string occurrences counting algorithms looks as follows:

CountSubstringsLineByLine.cs using System; using System.IO; public class CountSubstringsLineByLine { static void Main() { // Read the input list of words string[] words = File.ReadAllLines("words.txt"); // Process the file line by line int[] occurrences = new int[words.Length]; using (StreamReader text = File.OpenText("text.txt")) { string line; while ((line = text.ReadLine()) != null) { for (int i = 0; i < words.Length; i++) { string word = words[i]; int wordOccurrences =


1089

CountOccurrencesIgnoreCase(word, line); occurrences[i] += wordOccurrences; } } } // Print the result using (StreamWriter result = File.CreateText("result.txt")) { for (int i = 0; i < words.Length; i++) { result.WriteLine("{0} --> {1}", words[i], occurrences[i]); } } } static int CountOccurrencesIgnoreCase( string substring, string text) { int count = 0; int index = -1; while (true) { index = text.IndexOf(substring, index + 1, StringComparison.OrdinalIgnoreCase); if (index == -1) { // No more matches break; } count++; } return count; } }

Testing the Line by Line Algorithm Now let’s test the entire code of the program. We try our test and it works as expected!

text.txt

1090


Word? We have few words: first word, second word, third word. Some passwords: PASSWORD123, @PaSsWoRd!456, AAaA, !PASSWORD words.txt Word S MissingWord DS aa result.txt Word --> 9 S --> 13 MissingWord --> 0 DS --> 2 aa --> 3 We also try the sample test from the problem description and it also works correctly. We try few other tests and all they work correctly. We try also few border cases like empty text and empty list of words. All these cases are handled correctly. It seems like our line by line word counting algorithm and its implementation correctly solve the problem. We need to conduct only a performance test but let’s first implement the other algorithm to be able to compare which is faster.

Char by Char Algorithm: Step by Step Implementation Let’s now implement the char by char string occurrences counting algorithm. We will need a StringBuilder to hold the characters we read and a method to check for a match at the end of the StringBuilder. Let’s define this method first. For more flexibility it can be implemented as extension method to the StringBuilder class (recall how extension methods work from the chapter “Lambda Expressions and LINQ”):

static bool EndsWith(this StringBuilder buffer, string str) { for (int bufIndex = buffer.Length-str.Length, strIndex = 0; strIndex < str.Length; bufIndex++, strIndex++) { if (buffer[bufIndex] != str[strIndex]) { return false;


1091

} } return true; } Let’s test the method with a sample text and its ending:

Console.WriteLine( new StringBuilder("say hello").EndsWith("hello")); This test produces a correct result: True. Let’s test the negative case:

Console.WriteLine(new StringBuilder("abc").EndsWith("xx")); This test produces a correct result: False. Let’s test what will happen if the ending is longer than the test:

Console.WriteLine(new StringBuilder("a").EndsWith("abcdef")); We get IndexOutOfRangeException. We found a bug! It is easy to fix – we can return false if the ending string is longer than the text where it should be found:

static bool EndsWith(this StringBuilder buffer, string str) { if (buffer.Length < str.Length) { return false; } for (int bufIndex = buffer.Length - str.Length, strIndex = 0; strIndex < str.Length; bufIndex++, strIndex++) { if (buffer[bufIndex] != str[strIndex]) { return false; } } return true; } We run all the tests again and all of them pass. We assume the above method is correctly implemented. Now let’s continue with the step-by-step implementation. Let’s implement the reading of the words:

1092


string[] wordsOriginal = File.ReadAllLines("words.txt"); This is the same code from the line by line algorithm and it should work. Let’s now implement the main program logic which reads the text char by char in a buffer of characters and after each letter checks all input words for matches at the ending of the buffer:

int[] occurrences = new int[words.Length]; using (StreamReader text = File.OpenText("text.txt")) { StringBuilder buffer = new StringBuilder(); int nextChar; while ((nextChar = text.Read()) != -1) { char ch = (char)nextChar; if (char.IsLetter(ch)) { // A letter is found --> check all words for matches buffer.Append(ch); for (int i = 0; i < words.Length; i++) { string word = words[i]; if (buffer.EndsWith(word)) { occurrences[i]++; } } } else { // A non-letter character is found --> clean the buffer buffer.Clear(); } } } To test the code we will need few lines of code to print the output:

using (StreamWriter result = File.CreateText("result.txt")) { for (int i = 0; i < words.Length; i++) { result.WriteLine("{0} --> {1}", words[i], occurrences[i]);


1093

} } Now the program is completed and we should test it.

Testing the Char by Char Algorithm Let’s test the entire code of the program. We try our test and it fails. The produced result is incorrect:

Word --> 1 S --> 6 MissingWord --> 0 DS --> 0 aa --> 0 What’s wrong? Maybe the character casing? Do we compare the characters in case-insensitive fashion? No. We found the problem. How to fix the character casing? Maybe we need to fix the EndsWith(…) method. We search in MSDN and in Internet and we cannot find a method to compare case-insensitively characters. We can do something like this:

if (char.ToLower(ch1) != char.ToLower(ch2)) … The above code will work but it will convert the characters to lowercase many times, at each character comparison. This may be slow so it is better to lowercase the words and the text preliminary before comparing. If we lowercase the words, they will be printed in lowercase at the output and this will be incorrect. So we need to remember the original words and to make a copy of them in lowercase. Let’s try it. We can use the built-in extension methods from System.Linq to perform the lowercase conversion:

string[] wordsOriginal = File.ReadAllLines("words.txt"); string[] wordsLowercase = wordsOriginal.Select(w => w.ToLower()).ToArray(); We need to apply few other fixes and finally we get the following full source code of the char by char algorithm for counting the occurrences of a list of substrings in given text:

CountSubstringsCharByChar.cs using System.IO; using System.Linq; using System.Text;

1094


public static class CountSubstringsCharByChar { static void Main() { // Read the input list of words string[] wordsOriginal = File.ReadAllLines("words.txt"); string[] wordsLowercase = wordsOriginal.Select(w => w.ToLower()).ToArray(); // Process the file char by char int[] occurrences = new int[wordsLowercase.Length]; StringBuilder buffer = new StringBuilder(); using (StreamReader text = File.OpenText("text.txt")) { int nextChar; while ((nextChar = text.Read()) != -1) { char ch = (char)nextChar; if (char.IsLetter(ch)) { // A letter is found --> check all words for matches ch = char.ToLower(ch); buffer.Append(ch); for (int i = 0; i < wordsLowercase.Length; i++) { string word = wordsLowercase[i]; if (buffer.EndsWith(word)) { occurrences[i]++; } } } else { // A non-letter is found --> clean the buffer buffer.Clear(); } } } // Print the result using (StreamWriter result = File.CreateText("result.txt")) { for (int i = 0; i < wordsOriginal.Length; i++)


1095

{ result.WriteLine("{0} --> {1}", wordsOriginal[i], occurrences[i]); } } } static bool EndsWith(this StringBuilder buffer, string str) { if (buffer.Length < str.Length) { return false; } for (int bufIndex = buffer.Length-str.Length, strIndex = 0; strIndex < str.Length; bufIndex++, strIndex++) { if (buffer[bufIndex] != str[strIndex]) { return false; } } return true; } } We need to test again with our example. Now the program works. The result is correct:

Word --> 9 S --> 13 MissingWord --> 0 DS --> 2 aa --> 3 We test with all other tests we have (the test from the problem statement, the border cases, etc.) and all of them pass correctly.

Testing for Performance Now it is time to test for performance both our solutions. We need a big test. We can do it with copy-paste. It is easy to copy-paste the text from our text example 10,000 times and its words 100 times. The repeating words might cause inaccuracies in performance measuring so we manually replace the last 26 words with the letters from “a” to “z”. We also play a bit with the rectangular selection in Visual Studio ([Alt] + mouse selection)

1096


and we insert the alphabet as a vertical column in few other places. All this will result in 20,000 lines of text (1.2 MB) and 500 words (3 KB). To measure the execution time we add two lines of code – before the first line of the Main() method and after the last line of the Main() method:

static void Main() { DateTime startTime = DateTime.Now; // The original code goes here Console.WriteLine(DateTime.Now - startTime); } Now we execute first the line by line algorithm and it seems not very fast. On average computer from 2008 it prints the following result:

00:01:33.6393559 After that we execute the char by char algorithm. It produces the following output:

00:00:18.1080357 Unbelievable! Our char by char processing algorithm is more than 5 times faster than the line by line processing algorithm! But … it still is slow! 18 seconds for 1 MB file is not fast. How about processing 500 MB input and search for 10,000 words?

Invent a Better Idea (Again) If we are at exam, we could decide whether to take the risk to submit the char by char solution or spend more time to think of faster algorithm. This depends on how much time we have to the end of the exam and how much problems we have already solved, how hard are the unsolved problems, etc. Suppose we have enough time and we want to think more. What makes our solution slow? If we have 500 words, we check for each of them at each character. We do 500 * length(text) string comparisons. The text is scanned only once (char by char). This cannot be improved, right? If we do not scan the entire text, we will be unable to find all occurrences. So if we want to improve the performance, we should look how to check the words faster after each character is read, right? For 500 words we perform 500 checks after each character is read. This is slow! Can’t we do it faster? In fact we perform a kind of searching for a matching word in a list of words? From the data structures we know that this takes linear time. Also, from the data structures we know that the fastest data structure for searching is the hash-table. OK, can’t we use a hash table? Instead of


1097

searching the words by trying each of them one by one, can’t we directly find the word we need through a hast-table lookup? We take a sheet of paper and the pencil and we start making sketches and thinking. Suppose we have the text “passwords” and the word “s”. We can check the word that we obtain when we append the letters one after another:

p, pa, pas, pass, passw, passwo, passwor, password, passwords In this case we will not match the word “s”, right. In fact, when we find a word in the text, we should check all its substrings in the hash table. For example if the text is “password”, all its substrings are:

p, pa, a, pas, as, s, pass, ass, ss, s, passw, assw, ssw, sw, w, passwo, asswo, sswo, swo, wo, o, passwor, asswor, sswor, swor, wor, or, r, password, assword, ssword, sword, word, ord, rd, d, passwords, asswords, sswords, swords, words, ords, rds, ds, s There are 45 substrings of the word “password”. In a word of n characters we have n*(n+1)/2 substrings. This will work well with short words (e.g. 3-4 characters) and will be slow for the long words (e.g. 15-20 characters). We get into another idea? This multi-pattern matching problem should have a standard solution. Why don’t we search for it in Internet? We try to search for “multi-pattern matching algorithm” in Google and after exploring the first few results we learn about the “Aho-Corasick string matching algorithm”. Once we know the algorithm name we search for “Aho Corasick C#” and we find a nice C# implementation: https://github.com/tupunco/Tup.AhoCorasick. The theory says that after we have a new idea, we should check it for correctness. The best way to check this idea is by putting the code we found in action. In fact we do not implement the algorithm. We just try to adopt it to solve the problem we have.

Counting Substrings with the Aho-Corasick Algorithm From the open-source implementation of the Aho-Corasick multi-pattern string matching algorithm mentioned above we can take the class AhoCorasickSearch and put it in action. We write a new solution of the substring counting problem based on what we have learned from the previous solutions. We find all matches of all words by the SearchAll(…) method of the AhoCorasickSearch class. Then we use a hash-table to count the number of occurrences for each of the words. To ensure we ignore the character casing we convert the text and the words into lowercase. This is the code:

CountSubstringsAhoCorasick.cs using System; using System.Collections.Generic;

1098


using System.Linq; using System.IO; class CountSubstringsAhoCorasick { static void Main() { DateTime startTime = DateTime.Now; // Read the input list of words string[] wordsOriginal = File.ReadAllLines("words.txt"); string[] wordsLowercase = wordsOriginal.Select(w => w.ToLower()).ToArray(); // Read the text string text = File.ReadAllText("text.txt").ToLower(); // Find all word matches and count them var search = new AhoCorasickSearch(); var matches = search.SearchAll(text, wordsLowercase); Dictionary occurrences = new Dictionary(); foreach (string word in wordsLowercase) { occurrences[word] = 0; } foreach (var match in matches) { string word = match.Match; occurrences[word]++; } // Print the result using (StreamWriter result = File.CreateText("result.txt")) { foreach (string word in wordsOriginal) { result.WriteLine("{0} --> {1}", word, occurrences[word.ToLower()]); } } Console.WriteLine(DateTime.Now - startTime); }


1099

} We test the above code with all tests we already have and it seems to work correctly. We try the performance test and this time we can be amazed by its speed:

00:00:00.6540374 It runs really fast. This is the solution we needed and if we are allowed to use Internet at the exam, the best way to start when we have a standard well-known problem is to look for a well-known solution.

Problem 3: School Students, which are studying in a school, are separated into groups. Each of the groups has a teacher. The following information is kept for the students: first name and last name. The following information is kept for the groups: name, a list of students and teacher. The following information is kept for the teachers: first name, last name and a list of groups he is teaching. Each teacher can teach more than one group. The following information is kept for the school: name, list of the teachers, list of the groups and list of the students. Your task is to: 1. Design a set of classes and relationships between them to model the school, its students, teachers and groups. 2. Implement functionality for add / edit / delete teachers, students, groups and their properties. 3. Implement functionality for printing in human-readable form the school, the teachers, the students, the groups and their properties. 4. Write a sample test program, which demonstrates the work of the implemented classes and methods. Example of school with teachers, students and groups:

School "Freedom". Teachers: Tom Johnson, Elizabeth Hall. Group "English": David Russell, Nicholas Grant, Emma Fletcher, John Brown, Emily Cooper, teacher Elizabeth Hall. Group "French": Kevin Simmons, Ian Hayes, teacher Elizabeth Hall. Group "Informatics": Jessica Carter, Andrew Cooper, Ashley Moore, Olivia Adams, Jonathan Smith, teacher Tom Johnson.

Start Thinking on the Problem This is a good example of an exam assignment the purpose of which is to test your abilities to use object-oriented programming (OOP) for modeling

1100


problems from the real life, design classes and relationships between them as well as working with collections. All we need to solve this problem is to use our object-oriented modeling skills that we have gained from chapter “Object-Oriented Programming Principles”, especially from the section “Object-Oriented Modeling (OOM)”.

Inventing an Idea for Solution In this task there is nothing complex to invent. It is not algorithmic and there is not anything to be thought up. We must define a class for each of the described in the problem description objects (students, teachers, school students, groups, school, etc.) and after that we should define in each class properties to describe its characteristics and methods to implements the actions the class can do, e.g. printing in human-readable form. That’s all. Following the directions from the section “Object-Oriented Modeling (OOM)” we could identify the nouns in the problem description. Some of them should be modeled as classes; some of them as properties; and some of them may not be important and could be disregarded. Reading the text from the problem description and analyzing the nouns, we could come to the idea to model the school through defining few interrelated classes: Student, Group, Teacher and School. For testing the classes we could create a class SchoolTest, which will create few objects of each class and will demonstrate their work in action.

Checking the Idea We will not check the idea because there is nothing to be proven or checked. We need to write few classes to model a real-world situation: a school with students, teachers and groups.

Dividing the Problem into Subproblems The implementation of each of the classes we already identified can be considered a subproblem of the given school modeling problem. Thus we have the following subproblems: - Class for the students – Student. Students will have first name, last name and a method for printing in human-readable form – ToString(). - Class for the groups – Group. Groups will have a name, a teacher and a list of students. It will also have а method for printing in humanreadable form. - Class for the teachers – Teacher. Teachers will have first name, last name and a list of groups, as well as а method for printing in humanreadable form. - Class for the school – School. It will have a name and will hold all students, all teachers and all groups.


1101

- Class for testing the other classes – SchoolTest. It will create a school with a few students, a few groups holding subsets of the students and a few teachers. It will assign one teacher per group and a few groups per teacher accordingly. Finally the class will print the school and all its teachers, groups and students.

Think about the Data Structures The data structures, needed for this problem, are of two main groups: classes and relationships between the classes. Classes will be classes. We have nothing to decide here. The interesting part is how to describe the relationships between the classes, e.g. when a group has a collection of students. To describe a relationship (link) between two classes we can use an array. With an array we have access to its elements by index, but once it is created we will not be able to add or delete items (arrays have a fixed size). This makes it uncomfortable for our problem, because we don’t know how many students we will have in the school and more students can be added or removed after the school is once created.

List seems more comfortable. It has the advantages of an array and also has a variable length – it is easy to add or delete elements. List can hold lists of students (inside the school and inside a group), lists of teachers (inside a school) and lists of groups (inside a school and inside a teacher).

So far it seems List is the most appropriate for holding aggregations of objects inside another object. To be convinced we will analyze a few more data structures. For example hash-table – it is not appropriate in this case, because the school, teachers, students and groups are not of a key-value type. A hash-table would be appropriate if we need to search a student by its unique student ID, but this is not the case. Structures like stack and queue are inappropriate – we do not have LIFO or FIFO behavior. The structure “set” and its implementation HashSet may be used when we need to have uniqueness for given key. It would be good sometimes to use this structure to avoid duplicates. We must recall that HashSet requires the methods GetHashCode() and Equals(…) to be correctly defined by the T type. Shall we use sets and where? To answer this question we need to recall the problem description. What is says? We need to design a set of classes to model the school, its students, teachers and groups and functionality for add / edit / delete teachers, students, groups and their properties. The easiest way to implement it is to hold a list of students in the school, a list of groups for each teacher, etc. Lists are easier to implement. Sets give uniqueness, but require Equals() and GetHashCode(). Sets need more effort to be used. So we may use lists to simplify our work. According to the requirements the school should allow add / edit / delete of students, teachers and groups. The easiest way to implement this is to expose the lists of students, teachers and groups as public properties. List

1102


already have methods for add and delete of its elements and its elements are accessible by index and editable. It does the job. Finally we choose to use List for all aggregations in our classes and we will expose all the class members as properties with read and write access. We do not have a good reason to restrict the access to the members or implement immutable behavior.

Implementation: Step by Step It’s appropriate to start the implementation with the class Student because it does not depend on any of the other classes.

Step 1: Class Student In the problem definition we have only two fields representing the first name and the last name of a student. We may add a property Name, which returns a string with the full name of the student and a ToString() implementation to print the student in human-readable form. We might define the class Student as follows:

Student.cs public class Student { public string FirstName { get; set; } public string LastName { get; set; } public Student(string firstName, string lastName) { this.FirstName = firstName; this.LastName = lastName; } public string Name { get { return this.FirstName + " " + this.LastName; } } public override string ToString() { return "Student: " + this.Name; } }


1103

We want to allow the class members to be editable so we define the FirstName and LastName as public read-write properties.

Testing the Class Student Before continuing forward we want to test the class Student to be sure it is correct. Let’s create a testing class with a Main() method and create a student in it and print the student:

class TestSchool { static void Main() { Student studentPeter = new Student("Peter", "Lee"); Console.WriteLine(studentPeter); } } We run the testing program and we get a correct result:

Student: Peter Lee Now we can continue with the implementation of the other classes.

Step 2: Class Group The next class we can define is Group. We choose it because the only one required for its definition is the class Student. The properties, which we will define, are the name of the group, a list of the students, which belong to the group, and a teacher who teaches the group. To implement the list with of the students we will use List. We will add a ToString() method to enable printing the group in a human-readable text form. Let’s see the implementation of the class Group:

Group.cs using System.Collections.Generic; public class Group { public string Name { get; set; } public List Students { get; set; } public Group(string name) { this.Name = name; this.Students = new List();

1104


} public override string ToString() { StringBuilder groupAsString = new StringBuilder(); groupAsString.AppendLine("Group name: " + this.Name); groupAsString.Append("Students in the group: " + this.Students); return groupAsString.ToString(); } } It is important when we create a group to assign an empty list of students to it. If we leave the list of students unassigned, it will be null and when we try to add a student, we will get an exception.

Testing the Class Group Let’s now test the Group class. Let’s create a sample group, add few students to it and print the group at the console:

static void Main() { Student studentPeter = new Student("Peter", "Lee"); Student studentMaria = new Student("Maria", "Steward"); Group groupEnglish = new Group("English language course"); groupEnglish.Students.Add(studentPeter); groupEnglish.Students.Add(studentMaria); Console.WriteLine(groupEnglish); } We run the above testing code and we find a bug:

Group name: English language course Students in the group: System.Collections.Generic.List`1[Student] It seems like the list of students is printed incorrectly. It is easy to find why. The List class does not correctly implement ToString() and we need to use another way to print a list of students. We can do this with a forloop but let’s try something shorter and more elegant:

using System.Linq; … groupAsString.Append("Students in the group: " + string.Join(", ", this.Students.Select(s => s.Name)));


1105

The above code uses an extension method and a lambda expression to select all students’ names as IEnumerable and then combines them into a string using a comma as separator. Let’s test the Group class again after the fix:

Group name: English language course Students in the group: Peter Lee, Maria Steward The group class now works correctly. Let’s think a bit: who is teaching the students in the group? We should have a teacher, right. Let’s try to add the simplest possible class Teacher and define a property of it in the Group class:

public class Teacher { public string FirstName { get; set; } public string LastName { get; set; } public string Name

{ get { return this.FirstName + ' ' + this.LastName; } } } public class Group { public string Name { get; set; } public List Students { get; set; } public Teacher Teacher { get; set; } public Group(string name) { this.Name = name; this.Students = new List(); } public override string ToString() { StringBuilder groupAsString = new StringBuilder(); groupAsString.AppendLine("Group name: " + this.Name); groupAsString.Append("Students in the group: " + string.Join(", ", this.Students.Select(s => s.Name)));

1106


groupAsString.Append("\nGroup teacher: " + this.Teacher.Name); return groupAsString.ToString(); } } Let’s test again with our sample groups of two students studying English:

Student studentPeter = new Student("Peter", "Lee"); Student studentMaria = new Student("Maria", "Steward"); Group groupEnglish = new Group("English language course"); groupEnglish.Students.Add(studentPeter); groupEnglish.Students.Add(studentMaria); Console.WriteLine(groupEnglish); We find another bug:

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object. at Group.ToString() … We step through the debugger and we see that we try to print the teacher’s name but there is no teacher (it is null). This is easy to fix. We could check whether the teacher exists prior to printing it in the ToString() method:

if (this.Teacher != null) { groupAsString.Append("\nGroup teacher: " + this.Teacher.Name); } Let’s test again after the fix. Now we get the following correct result:

Group name: English language course Students in the group: Peter Lee, Maria Steward Let’s now add a teacher to the testing group and check what happens:

Student studentPeter = new Student("Peter", "Lee"); Student studentMaria = new Student("Maria", "Steward"); Group groupEnglish = new Group("English language course"); groupEnglish.Students.Add(studentPeter); groupEnglish.Students.Add(studentMaria); Teacher teacherNatasha = new Teacher() { FirstName = "Natasha", LastName = "Walters" }; groupEnglish.Teacher = teacherNatasha;


1107

Console.WriteLine(groupEnglish); The result is correct:

Group name: English language course Students in the group: Peter Lee, Maria Steward Group teacher: Natasha Walters Now the Group class works correctly. We can continue with the next class.

Step 3: Class Teacher Let’s define the class Teacher. We already have some piece of it, but let’s define it in a better way. The teacher should have first name, last name and a list of group he teaches and should be printable in human-readable form. We can define it directly repeating the logic in the Group class:

Teacher.cs public class Teacher { public string FirstName { get; set; } public string LastName { get; set; } public List Groups { get; set; } public Teacher(string firstName, string lastName) { this.FirstName = firstName; this.LastName = lastName; this.Groups = new List(); } public string Name { get { return this.FirstName + " " + this.LastName; } } public override string ToString() { StringBuilder teacherAsString = new StringBuilder(); teacherAsString.AppendLine("Teacher name: " + this.Name); teacherAsString.Append("Groups of this teacher: " +

1108


string.Join(", ", this.Groups.Select(s => s.Name))); return teacherAsString.ToString(); } } Like in the class Group, it is important to create and empty list of groups instead of leaving the Groups property uninitialized.

Testing the Class Teacher Before going further, let’s test the class Teacher. We can create a teacher with a few groups and print it at the console:

static void Main() { Teacher teacherNatasha = new Teacher("Natasha", "Walters"); Group groupEnglish = new Group("English language"); Group groupFrench= new Group("French language"); teacherNatasha.Groups.Add(groupEnglish); teacherNatasha.Groups.Add(groupFrench); Console.WriteLine(teacherNatasha); } The result is correct:

Teacher name: Natasha Walters Groups of this teacher: English language, French language This was expected. We just repeated the same logic like in the Group class which was already tested and all bugs in it was fixed. We found once again how important is to write the code step by step with testing and bugfixing after each step, right? The bug with incorrectly printing the list of students would have been repeated when printing the list of groups, right?

Step 4: Class School We finish our object model with the definition of the class School, which uses all of the classes we already defined. It should have a name and should hold a list of students, a list of teachers and a list of groups:

public class School { public string Name { get; set; } public List Teachers { get; set; } public List Groups { get; set; } public List Students { get; set; }


1109

public School(string name) { this.Name = name; this.Teachers = new List(); this.Groups = new List(); this.Students = new List(); } } Before testing the class, let’s think what the class School is expected to do. It should hold the students, teachers and groups and should be printable at the console, right? If we print the school, what should be printed? Maybe we should print its name, all its students (with their inner details), all its teachers (with their inner details) and all its groups (with their inner details). Let’s try to define the ToString() method for the class School:

public override string ToString() { StringBuilder schoolAsString = new StringBuilder(); schoolAsString.AppendLine("School name: " + this.Name); schoolAsString.AppendLine("Teachers: " + string.Join(", ", this.Teachers.Select(s => s.Name))); schoolAsString.AppendLine("Students: " + string.Join(", ", this.Students.Select(s => s.Name))); schoolAsString.Append("Groups: " + string.Join(", ", this.Groups.Select(s => s.Name))); foreach (var teacher in this.Teachers) { schoolAsString.Append("\n---\n"); schoolAsString.Append(teacher); } foreach (var group in this.Groups) { schoolAsString.Append("\n---\n"); schoolAsString.Append(group); } foreach (var student in this.Students) { schoolAsString.Append("\n---\n"); schoolAsString.Append(student); } return schoolAsString.ToString(); }

1110


We shall not test the class School, because this will be the main purpose of our last class: SchoolTest.

Step 5: Class SchoolTest The final thing is the implementation of the class SchoolTest the purpose of which is to demonstrate all the classes we have defined (Student, Group, Teacher and School) and their methods and properties. This is our last subproblem. For the demonstration we create a sample school with a few students, a few teachers and a few groups and we print it:

SchoolTest.cs class TestSchool { static void Main() { // Create a few students Student studentPeter = new Student("Peter", "Lee"); Student studentGeorge = new Student("George", "Redwood"); Student studentMaria = new Student("Maria", "Steward"); Student studentMike = new Student("Michael", "Robinson"); // Create a group and add a few students to it Group groupEnglish = new Group("English language course"); groupEnglish.Students.Add(studentPeter); groupEnglish.Students.Add(studentMike); groupEnglish.Students.Add(studentMaria); groupEnglish.Students.Add(studentGeorge); // Create a group and add a few students to it Group groupJava = new Group("Java Programming course"); groupJava.Students.Add(studentMaria); groupJava.Students.Add(studentPeter); // Create a teacher and assign it to few groups Teacher teacherNatasha = new Teacher("Natasha", "Walters"); teacherNatasha.Groups.Add(groupEnglish); teacherNatasha.Groups.Add(groupJava); groupEnglish.Teacher = teacherNatasha; groupJava.Teacher = teacherNatasha; // Create another teacher and a group he teaches Teacher teacherSteve = new Teacher("Steve", "Porter"); Group groupHTML = new Group("HTML course"); groupHTML.Students.Add(studentMike);


1111

groupHTML.Students.Add(studentMaria); groupHTML.Teacher = teacherSteve; teacherSteve.Groups.Add(groupHTML); // Create a school with few students, groups and teachers School school = new School("Saint George High School"); school.Students.Add(studentPeter); school.Students.Add(studentGeorge); school.Students.Add(studentMaria); school.Students.Add(studentMike); school.Groups.Add(groupEnglish); school.Groups.Add(groupJava); school.Groups.Add(groupHTML); school.Teachers.Add(teacherNatasha); school.Teachers.Add(teacherSteve); // Modify some of the groups, student and teachers groupEnglish.Name = "Advanced English"; groupEnglish.Students.RemoveAt(0); studentPeter.LastName = "White"; teacherNatasha.LastName = "Hudson"; // Print the school Console.WriteLine(school); } } We run the program and we get the expected result:

School name: Saint George High School Teachers: Natasha Hudson, Steve Porter Students: Peter White, George Redwood, Maria Steward, Michael Robinson Groups: Advanced English, Java Programming course, HTML course --Teacher name: Natasha Hudson Groups of this teacher: Advanced English, Java Programming course --Teacher name: Steve Porter Groups of this teacher: HTML course --Group name: Advanced English Students in the group: Michael Robinson, Maria Steward, George

1112


Redwood Group teacher: Natasha Hudson --Group name: Java Programming course Students in the group: Maria Steward, Peter White Group teacher: Natasha Hudson --Group name: HTML course Students in the group: Michael Robinson, Maria Steward Group teacher: Steve Porter --Student: Peter White --Student: George Redwood --Student: Maria Steward --Student: Michael Robinson Of course in real life programs do not start from the first time, but in this task the mistakes you could make are trivial so there’s no point in discussing them. All classes are implemented and tested. We are almost finished with this problem.

Testing the Solution As usually, it remains to test if the entire solution is working correctly. We’ve already done this. We tested all the classes in their nominal case. We can do some tests with the border cases, for instance a group without students, empty school, etc. It seems like these cases work correctly. We might test a student without a name, but it is unclear whether the class should keep itself of incorrect names and what is a correct name. We can leave these classes without checks for the names. It will be a responsibility of their caller to put correct names though their constructors and properties. The problem description says nothing about this. It is interesting how we delete a student. In our current implementation, if we delete a student, we will need to remove it from the school and to remove it from all groups he belongs to. The removal itself will require the student to have the Equals() method defined correctly or we should compare students by hand (property by property). It is unclear from the problem description how exactly the “delete student” operation should work. We assume we don’t have time and we submit the solution in its current state without efficient delete operation. Sometimes it takes too much time to fix something and it is better to leave it in not perfect form. Below is the full source code of the solution of the school modeling problem:


School.cs using using using using

System; System.Collections.Generic; System.Linq; System.Text;

public class Student { public string FirstName { get; set; } public string LastName { get; set; } public Student(string firstName, string lastName) { this.FirstName = firstName; this.LastName = lastName; } public string Name { get { return this.FirstName + " " + this.LastName; } } public override string ToString() { return "Student: " + this.Name; } } public class Group { public string Name { get; set; } public List Students { get; set; } public Teacher Teacher { get; set; } public Group(string name) { this.Name = name; this.Students = new List(); }

1113

1114


public override string ToString() { StringBuilder groupAsString = new StringBuilder(); groupAsString.AppendLine("Group name: " + this.Name); groupAsString.Append("Students in the group: " + string.Join(", ", this.Students.Select(s => s.Name))); if (this.Teacher != null) { groupAsString.Append("\nGroup teacher: " + this.Teacher.Name); } return groupAsString.ToString(); } } public class Teacher { public string FirstName { get; set; } public string LastName { get; set; } public List Groups { get; set; } public Teacher(string firstName, string lastName) { this.FirstName = firstName; this.LastName = lastName; this.Groups = new List(); } public string Name { get { return this.FirstName + " " + this.LastName; } } public override string ToString() { StringBuilder teacherAsString = new StringBuilder(); teacherAsString.AppendLine("Teacher name: " + this.Name); teacherAsString.Append("Groups of this teacher: " + string.Join(", ", this.Groups.Select(s => s.Name))); return teacherAsString.ToString();


1115

} } public class School { public string Name { get; set; } public List Teachers { get; set; } public List Groups { get; set; } public List Students { get; set; } public School(string name) { this.Name = name; this.Teachers = new List(); this.Groups = new List(); this.Students = new List(); } public override string ToString() { StringBuilder schoolAsString = new StringBuilder(); schoolAsString.AppendLine("School name: " + this.Name); schoolAsString.AppendLine("Teachers: " + string.Join(", ", this.Teachers.Select(s => s.Name))); schoolAsString.AppendLine("Students: " + string.Join(", ", this.Students.Select(s => s.Name))); schoolAsString.Append("Groups: " + string.Join(", ", this.Groups.Select(s => s.Name))); foreach (var teacher in this.Teachers) { schoolAsString.Append("\n---\n"); schoolAsString.Append(teacher); } foreach (var group in this.Groups) { schoolAsString.Append("\n---\n"); schoolAsString.Append(group); } foreach (var student in this.Students) { schoolAsString.Append("\n---\n"); schoolAsString.Append(student); } return schoolAsString.ToString();

1116


} } class TestSchool { static void Main() { // Create a few students Student studentPeter = new Student("Peter", "Lee"); Student studentGeorge = new Student("George", "Redwood"); Student studentMaria = new Student("Maria", "Steward"); Student studentMike = new Student("Michael", "Robinson"); // Create a group and add a few students to it Group groupEnglish = new Group("English language course"); groupEnglish.Students.Add(studentPeter); groupEnglish.Students.Add(studentMike); groupEnglish.Students.Add(studentMaria); groupEnglish.Students.Add(studentGeorge); // Create a group and add a few students to it Group groupJava = new Group("Java Programming course"); groupJava.Students.Add(studentMaria); groupJava.Students.Add(studentPeter); // Create a teacher and assign it to few groups Teacher teacherNatasha = new Teacher("Natasha", "Walters"); teacherNatasha.Groups.Add(groupEnglish); teacherNatasha.Groups.Add(groupJava); groupEnglish.Teacher = teacherNatasha; groupJava.Teacher = teacherNatasha; // Create another teacher and a group he teaches Teacher teacherSteve = new Teacher("Steve", "Porter"); Group groupHTML = new Group("HTML course"); groupHTML.Students.Add(studentMike); groupHTML.Students.Add(studentMaria); groupHTML.Teacher = teacherSteve; teacherSteve.Groups.Add(groupHTML); // Create a school with few students, groups and teachers School school = new School("Saint George High School"); school.Students.Add(studentPeter); school.Students.Add(studentGeorge);


1117

school.Students.Add(studentMaria); school.Students.Add(studentMike); school.Groups.Add(groupEnglish); school.Groups.Add(groupJava); school.Groups.Add(groupHTML); school.Teachers.Add(teacherNatasha); school.Teachers.Add(teacherSteve); // Modify some of the groups, student and teachers groupEnglish.Name = "Advanced English"; groupEnglish.Students.RemoveAt(0); studentPeter.LastName = "White"; teacherNatasha.LastName = "Hudson"; // Print the school Console.WriteLine(school); } } We will not run performance tests because the task is not of a computational nature which requires a fast algorithm. Operations that could be slow are deleting of elements from a collection. Creating objects, assigning their properties and adding elements to their collections of child elements are all fast operations. Only the deletion could be slow. We could improve its performance by using HashSet instead of List in all aggregations. We leave this to the reader. Let’s make just one more note. Why we did not notice the performance problem with deleting elements earlier? Let’s recall how we proceeded with solving this problem. After thinking about the data structures we had to thing about the performance right? Did we do this step? We omitted this step and we found the problem too late. The conclusion is: follow the guidelines for problem solving. They are very wise.

Exercises 1.

Write a program, which prints a square spiral matrix beginning from the number 1 in the upper right corner and moving clockwise. Examples for N=3 and N=4:

7 6 5

8 9 4

1 2 3

10 11 12 9 16 13 8 15 14 7 6 5

1 2 3 4

1118


2.

Write a program, which counts the phrases in a text file. Any sequence of characters could be given as phrase for counting, even sequences containing separators. For instance in the text "I am a student in Sofia" the phrases "s", "stu", "a" and "I am" are found respectively 2, 1, 3 and 1 times.

3.

Model with OOP the file system of a computer running Windows. We have devices, directories and files. The devices are for instance floppy disk, HDD, CD-ROM, etc. They have a name and a tree of directories and files. Each directory has a name, date of last change and list of files and directories, which it holds. Each file has a name, date of creation, date of last change and content. Each file is placed in one of the directories. Each file can be text or binary. Text files contain text (string), and the binary ones – sequence of bytes (byte[]). Create a class, which tests the other classes and demonstrates how we can build a model for devices, directories and files in the computer.

4.

Using the classes from the previous task write a program which takes the real file system from your computer and loads it in your classes (just the names of the devices, directories and files, without the content of the files because you will run out of memory).

Solutions and Guidelines 1.

The task is analogical to the first task of the sample exam. You can modify the sample solution given above.

2.

You may read the text char by char and after each char to append it to the current buffer buf and check each of the searched word for a match with EndsWith() in the buffer’s end. Of course you cannot use efficiently hash-table and you will have a loop for each letter from the text, which is not the fastest solution. This is a modification of the “char by char algorithm for word counting”. Implementing a faster solution needs to adapt the Aho-Corasick algorithm. Try to play with it and modify the code from the section “Counting Substrings with the Aho-Corasick Algorithm”.

3.

The problem is analogical with the “School” problem from the sample exam and it can be solved by using the same approach. Define classes Device, Directory, File, ComputerStorage and ComputerStorageTest. Think of what properties each of these classes has and what are the relationships between the classes. Create a base abstract class File and inherit it from TextFile and BinaryFile. Test your code with sample hierarchy of devices, files and folders. Note: a file can be listed in more than one directory at the same time (unlike in the file system).

4.

Use the class System.IO.Directory and its static methods GetFiles(), GetDirectories() and GetLogicalDrives(). Traverse the files system using the BFS or DFS graph traversal algorithm. Load partially the content of long files (e.g. the first 128 bytes / chars) to save memory.

Conclusion If you are reading this conclusion and if you have read carefully the entire book, then please accept our well-deserved congratulations! We are certain that you have earned valuable knowledge in the principles of programming that will stick for life. Even if the years pass, even if technology evolves and computers are far from their current state, the fundamental knowledge of data structures in programming and the algorithmic way of thinking as well as the experience gained in solving programming problems will always aid you, if you work in the field of information technology.

Did You Solve All Problems? If you have solved all problems from all chapters, in addition to reading carefully the entire book, then you can proudly declare yourself a programmer. Whatever technology you pick up from now on will be child’s play. Now that you have grasped the basics and fundamental principles of programming, you’ll easily learn to use databases and SQL, develop Web applications and server-side software (e.g. with ASP.NET and WCF), write HTML5 applications, develop for mobile devices and whatever else you’d like. You have a great advantage over the majority of programmers who do not know what a hash-table is, how searching in a tree works and what algorithm complexity is. If you have really made the tremendous effort to solve all problems from the book, then you have most certainly reached a level of fundamental understanding of the concepts of programming and a programmer’s way of thinking, which will aid you for many years.

Have You Encountered Difficulties with the Exercises? If you haven’t solved all exercise problems or at least the vast majority of them, turn back and solve them! Yes, it does take a lot of time, but that’s the way to learn programming – with a lot of work and effort. You won’t learn programming without practicing it diligently! If you have encountered difficulties, use the discussion forum of the courses on fundamentals of programming at the Software Academy, which follow this book: http://forums.academy.telerik.com. Several hundred people have taken these courses and the majority of them have solved all problems and shared their solutions. So, examine them, try solving the problems and then try again without using any guides.

1120


Many lectures and video tutorials have been uploaded on the book’s Web site (http://www.introprogramming.info). We have free PowerPoint slides and videos in English and Bulgarian for each chapter of the book. They will be of great use to you, especially if this is the first time you are getting involved in programming. If you decide to teach C#, programming or data structures and algorithms, the slides and exercises will help you focus on the training and save time preparing the content. It’s worth checking them out. Also, check out the free courses available from Telerik Software Academy (http://academy.telerik.com). All of their lectures' study materials and video recordings have been made available for free download on each course’s respective Web site. These courses are an excellent follow-up to your progress as software engineers and professionals in software development. All materials (lecture slides, exercises, demos) and some video recordings, both at this book’s and at Telerik Academy’s Web site, are available in English.

How Do You Proceed After Reading the Book? Maybe you are wondering how you should continue your development as a software engineer. You’ve laid solid foundations with this book, so it won’t be difficult. We can give you the following instructions: 1. Choose a language and a programming platform, e. g. C# + .NET Framework, Java + Java EE, Ruby + Rails or PHP + CakePHP. There’s nothing wrong with giving up C#. Focus on the technologies your platform supports; you’ll learn the corresponding language quickly. For example, if you choose Objective-C and iPhone / iPad / iOS / Xcode programming, the algorithmic way of thinking you have acquired with this book will help you make progress. 2. Read a book on databases and learn how to model your application’s data using tables and relations between them. Learn how to build queries for selecting and updating data in SQL. Learn how to work with a database server, like Oracle, SQL Server or MySQL. The next natural course of action is to acquire some ORM technology, like ADO.NET Entity Framework, Hibernate or JPA. You might also try the NoSQL database systems available in the public clouds. 3. Acquire a technology for building dynamic Web sites. Start with a book on HTML, CSS, JavaScript and jQuery, or with our free course on HTML5, CSS3 and JavaScript (http://html5course.telerik.com). Then explore the web development tools your platform supports, such as ASP.NET Web Forms / ASP.NET MVC using the .NET Platform and C#, Servlets / JSP / JSF using the Java platform, CakePHP / Symfony / Zend Framework with PHP, Ruby on Rails using Ruby or Django using Python. Learn how to make simple Web sites with dynamic content. Try creating a Web application for mobile devices using some mobile UI toolkit. 4. Learn to write mobile applications. Start for example with HTML5 and Cordova, try to deploy your apps in the large marketplaces maintained by Google, Apple, Microsoft and Amazon. Try to learn native mobile

Conclusion

1121

development (e.g. Java and Android development or Objective C and iOS development). Create a mobile app (e.g. some game) and deploy it in some major marketplace. Thus you will pass through the entire design / develop /publish cycle and this will give you real-world mobile development experience. 5. Take up working on a more serious project, like a Web market or a program for managing warehouse or accounting software. This will give you the opportunity to encounter the practical problems of practical software development. You’ll gain the more valuable practical experience and you’ll see for yourself that coding advanced software is much more difficult than coding simple programs. 6. Get a job at a software company! This is very important. If you have really solved all problems from this book, you’ll easily get a job offer. By working on practical software projects you’ll learn a great deal of new software technologies, unlike your colleagues, and you’ll come to realize that, even though you know a lot about programming, you are only at the very beginning of your career as a software engineer. You’ll only get to tackle the challenges of team work in practice, and acquire the tools for dealing with them by working on actual software projects at an actual work environment. You’ll have to work at least for a few years until you establish yourself as a software development professional. Then, perhaps, you’ll remember about this book and you’ll realize that you haven’t gone wrong by starting with data structures and algorithms rather than directly with Web technologies, databases and mobile development.

Free Courses at Telerik Software Academy You can save yourself a lot of trouble and nerves, if you decide to go through all of the above steps of your development as a software engineer at Telerik Software Academy. You’ll learn under the guidance of Svetlin Nakov and instructors with practical experience in the software industry. The Academy is the easiest and absolutely free-of-charge way to lay the foundations of your development career, but it is not the only way. Everything depends on you!

Good Luck to Everyone! On behalf of the entire panel of authors, we wish you endless success in your career and personal life! Svetlin Nakov, Manager of the "Technical Training" Department, Telerik Corporation, Telerik Software Academy – http://academy.telerik.com August 24th, 2013

Fundamentals of Computer Programming with C# (The Bulgarian C# Programming Book) by Svetlin Nakov and Co. http://www.introprogramming.info Book Back Cover