CPSC 689 Sketch Recognition Fall '08: 2008

Monday, November 17, 2008

Fluid Sketches: Continuous Recognition and Morphing of Simple Hand Drawn Shapes

-Arvo, Novins

COMMENT

SUMMARY

In this paper a new sketching interface is described in which in which raw geometrical strokes are continuously morphed into ideal strokes. The recognition is performed by using least square fits or recognition. The main aim of the authors is to provide immediate and useful feedback. The authors claim that such type of feedback allows users to be sloppier and still gets their sketches recognised correctly.

The authors formulated a family of differential equations that determine how a user drawn shape changes with time due to continuous recognition and morphing. This parametric ODE is referred to as the fluid sketching equation. The shape of the morphing curve may be influenced by many factors in the equation. At each step the algorithm finds the best match within a family of shapes. Different techniques are used for matching to different shapes. For example to fit the curve to a circle a linear least squares method is used, and a relaxation technique is used for fitting the curve to a rectangle.

The system was tested with users who were are graduate students. Each user was asked to reproduce a previously drawn sketch using conventional sketch system and the fluid interface system. The users favoured the conventional system to draw accurately placed shapes but favoured the fluid sketching interface strongly when they had to draw approximately placed shapes. Users also felt the need to have editing features which are currently absent in the system.

DISCUSSION

The main contribution of this paper is the evaluation of an immediate feedback sketch system. Several advantages and shortcomings of the fluid sketching system are studied. One of the problems with the evaluation is that the system was studied on just two types of recognizable shapes - the box and the circle. It remains to be seen how the system would behave with multiple shapes in the domain. System might have to match the drawn stroke to each shape in the domain as each pixel of ink is drawn. This might be computationally expensive.

Sketch Recognition User Interfaces: Guidelines for Design and Development
-Alvardo

COMMENT

SUMMARY

In this paper the author presents a sketch recognition based tool for creating PowerPoint diagrams. The author has also performed evaluation of the prototype using several techniques and established design guidelines for creating SkRUIs. The author also evaluates the utility of several techniques used in iterative design of traditional user interfaces, for development of SkRUIs.

The prototype of their application supports drawing naturally on a separate window. These diagrams are then recognised by the SketchREAD recognizer and the recognized diagrams are then imported to the PowerPoint slides. The recognition is done only when the user completes sketching or when the focus is shifted from the sketching window. The system is not capable of determining automatically whether the user has finished sketching and relies on explicit feedback from the user. The system also supports a number of editing features like move and delete. It was found that providing an explicit modal switch between edit and ink gestures confused the users who often forgot to switch modes. Therefore an online edit mode was developed. The user hovered the pen over the drawn ink diagram and a subsequent cursor change indicated that the system was in edit mode. During formative evaluation users expressed the desire to add annotation symbols without them being recognised. A combo box was provided to indicate whether the recognition was ON.

The system was evaluated by testing it on users who were all graduate students. The users were asked to perform three prototypical diagram creation tasks. After completion of these tasks feedback was taken, based on which several design guidelines were established.

(1) Recognition results should be displayed only after sketching is done; (2) Provide explicit indication between free sketching and recognition. (3) Muti-domains should only be used when system is robust enough; (4)Pen based editing should be used. Sketching and editing should have clearly distinguishable gestures. (5)Large buttons should be used for pen based interface (6)Pen should respond in real time.

DISCUSSION

The paper largely deals with UI issues in Sketch based interfaces. The paper established several important hueristics for SkRUIs. It also points out many shorcomings of using traditional iterative design techniques. As achnowledged by the author modal switches are prone to confuse user. Recognized and un-recognized modes confused the users in the study. An automatic method should be developed to identify modes on the basis of implicit cues, as was done in the case of differentiating between editing and drawing gestures.

Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples
-Hammond, Davis

COMMENT

SUMMARY

Structural shape descriptions either provided explicitly by the user or generated automatically by the computer are often over or under constrained. This paper describes a method to debug over and under constrained shapes in LADDER descriptions using a novel active learning technique that generates its own near missed example shapes.

LADDER based systems require the domain designer to provide shape descriptions. An intuitive way to provide description would be the draw and have the computer understand automatically. However these descriptions are often imperfect because of the inability of the computer to understand the intent of the user. The authors developed a visual debugger, that first asks the system to draw a positive example. After this the system generates near-miss examples (one additional or one less constrain) to be classified by the user as negative and positive. One the basis of user classification it removes unintended constraints and adds required constraints. But for this the system first needs to generate near miss examples. An initial set of true constraints is captured. This list is kept small and relevant using a set of heuristics. Each time a positive classification is encountered, the system removes from the list any constraint that is not true of the system. For under-constrained figures we determine a set of constraints which are not in the description. We add the negation of each constraint one by one. If the user gives a negative classification, the constraint is added. Thus the shape description is incrementally perfected.

DISCUSSION

Describing shapes by drawing them is very important from an HCI perspective. This paper provides a method fro enabling users to do this accurately. I was concerned about the size of the initial list of constraints until the authors describe a way to prune this list to include only the relevant ones.

The system also omits disjunctive constraints. A complex shape could easily consist of a Boolean combination of constraints rather then being described by individual constraints. For example two shapes which are mirror images of each other and are laterally asymmetric might need disjunctive constraints to describe them.

Purely from a UI perspective, would it be better to provide a group of shapes (say 10-15) to be classified by the user at once, rather then presenting it one by one.

Tuesday, September 23, 2008

MergeCF

-Wolin, Paulson, Hammond

COMMENTS

SUMMARY

MergeCF uses curvature and speed data to find an initial set of corners. It then eliminates false positives by removing similar corners, merging like stroke segments together, and examining stroke segment's direction values.
After producing an initial fit, that algorithm first checks for corners that are together in close proximity and removes the corner with smallest curvature. It then tends to remove the extraneous points that overfit the stroke. It assumes that the corners surrounding the smallest stroke are likely to be false positives. The smallest segment is found, and it is checked if it can be merged with any of its neighboring segments. It is then merged with the neighboring segment that has the least primitive error when combining the two segments.
MergeCF was tested on 501 unistroke symbols. It had an 'all or nothing' accuracy of 66.7 percent which outperforms the Sezgin and Kim algorithms.

DISCUSSION

The algorithm is a big improvement over the current benchmark algorithms. However, I think there might be better ways to remove false positives. I am using a different approach to remove false positives and it seems to work rather accurately. The exact accuracy has not been calculated yet.

Monday, September 22, 2008

Early Processing for Sketch Understanding

-Sezgin, Stahovich

COMMENTS

SUMMARY

The algorithm described in this paper uses both speed and curvature data to detect corners in a stroke. The underlying idea of using speed is that at corners stroke speed reaches a local minima. Thus corners are typically located where curvature reaches a maximum and speed reaches a minimum. The author uses a technique called average based filtering to eliminate any false positives. Only those extrema are considered where speed and curvature data lie beyond a threshold. False positives are also tackled with hybrid generation scheme. Hybrid generation occurs in three steps - computing vertex certainties, generating a set of hybrid fits and selecting the best fit. The initial fit is the intersection of the corners generated by speed and curvature data. The error is computed as an average of sum of squares of the distances to the fit from each point in the stroke. Additional fits are generated by adding highest scoring (least error) curvature and speed candiates not already in the fit. The algorithm can also be used to produce a fit for curved portion of the stroke. If the ratio of total path distance and euclidean distance between two points is significantly higher than one, that indicates a curved stroke segemnt between these points. Curves are approximated using Bezier curves using two end points and two control points. The approximation and identification of the shapes was correct 96% of the time when tested with ten figures.

DISCUSSION

The algorithm serves as benchmark beacuse it first introduced the idea of using speed data to find corners. The use of Bezier curve approximation makes it very powerful, since it can handle both linear and curved segments of the stroke. However, the accuracy (which ignores false positives) might be a point of contention.

Algorithms for reduction of the number of points required to represent a stroke

-Douglas, Peucker

COMMENTS

SUMMARY

Most digitizers record far more points than are required to represent basic strokes. This paper concerns with reducing this number. In the past some algorithms which do this concentrate on deleting points, whereas others select points. The methods proposed in this paper belong to the latter category. A point between two end points is selected if it is at a greater perpendicular distance than a threshold, otherwise the two end-points are considered to be enough to represent the stroke between them. This idea was implemented in two different ways and tested. The first point is defined as the anchor and the last point as the floater. The intervening points along the stroke are examined to find the one with greatest perpendicular distance from the line formed by the anchor and floater. If this distance is less than a threshold then only the end-points are selected and the rest of the points are discarded. Otherwise, the point having the maximum distance from the line is made the new floater. This cycle continues and once the minimum distance requirement is met, the anchor is moved to floater and the last point on the stroke is re-assigned as the floating point. The second method is different only in one way-the points that have been floaters are stacked in a vector. When the anchor moves to the floater, the new floater is then selected from the top of this stack. Thus, this avoids re-examining all points between the floater and the end point. The algorithm was tested with many data sets and was found to be suitable for simple reduction and abstractions.

DISCUSSION

The algorithm is very simple and I believe can be implemented with a simple recursive procedure. Since the algorithm uses an 'selection of points' approach instead of deletion, so its very efficient in case of strokes which are more abstract and can be represented with few number of feature points.

Saturday, September 20, 2008

Short Straw

-Aaron Wolin, Tracy Hammond

COMMENTS

SUMMARY

ShortStraw is a simple and accurate polyline corner finder. The first step is to resample the points, with the interspacing distance of the resampled points being the diagonal length divided by a constant factor. The 'straw' at point i is then computed as the euclidean distance between points (i-3) and (i+3). Since straw lengths would decrease when the stroke bends around a corner, the point where minima is reached is returned as the corner. After this some higher level processing is done to remove false positives and find missed corners. First every consecutive set of corners goes through a line test. If two points fails line test, more corners are assumed to be lying between them. Threshold is then relaxed and minima of straw length is calculated from among all the points in the middle half of the stroke segment. Thus a new corner is found. This process is repeated until all the consecutive corners pass line tests. A collinear check is then run and if any three consecutive corners are found to be collinear, the middle corner is removed. The algorithm was tested and compared with other benchmark corner finding algorithms. It was by far the most accurate, having an 'all or nothing' accuracy of 74.1 percent.

DISCUSSION

Short Straw algorithms is a perfect example of how complexity doesn't necessarily correlate with accuracy. One limitation compared to other algorithms is that, it restricts itself to polyline strokes. I am just guessing, could we do a second derivative test, to differentiate curves from corners. If we could then we could use this algorithm to work for curved figures as well.

Thursday, September 11, 2008

Prototype Pruning by Feature Extraction

Watt, Xie

COMMENTS

1. Comment on Daniel's blog

SUMMARY

In the paper the author discusses about gesture recognition where the set of symbols is very large, defines a set of features and analyses the performance of the recogniser after implementing them.The author introduces the concept of pruning, in which a symbol is first classified into a group, and the reclassified into a particular symbol in that group. The author also discusses stroke pre-processing techniques like chopping the head and tail, resampling, smoothing. The author suggests certain new geometric features like Number of loops, Minimum distance pair, Number of cusps and number of intersections. The author also suggests using some ink related features like number of strokes, point density; directional features like initial and end directions and global features like initial and end points. The results show that with the use of features the accuracy was reduced to 91 percent, but the prototypes were pruned thus reducing the computation proportionately.

DISCUSSION

One of the most interesting idea about this paper was the concept of pruning. I think without pruning it is not possible to recognize among a large set accurately. I had thought of the same idea to implement in my recognizer, before reading this. I was trying to form a set of similar alphabets (like Os and Ds) which were misclassified frequently and the train a different weight vector for classifying among these similar symbols. However, I did not get encouraging results probable due to erroneous test data. I would like to explore the concept of pruning further.

User Sketches

Tohidi, Buxton, Baecker, Sellen

COMMENTS

1. Comment on Daniel's Blog

SUMMARY

This paper aims to present a sketching technique which will encourage more reflective as opposed to reactive feedback from the user during Usability Testing(UT). Traditionally, techniques like questionnaires, think-aloud, interviews have been used in UT. But these techniques elicit more of criticism and problem identification from the user. The users are unable to provide any solutions in terms of alternative design ideas. The author proposes an inexpensive technique called user-sketching to get reflective feedback. The authors conducted two studies to explore their idea further. The first study consisted of examining the differences between two ways of performing a UT for a House Climate Control System (HCCS). The second study was performed at the end of the UT when users were asked to sketch an idea of what would be a better interface design. The authors notices that verbal feedback from interviews only consisted of pointing out shortcomings of the system. The users could not organize their thoughts properly when asked for suggestions. However when asked to sketch users came up with completely new ideas, some of them similar to the three example prototypes (which they had not seen). The authors then counted the number of ideas generated by user-sketching and also the number of ideas that were generated in the traditional methods but not in the sketching exercise. The authors found the percentage of the latter to be small.
This technique thus provides a less expensive way to reinforce the existing UT techniques. The author's future work will involve the study of role of such techniques further in the ideation process.

DISCUSSION

In the paper author substantiates his claim that sketching (on paper) leads to more reflective feedback with an elaborate experiment. By quoting excerpts from the interviews, the paper gives us an important insight into how sketching could lead to more creative process which is otherwise hindered with traditional UT techniques. However in the context of sketch recognition, it remains to be seen how recognition will help in such a process.

Monday, September 8, 2008

Graphical Input through Machine recogniton of sketches

-Christopher F. Herot

COMMENTS

1. Comment on Ben's blog

SUMMARY

The paper talks about a family of programs employed in sketch recognition. The author mentions that all the previous attempts have have not used human intervention to assist recognition. It is further claimed that even the basic recognition requires context information, which in the later part of paper is shown to be best gathered by asking the user.
The HUNCH system was conceived around a program called STRAIT. It found corners in a figure by finding minima in the speed curve. When the curvature of the corner was to gradual, it caused subsequent invocation of the CURVIT program. But experiments show that it had been modeled for a specific set of users only and had inaccurate results with others. Next, other techniques employed to improve recognition are discussed. Latching is a technique in which any end-points within a threshold distance are joined. Overtracing is a technique to turn several closely lying lines into one. However both these techniques sometimes cause unwanted actions (like joining of corners, intentionally drawn close).
Thus any bottom-up approach is hindered by lack of contextual information. getting context information without human intervention is akin to problem in AI which are hard to solve.
The author then describes a system built around human input for getting context. One important difference between this system and the previous ones, is the existence of a graphical manipulator/editor which user can use to make correction to the predicted output of the system, implicitly providing context information.
In the end several ways to improve latching technique are discussed.

DISCUSSION
What we learned from this paper provides a different perspective on sketch recognition. What we aim at is sketch-recognition without explicit user assistance. On the other hand the author claims that user-assistance might be pivotal in recognition. Whether this is a fault with this claim, or is it the right way to go can be determined by conducting user studies and getting feedback as to whether they find it natural and easy to assist the recognition programs. If users indeed find this natural, then it could considerably reduce the complexity of sketch recognition research.

Thursday, September 4, 2008

1 $ recognizer for User Interface prototypers

Wobbrock, Wilson, Li

COMMENTS

1. Comment on Daniel's blog

SUMMARY

The paper aims to (1) present an easy to implement gesture recognition algorithm, especially for UI prototypers;(2) to empirically compare it to more advanced algorithms (3) to give insight into which user interface gestures are best.
The algorithm was built with several guidelines in mind. For example, the algorithm must be resilient to variations in sampling, rotation, should require no advanced mathematics etc. The authors have described the algorithm in four steps.
The gesture data points are first resampled at a defined rate to make the data independent of sampling rate of a particular hardware. The gesture points are then rotated to align them with the template gesture. Brute force could be used to get all possible rotations and take the one that has maximum alignment. But the authors claim that rotating the gesture so that its indicative angle (angle between centroid of gesture and gesture's first point) is at zero gives the best alignment. The gesture is then scaled to a reference square and translated to a reference point.
Step four does the actual recognition. The candidate gesture is compared to each stored template, to find the average distance between the corresponding points. The template with least path distance is the result of recognition. The minimum path distance is then converted into a score. One of the limitation of the algorithm is that it cannot distinguish gestures whose identities depend on specific orientations, aspect ratios or locations.
The authors conducted an evaluation using 4800 gestures collected from 10 subjects. The comparison was made with two popular recognizers- Rubine classifier and DTW. DTW and 1$ were found to be very accuarate. Rubine was comparatively less successful.
In future, authors plan to conduct studies on programming ease of their algorithm. Further empirical analysis may help in making better algorithmic choices.

DISCUSSION
The obvious advantage with this algorithm is how simply it can be implemented, without any advanced mathematics. It uses a simple classifier by taking average distance of spatial coordinates. This simplicity might be a disadvantage too. As admitted by authors it fails to differentiate on the basis of features like aspect ratio. Rotational invariance which is discussed as an advantage could also prove to be a disadvantage. The system might not be able to differentiate between UP and DOWN arrows. One thing could be done to remedy this. There could be a threshold on how much rotation can we apply to align or else we could put a penalty as the amount of rotation required to align increases.

Wednesday, September 3, 2008

MARQS

-Brandon Paulson, Tracy Hammond

COMMENTS

1. Comment on Nabeel's blog

SUMMARY

In this paper the authors describe their goal as extending the traditional text-based search to include capabilities of sketch based search which can find documents from a single query search. In the system described, two classifiers have been combined to recognize sketches- (1)A single classifier (a classifier that learns and classifies from a single example to create a sketch system that is immediately usable after a single example; (2)A linear classifier that takes advantage of multiple examples as they become available from queries, creating a sketch that becomes more accurate with use. Both classifiers use the same feature set which is based on global features of the set. The system uses only four global features- (1)Bounding aspect Ratio, (2)Pixel density, (3)Average curvature, (4)Number of perceived corners. Features which would constrain how a user could draw a symbol were not used.
Next, the algorithm is described briefly. When we search a sketch for the first time, a simple classifier is run that calculates the values for features. These features are compared with the sketches in the database. Errors are computed as teh absolute difference between the corresponding features. Normalized errors are then summed up to give total error. Those with lowest errors are displayed in the search results. Once the system has used at least two examples, the linear classifier is used. In order to test the recognition algorithm, MARQS system was implemented. Testing consisted of 1350 different search queries (15 sketches, 9 queries each, 10 tests). The results obtained were very encouraging. 98% of the time the correct sketch was ranked among the top four.
Some shortcomings of the system are slowdown during query time, and reduction in accuracy over time with single classifier. Then some issues of overfitting might occur if the system is repeatedly trained with similar data. Some work may be done to counter this. Another area which could be explored is inputting several sketches at the same time, but that would need perceptual grouping of these strokes.

DISCUSSION

It is amazing that the algorithm works so accurately with just 4 features. The fact that we could draw the sketch in any orientation, of any size and it would still be recognized is something that makes it easy for the user. I think the whole trick was to choose the features carefully, which the authors accomplished successfully. One area were there could be a fault is the reporting of accuracies. In the tests, the classification could result into only 15 classes of sketches. Some combinatorial mathematics tells me that getting the required sketch among the top four retrieved can be achieved randomly with a probability of about 26.7%. So the 98 percent accuracy will have some component of this probability. If I were doing these experiments, I would test it with say 100 sketch classes... where the chances of random success are just 4%.

Tuesday, September 2, 2008

Visual Similarity of Pen Gestures

-A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels

COMMENTS

1. Comment on Manoj's blog

SUMMARY

In this paper authors' goal to to study through experiments why users find gestures similar and in the process derive a predictive model for perceived gesture similarity which has a high correlation with actual observation. This model may be used as an advising tool by gesture designers. The author enlists the previous work in the field of Pen Input devices. Most relevant to authors' work is Attneave's extensive study in perceptual similarity. He found that logarithm of quantitative metric correlates with similarity. The authors conducted two experiments using the techniques of Multidimensional Scaling (MDS). MDS is a technique for reducing the number of dimensions of a data set, so that patterns can easily been seen by plotting data in 2 or 3 dimensions. In the first experiment a previously designer gesture set which varied widely in how people would perceive them was used. Participants were presented with all possible sets of three gestures (triad) and were asked to mark the one that seemed most different from others among each set. By plotting gestures generated by MDS, the authors were able to determine the features that contribute to similarity. By running regression analysis, the authors were able to derive a model of gesture similarity that correlated 0.74 with the reported gesture similarities. The MDS indicated that the optimal number of dimensions is 5. Some of the features correlated with perceived similarity were curviness, total absolute angle, density etc. Another surprising outcome of the experiment was that the participants seemed to be divided into 2 groups which had different perceptions of similarity of gestures. Experiment 2 was conducted to test the predictive power of the model derived in the first one. Three new gesture sets of nine gestures each were created. Each set achieved a variance in the one of features found to be correlated in the first experiment. 2 gestures from each of the 3 gesture sets were chosen and added to a fourth set to allow us to compare the three sets against each other. Each participant was shown all possible triads. The analysis shows that 3 was the optimal number of dimensions to be used. Meaning of these dimensions was not obvious as in the first experiment. The features that correlated with the dimentions were Log (aspect), total absolute abgle and density. the derived model had a correlation of 0.71 with observation. Based on correlation calculation (model derived from which experiment agrees with observation more), model derived from experiment 2 was found to be a better predictor. Future work could consst of extending these experiments. Participants could be asked to draw gestures and then mark dissimilarity.

DISCUSSION

This paper has impressed me because the emphasis is more on Human part of the human computer interaction. I could not identify any faults with the paper except for those pointed out by the author like- users were not made to draw the gestures before they could pick out the dissimilar one. Since eventually, users will have to draw the gestures. Consider this- I have two line gestures for scroll up and scroll down. For scroll up line gesture is drawn upwards and for scroll down it goes downwards. Users will feel this difference only when they actually draw these gestures and not by just looking at the pictures of these two seemingly similar gestures (even when the starting point is specified for each gesture).

Monday, September 1, 2008

Specfying Gestures by Example

-Dean Rubine

COMMENTS

1. Comment on Andrew's blog

SUMMARY

This paper describes GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture) as a toolkit for rapidly adding gestures and also as a trainable single stroke gesture recognizer.
The paper begins by describing the historical efforts towards gesture recognition and current relevant research. A common feature among most of these systems is that the gesture recognizer is hand-coded, making these systems difficult to create, maintain and modify. GRANDMA is different in the sense that it allows designers to create gesture recognizers automatically from example gestures. These recognizers can be rapidly trained from a small number of examples of each gesture.
Next. GDP, a gesture-base drawing program built using GRANDMA is described. The author gives a step-by-step example of how users can enter gestures to draw and manipulate shapes on the GDP interface. Each GDP gesture corresponds to a high level operation. The class of gesture determines the high level operation; attributes of gesture determine operands (scope) as well as additional parameters. It is stressed that all GRANDMA allows designers to create recognizers for single stroke gestures only as a deliberate limitation.
The author describes how a click and drag interface built using GRANDMA can be used by a gesture designer to modify the way input is handled. The gesture designer must determine which of the view classes are to have associated gestures and design a set of intuitive gestures for them. The two GDP view classes of GDP are described. A GdpTopView object refers to the window in which GDP runs. The GraphicObjectView object is either a line, rectangle, an ellipse, text or a set of these. GRANDMA is a MVC like system, where a single event handler is associated with a view class. The designer can add gestures by creating a new gesture handler and associating it with the GraphicObjectView class. The designer can then train the handler by providing it with example gestures. It is claimed that 15 examples are adequate. The Semantics button can then be used to initiate editing of the semantics of each gesture in the handler’s set. The designer enters an expression for each of the semantic components - RECOG (evaluated when the gesture is recognized), MANIP (on subsequent mouse points) and DONE (when mouse button is released).
The next section discusses the low level recognition of two dimensional single stroke gestures, which consists of classifying an input gesture g, into a set of known gesture classes. Each gesture is an array of P time-stamped sample points. Statistical gesture recognition consists of two steps - first a vector of features is extracted and then the feature vector is classified into one of the gesture classes using a linear machine. Features are chosen according to the criteria: Each feature should be incrementally computable in constant time per input point, small change in the input should result in small change in the feature, feature should be meaningful, there should be enough features to provide differentiation between all gestures, but for efficiency reasons not too many. In actual GRANDMA uses 13 features like cosine and sine of initial angle, the length and angle of the bounding box diagonal etc. This feature set was determined empirically by the author to work well. In the cases where these features fail to classify, additional features can be added.
Next the mathematics of gesture classification is discussed. Simply put, each gesture class is associated with a linear evaluation function V. The classification of gesture g is the class C for which V is maximized. In the linear classification function each feature has a weight (different for different classes) associated with it. The training problem is to determine these weights from example gestures. To calculate weights a closed formula is preferred over iterative methods for efficiency reasons. A linear classifier will always classify the gestures as one of the C gestures. The gesture is rejected if the probability that it was classified correctly is less than 0.95. Despite the simplicity, the recognizers trained using this algorithm perform quite well.
Some of the extensions to this algorithm could be Eager Recognition referring to recognition of gestures as soon as they become unambiguous and not waiting for the user to complete the gesture. Multi finger gesture recognition is another area which could be explored. In the end author encourages the integration of GRANDMA into other recognition systems.

DISCUSSION

This is the first paper in the series with more or less complete implementational details. It is a good read and it has given us a chance to do some hands on. I am excited about about that. Another thing that I liked about the idea presented in this paper is its simplicity. The simple training algorithm takes just a single pass through all the example gestures to determine the weights and even then it is very accurate. Something like a Neural network might need several interations to accomplish the same task.
The author claims that one of the critera in selecting a feature is that it should be meaningful. Since features are only used by the recognizer and are not exposed to the user, I think it is not a good criteria. I think a better idea would be to have a placeholder feature which is computed dynamically at design time, not necessarily meaningful, which creates maximum variance among the gestures input by the designer. The gestures will thus be more spread out in the feature hyper-space. Its just an idea right now, but I am sure we can find a way to implement this. I think this would have increased the recognition rates, hence would have been my future direction of work.

Thursday, August 28, 2008

Introduction to Sketch Recognition

-Tracy Hammond and Kenneth Mock

COMMENTS ON OTHER PEOPLE'S BLOGS

1. Comment on Nabeel's blog

SUMMARY

The paper presents an overview of the existing technology and ongoing research in the field of sketch recognition (SR). It also enumerates the ways in which SR could make certain tasks simpler and more efficient. It begins with a brief introduction of Ivan Sutherland's sketch pad and a probable reason why it couldn't take off. Raster graphic displays despite their inability to produce smooth lines overshadowed vector graphics (used in sketch pad) due to the flicker free display and lower cost of the former.
Next, types of digitizers (technology used to determine the location of a pen while writing or navigating) are discussed. Passive digitizers use only touch data and don't require a special pen to navigate. However they suffer from various disadvantages like Vectoring (unintended click), jumpy mouse cursor, difficult secondary inputs like right click, and lower resolution. Active digitizers on the other hand use electromagnetic signals reflected off a special pen to get position data, but are free of other disadvantages associated with passive digitizers.
After that, various hardware and software technologies that are used in sketch recognition systems are described. Convertible tablet PCs, slates, Wacom pen tablets are some of the hardware technologies to enable pen based input. Microsoft Vista and XP to some extent have handwriting recognition capabilities. Camtasia screen capture allows users to record their pen interactions.
The next part is related to applications in education. Instructors can deliver their lectures with the help of tablet PCs and large displays. In addition to previously prepared content of their slides, they could show on the fly data by simply sketching on the slides. User studies show that students have shown an increase in performance when such methods were used by instructors. There are few disadvantages associated with this method; one of which is the initial learning curve.
The paper then presents several pointers as to how lectures could be prepared and delivered using the above mentioned technologies. These technologies have found some nice application in describing molecular structure to students, in the field of high school physics and mathematics.
After that the FLUID framework is described. FLUID framework enables end-users to describe their own shapes and domains. So this framework is sustainable and self learning in that way. Users could either describe shapes by entering text data or just by drawing an example shape.
In the end two user studies illustrate how sketch based systems have actually shown positive results in a classroom setting. Statistics show an increase in tablet-PC usage especially in the field of education. So it might not be long before we see tablet PCs installed with sketch recognition systems as a ubiquitous piece of technology in classroom and elsewhere.

DISCUSSION

The paper gives a cursory overview of sketch recognition technology to the uninitiated. I liked the idea of the FLUID framework, where the end-user does not have to wait for a new version every time he wants to work with a new domain. The system can be taught to recognize new shapes. In essence, the intelligence of the system will evolve with usage, very much in line with what humans go through. I was very impressed with the idea of teaching the system a new shape by drawing an example. The text input method, however, would seem a little intimidating to the user.
One more thing that impressed me was the pressure sensitive capabilities of active digitizers. This technology could go a long way in giving digital pens a natural feel of writing pen. I can see it being used by professional painters and artists in future.

Wednesday, August 27, 2008

Sketch Pad

-Ivan E. Sutherland

COMMENTS ON OTHER PEOPLE'S BLOGS

1. Comment on Nabeel's blog

2. Comment on Daniel's Blog

3. The Comment on my own blog!!

SUMMARY

The paper contains the description of sketch pad, a tool that could be used for simplifying creation, manipulation and storage of drawings. It begins with an example of drawing a regular hexagon and associated patterns to give readers an idea of what Sketch pad can accomplish. The author uses this example to illustrate underlying concepts like sub picture, constraints and definition copying. The sub picture is an instance of a master figure which can be used repeatedly. This saves a lot of design time in creation of highly repetitive drawings. Enforcing a constraint implies establishing a relationship between different parts of a picture. Thus if one component among all the components involved in a constraint relationship is changed (translated, rotated, deleted etc), other components will automatically change in order to satisfy the constraints. All such elements are stored as the n-component ring structure (which actuates the propagation of constraints). One of the remarkable things about Sketch pad is the hierarchical storage of drawing elements. This ensures that changes made to a basic element will automatically propagate to higher levels. Moreover, this allows usage of general functions which act on higher level drawing objects by acting recursively on the lower levels. The concept of these general functions is akin to the modern day OOPS concept of abstraction. The author then describes the light pen tracking, pointing and display generation. For pointing to a picture element the system only selects the spots which are within a threshold distance from the center of the position of the light pen. For display generation, the coordinates of the spots are stored in a file with 36 bits allotted to each display spots, out of which 20 bits contain coordinate information and remaining 16 bits contain the address of the n-component element to which the spot belongs. This 16 bit tag allows the system to know which picture element is being aimed at. Another very useful feature of sketch pad is that it enables user to draw intricate details with high accuracy using the magnification feature. In the end, several practical applications of sketch pad are suggested like analysis of force distribution in bridge structures; in artistic drawings and animation; in electrical circuit diagram. The author suggests that future efforts could be directed towards 3D drawing and capability of defining transformation functions on drawing objects.

DISCUSSION

Sutherland’s paper must have been a groundbreaking idea when it was published. The fact that despite the vast difference in technology - both hardware and software, between now and back in 1960s, there is not much difference between present day design tools and Sutherlands sketchpad is what is remarkable. Having said that, as far as the recognition part of Sketch recognition (SR) is concerned, the sketch pad doesn’t do much, nothing actually. My interpretation of recognition is inputting noisy data say a set of pixels (with a high cardinality), and getting in return pure information, a set which consists only of classifying features. The Sketch pad doesn’t do that. When we move the light pen and turn the knobs, in effect we are providing it with pure information (feeding it with coordinates, the critical defining ones and telling it what shape we are trying to draw) and all it does is manipulate that information in various ways (which I must admit it does brilliantly). Now I am not claiming this to be a fault with the system. I am just suggesting that sketch pad might not be relevant to recognition. Therefore, in the context of SR, I would not think of improving this system. Sketch pad indeed could be made better by using SR algorithms in it. But sketchpad itself does not provide any knowledge base for the development of SR per se.

Something about me..

E-mail address : akb2810 at tamu dot edu

Graduate standing : 1st year Masters

Why am I taking this class? I tried to search for a book related to Sketch Recognition. I could not find even one. Less explored field -> huge probability that research will lead to new findings.

What experience do I bring to this class? I built a simple interface in the final year of my bachelors where I could navigate the mouse cursor and perform other functions by waving my hand from a distance. Thats all I have.

What do I expect to be doing in 10 years? Working in a research laboratory in some field of Computer Science, or may be sociology and economics. ( I have no idea what these two latter fields are all about. But I keep developing theories in my head, rejecting or accepting them. I am my own audience)

What do I think will be the next biggest technological advancement in computer science? It could be virtual reality, like virtual office spaces etc.

What was my favorite course in undergrad (CS or otherwise)? Design and analysis of algorithms.

If I could be another animal, what would it be and why? I could never be any other animal. I love all the confusion in human head. It would feel too tied up being driven by instinct alone.

What is my favorite motto or slogan? Imagination is better than knowledge. -- Albert Einstein

What is my favorite movie? Right now ... Hotel Rawanda, Ghandhi, Top Gun.

Some interesting fact about myself? I used to paint a lot and I was good at it. But midway I lost interest because I could not find a defining factor which makes one piece of art better than the other. I find this reason funny and interesting.... and sometimes sad(!). I like photorealistic paintings though because these can be evaluated by a deterministic method of how close to reality they are. So there is a defining factor....

CPSC 689 Sketch Recognition Fall '08