SUMMARY
DISCUSSION
Summary and discussion of papers in the field of Sketch Recognition.
In this paper the author presents a sketch recognition based tool for creating PowerPoint diagrams. The author has also performed evaluation of the prototype using several techniques and established design guidelines for creating SkRUIs. The author also evaluates the utility of several techniques used in iterative design of traditional user interfaces, for development of SkRUIs.
The prototype of their application supports drawing naturally on a separate window. These diagrams are then recognised by the SketchREAD recognizer and the recognized diagrams are then imported to the PowerPoint slides. The recognition is done only when the user completes sketching or when the focus is shifted from the sketching window. The system is not capable of determining automatically whether the user has finished sketching and relies on explicit feedback from the user. The system also supports a number of editing features like move and delete. It was found that providing an explicit modal switch between edit and ink gestures confused the users who often forgot to switch modes. Therefore an online edit mode was developed. The user hovered the pen over the drawn ink diagram and a subsequent cursor change indicated that the system was in edit mode. During formative evaluation users expressed the desire to add annotation symbols without them being recognised. A combo box was provided to indicate whether the recognition was ON.
The system was evaluated by testing it on users who were all graduate students. The users were asked to perform three prototypical diagram creation tasks. After completion of these tasks feedback was taken, based on which several design guidelines were established.
(1) Recognition results should be displayed only after sketching is done; (2) Provide explicit indication between free sketching and recognition. (3) Muti-domains should only be used when system is robust enough; (4)Pen based editing should be used. Sketching and editing should have clearly distinguishable gestures. (5)Large buttons should be used for pen based interface (6)Pen should respond in real time.
DISCUSSIONStructural shape descriptions either provided explicitly by the user or generated automatically by the computer are often over or under constrained. This paper describes a method to debug over and under constrained shapes in LADDER descriptions using a novel active learning technique that generates its own near missed example shapes.
LADDER based systems require the domain designer to provide shape descriptions. An intuitive way to provide description would be the draw and have the computer understand automatically. However these descriptions are often imperfect because of the inability of the computer to understand the intent of the user. The authors developed a visual debugger, that first asks the system to draw a positive example. After this the system generates near-miss examples (one additional or one less constrain) to be classified by the user as negative and positive. One the basis of user classification it removes unintended constraints and adds required constraints. But for this the system first needs to generate near miss examples. An initial set of true constraints is captured. This list is kept small and relevant using a set of heuristics. Each time a positive classification is encountered, the system removes from the list any constraint that is not true of the system. For under-constrained figures we determine a set of constraints which are not in the description. We add the negation of each constraint one by one. If the user gives a negative classification, the constraint is added. Thus the shape description is incrementally perfected.
DISCUSSION
Describing shapes by drawing them is very important from an HCI perspective. This paper provides a method fro enabling users to do this accurately. I was concerned about the size of the initial list of constraints until the authors describe a way to prune this list to include only the relevant ones.
The system also omits disjunctive constraints. A complex shape could easily consist of a Boolean combination of constraints rather then being described by individual constraints. For example two shapes which are mirror images of each other and are laterally asymmetric might need disjunctive constraints to describe them.
Purely from a UI perspective, would it be better to provide a group of shapes (say 10-15) to be classified by the user at once, rather then presenting it one by one.
SUMMARY
The paper aims to (1) present an easy to implement gesture recognition algorithm, especially for UI prototypers;(2) to empirically compare it to more advanced algorithms (3) to give insight into which user interface gestures are best.
The algorithm was built with several guidelines in mind. For example, the algorithm must be resilient to variations in sampling, rotation, should require no advanced mathematics etc. The authors have described the algorithm in four steps.
The gesture data points are first resampled at a defined rate to make the data independent of sampling rate of a particular hardware. The gesture points are then rotated to align them with the template gesture. Brute force could be used to get all possible rotations and take the one that has maximum alignment. But the authors claim that rotating the gesture so that its indicative angle (angle between centroid of gesture and gesture's first point) is at zero gives the best alignment. The gesture is then scaled to a reference square and translated to a reference point.
Step four does the actual recognition. The candidate gesture is compared to each stored template, to find the average distance between the corresponding points. The template with least path distance is the result of recognition. The minimum path distance is then converted into a score. One of the limitation of the algorithm is that it cannot distinguish gestures whose identities depend on specific orientations, aspect ratios or locations.
The authors conducted an evaluation using 4800 gestures collected from 10 subjects. The comparison was made with two popular recognizers- Rubine classifier and DTW. DTW and 1$ were found to be very accuarate. Rubine was comparatively less successful.
In future, authors plan to conduct studies on programming ease of their algorithm. Further empirical analysis may help in making better algorithmic choices.
DISCUSSION
The obvious advantage with this algorithm is how simply it can be implemented, without any advanced mathematics. It uses a simple classifier by taking average distance of spatial coordinates. This simplicity might be a disadvantage too. As admitted by authors it fails to differentiate on the basis of features like aspect ratio. Rotational invariance which is discussed as an advantage could also prove to be a disadvantage. The system might not be able to differentiate between UP and DOWN arrows. One thing could be done to remedy this. There could be a threshold on how much rotation can we apply to align or else we could put a penalty as the amount of rotation required to align increases.
-A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels
COMMENTS
1. Comment on Manoj's blog
SUMMARY
In this paper authors' goal to to study through experiments why users find gestures similar and in the process derive a predictive model for perceived gesture similarity which has a high correlation with actual observation. This model may be used as an advising tool by gesture designers. The author enlists the previous work in the field of Pen Input devices. Most relevant to authors' work is Attneave's extensive study in perceptual similarity. He found that logarithm of quantitative metric correlates with similarity. The authors conducted two experiments using the techniques of Multidimensional Scaling (MDS). MDS is a technique for reducing the number of dimensions of a data set, so that patterns can easily been seen by plotting data in 2 or 3 dimensions. In the first experiment a previously designer gesture set which varied widely in how people would perceive them was used. Participants were presented with all possible sets of three gestures (triad) and were asked to mark the one that seemed most different from others among each set. By plotting gestures generated by MDS, the authors were able to determine the features that contribute to similarity. By running regression analysis, the authors were able to derive a model of gesture similarity that correlated 0.74 with the reported gesture similarities. The MDS indicated that the optimal number of dimensions is 5. Some of the features correlated with perceived similarity were curviness, total absolute angle, density etc. Another surprising outcome of the experiment was that the participants seemed to be divided into 2 groups which had different perceptions of similarity of gestures. Experiment 2 was conducted to test the predictive power of the model derived in the first one. Three new gesture sets of nine gestures each were created. Each set achieved a variance in the one of features found to be correlated in the first experiment. 2 gestures from each of the 3 gesture sets were chosen and added to a fourth set to allow us to compare the three sets against each other. Each participant was shown all possible triads. The analysis shows that 3 was the optimal number of dimensions to be used. Meaning of these dimensions was not obvious as in the first experiment. The features that correlated with the dimentions were Log (aspect), total absolute abgle and density. the derived model had a correlation of 0.71 with observation. Based on correlation calculation (model derived from which experiment agrees with observation more), model derived from experiment 2 was found to be a better predictor. Future work could consst of extending these experiments. Participants could be asked to draw gestures and then mark dissimilarity.
DISCUSSION
This paper has impressed me because the emphasis is more on Human part of the human computer interaction. I could not identify any faults with the paper except for those pointed out by the author like- users were not made to draw the gestures before they could pick out the dissimilar one. Since eventually, users will have to draw the gestures. Consider this- I have two line gestures for scroll up and scroll down. For scroll up line gesture is drawn upwards and for scroll down it goes downwards. Users will feel this difference only when they actually draw these gestures and not by just looking at the pictures of these two seemingly similar gestures (even when the starting point is specified for each gesture).
-Dean Rubine
COMMENTS
SUMMARY-Tracy Hammond and Kenneth Mock
COMMENTS ON OTHER PEOPLE'S BLOGS
1. Comment on Nabeel's blog
SUMMARY
The paper presents an overview of the existing technology and ongoing research in the field of sketch recognition (SR). It also enumerates the ways in which SR could make certain tasks simpler and more efficient. It begins with a brief introduction of Ivan Sutherland's sketch pad and a probable reason why it couldn't take off. Raster graphic displays despite their inability to produce smooth lines overshadowed vector graphics (used in sketch pad) due to the flicker free display and lower cost of the former.
Next, types of digitizers (technology used to determine the location of a pen while writing or navigating) are discussed. Passive digitizers use only touch data and don't require a special pen to navigate. However they suffer from various disadvantages like Vectoring (unintended click), jumpy mouse cursor, difficult secondary inputs like right click, and lower resolution. Active digitizers on the other hand use electromagnetic signals reflected off a special pen to get position data, but are free of other disadvantages associated with passive digitizers.
After that, various hardware and software technologies that are used in sketch recognition systems are described. Convertible tablet PCs, slates, Wacom pen tablets are some of the hardware technologies to enable pen based input. Microsoft Vista and XP to some extent have handwriting recognition capabilities. Camtasia screen capture allows users to record their pen interactions.
The next part is related to applications in education. Instructors can deliver their lectures with the help of tablet PCs and large displays. In addition to previously prepared content of their slides, they could show on the fly data by simply sketching on the slides. User studies show that students have shown an increase in performance when such methods were used by instructors. There are few disadvantages associated with this method; one of which is the initial learning curve.
The paper then presents several pointers as to how lectures could be prepared and delivered using the above mentioned technologies. These technologies have found some nice application in describing molecular structure to students, in the field of high school physics and mathematics.
After that the FLUID framework is described. FLUID framework enables end-users to describe their own shapes and domains. So this framework is sustainable and self learning in that way. Users could either describe shapes by entering text data or just by drawing an example shape.
In the end two user studies illustrate how sketch based systems have actually shown positive results in a classroom setting. Statistics show an increase in tablet-PC usage especially in the field of education. So it might not be long before we see tablet PCs installed with sketch recognition systems as a ubiquitous piece of technology in classroom and elsewhere.
DISCUSSION
The paper gives a cursory overview of sketch recognition technology to the uninitiated. I liked the idea of the FLUID framework, where the end-user does not have to wait for a new version every time he wants to work with a new domain. The system can be taught to recognize new shapes. In essence, the intelligence of the system will evolve with usage, very much in line with what humans go through. I was very impressed with the idea of teaching the system a new shape by drawing an example. The text input method, however, would seem a little intimidating to the user.
One more thing that impressed me was the pressure sensitive capabilities of active digitizers. This technology could go a long way in giving digital pens a natural feel of writing pen. I can see it being used by professional painters and artists in future.
E-mail address : akb2810 at tamu dot edu
Graduate standing : 1st year Masters
Why am I taking this class? I tried to search for a book related to Sketch Recognition. I could not find even one. Less explored field -> huge probability that research will lead to new findings.
What experience do I bring to this class? I built a simple interface in the final year of my bachelors where I could navigate the mouse cursor and perform other functions by waving my hand from a distance. Thats all I have.
What do I expect to be doing in 10 years? Working in a research laboratory in some field of Computer Science, or may be sociology and economics. ( I have no idea what these two latter fields are all about. But I keep developing theories in my head, rejecting or accepting them. I am my own audience)
What do I think will be the next biggest technological advancement in computer science? It could be virtual reality, like virtual office spaces etc.
What was my favorite course in undergrad (CS or otherwise)? Design and analysis of algorithms.
If I could be another animal, what would it be and why? I could never be any other animal. I love all the confusion in human head. It would feel too tied up being driven by instinct alone.
What is my favorite motto or slogan? Imagination is better than knowledge. -- Albert Einstein
What is my favorite movie? Right now ... Hotel Rawanda, Ghandhi, Top Gun.
Some interesting fact about myself? I used to paint a lot and I was good at it. But midway I lost interest because I could not find a defining factor which makes one piece of art better than the other. I find this reason funny and interesting.... and sometimes sad(!). I like photorealistic paintings though because these can be evaluated by a deterministic method of how close to reality they are. So there is a defining factor....