Chapter 12: Introducing Evaluation
This question is designed to encourage you to analyze the evaluation case studies in detail and to compare them. By doing so you will discover more about the underlying reasons why the designers and evaluators did what they did. You will also discover that the descriptions of what was done, how it was done and when it was done are incomplete for some case studies, and for published papers that you read. In this case you will have to speculate about the details and make suggestions about what you think happened or what could happen.
The evaluation case studies cover the design cycle. Crowdsourcing can be done early in design or later in design depending on the task to be evaluated. In this example the evaluation was done to compare the reliability of crowdsourcing using Mechanical Turk compared with laboratory-based approaches. Some studies describe early evaluations while others are done late in design and often involve usability testing in controlled environments. The case studies described in this chapter demonstrate how different evaluation methods are used together to compliment each other. The advantage of doing this is that they provide different types of data which, when analyzed, offer different perspectives on what is happening. For example, quantitative data from usability tests is often supplemented with observational data and data from user satisfaction questionnaires. You will also see that some methods are used only for examining particular parts of systems. This is because evaluating the whole system may take too long and be too costly or not necessary. If a previous evaluation has shown a particular problem, then the next evaluation may focus on that issue. This approach is often adopted when using analytical methods like those described in chapter 15.
The description of the activity suggests that you may find it useful to complete at table; alternatively you may wish to write longer descriptions. Whichever approach you adopt be sure to focus on when during the design the evaluations were performed, which methods were used, and what was learned from the evaluations?