Usability evaluation methods are focused on finding problems, not on finding successes with the exception of Cognitive Walkthrough. Still, experienced usability practitioners know that an evaluation report should begin by commending the strong points of a design, but these are not what usability methods are optimised to detect. Realistic relevant evaluations must assess incurred costs relative to achieved benefits. When transferring my contacts between phones, I experienced the following problems and associated costs:.
By forming the list above, I have taken a position on what, in part, would count as poor usability. To form a judgement as to whether these costs were worthwhile , I also need to take a position on positive outcomes and experiences:. A more helpful question is whether the interaction was worthwhile , i. Worth is a very useful English word that captures the relationship between costs and benefits: achieved benefits are not worth the incurred costs.
Worth relates positive value to negative value, considering the balance of both, rather than, as in the case of poor usability, mostly or wholly focusing on negative factors. So, did my resulting benefits justify my expended costs? My answer is yes, which is why I was satisfied at the time, and am more satisfied now as frustrations fade and potential future value has been steadily realised. Given the two or three usability problems encountered, and their associated costs, it is quite clear that the interaction could have been more worthwhile increased value at lower costs , but this position is more clear cut than having to decide on the extent and severity of usability problems in isolation.
The interaction would have been more worthwhile in the absence of usability problems but I would not have this example. Still better, the utility suite that came with my phone could have had. Perhaps the best solution would be for phones to enable Windows to import contacts from them. Also, if I had used my previous laptop, the required phone utility suite was already installed and there should have been no initial connection problems.
There were thus ways of reducing costs and increasing value that would not involve modifications to the software that I used, but would instead have replaced them all with one simple purpose built tool. None of the experienced usability problems would have been fixed. Once the complexity of the required data path is understood, it is clear that the best thing to do is to completely re-engineer it. Obliteration beats iteration here. By considering usability within the broader context of experience and outcomes, many dilemmas associated with usability in isolation disappear. This generalises to human-centred design as a whole.
Even so, he acknowledges the lack of truly compelling stories that fully establish the importance of human-centred design to innovation , since these are undermined by examples of people regularly surmounting inconveniences Brown , pp. Through examples such as chaining bicycles to park benches, Brown illustrates worth in action : the benefit security of bike warrants the cost finding a nearby suitable fixed structure to chain to.
The problem with usability evaluations is that they typically focus on incurred costs without a balancing focus on achieved benefits. Brown returns to the issue of balance in his closing chapter, where design thinking is argued to achieve balance through its integrative nature p. Human-centred contributions to designs are just one set of inputs.
Design success depends on effective integration of all its inputs. Usability needs to fit into the big picture here. Usability evaluation finds usage problems. These need to be understood holistically in the full design context before possible solutions can be proposed. Usability evaluation cannot function in isolation, at least, not without isolating the usability function. Since the early 90s, usability specialists have had a range of approaches to draw on, which, once properly adapted, configured and combined can provide highly valuable inputs to the iterative development of interaction designs.
Yet we continue to experience interaction design flaws, such as lack of instructive actionable feedback on errors and problems, which can and should be eliminated. However, appropriate use of usability expertise is only one part of the answer. A complete solution requires better integration of usage evaluation into other design activities. Without such integration, usability practices will continue to be met often with disappointment, distrust, scepticism and a lack of appreciation in some technology development settings Iivari This sets us up for a third alternative definition of usability that steers a middle course between essentialism and relationalism:.
A usable system does not degrade or destroy achievable worth through excessive or unbearable usage costs. Usability can thus be understood as a major facet of user experience that can reduce achieved worth through adverse usage costs, but can only add to achieved worth through the iterative removal of usability problems. Usability improvements reduce usage costs, but cannot increase the value of usage experiences or outcomes. Office of War Information. Domestic Operations Branch. Bureau of Special Services.
Herzberg studied motivation at work, and distinguished positive motivators from negative hygiene factors in the workplace. Overt and sustained recognition at work is an example of a motivator factor, whereas inadequate salary is an example of a hygiene factor. Motivator factors can cause job satisfaction, whereas hygiene factors can cause dissatisfaction.
The absence of motivators does not result in dissatisfaction, but in the neutral absence of dis satisfaction. Similarly, the absence of negative hygiene factors does not result in satisfaction, but in the neutral absence of dis satisfaction. Loss of a positive motivator thus results in being unsatisfied, whereas loss of an adverse hygiene factor results in being undissatisfied!
Usability can thus be thought of as an overarching term for hygiene factors in user experience. Attending to poor usability can remove adverse demotivating hygiene factors, but it cannot introduce positive motivators. Positive motivators can be thought of as the opposite pole of user experience to poor usability. Poor usability demotivates, but good usability does not motivate, only positive experiences and outcomes do. The problem with usability as originally conceived in isolation from other design concerns is that it only supports the identification and correction of defects, and not the identification and creation of positive qualities.
Commercially, poor usability can make a product or service uncompetitive, but usability can only make it competitive relative to products or services with equal value but worse usability. Usage costs will always influence whether an interactive system is worthwhile or not. These costs will continue to be so high in some usage contexts that the achieved worth of an interactive system is degraded or even destroyed. For the most part, such situations are avoidable, and will only persist when design teams lack critical human-centred competences.
Usability experts will continue to be needed to fix their design disasters. In well directed design teams, there will not be enough work for a pure usability specialist. This is evidenced by a trend within the last decade of a broadening from usability to user experience expertise. User experience work focuses on both positive and negative value, both during usage and after it. A sole focus on negative aspects of interactive experiences is becoming rarer. Useful measures of usage are extending beyond the mostly cognitive problem measures of s usability to include positive and negative affect, attitudes and values, e.
The coupling between evaluation and design is being improved by user experience specialists with design competences. We might also include interaction designers with user experience competences, but no interaction designer worthy of the name should lack these! Competences in high-fidelity prototyping, scripting and even programming are allowing user experience specialists firstly to communicate human context insights through working prototypes Rosenbaum , and secondly to communicate possible design responses to user experience issues revealed in evaluations.
We can see two trends here. The first involves complementing human-centred expertise with strong understandings of specific technologies such as search and security. The second involves a broadening of human-centred expertise to include business competences e. At the frontiers of user experience research, the potentials for exploiting insights from the humanities are being increasingly demonstrated e.
The extension of narrow usability expertise to broader user experience competences reduces the risk of inappropriate evaluation measures Cockton However, each new user experience attribute introduces new measurement challenges, as do longer term measures associated with achieved value and persistent adverse consequences. A preference for psychometrically robust metrics must often be subordinated to the needs to measure specific value in the world, however and wherever it occurs. User experience work will thus increasingly require the development of custom evaluation instruments for experience attributes and worthwhile outcomes.
Standard validated measures will continue to add value, but only if they are the right measures. There is however a strong trend towards custom instrumentation of digital technologies, above the level of server logs and low level system events Rosenbaum Such custom instrumentation can extend beyond a single technology component to all critical user touch points in its embracing product-service ecosystem. For example, where problems arise with selecting, collecting, using and returning hired vans, it is essential to instrument the van hire depots, not the web site.
Where measures relate directly to designed benefits and anticipated adverse interactions, this approach is known as direct worth instrumentation Cockton b. Risks of inappropriate standard metrics arise when web site evaluations use the ready-to-hand measures of web server logs. What is easy to measure via a web server is rarely what is needed for meaningful relevant user experience evaluation. Thus researchers at Google Rodden et al. Earnings sales can of course be a simple and very effective measure for e-commerce as a measure of not one, but every, user interaction. Improved usability has been only one re-design input here, albeit a vital one.
Similar improvements have been recorded by collaborations involving user experience agencies and consultancies worldwide. However, the relative contributions of usability, positive user experience, business strategy and marketing expertise are not clear, and in some ways irrelevant. The key point is that successful e-commerce sites require all such inputs to be co-ordinated throughout projects. There are no successful digital technologies without what might be regarded as usability flaws.
Some appear to have severe flaws, and are yet highly successful for many users. What matters is the resulting balance of worth as judged by all relevant stakeholders, i. Evaluation needs to focus on both positives and negatives. The latter need to be identified and assessed for their impact on achieved worth. Probe studies have proved to be highly effective here, identifying positive appropriative use that was completely unanticipated by design teams e.
It is refreshing to encounter evaluation approaches that identify unexpected successes as well as unwanted failures. For example, the evaluation of the Whereabouts Clock Brown et al. Designers and developers are more likely to view evaluation positively if it is not overwhelmingly negative. There should always be genuine significant positive experiences and outcomes to report.
Evaluation becomes more complicated once positive and negative phenomena need to be balanced against each other across multiple stakeholders. Worth has been explored as an umbrella concept to cover all interactions between positive and negative phenomena Cockton As well as requiring novel custom evaluation measures, this also requires ways to understand the achievement and loss of worth.
There have been some promising results here with novel approaches such as worth maps Cockton et al. Worth maps can give greater prominence to system attributes while simultaneously relating them to contextual factors of human experiences and outcomes. Evaluation can focus on worth map elements system attributes, user experience attributes, usage outcomes or on the connections between them, offering a practical resource for moving beyond tensions between essentialist and relational positions on software quality.
Worth-focused evaluation remains underdeveloped, but will focus predominantly on outcomes unless experiential values dominate design purpose as in many games. Where experiential values are not to the fore, detailed evaluation of user interactions may not be worthwhile if products and services have been shown to deliver or generously donate value. Evaluation of usage could increasingly become a relatively infrequent diagnostic tool to pinpoint where and why worth is being degraded or destroyed. Such a strategic focus is essential now that we have new data collection instruments such as web logs and eye tracking that gather massive amounts of data.
Such new weapons in the evaluation arsenal must be carefully aimed. A bore shotgun scattershot approach cannot be worthwhile for any system of realistic complexity. This is particularly the case when, as in my personal example of phone contacts transfer, whole product ecologies Forlizzi must be evaluated, and not component parts in isolation. When usage within such product ecologies here is mobile, intermittent and moves through diverse social contexts, it becomes even more unrealistic to evaluate every second of user interaction. In the future, usability evaluation will be put in its place.
User advocates will not be given free rein to berate and scold. Moaning on the margins about being ignored and undervalued is no longer an option. Usability must find its proper place within interaction design, as an essential part of the team, but rarely King of the Hill.
The reward is that usability work could become much more rewarding and less fraught. That has got to be worthwhile for all concerned. It contains a short essay Cockton a on the Whiteside et al. The 3rd edition will be published in There are chapters on user testing, inspection methods, model-based methods and other usability evaluation topics. Darryn Lavery prepared a set of tutorial materials on inspection methods in the s that are still available:. The TwinTide web site www. Practically minded readers may prefer BOK content over more academically oriented research publications.
Jakob Nielsen has developed and championed discount evaluation methods for over two decades. He co-developed Heuristic Evaluation with Rolf Mohlich. For example, in the final version of his heuristics some known issues with Heuristic Evaluation are not covered. Even so, the critical reader will find many valuable resources on www. UPA has a specific practitioner focus on usability evaluation. Most HCI publications are indexed on www.
In November , a search for usability evaluation found almost publications. I have been immensely fortunate to have collaborated with some of the most innovative researchers and practitioners in usability evaluation, despite having no serious interest in usability in my first decade of work in Interaction Design and HCI!
One of my first PhD students at Glasgow University, Darryn Lavery, changed this through his struggle with what I had thought was going to be a straightforward PhD on innovative inspection methods. Darryn exposed a series of serious fundamental problems with initial HCI thinking on usability evaluation. He laid the foundations for over a decade of rigorous critical research through his development of conceptual critiques Lavery et al. From , Alan Woolrych, Darryn Lavery to , myself and colleagues at Sunderland University built on these foundations in a series of studies that exposed the impact of specific resources on the quality of usability work e.
Research tactics from our studies were also used to good effect from by members of WG2 of COST Action MAUSE - see Where to learn more above , resulting in a new understanding of evaluation methods as usability work that adapts, configures and combines methods Cockton and Woolrych Nigel is one of many distinguished practitioners who have generously shared their expertise and given me feedback on my research. My apologies to anyone who I have left out! Bardzell , Jeffrey : Interaction criticism: An introduction to the practice.
In Interacting with Computers , 23 6 pp. Bardzell , Jeffrey : Interaction criticism and aesthetics. Bellamy , Rachel, John , Bonnie E. In: Proceeding of the 33rd international conference on Software engineering Brown , Barry A. Card , Stuart K. In Communications of the ACM , 23 pp. Cockton , Gilbert : Designing worth is worth designing. Cockton , Gilbert : Some Experience! Some Evolution. The MIT Presspp. Cockton , Gilbert : Load while Aiming; Hit?
In: Brau , Henning ed.
Electronic Journal of Academic and Special Librarianship
In: Angela. Sasse , M. Proceedings of Interact 99 , Edinburgh. Cockton , Gilbert and Woolrych , Alan : A. Cockton , Gilbert, Woolrych , Alan and Hindmarch , Mark : Reconditioned merchandise: extended structured report formats in usability inspection. Dumas , Joseph S. CRC Presspp. Norwood, NJ, Intellect. Forlizzi , Jodi : The product ecology: Understanding social product use and supporting design culture. In International Journal of Design , 2 1 pp. Gaver , William W. Gray , Wayne D. In Human-Computer Interaction , 13 3 pp. Microsoft Research Ltd. Herzberg , Frederick : Work and the Nature of Man.
In Behaviour and Information Technology , 29 1 pp. Iivari , N. International Standards Association John , Bonnie E. Landauer , Thomas K. The MIT Press. In Behaviour and Information Technology , 16 4 pp. Lecture Notes in Computer Science. Nielsen , Jakob : Enhancing the explanatory power of usability heuristics. In: Plaisant , Catherine ed. Pew , Richard W. Lawrence Erlbaumpp. Rodden , Kerry, Hutchinson , Hilary and Fu , Xin : Measuring the user experience on a large scale: user-centered metrics for web applications. Salvucci , Dario D. Sears , Andrew and Jacko , Julie A.
Lawrence Erlbaum Associates. Sengers , Phoebe and Gaver , William : Staying open to interpretation: engaging multiple meanings in design and evaluation. Shneiderman , Ben : No members, no officers, no dues: A ten year history of the software psychology society. Smith , Sidney L. Wasserman , Anthony I. In: Proceedings of the June , , national computer conference and exposition Wilson , Chauncey Severity Scale for Classifying Usability Problems.
My comments come from the perspective of someone who has practiced user experience UX research of many types as a consultant. Although I have done my share of usability evaluations, almost all of my work currently consists of in vivo contextual research, with a focus on finding ways to increase value to the user.
The product teams I work with often include internal usability specialists, and I well understand their roles within their teams and the challenges they face. Finally, my prior career as a psychologist has given me a very healthy respect for the difficulties of measuring and understanding human behavior in a meaningful way, and impatience with people who gloss over these challenges. I also agree with his critique of the methodological limitations of laboratory usability evaluation.
I could not agree more that contextual research is usually much more powerful than laboratory usability evaluation as an approach to understanding the user experience holistically and to gaining insights that will drive UX design towards greater overall value. With this said, however, I have a number of concerns about the chapter's portrayal and critique of usability as an inherently limited, marginal contributor to development of great products. In regard to practice, there are many gradations of skill and wisdom, and some unknown proportion of usability practitioners may deserve to be confronted with the criticisms Gilbert raises.
However, I question the idea that these criticisms are true of usability practice in principle. I believe that most mature usability practitioners are aware of the issues he raises, would agree with many of his points, and work hard to address them in various ways. This requires considering usability at two levels: as an abstract concept and as a field of practice. This treats usability as though it is distinct from both of these other concepts. Even though usability is generally acknowledged to be important, it is portrayed as quite subordinate.
In my view, this greatly understates the contribution of usability to value. The two concepts are far more intertwined than this Attempts to abstract value from usability are just as flawed as the reverse. The notion that ease of use is a separate issue from value, although one that affects it, has much face validity. It seems to make sense to think of value as a function of benefit somehow related inversely with costs, with usability problems counted in the costs column. In my view, usability divorced from value is as undefined as the sound of one hand clapping.
Usability can only be defined in the context of benefit. By this I do not mean benefit in principle, but rather the benefit anticipated by or experienced by the user. At one level, this is because usability and experienced benefit interact in complex ways. But beyond this, there are many products where usability is itself the primary value proposition.
In fact, the central value proposition of most technological tools is that they make something of value easier to achieve than it used to be. A mobile phone has value because its portability enables communication while mobile, and its portability matters because it makes it more usable when mobile. In another example, a large medical organization I am familiar with recently adopted a new, integrated digital medical record system.
Initially, there was a great deal of grumbling about how complex and confusing it was. I saw the classic evidence of problems in the form of notes stuck on computer monitors warning people not to do seemingly intuitive things and reminding them of the convoluted workarounds. However, more recently, I have heard nurses make comments about the benefit of the system. As a result, patients now can come to the clinic for a follow up laboratory test without having to remember to bring a written copy of the lab order.
The benefit has to do with its overall success in reducing the usability problems of an earlier process that used to be difficult to coordinate and error prone, and this increase in usability only matters because it is delivering a real benefit. Sometimes, usability seems detached from value when the goal is fulfilled at the end of a sequence of steps, but the steps along the way are confusing. However, it can be the separation from the experience of value that creates the usability problem.
For example, if people trying to book an online hotel reservation get lost in preliminary screens where they first have to create an account, we might see usability as only relevant to the cognitive aspects of the sign up process, and as mere hygiene factors. But when users become disoriented because they do not understand what a preliminary process has to do with their goal, it can be precisely because they cannot see the value of the preliminary steps.
If they did, the subparts of the process would both be more understandable and would acquire value of their own, just as a well-designed hammer gains value not simply in its own right, but because it is understood as a more effective tool for driving nails which are valued because of the value of the carpentry tasks they enable, and so on. For example, in one product that I worked on, users were offered the opportunity to enroll for health insurance benefits that claimed to be highly personalized.
In addition to setting different benefit levels for different members of their families, users could compose their own preferred networks of medical specialists, for which they would receive the highest reimbursement levels. Unfortunately, the actual user experience did not appear to live up to this.
As soon as the user entered identifying information, the system applied defaults to all the decisions that the user was supposedly able to personalize. Along the way, the user experienced the sense that decisions were being imposed. There was not even an indication to the user that the opportunity to make personal choices was coming eventually. Unfortunately, the system did not start by asking the user which choices mattered to them and what their preferences were, so it could factor this things in before presenting a result to the user.
How should we construe this? As a usability problem? As a problem in delivery of value? As a failure in the design of a user experience? It is all of these at the same time. The discrepancy from the expected perception of value is a primary cause of the confusion users felt.
None of these constructs usability, value, experience can be defined without incorporating the others. If we parse and remove the meaning that we can attribute to any of them, we drain the meaning from the others. Disputes about which is the legitimate language to describe them are at best just ways to emphasize different faces of the same phenomenon, and at worst semantic quibbling.
This means that usability is something more than just another item to add into the costs column when we weigh them against benefits to arrive at value. While Gilbert and I may agree on the need for a more holistic focus on user experience, we may disagree about whether usability in practice actually takes this holistic view.
Reducing the profession to a particular type of laboratory evaluation makes it seem limited and can raise questions about its relevance. Furthermore, even despite its limitations, traditional usability evaluation often contributes significant value in the product development context, at least when practiced by reflective professionals. Below, I comment on some of the major issues Gilbert raises with regard to usability practice.
Although some interaction design patterns have become established, and an increasing number of users have gained generalizable skills in learning a variety of new interaction patterns, this does not mean that ease of use as an issue has gone away or even declined in importance. For several reasons, it makes more sense to see the spectrum of usability issues to be addressed as having evolved.
First, the spectrum of users remains very large and is constantly expanding, and there are always some at an entry level. Second, although with experience users may gain knowledge that is transferrable from one family of products to another, this can be both an asset and a source of confusion, because the analogies among product designs are never perfect. Third, as innovation continues to create new products with new capabilities, the leading edge of UX keeps moving forward.
On that leading edge, there are always new sets of design challenges, approaches, and tradeoffs to consider. Finally, the world does not consist only of products intended to create experiences for their own sake as opposed to those that support tasks a distinction that is not necessarily so clear. Products that are designed to facilitate and manage goal-oriented tasks and to support productivity continue to have a tremendous impact on human life, and we have certainly not learned to optimize ease of interaction with them.
Finally, usability is continually driven forward by competition within a product domain. Another claim in the chapter that suggests limited relevance for usability is that good product teams do not need a dedicated usability person. This is too simplistic. Of course, a designated usability person does not create usability single handedly.
That is the cumulative result of everything that goes into the product. However, how much specialized work there is for a usability person depends on many factors. We need to take into account the variability among ways that product teams can be structured, the magnitude of the UX design challenges they face in their product space, the complexity of the product or family of inter-related products that the usability person supports, how incremental versus innovative the products are, what the risk tolerance is for usability problems, how heterogeneous the user population and user contexts are, how much user persistence is needed for usage to be reinforced and sustained by experiences of value, etc.
The simplistic statement certainly does not address the fact that some usability work takes more effort to carry out than others. To do realistic research with consumers is generally much easier than doing realistic research inside enterprises. As a matter of fact, in actual practice teams often do not have usability professionals assigned to them full time, because these people often support multiple product teams, in a matrix organizational structure.
There are benefits to this in terms of distributing a limited resource around the company. But there are also drawbacks. This structure often contributes to the usability person being inundated with requests to evaluate superficial aspects of design. It can also exclude the usability person from integrative discussions that lead to fundamental aspects of product definition and design and determine the core intendedvalue of the product.
It is true that laboratory usability evaluation typically does try to isolate cognitive factors by treating the users goals and motivation as givens, rather than attempting to discover them. Often, it is the fit of the assumed goal that is in question, and that makes the biggest difference in user experience. But many usability professionals spend a great deal of time doing things other than laboratory tests, including, increasingly, fundamental in context user research. For many years, usability evaluation has served as a platform to promote systematic attention to deeper issues of value to the user.
Many usability professionals deeply understand the complex, entangled relationship between ease of use and value, and work to focus on broad questions of how technology can deliver experienced value. Some usability people have succeeded in getting involved earlier in the design process when they can contribute to deeper levels of decision-making. Gilbert is correct that UX skills are increasingly distributed across roles. He lists a number of such skills, but missing from the list is the skill of doing disciplined research to evaluate evidence for the assumptions, claims, beliefs, or proposed designs of the product team, whether these are claims about what people need and will value, or whether a particular interface design will enable efficient performance.
Gilbert points out that there is no cookbook of infallible usability approaches. This is not a surprise, and indeed, we should never have expected such a thing. Such cookbooks do not exist for any complex field, and there is no way to guarantee that a practical measurement approach captures the core meaning of a complex construct. I do agree wholeheartedly with Gilbert when he points out the many factors that can complicate the process of interpreting usability findings due to this lack of a cookbook of infallible methods and the presence of many confounds.
These issues argue for the need for greater professionalism among usability practitioners, not for the downgrading of the profession or marginalizing it on the periphery of the product development team. Professionalism requires that practitioners have expert understanding of the limitations of methods, expertise in modifying them to address different challenges, the dedication to continually advance their own processes, and the skill to help drive the evolution of practice over time.
At a basic level, mature usability professionals recognize that results from a single evaluation do not give an absolute measure of overall usability. They are careful about overgeneralizing. They at least attempt to construct tasks that they expect users will care about, and attempt to recruit users who feel will engage realistically with the tasks. They wrestle with how best to achieve these things given the constraints they work under. Those who do not recognize the challenges of validity, or who apply techniques uncritically are certainly open to criticism, or should be considered mere technicians, but, again, they do not represent the best of usability practice.
In the absence of scientific certainty, where is the value of usability practice? In the product development context, this should not be judged by how well usability meets criteria of scientific rigor. It is more relevant to ask how it compares to and compliments other types of evidence that are used as a basis for product definition, audience targeting, functional specification, and design decisions.
Membership in product teams often requires allegiance to the product concept and design approach. Sometimes, demonstrations of enthusiasm are a pre-requisite for hiring. Often, it is risky for team members to challenge the particular compromises that have been made previously to adapt the product to various constraints or a design direction that has become established, since these all have vested interests behind them.
In this context, the fact that usability methods or approaches as Gilbert rightfully calls them are scientifically flawed does not mean they are without value. It is not as though all the other streams of influence that affect product development are based on solid science while usability is voodoo. When you consider the forces that drive product development, it is clear that subjective factors dominate many of them, for example:. Product decisions are also deeply influenced by legitimate considerations that are difficult to evaluate objectively, much less to weigh against each other, such as:.
In this context, a discipline that offers structured and transparent processes for introducing evidence-based critical thinking into the mix adds value, even though its methods are imperfect and its evidence open to interpretation. Sometimes, usability evaluation is a persuasive tool to get product teams to prioritize addressing serious problems that everyone knew existed, but that could not receive focus earlier.
Sometimes this is needed to counterbalance the persuasive techniques of other disciplines, which may have less scientific basis than usability. Sometimes usability results provide a basis to resolve disputes that have no perfect answer and that have previously paralyzed teams.
And sometimes they have the effect of triggering discussions about controversial things that would otherwise have been suppressed. Sometimes, usability in practice is portrayed as a mere quality assurance process, or as Gilbert says, a hygiene factor. It is often equated with evaluation as distinct from discovery and idea generation.
User-Centred Library Websites: Usability Evaluation Methods - Carole George - Google книги
In many ways, this is a false distinction. Careful evaluation of what exists now can inspire invention and direct creativity towards things that will make the most difference. Practices like rapid iterative design reflect efforts to integrate evaluation and invention. Practices that are considered to be both discovery and invention processes, like contextual design, fall on a continuum with formative usability evaluation and naturalistic evaluation in the usage context.
Of course, usability professionals differ in their skills for imagining new ways of meeting human needs, envisioning new forms of interactive experience, or even generating multiple alternative solutions to an information architecture problem or interface design problem. Some may lack these skills. However, the practice of usability is clearly enhanced by them.
Certainly one can find examples of bad usability practice, and I cannot judge what other people may have encountered. Of course, there is also a lot of bad market research, bad design, bad business decision-making, bad engineering, and bad manufacturing. Let us not define the field based on its worst practice, or even on its lowest-common denominator practice. Failure to take into account the kinds of confounds Gilbert identifies is indeed bad practice because it will lead to misleading information. Handing over to a team narrow findings, minimally processed, excludes the usability practitioner from the integrative dialogue in which various inputs and courses of action are weighed against each other, and from the creative endeavor of proposing solutions.
This will indeed limit usability practitioners to a tactical contributor role and will also result in products that are less likely to provide value for the users. Finally, to any usability practitioners who think that usability is some kind of essence that resides in a product or design, and that can be objectively and accurately measured in the lab: Stop it. If you think that there is a simple definition of ease of use that can be assessed in an error-free way via a snapshot with an imperfect sample of representative users and simulated tasks: Stop it.
If you think usability does not evolve over time or interact with user motivation and expectations and experience of benefit: Stop it. If you think that ease of use abstracted from everything else is the sole criterion for product success or experienced value: Stop it!
If you think you are entitled to unilaterally impose your recommendations on team decision-making: Stop it. You are embarrassing the profession! However, its presentation and focus could be more helpful to the Interaction-Design. If some of this audience consists of practitioners—especially less-experienced practitioners—then Cockton is not speaking to their needs. Who are you, the readers of this chapter? One of Interaction-Design. It seems likely that many of you are practitioners in business, technology, healthcare, finance, government, and other applied fields.
As founder and CEO of a user experience consultancy, I find that most people—in both industry and academia—want to learn about usability evaluation as part of their goal to design better products, websites, applications, and services. Especially in industry, philosophical debates about points of definition take second place to the need to compete in the marketplace with usable, useful, and appealing products. This is not a new observation. As early as , Dumas and Redish  pointed out that we don't do usability testing as a theoretical exercise; we do it to improve products.
Unfortunately, Cockton loses sight of this key objective and instead forces his readers to follow him as he presents, and then demolishes, an increasingly complex series of hypotheses about the meaning of usability. The danger of this approach is that a casual reader—especially one with a limited command of English—may learn from the chapter precisely the ideas Cockton eventually disproves.
Yet as he states later in the chapter, the contextual nature of design—and thus usability—has long been known, not only in the Whiteside at al. Throughout his chapter, Cockton continues to build and revise his definitions of usability. The evolution of these definitions is interesting to me personally because of my academic degrees in the philosophy of language. But reading this chapter gives my colleagues in industry only limited help in their role as user experience practitioners conducting usability evaluations of products under development.
In Section The concept of usability has never applied only to software; ease of use is important to all aspects of our daily life. In , Don Norman wrote about the affordances of door handles . Interactive systems are meaningless without users, and usage must be of something. The discussion of damaged merchandise invalid usability methods in Section There are—and will always be—evaluator effects in any method which has not been described in enough detail to replicate it.
The fact that evaluator effects exist underlines the importance of training skilled evaluators. There need not be a dichotomy between essentialist ontologies and relational ontologies of usability as described in Section Rather, if enough people in enough different contexts have similar user experiences, then guidelines about how to improve those experiences can be created and applied effectively, without using empirical methods for every evaluation. Also, from a practical standpoint, it is simply not realistic to usability test every element of every product in all of its contexts.
See Figures 1 and 2. Planning the activities in a usability evaluation program—and the schedule and budget appropriate to each—is central to the responsibilities of an experienced and skilled usability practitioner. An encyclopedia chapter on usability evaluation should help readers understand this decision-making process. By the time we get to sections I wish this chapter had provided more references about how to learn usability evaluation skills; adding such a focus would make it more valuable for readers. I have included a selection of these at the end of my commentary.
Although Cockton correctly points out that such resources are not sufficiently complete to follow slavishly, they are still helpful learning tools. From my own experience at TecEd, the selection and combination of methods in a usability initiative are the most challenging—and interesting—parts of our consulting practice. For example, our engagements have included the following sequences:. Ethnographic interviews at the homes of 19 vehicle owners throughout the United States. We observed vehicle records and photographed and analyzed artifacts see Figure 3 to learn how Web technology could support the information needs of vehicle owners.
Next we conducted interviews at the homes of 10 vehicle buyers to learn what information they need to make a purchase decision, where they find it, and what they do with it. We subsequently conducted another cycle of interviews at the homes of 13 truck buyers to learn similar information, as well as how truck buyers compare to other vehicle buyers. Multi-phase qualitative research project with physicians and allied health personnel during the alpha test of a clinical information system at a major U. Unmoderated card sorting, followed by an information architecture IA exploration to help define the user interface for a new product.
We began with a two-hour workshop to brainstorm terms for the card sorting, then created and iterated lists of terms, and launched the sorting exercise. In these environments, we learned some contextual information despite the lab setting. Early field research for the Cisco Unified Communications System, observing how people use a variety of communication methods and tools in large enterprise environments.
We began each site visit with a focus group, then conducted contextual inquiry with other participants in their own work settings. Two teams of two researchers one from TecEd, one from Cisco met in parallel with participants, to complete each site visit in a day. After all the visits, Cisco conducted a full-day data compilation workshop with the research teams and stakeholders.
Then TecEd prepared a project report see Figure 5 with an executive summary that all participating companies received, which was their incentive to join the study. However, there remains an implicit assumption that evaluation is summative rather than formative. Used effectively, these can give a measure of the quality or even the worth of a system, alone or in the product ecologies of which it is a part.
However, they do not provide information for design improvement. A concern with the quantifiable, and with properties of evaluation methods such as reliability e. Wixon argues that the most important feature of any method is its downstream utility: does the evaluation method yield insights that will improve the design? To deliver downstream utility, the method has to deliver insights not just about whether a product improves for example user happiness, but also why it improves happiness, and how the design could be changed to improve happiness even further or reduce frustration, or whatever.
This demands evaluation methods that can inform the design of next-generation products. Of course, no method stands alone: a method is simply a tool to be used by practitioners for a purpose. To focus this selection and adaptation process, we have developed the Pret A Rapporter framework Blandford et al, a for planning a study. The first important element of the framework is making explicit the obvious point that every study is conducted for a purpose, and that that purpose needs to be clear whether it is formative or summative, focused or exploratory.
The second important element is that every study has to work with the available resources and constraints: every evaluation study is an exercise in the art of the possible. Every evaluation approach has a potential scope — purposes for which it is and is not well suited. Cockton draws a distinction between analytical and empirical methods, where analytical methods involve inspection of a system and empirical methods are based on usage.
This is a good first approximation, but hides some important differences between methods. Some analytical methods such as Heuristic Evaluation or Expert Walkthrough have no direct grounding in theory, but provide more or less support for the analyst e. In a study of several different analytical methods Blandford et al, c , we found that methods with a clear theoretical underpinning yielded rich insights about a narrow range of issues concerning system design, likely user misconceptions, how well the system fits the way users think about their activities, the quality of physical fit between user and system, or how well the system fits its context of use ; methods such as Heuristic Evaluation, which do not have theoretical underpinnings, tend to yield insights across a broader range of issues, but also tend to focus more on the negative what is wrong with a system than the positive what already works well, or how a system might be improved.
Cockton rightly emphasises the importance of context for assessing usability or user experience ; surprisingly little attention has been paid to developing methods that really assess how systems fit their users in their various contexts of use. In the context of e-commerce, such as his van hire example, it is widely recognised that the Total Customer Experience matters more than the UX of the website interface e. Minocha et al, : the website is one component of a broader system, and what matters is that the whole system works well for the customers and also for the staff who have to work within it.
The same is true in most contexts: the system has to perform well, it has to be usable and provide a positive user experience, but it also has to fit well into the context of use. In different contexts, different criteria become prominent. For example, for a banking system, security is at least as important as usability, and having confidence in the security of the system is an important aspect of user experience. A few days ago, I was trying to set up a new standing order i.
This was irritating, and a waste of time as I tried to work out whether there was a way to force the system to accept a later date for first payment , but it did not undermine my confidence in the system, so I will continue to use it because in many other situations it provides a level of convenience that old-fashioned banking did not. Cockton points out that there are many values that a system may offer other than usability. We have recently been conducting a study of home haemodialysis. We had expected basic usability to feature significantly in the study, but it does not: not because the systems are easy to use they are not , but because the users have to be very well trained before they are able to dialyse at home, their lives depend on dialysis so they are grateful to have access to such machines , and being able to dialyse at home improves their quality of life compared to having to travel to a dialysis centre several times a week.
The value to users of usability is much lower than the values of quality of life and safety. In our experience, combining empirical studies involving interviews and observations with some form of theory-based analysis provides a way of generalising findings beyond the particular context that is being studied, while also grounding the evaluation in user data. If you do a situated study of for example a digital library in a hospital setting Adams et al, , it is difficult to assess how, or whether, the findings generalise to even a different hospital setting, never mind other contexts of use.
- The Physical Metallurgy of Fracture. Fourth International Conference on Fracture, June 1977, University of Waterloo, Canada.
- User-Centred Library Websites: Usability Evaluation Methods!
- Main navigation!
- Xcode 4 Unleashed (2nd Edition);
In this case, the theory did not contribute to an understanding of usability per se , but to an understanding of how the deployment of the technology influenced its acceptance and take-up in practice. Similarly, in a study of an ambulance dispatch system Blandford and Wong, , a theory of situation awareness enabled us to reason about which aspects of the system design, and the way it was used in context, supported or hindered the situation awareness of control room staff.
It was possible to apply an alternative theoretical perspective Distributed Cognition to the same context of use ambulance dispatch Furniss and Blandford, to get a better understanding of how the technology design and workspace design contribute to the work of control room staff, including the ways that they coordinate their activity.
By providing a semi-structured method DiCoT for conducting Distributed Cognition analyses of systems Blandford and Furniss, , we are encoding key aspects of the theory to make it easier for others to apply it e. McKnight and Doherty, , and we are also applying it ourselves to new contexts, such as an intensive care unit Rajkomar and Blandford, in press. Even though particular devices are typically at the centre of these studies, they do not focus on classical usability of the device, or even on user experience as defined by Cockton, but on how the design of the device supports work in its context of use.
Another important aspect of use in context is how people think about their activities and how a device requires them to think about those activities. We went on to develop CASSM Blandford et al, b as a method for systematically evaluating the quality of the conceptual fit between a system and its users. Where there are different classes of users of the same system, which you might regard as different personas, you are likely to find different qualities of fit Blandford et al, CASSM contrasts with most established evaluation methods in being formative rather than summative; in focusing on concepts rather than procedures; in being a hybrid empirical-analytical approach; and in focusing on use in context rather than either usability or user experience as Cockton describes them.
It is a method for evaluating how existing systems support their users in context, which is a basis for identifying future design opportunities to either improve those systems or deliver novel systems that address currently unmet needs. Evaluation should not be the end of the story: as Carroll and Rosson argue, systems and uses evolve over time, and evaluation of the current generation of products can be a basis for designing the next generation.
Whalen18 carried out a brief qualitative study of academic and public library site features. Stover and Zink19 developed an evaluation tool for the design of library home pages based on contemporary guidelines, and evaluated the home pages of 40 American and Canadian higher education libraries. Clyde20 carried out a content analysis of school and public and library web sites in 13 countries. Quintana21 developed a set of content and design principles for health web sites and used them to evaluate ten well-known services. Cottrell and Eisenberg22 developed a six-factor framework for web site evaluation and organisation.
Cohen and Still23 carried out a comparison of the web sites of research university and two-year college library web sites, evaluating them in terms of their provision of information about their services and their support for reference, research and instruction, and of particular features of their design and functionality. Misic and Johnson 24 carried out a comprehensive benchmarking exercise comparing web sites of business schools.
Dewey 25 studied the findability of links on the library web pages of members of an American university consortium. Sowards 26 ,in a highly detailed study, established a theoretical typology for library guides to web resources in terms of their depth, organisation and design fea,tures. Osorio 27 provides a useful literature review and an account of a content and design evaluation carried out on 45 science and engineering libraries. An important study by McGillis and Toms of the Memorial University of Newfoundland library web site has recently been published.
McCready31 describes a library web site redesign at Marquette University that depended exclusively on focus groups. There is little of specific relevance to health libraries other than the recent work of Fuller and Hinegardner, which appeared too late to inform the project.
Methods It is widely held32 that the best results in usability evaluations come from carrying out as many small tests as possible. This work involved several phases and combined several different methodological approaches. In the preliminary phase a succinct content and design checklist was developed and used on a selection of web sites of NHS libraries similar to SLAM Multidisciplinary Library, as a benchmarking and evaluation tool and as a source of new ideas. The main phase of the project consisted of formal observation testing, card sorting, and a combined label intuitiveness and category membership test.
In the final phase of the project proposals were put forward, based on the findings, for revisions of the site. Despite the provision of an incentive the offer of a free lunch in the staff canteen it proved difficult to recruit participants for activities which required extended time to be spent in the library, such as the focus groups and usability tests. In practice, most of those who took part were based on site or in premises nearby. Demographic information was recorded for each tester, as recommended by Davis Among the volunteers there was a preponderance of medical staff and of professional non-clinical staff, also there were considerably more women than men.
It proved impossible to recruit participants from certain groups social workers, health care assistants. Only three participants reported making extensive use of the site before taking part in usability testing activity. The libraries were chosen deliberately for the range of approaches to navigation and design they represented, and the range of their online content. Each group was facilitated by the author and lasted about 45 minutes. At the start of each test the participants were given a script and list of tasks.
The fifteen tasks, some of which had a number of different components, were designed to address anticipated usability problems. The usability metrics derived were: percentage of tasks completed, number of false starts for each task, longest time taken for each task, number of prompts required per task per user, and user satisfaction ratings. Sets of paper slips were created, one slip for each item on each of the menus. Menu category headings were also included among the slips.
Subjects were asked to sort the slips into categories, using either one of the menu headings as a label for the category, or devising their own heading if they preferred. This provided screen shots illustrating the main menu and sub-menus; respondents were asked what they would expect to be included in each main category, and what sort of information they thought each of the links would indicate.
Results and discussion IV. There is considerable variation in the primary and additional navigation systems used, in the categories of information given on the home page, in scope and content, and in the services provided. Sites tend to group information about library services on a single long page. Most of the sites make little use of interactive features, typically providing only one or two; these, however, varied considerably. Glenfield has a message board, a chat room, and online polls.
Exeter has an online membership registration form, while Brighton has online book and journal article request forms. One site Chichester did not list any journals. Provision of book catalogue access varied: one site provided a link to a university Web OPAC, two others to a consortium union catalogue, another provided access only within the trust network to the catalogue, and the others had no access at all.
Direct login facilities were provided where applicable. Sites did not generally have any significant uniquely developed content. One exception was the Knowledgeshare clinical knowledge management web site at Brighton. All the sites except Chichester provided some selected links to external web sites. The Exeter and Glenfield lists are comprehensive and highly developed.
According to the canons of focus group methodology 37, the groups were a too small, and b should not have been conducted by the site designer. In this instance, however, combining the two roles in the same person meant that the facilitator knew the participants and had a close knowledge of the site and of the library service, hence could set individual comments in context within the discussion. A learning effect was apparent as the test was conducted; as they worked through the tasks, the testers learnt their way around the site, and by the end their comments often indicated that that were able to find things relatively easily.
Twelve failures to complete tasks and 15 prompts were recorded out of a total of test events. Table 1. Usability test data analysis worksheet No. Task Maximum Total no. Problem summary time min. Problem summary--examples time min. Are they site search available to staff who are not currently affiliated to or studying at an institution of higher education? CHSL access should be mentioned on list 7 What academic and 1m 15s 0 one tester thought email hyperlink professional qualifications would lead to a CV does the part-time library assistant hold?
Berkeley, CA: Peachpit Press? Problem summary—examples time min. Vancouver system for referencing your citations. Find the form that will enable you to submit a literature search request to the library. The test results were difficult to record with total accuracy; it was difficult to take notes while conducting the tests, and the tapes that had been recorded could not always be deciphered.
Again it was not good practice that the author of the site should have been conducting and recording the tests. These tests were successful, however, in identifying a number of significant usability issues. It had evident advantages in terms of time and convenience, but precluded any informative contact on my part with the subjects.
It became apparent that the results were being affected by user uncertainty caused by lack of intuitiveness of the item labels; with hindsight, fuller descriptions should have been given of the item contents. The following results were evident across all three charts. Card sorting is considered to be more effective and accurate with 20 users or more. The results did, however, provide some clear pointers for restructuring the menu system. Both headings evidently tend to be interpreted as referring comprehensively to all aspects of the library service.
The test also highlighted some expectations for content which are not currently available.
One expected there to be a facility to search their own inter-library loan records online, while another mentioned the desirability of an integrated union catalogue and inter-library loan request facility. As a result of usability testing, detailed lists of the proposed changes to the site were drawn up and a new structure for the web site proposed.
Conclusion Relatively few people were involved in each of these tests. It appeared that the main usability issues had been correctly anticipated II. The major problems encountered by the testers appeared to involve two main areas: a the specialised terminology used in referring to information sources and services, and b the organisation and structure of some of the information about library services.
Many researchers have highlighted the classification of information systems, and the labelling of resulting categories, as a problem of information design generally and of web information services in particular; a review of the literature is provided by McGillis and Toms. According to Spivey, experienced library users become familiar with library jargon, but can be confused by new systems and terminology, or by the availability of multiple platforms and interfaces for a single resource e. Library jargon can include: short descriptions and nouns for library resources and services, e.
Unsurprisingly, given the rapid changes taking place in the information market, most readers do not have in their minds a clear taxonomy of electronic information sources. Naismith and Stein suggest that a continuum of strategies, such as the use of explanatory phrasing, the provision of glossaries etc.
These have obvious application to library web site design. Participants in the focus group discussions appeared to think that, for them, the key role of a library web site in relation to external web-based resources is not only to act as a form of quality filter, but also to provide readers with jumping-off points for their information seeking.
The focus group participants emphasised, as well, the value of local content or content of immediate local relevance; this appears to be an appropriate niche for the SLAM Multidisciplinary Library site. Usability testing, being limited to what can be readily observed and measured, is necessarily somewhat artificial. The questionnaire and demographic data obtained in the study gave indications of existing habits of professional information seeking on the web. Barak A. Psychological applications on the Internet: a discipline on the threshold of a new millennium.
Computers in Psychiatry Today. Academic Psychiatry 24 3 ff. Useful websites for psychiatrists. Academic Psychiatry 22 2 f. A guide to the Internet for psychotherapists. Information for social work practice: observations regarding the role of the World Wide Web. Internet support for nurses and midwives. Professional Nurse 14 4 2 cf.
Anthony, D. Yeoman, A, Cooper, CJ. Urquhart C, Tyler A, editors. Signposts to information for mental health workers: a research project funded by South and West Health Care Libraries Unit. Diaz, K. The role of the library web site: a step beyond deli sandwiches. The mission and role of the library web site [WWW document].
Appendix B: usability tests. In: GUI design handbook [online monograph]. New York: McGraw-Hill, Building user-centered library web sites on a shoestring. Library web page design: are we doing it right? The impact of information architecture on academic web site usability. Electronic Library 17 5 13 Matylonek, J. Halub, LP. The impact of access to the World Wide Web on evidence based practice of nurses and professions allied to medicine [WWW document]. A study of Internet use by physician treating HIV patients [online serial].
These have subsequently been revised.