Ultra-lean user testing with Amazon Turk and Google Forms

I love Kevin Cornell’s illustrations for A List Apart, his work inspired this sketch.

Weaving user testing into design projects has always been a challenge for me. There’s often a perception that user testing is a costly friction that extends timelines, and fattens backlogs.

User testing IS a friction, particularly if testing requires an entourage of people behind one-way glass. This type of testing has it’s place, but there is another way that’s more like sticking a thermometer in your roasting chicken to see if it’s reached the right temperature.

If you seek ways to test quickly with lots of users, Amazon Turk is a way to approach lean ux that reduces the friction and cost of user testing. This process isn’t perfect, and may only work for specific types of audiences, but if your project is a match for Turk, you’ll use it every day. Let me repeat: user testing. Every. Day. Now that’s lean.

Amazon Turk: The Mighty Oddball

I’ll just begin by saying this isn’t going to be a step-by-step of setting up Amazon Turk. There’s plenty of resources out there that will walk you through, and I have nothing to add to those articles. This article is going to explain the who, what, why, where, when and my experiences using this method.

With this said, I do want to provide an overview of this weird little world. Amazon Turk is a crowdsourcing service that enables individuals “Workers” to perform tasks that are typically outside the realm of tasks computers can accomplish. On the flip side, Turk enables businesses “Requesters” to tap into the crowd to pay them to perform what Amazon has dubbed Human Intelligence Tasks (HITs).

Designers will identify as Requesters on Turk, setting up tasks or scenarios for Workers to complete. The interface to do this in Turk feels…old, like web 2007 old, but nevertheless the web application works well despite being a little hard to understand sometimes.

The tabs-on-tabs design leaves much to be desired aesthetically, but the crowdsourcing capabilities of Turk sweep away concerns about the interface.

Audience

Workers on Turk are primarily United States citizens, who are seeking to pick up a little extra cash on the side by answering surveys and performing tasks to help Requesters. Workers can pick tasks they have time for, and at rates they feel are worth that time.

Verticals

Over the last couple years I’ve been using Turk to test verticals that are U.S.-centric: Taxes, Health Insurance, Auto Insurance, and the like. You can choose to exclude countries, which is helpful since anyone else would skew our results. Unfortunately, turk doesn’t provide granular control over segmenting the American audience, filtering it bluntly with just “United States” will have to be good enough.

It’s important to point out that verticals like taxes and insurance are experiences that all Americans share, and make Turk a great place to investigate those experiences. However, if you’re seeking to understand how people navigate the New York City transit system, you may not be able to get substantive results due to Turk’s lack of segmentation granularity. The service https://www.turkprime.com/ provides this missing link, but I haven’t had occasion to use it yet.

Payment

Turk surveys can be completed really quickly, like 50 people in 15 minutes quickly, but speed is a function of payment. I always adjust the payment lever when I setup a Turk survey and one thing is for sure, the more you compete with minimum wage, the quicker your surveys complete. Take care of your workers by paying them just north of minimum wage, and they’ll take care of you by picking up your task and completing it accurately and quickly.

A bonus note about speed of completion: I’ve found that kicking off my tests between 10 a.m. and 2 p.m. central leads to the quickest results. The later in the day it seems, the more of a trickle the results tend to come in. They do complete though, so just have patience if your results are slow to come in. Adjusting payment in your subsequent tests can’t hurt either.

Types of tests appropriate for Turk

Split tests, or A/B tests are great with the turk crowd and are known to settle debates over everything from visual design to relevancy of content. This one is simple enough, devise two designs you wish to compete against one another, and setup your test. The benefit to this is when you have a tie, or a clear champion, it will obviously chance the conversation about where to head next with your design.

“What’s the most important thing on this page?” I’ve performed a number of exercises with Workers where I show them a screenshot of an app design and ask questions around the most important parts of the page. Here, the average responses about page content reveal patterns that help shape our direction.

Straight Questionnaires are another way to gather responses and attempt to see the patterns in user feedback. This type of task hinges on setting the context of your questions really well, and designing the right questions to follow that introduction. We spend lots of time designing questions that don’t lead users (a whole article in itself), but help us understand where they’re at today with how they do taxes, or how they do things like maintain their vehicle.

How do we collect responses?

I’ll be the first to admit I’ve never used Turk to collect the actual responses to our tests. I use Turk to put a link to our survey in front of the Workers, but that link goes to a Google Form. I’ve found Google Forms easy to put together quickly, and great for capturing and displaying user responses.

When it comes to reviewing user responses, Google Forms displays qualitative and quantitative data quite nicely. For quant data, it makes nice charts and graphs to help you see the breakdown of responses. For qual, there’s more work involved but the listing of the Workers’ short- and long-form answers is easy to read. Beyond this, it’s simple to print the responses page as a PDF for sharing or archiving elsewhere.

Conclusion

The spirit of using Turk and Google Forms in this way is to test fast, and free of friction. Not sure about a call-to-action? Split test it really quick and move forward. Don’t understand how users currently do something like file an insurance claim, or how they feel about how many times a year they’re reminded to do their taxes? Take a quick temperature read with a questionnaire, and see where things land.

When you use Turk as a way to take the temperature, it brushes away some of the obscurity around user behavior and feedback. While not as hi-fi as a formal session that aligns to detailed personas, for the day-to-day it’s a fast and low-friction way to work research into the conversation.

⍲

Avoiding leading questions in user interviews

iOS and Android prototyping tools are at a dead end