Note to the reader:
This part of the field guide comes from our 2019 version of the UX Research Field Guide. Updated content for this chapter is coming soon!
Want to know when it's released?
Subscribe to our newsletter!
Tree testing means testing the architecture of your website, or anything else that has a branching (tree-like) menu. The method is relatively simple, can be done early in the development process, and can save you lots of time and effort later on.
Tree testing is one of several methods for getting the feedback you need to design a functional website, or anything else with menu options nested inside each other, like an automated messaging system ("for Employee Services, press one. For Customer Services, press two") or the options on a DVD (are subtitle options under "Play Movie"?). It is sometimes described as "backwards card sorting," and bears a certain resemblance to first click testing, so it's difficult to describe tree testing without also discussing these other methods.
The first thing to understand about tree testing is that it isn't card sorting, nor is it first click testing (though first clicks can be important here, too). All three methods are closely related and superficially quite similar, but serve very different functions. In fact, you might find yourself doing all three at different points in the same project. To recap:
First click testing means giving the tester a task—like looking up lobby hours on a bank's website—and recording whether the first place the user clicks is actually on the correct path to accomplish the task. Finding the lobby hours, or whatever else, would usually involve several more clicks, so the object is just to see whether the user can tell where to start. Failure to click in the right place first, or taking a long time to click there, suggests that something is wrong with the site's layout. Either the structure of the site is counterintuitive, or the appearance of the page is in some way confusing or distracting. Perhaps the correct place to click has been accidentally hidden or obscured.
First click testing can be done to check a new design, or to investigate possible problems in an existing design. The test can be done on a wireframe, or even a sketch, provided that all the potentially helpful--or potentially confusing--elements of the layout are present.
Card sorting involves giving a participant a group of cards (in real life or on a computer screen) each labeled with a concept, and asking the participant to organize the cards in a way that makes sense. While the method has several possible applications, it is often used as an initial step in designing the structure of a website. If most participants place the card labeled "hours of operations" under "services," rather than under "locations," then it makes sense to design the site with hours as a subheading under services. There are open card sorts where users define the categories the cards go into, or closed ones where the researcher defines the categories.
Tree testing involves showing the tester the architecture of your site (or your answering service, or whatever else) and asking them where they would click in order to accomplish a goal. However, unlike first click testing, the test doesn't end at the first click. The tester actually has to follow the entire pathway, from the first page to the final, triumphal click. Although the first click is disproportionately important, it is possible to get the first click right and still get lost before accomplishing the task. You need to know if your structure is getting your users lost.
The other big difference between tree testing and first click testing is that tree testing involves only the structure, not the content or the layout. With click testing, wrong clicks could be the result of buttons being too large or too small or in the wrong place or the wrong color, even if the basic architecture of the site is correct. But with tree testing, none of those other variables are at issue. The tester does not see your layout, only a diagram of which headings contain which subheadings.
Tree testing is often called backwards card sorting, because both focus very closely on the architecture of a site, its branching "tree" of options. In card sorting, you ask your participants to create trees for you. In tree testing, you...you guessed it, test those trees.
Tree testing can be done very early in the design—or redesign—process, because the site doesn't need to exist yet, even as a concept sketch. All you need is the tree. If your tree fails the test, the problem is relatively simple and cheap to fix because, even if you have to go back to the beginning, you don't have to go back very far. You won't have lost much.
There is no rule that says you have to use all possible testing types on the same project. Especially if your resources are limited, you might choose to simply use your judgment for some aspects of the design process and only test places where you suspect there might be a problem.
But if you have lots of time, questions, and resources, you might begin with a task assessment (to figure out what your product or website needs to be able to do), then do an open card sort to find out what kind of structure your users might find most intuitive. Create your structure, then double-check it with an open card sort. Then refine that structure into the architecture you plan to use, your tree. Test your tree. Make any necessary changes, then test again. Then start designing your site around that tree and use several rounds of first click testing to make sure your layout and content add to, not detract from, the usability of the site.
Tree testing focuses on site architecture and nothing else, which is great because if the test reveals a problem, you know exactly where the problem is in the tree. But since there are many other things that could go wrong to impact overall usability, you can't use tree testing alone. You need sources of information about the other aspects of your project, too.
The other potential drawback of tree testing is that since it is nearly always an automated, remote process, you don't get the qualitative data that could show you why your testers are having the problems they are. Moderated tree testing would fix that problem, but generally is not practical.
You can get some of that qualitative material by polling your testers after the fact, or by doing a moderated trial run. More on these options below.
As with many types of testing, the difficult part for you as the researcher comes before and after, not during.
Tree testing is logistically fairly simple, since you generally don't need to travel, gather supplies, or coordinate any more people than you normally have on your team. You do need to design your test, though.
While you could conduct tree testing with a handwritten site map (and a note-taker standing by), most people use specialized tools. These vary, but generally you load your categories and subcategories and sub-subcategories (etc.) onto a spreadsheet, and then the tool creates a clickable tree suitable for testing. The tool will track where testers click, how long they take to click, in what order they click on things, and how many of them click in the right places. Your job is to choose the tool you want to use and familiarize yourself with it.
Most tree tests are conducted online. The tester will get a link to your test and complete the assignment. They can do this in an unmoderated setting, from the comfort of their own computer, or from a moderated setting, where you can watch the action. The easiest and cheapest way to do tree testing in in an unmoderated, remote setting.
You will have to design your tree before you test it. You will also need to decide how much of your tree you want to test. If your site is extremely complex, you might not want to test all of it at once. If you are adding new material to an existing site, you don't necessarily need to retest the older parts. However, you do need to include all the options a user might consider taking. For example, if you have six menu items on your landing page, you need to include all six of them in the test, even if you are only testing pathways involving two of them. Don't make the test easier by eliminating wrong answers, as this will skew your data. If your site has seven levels, but you only need to know if users can navigate the first five, then you don't need to include the extra two.
You may want to test two different trees to see which one works better. In this case, you are essentially doing an A/B test. Don't show both versions to the same tester. Do recruit twice as many testers, so that each version will still get thoroughly tested. In some cases, you can do a comparison within a single tree. For example, if you want to know whether you should put cucumbers under "fruit" or "vegetables," simply include both options in your tree and see which one testers click on.
Don't simply ask your testers to "find the lobby hours listing," or something equivalent. First, it is important to avoid biasing the test by using the same wording as the menu label of the right answer. Second, it's important to get your tester in the same frame of mind as a real user—the mind works differently, depending on whether a person is taking a test or trying to solve a real-world problem (such as depositing a check before the lobby closes). Give the tester a realistic scenario, rather than a simple test question, to set the right tone.
Don't make your scenarios too complex either. You're not writing flash fiction here, and too many irrelevant details would just be confusing. Remember that many testers will skim the questions, rather than reading them carefully, and could mistake a supporting detail for the central point. You only need one or two sentences.
The same test can and should include multiple tasks, so you get a larger and more nuanced picture of how well your tree works. But don't include more than ten tasks. After ten scenarios, your tester is going to start getting tired or bored—and after ten trips through your tree, they are going to start learning their way around, thus biasing your test.
If you need to test more than ten tasks, run more than one test. For example, if you want to ask about twenty scenarios, you can double your number of testers and then randomly assign ten to half your testers and the other ten to the other half.
You are not going to be able to ask about every task that can possibly be accomplished on your site. Focus on the ones that are most important and any that you suspect might present a special problem.
Finally, be clear about what the right answer actually is. There is no reason to tell your testers whether they got it right, but your testing tool needs to know so it can give you the results you need.
The two most important things in recruiting testers for a tree test is that you get enough people for statistical significance and that you get representative users--that is, people who are similar to the people who will actually use your site. If your target usership is retirement-age women, then that's who you need in your study. If your target usership is single fathers of pre-school-aged children, then that's who you need to recruit. You won't get good results otherwise. You can hire a company to provide you with paid test-takers, or you can find your own. Remember that some of the people who agree to help you out will not follow through, or will not complete the test, so you may need to do a second wave of recruiting if you don't reach your target number the first time.
Get a minimum of 50 people. Less, and you run the risk of having your results biased by the small number of testers who don't put any effort into the exercise. Also, you want to clear the threshold for statistically meaningful results by a healthy margin, not an anemic squeak.
Be sure that your testers are properly compensated, either through fair pay or through a nice thank-you gift, and be sure to explain why their contribution is valuable and that you are grateful.
In your communications with your testers, be very clear that they are testing your tree. You are not testing the testers. Most people slip back into "school mode" very easily and become very anxious about whether they have gotten the question right and earned the teacher's approval (even if they know better intellectually). As much as possible, avoid language that could exacerbate the problem and bias your test.
You can minimize some of the drawbacks of tree testing by what you do both immediately before and immediately after the test itself.
Do a small-scale moderated pilot test first, both to make sure your instructions make sense, that your testing tool works, that the whole test does not take longer than about 20 minutes, and so forth, and in order to collect some qualitative data you otherwise wouldn't have. Besides a moderator, you will need a note-taker for this phase, as the same person cannot perform both roles.
For the main, unmoderated test (and in the moderated pilot), after the test itself is complete, show each tester a list of the category and subcategory (etc.) labels used in the test and ask if any were confusing. Then ask for any further feedback, questions, comments, or concerns (if anyone asks questions, answer them in a reasonable timeframe). The object here is to capture some of the qualitative data you otherwise wouldn't get, as well as to solicit information you didn't think to ask about.
Surprises are important.
Your testing tool will likely give you results in the form of the following numbers:
The relative importance of each of these measures will vary from one test to another. You'll also get a lot of information from the relationships between these numbers. More on that shortly.
You might not be able to compare results from different tasks, let alone merge the numbers to get, say, an average success rate of all ten tasks in the test. For example, if one task requires a minimum of five clicks and the other requires only two, of course their times should not be comparable. Compare only the results of comparable tasks or, better yet, different versions of the same task. For example, if you have cucumbers under both fruit and vegetables, and both paths require a minimum of three clicks, is time lower and directness higher for one path than the other?
The success rate of a tree test will always be lower than the success rate for the finished website, assuming you make improvements based on your learnings from tree tests and other research. Additionally a finished website offers contextual cues, drop-down menus, a search feature, and other such details that make navigation easier. The difference can be huge--a success rate in the sixties on a tree test can be comparable to a success rate in the nineties on a finished site.
Also, the success rate for a task that requires many clicks is going to be lower than for one requiring few tests (in tree tests or otherwise).
The point is that you can't grade a tree test the same way you'd grade a spelling test. The numbers are going to be lower and they are going to be variable. Experience will teach you what results can reasonably be considered "good" results in different circumstances. You can also look at relative results--does this tree have higher success rates than the previous version? Is one task consistently taking longer than other tasks that should be similar?
Finally bear in mind that not all tasks are equally important. Depending on what your site is for, there will be some tasks that absolutely have to be clear and straightforward or the user will get frustrated and leave. Other tasks might be perceived as worth the extra effort, or users might decide that even if that task is a headache, other parts of the site are worth sticking around for. Ideally, the entire tree would work perfectly, but a passing grade for one pathway might not be a passing grade for another.
It is important to consider the relationship between the various numbers. For example, if the directness is low for a given task, that's a problem, even if the success rate is high; users who accomplish their goal only after a lot of false starts and backtracking are going to be frustrated, even angry. Likewise, if directness and success are both good, but time is very long, that means the users are getting confused by something.
First clicks are important, that's why first click testing is a thing. Users who get the first click wrong are much less likely to accomplish their goal (even with backtracking) than those who get that first click right (even if they get off track for a while later). In tree testing, that means wrong first clicks should have a greater bearing on a task's "grade" than wrong turns later. Conversely, if the first click is right but the destination is wrong, that helps you localize the problem so you can fix it.
The location of wrong first clicks can also suggest how you can change the tree to make it more intuitive for users. If the wrong clicks cluster, change the tree so that the cluster is right. If the wrong clicks are scattered, then the destination for that task might better belong to multiple headings--unless scattered wrong first clicks turn up for many tasks, in which case your options may be poorly differentiated. Consider doing a card sort to develop more clearly defined categories.
It's worth noting that although the data from a tree test are quantitative, and the results are numbers, interpreting the results requires the type of judgment calls more typical of qualitative tests. You can also dramatically improve the value of the test by including the (properly analyzed) qualitative data from the pilot study. For example, if a given task took a very long time, the qualitative results might tell you what testers were spending all that time doing.
Tree testing has the advantage of being easier to set up and run than most of the other tests, thanks to modern tree testing tools. To create a new test, you just edit a spreadsheet. Tree testing is no panacea, but no type of testing is. For best results, use tree tests in conjunction with other testing types, to get a full picture of the progress of your development process.