Friday, May 6, 2011

Implementation and initial rollout

Zoogle, written in Python, begins by prompting the user to enter a search query, emulating the Google homepage.  The experiment begins by giving the user an overview of the project as "seeing how modifications to the current Google search affect the quality of the search experience."  For full disclosure, they will be told that their behaviors on the site will be logged.

When the user first opens the homepage, their session ID and the current time are logged in a file.  Let's say that the user enters "Stanford University" as their search.

The user is then redirected to the results page, which is (at the current implementation) a randomized ordering of the first ten Google search results for the query.

The user's query is noted in the logs.

After the user completes their search task, they will be asked to complete a survey.  One question I am interested in seeing the answer to is the amount of time users think they personally spend on relevant websites vs. irrelevant websites.  I will then compare their perceived visit times with the actual visit times, to see how accurate (or not accurate) users are at judging their own search behavior.

In order to deal with the problem of not being able to track the user's behavior on external websites, I developed a workaround by logging the time at which the user clicks on a link on the results page and the time at which they return to the Zoogle results.  In this way, I cannot track detailed information of navigation within the website (such as how many internal pages the user looked at), but I can estimate the total amount of time spent on the website.  I am thus modifying my hypothesis to be using the time spent on websites as a measure of implicit feedback instead of the original plan of looking at the bounce rate.  I think that this behavior can also be revealing and possibly even correlated with the bounce rate, even though that is no longer the goal of the experiment.

I will be running experiments on Amazon's Mechanical Turk, so I hope to collect at least 105 datapoints for a 95% confidence interval with a Cohen's d of 0.5.  Collecting enough data by running experiments manually would have been a near impossible task otherwise.

  1. How should I ask users whether or not they judged a website to be relevant?  If I ask them right after they finish viewing the site, it could be too distracting and possibly affect their search behavior.  However, if I ask them after they have completed the experiment, it is very likely that they have forgotten which websites they looked at, much less than whether or not they considered the website to be relevant.
  2. Alternatively, if I ask simple enough questions, would it be a good idea to make the simplifying assumption that the last site visited in most relevant, and the previously visited sites were less or not relevant?
  3. Other feedback is also welcome!


  1. For 1), perhaps when you give them the survey after the experiment, you might provide a list of the sites they visited to remind them? It might also be helpful to have little thumbnails of what the site looked like, to job their memory visually.

    For 2), I'd ask - are you timing them? On AMT, it seems their time is rather limited. In other words, unless they're really curious, they probably won't be searching around after they find their answer. So I think the last site visited really would be the most relevant (most of the time - and especially for the correct answers.)

  2. You can look at the HTTP header "referer" which will tell you what website the user was at prior to this one. This *may* be useful for 1-step bounces where the user clicks on a link and then somehow goes directly back to your web site, or even in building up some context of how the user got to your site.

    1/2) I believe the more recent experiences will be heavily biased no matter what you do if you measure at the very end. So I would suggest a very simple questionaire after every result. Yes, it is annoying but at least your data will be correct!

  3. 1) I actually think that asking them right after they visit should not be too distracting. It could be as simple as "was this relevant" and then yes/no buttons to push, or just relevant/irrelevant buttons. Also, you could still show the site name and a background image of that site to remind users where they just left.

    2) Yeah, I think you could run in to issues of Turkers not spending very much time at all actually using your system (and just finding the answer the fastest possible way to get their reward). Hopefully most use your system as intended so with a large sample size you still get good results. It seems like you are just defining the sites as 'relevant' or not, not assigning them a relative order of relevance, so knowing if its the most relevant doesn't seem important to you. I think it is safer to just ask about that pages relevance just like you ask about the other pages they view.

    3) So are you using the randomized ordering to determine if the site they click on is actually 'relevant' or not depending on if it was in the top 5 of the actual search results or not? Or are you manually viewing each page at a later date and determining its relevance yourself? I'm not really clear about how you are extracting the actual 'relevant' vs. 'irrelevant' pages that the user is going to.