Friday, May 20, 2011

Initial Turk Data

Questions used (including some from the Joachims paper)
Open-ended (multiple correct answers possible)
Find a website that gives a tutorial of how to type Chinese characters on a Windows computer.
About how many calories are in an apple?
List a movie that is coming out in the US on 07/08/11.
List a movie that is coming out in the US on 06/03/11.

Factual (only one correct answer)
What is the name of the tallest mountain in New York?
What is the elevation of the tallest mountain in New York?
Find the homepage of Michal Jordon, the statistician.
What is the name of the researcher who discovered the first modern antibiotic?


Disclaimer
The initial run on Turk revealed that my logging was not quite detailed enough, since I only noted the user's unique ID when they started and ended the task.  This created a problem when more than one user was using the website at the same time, since I would be unable to differentiate which user was issuing what search query.   I have since fixed this and will continue to collect more data that I could actually create more fine-grained graphs, like graphing the distribution of links clicked.
Because of this, I was unable to graph very much data besides super high-level stuff that may not be that interesting.

Data




Average time per assignment: 2 minutes
Average number of search queries per task: 1.86

Comments
  • My worries about people using Google instead of Zoogle turned out to be unfounded, because it would actually be more effort to use Google, instead of just clicking on the Zoogle link that I provide. 
  • One user commented that "the first few results were not helpful for the answer I was looking for," basically trying to provide helpful criticism.  However, this is actually desired behavior, since it forces the user to search more (giving me more data) before they complete the task!  :)
  • More than one user would put something like "good HIT" or "good task, thank you!" in the optional comments section.  I was amused and surprised by this behavior because I feel that people tend to think of Turk as a bunch of random workers who mechanically do what they are told.  However, here we have examples of users subsequently trying to influence the experimenters, perhaps either to encourage the experimenter to accept the HIT or to even give the worker a Worker Bonus.

    Do note that I do not regard this as a bad thing, as it appears that it is a way for good workers to attempt to distinguish themselves. This could ultimately results in a mutually beneficial relationship (the worker gets more tasks that they like, and the experimenter gets higher quality data). Obviously, this would not be the case if bad workers put nice comments, but it would be interesting to see if there is a correlation.

To Do
  • Graph the distribution of number of links clicked
  • Graph the distribution of number of queries made
  • Graph the distribution of time spent on each link
  • Graph the number of HITs per user
    • Compare the results of repeat users to those who only completed a single task (answering one question).
  • Think of more ambiguous questions that force users to search through multiple links
  • Run a randomized dummy user

1 comment:

  1. Hi Nina,

    It sounds like you've made a lot of progress and I'm really excited to see the data you get this week (now that you can differentiate users! :)

    I was wondering if you are seeing any turkers who answer the factual questions without making any search queries at all. I don't know if your instructions specify that the turker needs to search or just answer the questions, but it seems plausible that someone could just know name of the researcher who discovered the first modern antibiotic.

    I was also wondering if you've run through those specific searches yourself on google and/or zoogle. I was reading some of your earlier posts and I agree with your reasoning both in favor of and against bounce rate as an indicator of a 'good' search result and I think this could really depend on the questions you ask/searches you perform.

    ReplyDelete