Thursday, May 12, 2011

Weekly Update

Progress made
I have more or less completed the coding portion of my website. This week, I had been working on how to track the amount of time that users spend on the websites returned in the search results, which I can't do directly because the links go to external webpages. Instead, my site logs the timestamp of when a user clicks on each result link and approximates the difference to be the amount of time spent on each external site. This approximation is not exact because it includes the time that the user spends looking at the search results page, but I am making the simplifying assumption that this time remains relatively constant between different page views and is therefore negligible.

I also spent a fair amount of time making sure that the logging worked on multiple browsers, since I discovered that some calls were handled differently. For example, my initial implementation would not log any data if the user was using Chrome, due to the way Chrome handles javascript. I also (re)discovered that Internet Explorer is a giant pain in the butt and does silly things like reload pages for no reason.

5/15/11 – Go over the final implementation with a few people in person, so they can point out any errors or clarifications that I should make before running it live on AMT
5/18/11 – Start preliminary tests on AMT to see what kind of data is collected, fixing bugs in my script or making AMT adjustments if necessary
5/23/11 – Run a randomized “dummy user” who clicks on links randomly and views the page for a random period of time. Use the data from the dummy user to compare with actual collected data, to see if there are actual patterns in collected data or whether the behavior is basically random.
5/27/11 – Final presentation

1. Right now, my website randomizes the order of the top ten Google results, but I am concerned that this might introduce unnecessary noise into the data. The Joachims paper only used three orderings: original Google order, reversed order, and original Google order with the first and second results swapped. I originally decided to randomize the order of the results in case the time spent on each website was affected by the ordering of the websites, but complete randomization may be too much. Should I randomize all of the results? Leave them in the order that Google provides? Or do half of both?

1 comment:

  1. I personally think that including complete randomization in your experiments would be interesting. Having said that, it makes sense to keep record of what random order you offer to the turkers.

    Also, just so that it doesnt become too overwhelming, I think doing half of both is a good way to go.