I have more or less completed the coding portion of my website. This week, I had been working on how to track the amount of time that users spend on the websites returned in the search results, which I can't do directly because the links go to external webpages. Instead, my site logs the timestamp of when a user clicks on each result link and approximates the difference to be the amount of time spent on each external site. This approximation is not exact because it includes the time that the user spends looking at the search results page, but I am making the simplifying assumption that this time remains relatively constant between different page views and is therefore negligible.
5/15/11 – Go over the final implementation with a few people in person, so they can point out any errors or clarifications that I should make before running it live on AMT
5/18/11 – Start preliminary tests on AMT to see what kind of data is collected, fixing bugs in my script or making AMT adjustments if necessary
5/23/11 – Run a randomized “dummy user” who clicks on links randomly and views the page for a random period of time. Use the data from the dummy user to compare with actual collected data, to see if there are actual patterns in collected data or whether the behavior is basically random.
5/27/11 – Final presentation
1. Right now, my website randomizes the order of the top ten Google results, but I am concerned that this might introduce unnecessary noise into the data. The Joachims paper only used three orderings: original Google order, reversed order, and original Google order with the first and second results swapped. I originally decided to randomize the order of the results in case the time spent on each website was affected by the ordering of the websites, but complete randomization may be too much. Should I randomize all of the results? Leave them in the order that Google provides? Or do half of both?