Archive for January 14th, 2007

Top 1001 Duplicate Digg Comments

January 14th, 2007

I was not really satisfied with my previous post on duplicate digg comments and decided to fix my code and generate a better list. I also updated the user comment database with my new data. Just a reminder that this data is from front page stories from the last 365 days. This list was generated […]


Digg Comment Data

January 14th, 2007

Some people were interested in downloading a copy of the digg.com comment data used in these 2 posts. So I fixed a few bugs in my spider code and now the data is over a gigabyte uncompressed and contains over 4 million comments. The compressed file is about 340MB. I know a lot of […]