Digg Data Trace

Dr. Yingwu Zhu

Project -- Measurement and Analysis of An Online Content Voting Network: A Case Study of Digg

In online content voting networks, aggregate user activities (e.g., submitting and rating content) makes high-quality content thrive through the unprecedented scale, high dynamics and divergent quality of user generated content (UGC). To better understand the nature and impact of online content voting networks, we have analyzed Digg, a popular online social news aggregator and rating website. Based on a large amount of data collected, we provide an in-depth study of Digg. We study structural properties of Digg social network, revealing some strikingly distinct properties such as low link symmetry and the power-law distribution of node outdegree with truncated tails. We explore impact of the social network on user digging activities, and investigate the issues of content promotion, content ltering, vote spam and content censorship, which are inherent to content rating networks. We also provide insight into design of content promotion algorithms and recommendation-assisted content discovery. Overall, we believe that the results presented in this paper are crucial in understanding online content rating networks.

Relevant Publications:

  • Yingwu Zhu
    A Measurement and Analysis of an Online Content Voting Network: A Case Study of Digg ,
    Accepted by the 19th International World Wide Web Conference (WWW2010). [paper(pdf)]

Digg Data Trace

Due to resource constraints at Seattle University, I am not able to provide all the data traces used in my paper, especially the PT trace that spans over 4 years. Here, I provide three files that have been split into subfiles for easy downloads. Please kindly cite our work if you feel the data trace useful.
(1) topo_sub_xxx.txt.gz: the Digg social graph, containing total 580228 users in 6 subfiles. The data format for each text line:
      user_id (generated by myself), number of friends, a list of friends' user ids
(2) users.map.gz: map real user names to my generated IDs. The data format each text line:
      user_name, user_id
(3) digg_trace_ST_xxx.txt.gz: containing a month worth of trace ST in 48 subfiles. The data format for each text line:
      user_name, digg_submission_time, story_id, story_status(1--popular, 0-upcoming)

Files to download

social graph part 1 social graph part 2 social graph part 3 social graph part 4 social graph part 5 social graph part 6
ST part 1 ST part 2 ST part 3 ST part 4 ST part 5 ST part 6
ST part 7 ST part 8 ST part 9 ST part 10 ST part 11 ST part 12
ST part 13 ST part 14 ST part 15 ST part 16 ST part 17 ST part 18
ST part 19 ST part 20 ST part 21 ST part 22 ST part 23 ST part 24
ST part 25 ST part 26 ST part 27 ST part 28 ST part 29 ST part 30
ST part 31 ST part 32 ST part 33 ST part 34 ST part 35 ST part 36
ST part 37 ST part 38 ST part 39 ST part 40 ST part 41 ST part 42
ST part 43 ST part 44 ST part 45 ST part 46 ST part 47 ST part 48

This is a personal WEB site developed and maintained by an individual and not by Seattle University. The content and link(s) provided on this site do not represent or reflect the view(s) of Seattle University. The individual who authored this site is solely responsible for the site's content. This site and its author are subject to applicable University policies including the Computer Acceptable Use Policy (www.seattleu.edu/policies).