Saturday, August 05, 2006

Importance of Archived Web Sites

After my exploration of Arhcive.org I took a ride on the Wayback Machine to look at a few sites. The following is a list of the sites I looked at or attempted to:
  1. IsoNews
    • For anyone that doesn't know what isonews was; It was essentially a news, discussion forum, and ranking site for the release of pirated software, games, and console CD's and DVD's. The site was intended to be a one stop source to see what groups had released for distribution via. p2p, etc... and to discuss the releases.
    • Unlike many of the warez sites that provided links or direct access to pirated files, this site was solely informative and kept a pulse on the underground scene.
    • It was interesting to look at the first archived page, shortly after it launched in Dec. '98 then to the demise of the site on March 19, 2003 when the Department of Justice took over the site and domain to push cybercrime laws and information following the prosecution of the founder Krazy8 for selling console "mod" chips.
  2. Wall Street Journal
    • Interestingly the news paper requested archive.org to not collect the web pages using a robots.txt file.
  3. Washington Post
    • Interestingly the news paper requested archive.org to not collect the web pages using a robots.txt file.
  4. Seattle PI
    • I looked at their first archived web site and on the top of the headlines was the Enron debacle and how politicians who had received money before the scandal and bankruptcy were trying to figure out what to do with the money.
  5. Google
    • Started off with their first site archived on Dec. 2, 1998 and saw that they actually had columns on their page with links to other searches: Stanford Search and Linux Search.
    • Then I looked at their Dec. 4, 2001 site, which progressed to a rough form of their current format. Essentially, they only real change they made from that time is they changed it from looking like a tabbed browser at the top to links and changed directory to more.
After reviewing these sites, I have identified several benefits to having access to archived sites. I will use IsoNews, Seattle PI, and Google as my talking points.
  1. Having access to Archived site versions allows individuals, agencies, etc... to research the past, review trends, and data mine.
    • In the case of IsoNews, the Department of Justice cybercrimes division was able to review the activities of active members of the website. Essentially, this allowed them to target specific individuals and dig up a much unscrupulous information they could obtain.
    • In doing so, the DOJ was able to target Krazy8 and find out what else he was up to. Although not directly sold on the website, Krazy8 was involved in selling mod chips for consoles and by his affiliation with IsoNews he targeted himself.
    • On the flipside, I could argue that having access to archived websites, the DOJ could have targeted individuals unwarranted. Many of these individuals are just enthusiasts and like to see what is coming out soon. Most of the time, these individuals are more interested in what has gone gold and when they can pick it up at ebgames, compusa, etc...
  2. Having access to archived web sites allows you to see what happened during a particular day or period of time without the need to do extensive research in library stacks and the such.
    • For example by looking at the Seattle PI, I was able to see what was going on in Washington at a specific point in time. Moreso, it allows me to review what they were reporting versus other local and national publications.
    • For a publication such as the Seattle PI, access to their archived websites also provides readers with the ability to find past columnists that may have moved on and access to their archived writings.
  3. Having access to archived web sites allows a company to benchmark their own performance versus their competitors performance.
    • In the case of Google and thier competition, they can study the web site design transitions over time and make inferrences about the overall impact of the design changes.
    • Furthermore, having access to these websites allows newcomers to understand this progression and revisit the benefits and detriments of each design impass.
    • Obviously Google was able to determine in 2001 a format that has appeared to work well for their business model. By having access to the archived sites, other companies can view Google's traffic history and study the impact of such changes in the design and layout of their sites.
As a side note, I found a few interesting sites on the archive.org being sued article and how the DMCA needs to define legalities of archive sites:

0 Comments:

Post a Comment

<< Home