erik%netscape.com f933bcde12 Apache does something a bit strange with files called "README", so I've
created this file called "READ-ME.txt", which gets displayed normally
in the list of files.


git-svn-id: svn://10.0.0.236/trunk@59404 18797224-902f-48f8-a5cc-f745e15eee43
2000-02-01 20:40:05 +00:00
..
run



                                  Web Sniffer

                      by Erik van der Poel <erik@netscape.com>

		           originally created in 1998



Introduction

This is a set of tools to work with the protocols underlying the Web.


Description of Tools

  view.cgi

    This is an HTML form that allows the user to enter a URL. The CGI then
    fetches the object associated with that URL, and presents it to the
    user in a colorful way. For example, HTTP headers are shown, HTML
    documents are parsed and colored, and non-ASCII characters are shown in
    hex. Links are turned into live links, that can be clicked to see the
    source of that URL, allowing the user to "browse" source.
    
  robot

    Originally written to see how many documents actually include the HTTP
    and HTML charsets, this tool has developed into a more general robot
    that collects various statistics, including HTML tag statistics, DNS
    lookup timing, etc. This robot does not adhere to the standard robot
    rules, so please exercise caution if you use this.
    
  proxy

    This is an HTTP proxy that sits between the user's browser and another
    HTTP proxy. It captures all of the HTTP traffic between the browser and
    the Internet, and presents it to the user in the same colorful way as
    the above-mentioned view.cgi.
    
  grab

    Allows the user to "grab" a whole Web site, or everything under a
    particular directory. This is useful if you want to grab a bunch of
    related HTML files, e.g. the whole CSS2 spec.
    
  link

    Allows the user to recursively check for bad links in a Web site or
    under a particular directory.


Description of Files

  addurl.c, addurl.h: adds URLs to a list
  cgiview.c, cgiview.html: the view.cgi tool
  dns.c: experimental DNS toy
  doRun: used with robot
  file.c, file.h: the file: URL
  ftp.c: experimental FTP toy
  grab.c: the "grab" tool
  hash.c, hash.h: incomplete hash table routines
  html.c, html.h: HTML parser
  http.c, http.h: simple HTTP implementation
  io.c, io.h: I/O routines
  link.c: the "link" tool
  main.h: very simple callbacks, could be more object-oriented
  Makefile: the Solaris Makefile
  mime.c, mime.h: MIME Content-Type parser
  mutex.h: for threading in the robot
  net.c, net.h: low-level Internet APIs
  pop.c: experimental POP toy
  proxy.c: the "proxy" tool
  robot.c: the "robot" tool
  run: used with robot
  TODO: notes to myself
  url.c, url.h: implementation of absolute and relative URLs
  utils.c, utils.h: some little utility routines
  view.c, view.h: presents stuff to the user


Description of Code

  The code is extremely quick-and-dirty. It could be a lot more elegant,
  e.g.  C++, object-oriented, extensible, etc.

  The point of this exercise was not to design and write a program well,
  but to create some useful tools and to learn about Internet protocols.