Original check-in of the "Web Sniffer", a set of tools to work with the

protocols underlying the Web.


git-svn-id: svn://10.0.0.236/trunk@59401 18797224-902f-48f8-a5cc-f745e15eee43
This commit is contained in:
erik%netscape.com 2000-02-01 18:24:20 +00:00
parent 6ed5ab9f0a
commit f9eb58e3fe
38 changed files with 8010 additions and 0 deletions

View File

@ -0,0 +1,95 @@
#
# The contents of this file are subject to the Mozilla Public
# License Version 1.1 (the "License"); you may not use this file
# except in compliance with the License. You may obtain a copy of
# the License at http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS
# IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
# implied. See the License for the specific language governing
# rights and limitations under the License.
#
# The Original Code is Web Sniffer.
#
# The Initial Developer of the Original Code is Erik van der Poel.
# Portions created by Erik van der Poel are
# Copyright (C) 1998,1999,2000 Erik van der Poel.
# All Rights Reserved.
#
# Contributor(s):
#
CC = gcc
#O_or_g = -g
O_or_g = -O
CFLAGS = -Wall -pedantic -D_REENTRANT $(O_or_g)
PURIFY =
#PURIFY = purify
#PURIFY = purify -windows=no
OBJS = \
addurl.o \
file.o \
hash.o \
html.o \
http.o \
io.o \
mime.o \
net.o \
url.o \
utils.o \
view.o
EXES = \
dnstest \
ftp \
grab \
link \
pop \
proxy \
robot \
urltest \
view.cgi
#all: dnstest
#all: ftp
#all: grab
#all: link
#all: pop
#all: proxy
#all: robot
all: view.cgi
#all: $(EXES)
dnstest: dns.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) dns.c $(OBJS) -lsocket -lnsl -o $@
ftp: ftp.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) ftp.c $(OBJS) -lsocket -lnsl -o $@
grab: grab.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) grab.c $(OBJS) -lsocket -lnsl -o $@
link: link.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) link.c $(OBJS) -lsocket -lnsl -o $@
pop: pop.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) pop.c $(OBJS) -lsocket -lnsl -o $@
proxy: proxy.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) proxy.c $(OBJS) -lsocket -lnsl -o $@
robot: robot.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) robot.c $(OBJS) -lthread -lsocket -lnsl -o $@
urltest: url.c utils.c
$(PURIFY) $(CC) $(CFLAGS) -DURL_TEST url.c utils.c -o $@
view.cgi: cgiview.c $(OBJS)
$(PURIFY) $(CC) $(CFLAGS) cgiview.c $(OBJS) -lsocket -lnsl -o $@
clean:
rm -f *.o $(EXES)

View File

@ -0,0 +1,90 @@
Web Sniffer
by Erik van der Poel <erik@netscape.com>
originally created in 1998
Introduction
This is a set of tools to work with the protocols underlying the Web.
Description of Tools
view.cgi
This is an HTML form that allows the user to enter a URL. The CGI then
fetches the object associated with that URL, and presents it to the
user in a colorful way. For example, HTTP headers are shown, HTML
documents are parsed and colored, and non-ASCII characters are shown in
hex. Links are turned into live links, that can be clicked to see the
source of that URL, allowing the user to "browse" source.
robot
Originally written to see how many documents actually include the HTTP
and HTML charsets, this tool has developed into a more general robot
that collects various statistics, including HTML tag statistics, DNS
lookup timing, etc. This robot does not adhere to the standard robot
rules, so please exercise caution if you use this.
proxy
This is an HTTP proxy that sits between the user's browser and another
HTTP proxy. It captures all of the HTTP traffic between the browser and
the Internet, and presents it to the user in the same colorful way as
the above-mentioned view.cgi.
grab
Allows the user to "grab" a whole Web site, or everything under a
particular directory. This is useful if you want to grab a bunch of
related HTML files, e.g. the whole CSS2 spec.
link
Allows the user to recursively check for bad links in a Web site or
under a particular directory.
Description of Files
addurl.c, addurl.h: adds URLs to a list
cgiview.c, cgiview.html: the view.cgi tool
dns.c: experimental DNS toy
doRun: used with robot
file.c, file.h: the file: URL
ftp.c: experimental FTP toy
grab.c: the "grab" tool
hash.c, hash.h: incomplete hash table routines
html.c, html.h: HTML parser
http.c, http.h: simple HTTP implementation
io.c, io.h: I/O routines
link.c: the "link" tool
main.h: very simple callbacks, could be more object-oriented
Makefile: the Solaris Makefile
mime.c, mime.h: MIME Content-Type parser
mutex.h: for threading in the robot
net.c, net.h: low-level Internet APIs
pop.c: experimental POP toy
proxy.c: the "proxy" tool
robot.c: the "robot" tool
run: used with robot
TODO: notes to myself
url.c, url.h: implementation of absolute and relative URLs
utils.c, utils.h: some little utility routines
view.c, view.h: presents stuff to the user
Description of Code
The code is extremely quick-and-dirty. It could be a lot more elegant,
e.g. C++, object-oriented, extensible, etc.
The point of this exercise was not to design and write a program well,
but to create some useful tools and to learn about Internet protocols.

View File

@ -0,0 +1,27 @@
check HTTP error codes on 1st line
deal with content type "text/html "
take stats on domain names e.g. foo.co.kr, www.bar.com
URL char stats e.g. 8-bit, escaped 8-bit, etc
hierachical tag and attribute stats, not flat attr space
more checking in ISO 2022 code
detect UCS-2, UCS-4
deal with multiple charset parameters in one content-type
FRAME SRC URLs
IMG SRC URLs
other URLs?
NNTP robot
FTP robot
DNS robot
IP robot
parse URLs properly a la RFC
improve hashing (grow tables, prime numbers)
parse <!doctype ...> where "..." appears as attribute-name-like thing
run purify to find memory leaks
use less memory in URL hash table (value not needed, only key needed)
use less memory in URL list (use array, remove processed URLs, randomize?)
get http://www.olelo.hawaii.edu/UTF8/index.html to work
(problem in io.c's read whole stream routine)
---
2/17/99
use nm to find all system calls, and do proper error checking on all of them
e.g. write() to catch SIGPIPE-like stuff(?)

View File

@ -0,0 +1,252 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "addurl.h"
#include "hash.h"
#include "html.h"
#include "url.h"
#include "utils.h"
static AddURLFunc addURLFunc = NULL;
static char **limitDomains = NULL;
static char **limitURLs = NULL;
static HashTable *rejectedURLTable = NULL;
static HashTable *urlTable = NULL;
static void
addThisURL(void *a, unsigned char *str)
{
int addIt;
/*
HashEntry *anchorEntry;
*/
unsigned char *fragless;
int i;
char **limit;
HashEntry *urlEntry;
unsigned char *sharp;
URL *url;
if (!urlTable)
{
return;
}
url = urlParse(str);
addIt = 0;
if (limitURLs)
{
if (limitURLs[0])
{
limit = limitURLs;
while (*limit)
{
if (!strncmp(*limit, (char *) url->url,
strlen(*limit)))
{
addIt = 1;
break;
}
limit++;
}
}
}
else
{
if (url->host)
{
if (limitDomains[0])
{
limit = limitDomains;
while (*limit)
{
i = strlen((char *) url->host) -
strlen(*limit);
if (i >= 0)
{
if (!strcmp(*limit,
(char *) &url->host[i]))
{
addIt = 1;
break;
}
}
limit++;
}
}
else
{
addIt = 1;
}
}
}
if (addIt)
{
fragless = copyString(url->url);
sharp = (unsigned char *) strchr((char *) fragless, '#');
if (sharp)
{
*sharp = 0;
}
urlEntry = hashLookup(urlTable, fragless);
if (urlEntry)
{
/*
if (url->fragment)
{
anchorEntry = hashLookup(urlEntry->value,
url->fragment + 1);
}
*/
urlFree(url);
free(fragless);
}
else
{
/*
printf("%s\n", fragless);
*/
hashAdd(urlTable, fragless, NULL);
(*addURLFunc)(a, url);
}
}
else
{
urlEntry = hashLookup(rejectedURLTable, url->url);
if (!urlEntry)
{
hashAdd(rejectedURLTable, copyString(url->url), NULL);
/* XXX
printf("rejected %s\n", url->url);
*/
}
urlFree(url);
}
}
void
addURL(void *a, unsigned char *str)
{
int len;
unsigned char *s;
unsigned char *slash;
unsigned char *u;
URL *url;
addThisURL(a, str);
url = urlParse(str);
if (!url)
{
return;
}
if ((!url->net_loc) || (!url->path))
{
urlFree(url);
return;
}
s = copyString(url->path);
len = strlen((char *) s);
if
(
(len > 0) &&
(
(s[len - 1] != '/') ||
(len > 1)
)
)
{
if (s[len - 1] == '/')
{
s[len - 1] = 0;
}
len = strlen((char *) url->scheme) + 3 +
strlen((char *) url->net_loc);
u = calloc(len + strlen((char *) url->path) + 1, 1);
if (!u)
{
fprintf(stderr, "cannot calloc url\n");
exit(0);
}
strcpy((char *) u, (char *) url->scheme);
strcat((char *) u, "://");
strcat((char *) u, (char *) url->net_loc);
while (1)
{
slash = (unsigned char *) strrchr((char *) s, '/');
if (slash)
{
slash[1] = 0;
u[len] = 0;
strcat((char *) u, (char *) s);
addThisURL(a, u);
slash[0] = 0;
}
else
{
break;
}
}
free(u);
}
free(s);
urlFree(url);
}
static void
urlHandler(void *a, HTML *html)
{
URL *url;
url = urlRelative(html->base, html->currentAttribute->value);
if (url)
{
/*
printf("--------------------------------\n");
printf("%s +\n", html->base);
printf("%s =\n", html->currentAttribute->value);
printf("%s\n", url->url);
printf("--------------------------------\n");
*/
addURL(a, url->url);
urlFree(url);
}
}
void
addURLInit(AddURLFunc func, char **URLs, char **domains)
{
addURLFunc = func;
limitURLs = URLs;
limitDomains = domains;
rejectedURLTable = hashAlloc(NULL);
urlTable = hashAlloc(NULL);
htmlRegisterURLHandler(urlHandler);
}

View File

@ -0,0 +1,32 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _ADDURL_H_
#define _ADDURL_H_
#include "url.h"
typedef void (*AddURLFunc)(void *a, URL *url);
void addURL(void *a, unsigned char *str);
void addURLInit(AddURLFunc addURLFunc, char **limitURLs, char **limitDomains);
#endif /* _ADDURL_H_ */

View File

@ -0,0 +1,458 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include "html.h"
#include "http.h"
#include "io.h"
#include "main.h"
#include "mutex.h"
#include "url.h"
#include "utils.h"
#include "view.h"
mutex_t mainMutex;
static char *me = NULL;
static char *passThese[] =
{
"HTTP_USER_AGENT=",
"HTTP_ACCEPT=",
"HTTP_ACCEPT_CHARSET=",
"HTTP_ACCEPT_LANGUAGE=",
NULL
};
void
reportHTTPCharSet(void *a, unsigned char *charset)
{
}
void
reportContentType(void *a, unsigned char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
View *view;
view = a;
viewHTML(view, input);
}
void
reportHTMLAttributeName(void *a, HTML *html, Input *input)
{
View *view;
view = a;
viewHTMLAttributeName(view, input);
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
URL *url;
View *view;
view = a;
if (html->currentAttributeIsURL)
{
url = urlRelative(html->base, html->currentAttribute->value);
fprintf(view->out, "<a href=%s%s>", me,
url ? (char*) url->url : "");
urlFree(url);
}
viewHTMLAttributeValue(view, input);
if (html->currentAttributeIsURL)
{
fprintf(view->out, "</a>");
}
}
void
reportHTMLTag(void *a, HTML *html, Input *input)
{
View *view;
view = a;
viewHTMLTag(view, input);
}
void
reportHTMLText(void *a, Input *input)
{
View *view;
view = a;
viewHTMLText(view, input);
}
void
reportHTTP(void *a, Input *input)
{
View *view;
view = a;
viewHTTP(view, input);
}
void
reportHTTPBody(void *a, Input *input)
{
View *view;
view = a;
viewHTTP(view, input);
}
void
reportHTTPHeaderName(void *a, Input *input)
{
View *view;
view = a;
viewHTTPHeaderName(view, input);
}
void
reportHTTPHeaderValue(void *a, Input *input, unsigned char *url)
{
View *view;
view = a;
if (url)
{
fprintf(view->out, "<a href=%s%s>", me, url);
}
viewHTTPHeaderValue(view, input);
if (url)
{
fprintf(view->out, "</a>");
}
}
void
reportStatus(void *a, char *message, char *file, int line)
{
}
void
reportTime(int task, struct timeval *theTime)
{
}
unsigned char **
getHTTPRequestHeaders(View *view, char *host, char *verbose)
{
char **e;
extern char **environ;
int firstLetter;
char **h;
char *p;
char *q;
char **r;
char **ret;
char *scriptName;
char *serverURL;
char *str;
serverURL = "SERVER_URL=";
scriptName = "SCRIPT_NAME=";
e = environ;
while (*e)
{
if (!strncmp(*e, serverURL, strlen(serverURL)))
{
if (strchr(*e, '='))
{
serverURL = strchr(*e, '=') + 1;
}
}
else if (!strncmp(*e, scriptName, strlen(scriptName)))
{
if (strchr(*e, '='))
{
scriptName = strchr(*e, '=') + 1;
}
}
e++;
}
ret = malloc((e - environ + 1) * sizeof(*e));
if (!ret)
{
return NULL;
}
me = malloc(strlen(serverURL) + strlen(scriptName) +
strlen(verbose) + 1);
if (!me)
{
return NULL;
}
strcpy(me, serverURL);
strcat(me, scriptName);
strcat(me, verbose);
e = environ;
r = ret;
viewReport(view, "will send these HTTP Request headers:");
while (*e)
{
h = passThese;
while (*h)
{
if (!strncmp(*e, *h, strlen(*h)))
{
break;
}
h++;
}
if (*h)
{
str = malloc(strlen(*e) - 5 + 1 + 1);
if (!str)
{
continue;
}
p = *e + 5;
q = str;
while (*p && (*p != '='))
{
firstLetter = 1;
while (*p && (*p != '=') && (*p != '_'))
{
if (firstLetter)
{
*q++ = *p++;
firstLetter = 0;
}
else
{
*q++ = tolower(*p);
p++;
}
}
if (*p == '_')
{
*q++ = '-';
p++;
}
}
if (*p == '=')
{
p++;
*q++ = ':';
*q++ = ' ';
while (*p)
{
*q++ = *p++;
}
*q = 0;
*r++ = str;
viewReport(view, str);
}
}
e++;
}
str = malloc(6 + strlen(host) + 1);
if (str)
{
strcpy(str, "Host: ");
strcat(str, host);
*r++ = str;
viewReport(view, str);
}
viewReport(view, "<hr>");
*r = NULL;
return (unsigned char **) ret;
}
int
main(int argc, char *argv[])
{
char *ampersand;
unsigned char *equals;
char *name;
unsigned char *newURL;
char *p;
char *query;
URL *u;
unsigned char *url;
char *verbose;
View *view;
MUTEX_INIT();
url = NULL;
verbose = "?url=";
query = getenv("QUERY_STRING");
view = viewAlloc();
view->out = stdout;
freopen("/dev/null", "w", stderr);
fprintf(view->out, "Content-Type: text/html\n");
fprintf(view->out, "\n");
if (query)
{
p = query;
do
{
name = p;
ampersand = strchr(p, '&');
if (ampersand)
{
*ampersand = 0;
p = ampersand + 1;
}
equals = (unsigned char *) strchr(name, '=');
if (equals)
{
*equals = 0;
if (!strcmp(name, "url"))
{
url = equals + 1;
urlDecode(url);
}
else if (!strcmp(name, "verbose"))
{
verbose = "?verbose=on&url=";
viewVerbose();
}
}
} while (ampersand);
}
else if (argc > 1)
{
url = (unsigned char *) argv[1];
}
else
{
fprintf(view->out, "no environment variable QUERY_STRING<br>\n");
fprintf(view->out, "and no arg passed<br>\n");
return 1;
}
if (url && (*url))
{
fprintf
(
view->out,
"<html><head><title>View %s</title></head>"
"<body><tt><b>\n",
url
);
viewReport(view, "input url:");
viewReport(view, (char *) url);
viewReport(view, "<hr>");
u = urlParse(url);
if
(
((!u->scheme)||(!strcmp((char *) u->scheme, "http"))) &&
(!u->host) &&
(*url != '/')
)
{
newURL = calloc(strlen((char *) url) + 3, 1);
if (!newURL)
{
viewReport(view, "calloc failed");
return 1;
}
strcpy((char *) newURL, "//");
strcat((char *) newURL, (char *) url);
}
else
{
newURL = copyString(url);
}
urlFree(u);
u = urlParse(newURL);
if
(
(
(!u->scheme) ||
(!strcmp((char *) u->scheme, "http"))
) &&
(!*u->path)
)
{
url = newURL;
newURL = calloc(strlen((char *) url) + 2, 1);
if (!newURL)
{
viewReport(view, "calloc failed");
return 1;
}
strcpy((char *) newURL, (char *) url);
free(url);
strcat((char *) newURL, "/");
}
urlFree(u);
u = urlRelative(
(unsigned char *) "http://www.mozilla.org/index.html",
newURL);
free(newURL);
viewReport(view, "fully qualified url:");
viewReport(view, (char *) u->url);
viewReport(view, "<hr>");
fflush(view->out);
if (!strcmp((char *) u->scheme, "http"))
{
httpProcess(view, u,
getHTTPRequestHeaders(view, (char *) u->host,
verbose));
}
else
{
fprintf
(
view->out,
"Sorry, %s URLs are not supported yet. "
"Only http URLs are supported.",
u->scheme
);
}
fprintf(view->out, "</b></tt></body></html>\n");
}
else
{
viewReport(view, "no URL or empty URL specified");
}
exit(0);
return 1;
}

View File

@ -0,0 +1,25 @@
<html>
<head>
<title>View HTTP and HTML Source</title>
</head>
<body>
<h2>View HTTP and HTML Source</h2>
<form method="get" action="view.cgi">
Enter the URL of the document you'd like to examine:<br>
<input type=text name=url size=60>
<p>
<input type=submit value=Submit>&nbsp;
<input type=checkbox name=verbose>Verbose Mode (to watch connection)
</form>
Examples:
<pre>
somehost
http://www.mozilla.org
http://www.mozilla.org/index.html
</pre>
</body>
</HTML>

View File

@ -0,0 +1,210 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <stdio.h>
#include <unistd.h>
#include "main.h"
#include "net.h"
#define server "host.domain.com"
#define QR 0 /* query */
#define OPCODE 0 /* standard query */
#define AA 0 /* authoritative answer */
#define TC 0 /* truncation */
#define RD 1 /* recursion desired */
#define RA 0 /* recursion available */
#define Z 0 /* reserved */
#define RCODE 0 /* response code */
static unsigned short ID = 0xbeef;
mutex_t mainMutex;
void
reportContentType(void *a, char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
}
void
reportHTMLAttributeName(void *a, HTML *html, Input *input)
{
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
}
void
reportHTMLTag(void *a, HTML *html, Input *input)
{
}
void
reportHTMLText(void *a, Input *input)
{
}
void
reportHTTP(void *a, Input *input)
{
}
void
reportHTTPBody(void *a, Input *input)
{
}
void
reportHTTPCharSet(void *a, char *charset)
{
}
void
reportHTTPHeaderName(void *a, Input *input)
{
}
void
reportHTTPHeaderValue(void *a, Input *input)
{
}
static char *
putDomainName(char *p, char *name)
{
char *begin;
char *q;
q = name;
while (*q)
{
begin = q;
while (*q && (*q != '.'))
{
q++;
}
if (q - begin > 0)
{
*p++ = q - begin;
q = begin;
while (*q && (*q != '.'))
{
*p++ = *q++;
}
if (*q == '.')
{
q++;
}
}
}
*p++ = 0;
return p;
}
int
main(int argc, char *argv[])
{
unsigned char buf[1024];
int bytesTransferred;
int c;
int fd;
int i;
int len;
unsigned char *p;
fd = netConnect(NULL, server, 53);
if (fd < 0)
{
fprintf(stderr, "netConnect failed\n");
return 1;
}
p = buf + 2;
p[0] = (ID >> 8);
p[1] = (ID & 0xff);
p[2] = (QR | OPCODE | AA | TC | RD);
p[3] = (RA | Z | RCODE);
p[4] = 0; /* QDCOUNT */
p[5] = 1; /* QDCOUNT */
p[6] = 0; /* ANCOUNT */
p[7] = 0; /* ANCOUNT */
p[8] = 0; /* NSCOUNT */
p[9] = 0; /* NSCOUNT */
p[10] = 0; /* ARCOUNT */
p[11] = 0; /* ARCOUNT */
p = putDomainName(&p[12], "w3.org");
/* A = 1 = host address */
p[0] = 0;
p[1] = 15; /* MX = mail exchange */
p[2] = 0;
p[3] = 1; /* IN = Internet */
len = (p + 4) - buf;
buf[0] = 0;
buf[1] = len - 2;
bytesTransferred = write(fd, buf, len);
if (bytesTransferred != len)
{
fprintf(stderr, "wrong number of bytes written\n");
return 1;
}
bytesTransferred = read(fd, buf, sizeof(buf));
for (i = 0; i < bytesTransferred; i++)
{
c = buf[i];
if ((0x20 <= c) && (c <= 0x7e))
{
printf("%02d: 0x%02x = '%c'\n", i, c, c);
}
else
{
printf("%02d: 0x%02x\n", i, c);
}
}
printf("%d bytes read\n", bytesTransferred);
printf("ID 0x%04x\n", (buf[2] << 8) | buf[3]);
printf("QR %d\n", buf[4] >> 7);
printf("RCODE %d\n", buf[5] & 0xf);
printf("QDCOUNT %d\n", (buf[ 6] << 8) | buf[ 7]);
printf("ANCOUNT %d\n", (buf[ 8] << 8) | buf[ 9]);
printf("NSCOUNT %d\n", (buf[10] << 8) | buf[11]);
printf("ARCOUNT %d\n", (buf[12] << 8) | buf[13]);
return 0;
}

View File

@ -0,0 +1,16 @@
:
stdout=zzz.out.$$
stderr=zzz.err.$$
start=$1
out=$2
shift
shift
echo $start > $stdout
echo $start > $stderr
echo about to invoke ./robot -s $start -o $out $*
./robot -s $start -o $out $* >> $stdout 2>> $stderr
echo robot returned $?
if test -f core
then
mv core core.$$
fi

View File

@ -0,0 +1,65 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include "file.h"
#include "html.h"
#include "io.h"
#include "url.h"
void
fileProcess(void *a, URL *url)
{
char *dot;
FILE *file;
Input *input;
/* XXX temporary? */
if (!url->file)
{
return;
}
dot = strrchr((char *) url->file, '.');
if (dot)
{
if (strcasecmp(dot, ".html") && strcasecmp(dot, ".htm"))
{
return;
}
}
else
{
return;
}
file = fopen((char *) url->path, "r");
if (!file)
{
fprintf(stderr, "cannot open file %s\n", url->path);
return;
}
input = readStream(fileno(file), url->url);
htmlRead(a, input, url->url);
inputFree(input);
}

View File

@ -0,0 +1,31 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _FILE_H_
#define _FILE_H_
#include <stdio.h>
#include "url.h"
void fileProcess(void *a, URL *url);
#endif /* _FILE_H_ */

View File

@ -0,0 +1,227 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include "main.h"
#include "net.h"
#include "url.h"
mutex_t mainMutex;
static int
readReply(int fd, char *buf, int size)
{
int bytesRead;
bytesRead = read(fd, buf, size - 1);
if (bytesRead < 0)
{
buf[0] = 0;
}
else
{
buf[bytesRead] = 0;
}
if (bytesRead < 3)
{
fprintf(stderr, "bytesRead %d at line %d\n", bytesRead,
__LINE__);
return -1;
}
return ((buf[0] - '0') * 100) + ((buf[1] - '0') * 10) + (buf[2] - '0');
}
static int
writeRequest(int fd, char *command, char *argument)
{
char buf[1024];
int bytesWritten;
int len;
strcpy(buf, command);
if (argument)
{
strcat(buf, argument);
}
strcat(buf, "\r\n");
len = strlen(buf);
bytesWritten = write(fd, buf, len);
if (bytesWritten != len)
{
fprintf(stderr, "bytesWritten at line %d\n", __LINE__);
return 1;
}
return 0;
}
void
ftpProcess(void *a, URL *url)
{
char buf[4096];
int fd;
int port;
int reply;
int ret;
if (url->port == -1)
{
port = 21;
}
else
{
port = url->port;
}
fd = netConnect(a, url->host, port);
if (fd < 0)
{
fprintf(stderr, "netConnect failed\n");
return;
}
reply = readReply(fd, buf, sizeof(buf));
if (reply != 220)
{
fprintf(stderr, "reply %d at line %d\n", reply, __LINE__);
return;
}
ret = writeRequest(fd, "USER ", "anonymous");
if (ret)
{
return;
}
reply = readReply(fd, buf, sizeof(buf));
if (reply != 331)
{
fprintf(stderr, "reply %d buf %s", reply, buf);
return;
}
ret = writeRequest(fd, "PASS ", "foo@bar.com");
if (ret)
{
return;
}
reply = readReply(fd, buf, sizeof(buf));
if (reply != 230)
{
fprintf(stderr, "reply %d buf %s", reply, buf);
return;
}
ret = writeRequest(fd, "TYPE ", "I");
if (ret)
{
return;
}
reply = readReply(fd, buf, sizeof(buf));
if (reply == 230)
{
reply = readReply(fd, buf, sizeof(buf));
}
if (reply != 200)
{
fprintf(stderr, "reply %d buf %s", reply, buf);
return;
}
ret = writeRequest(fd, "PASV", NULL);
if (ret)
{
return;
}
reply = readReply(fd, buf, sizeof(buf));
printf("buf %s", buf);
}
void
reportContentType(void *a, char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
}
void
reportHTMLAttributeName(void *a, HTML *html, Input *input)
{
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
}
void
reportHTMLTag(void *a, HTML *html, Input *input)
{
}
void
reportHTMLText(void *a, Input *input)
{
}
void
reportHTTP(void *a, Input *input)
{
}
void
reportHTTPBody(void *a, Input *input)
{
}
void
reportHTTPCharSet(void *a, char *charset)
{
}
void
reportHTTPHeaderName(void *a, Input *input)
{
}
void
reportHTTPHeaderValue(void *a, Input *input)
{
}
int
main(int argc, char *argv[])
{
char *str;
URL *url;
str = "ftp://ftp.somedomain.com/somedir/somefile";
url = urlParse(str);
if (!url)
{
fprintf(stderr, "urlParse failed\n");
return 1;
}
ftpProcess(NULL, url);
return 0;
}

View File

@ -0,0 +1,267 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <errno.h>
#include <malloc.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "addurl.h"
#include "hash.h"
#include "html.h"
#include "http.h"
#include "main.h"
#include "mutex.h"
#include "url.h"
#include "utils.h"
typedef struct Arg
{
URL *url;
} Arg;
mutex_t mainMutex;
static char *limitURLs[] =
{
"http://www.w3.org/TR/REC-CSS2/",
NULL
};
static URL *lastURL = NULL;
static URL *urls = NULL;
void
reportContentType(void *a, unsigned char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
}
void
reportHTMLAttributeName(void *a, HTML *html, Input *input)
{
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
}
void
reportHTMLTag(void *a, HTML *html, Input *input)
{
}
void
reportHTMLText(void *a, Input *input)
{
}
void
reportHTTP(void *a, Input *input)
{
}
void
reportHTTPBody(void *a, Input *input)
{
}
void
reportHTTPCharSet(void *a, unsigned char *charset)
{
}
void
reportHTTPHeaderName(void *a, Input *input)
{
}
void
reportHTTPHeaderValue(void *a, Input *input)
{
}
void
reportStatus(void *a, char *message, char *file, int line)
{
}
void
reportTime(int task, struct timeval *theTime)
{
}
static void
addURLFunc(void *a, URL *url)
{
lastURL->next = url;
lastURL = url;
}
static void
grab(unsigned char *url, HTTP *http)
{
char *add;
int baseLen;
FILE *file;
char *p;
char *slash;
char *str;
baseLen = strlen(limitURLs[0]);
if (strncmp((char *) url, limitURLs[0], baseLen))
{
fprintf(stderr, "no match: %s vs %s\n", url, limitURLs[0]);
return;
}
if (url[strlen((char *) url) - 1] == '/')
{
add = "index.html";
}
else
{
add = "";
}
str = calloc(strlen((char *) url + baseLen) + strlen(add) + 1, 1);
if (!str)
{
fprintf(stderr, "cannot calloc string\n");
exit(0);
}
strcpy(str, (char *) url + baseLen);
p = strchr(str, '#');
if (p)
{
*p = 0;
}
strcat(str, add);
p = str;
while (1)
{
slash = strchr(p, '/');
if (!slash)
{
break;
}
*slash = 0;
if (mkdir(str, 0777))
{
if (errno != EEXIST)
{
perror("mkdir");
}
}
*slash = '/';
p = slash + 1;
}
file = fopen(str, "w");
if (!file)
{
fprintf(stderr, "cannot open file %s for writing\n", str);
exit(0);
}
if (fwrite(http->body, 1, http->bodyLen, file) != http->bodyLen)
{
fprintf(stderr, "did not write %ld bytes\n", http->bodyLen);
exit(0);
}
fclose(file);
free(str);
}
int
main(int argc, char *argv[])
{
Arg arg;
HTTP *http;
char *prog;
URL *url;
MUTEX_INIT();
prog = strrchr(argv[0], '/');
if (prog)
{
prog++;
}
else
{
prog = argv[0];
}
switch (argc)
{
case 1:
break;
case 2:
limitURLs[0] = argv[1];
break;
default:
fprintf(stderr, "usage: %s [ http://www.foo.com/bar/ ]\n",
prog);
return 1;
}
addURLInit(addURLFunc, limitURLs, NULL);
url = urlParse((unsigned char *) limitURLs[0]);
urls = url;
lastURL = url;
while (url)
{
arg.url = url;
http = httpProcess(&arg, url, NULL);
if (http)
{
switch (http->status)
{
case 200: /* OK */
grab(url->url, http);
break;
case 302: /* Moved Temporarily */
break;
case 403: /* Forbidden */
break;
case 404: /* Not Found */
break;
default:
printf("status %d\n", http->status);
break;
}
httpFree(http);
}
else
{
printf("httpProcess failed: %s\n", url->url);
}
url = url->next;
}
return 0;
}

View File

@ -0,0 +1,186 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <stdlib.h>
#include "hash.h"
#include "mutex.h"
HashTable *
hashAlloc(void (*func)(unsigned char *key, void *value))
{
HashTable *result;
result = calloc(sizeof(HashTable), 1);
if (!result)
{
fprintf(stderr, "cannot calloc hash table\n");
exit(0);
}
result->size = 130363;
result->buckets = calloc(result->size, sizeof(HashEntry *));
if (!result->buckets)
{
fprintf(stderr, "cannot calloc buckets\n");
free(result);
exit(0);
}
result->free = func;
return result;
}
void
hashFree(HashTable *table)
{
HashEntry *entry;
int i;
HashEntry *next;
for (i = 0; i < table->size; i++)
{
entry = table->buckets[i];
while (entry)
{
if (table->free)
{
(*table->free)(entry->key, entry->value);
}
next = entry->next;
free(entry);
entry = next;
}
}
free(table->buckets);
free(table);
}
static unsigned long
hashValue(HashTable *table, unsigned char *key)
{
unsigned long g;
unsigned long h;
unsigned char *x;
x = (unsigned char *) key;
h = 0;
while (*x)
{
h = (h << 4) + *x++;
if ((g = h & 0xf0000000) != 0)
h = (h ^ (g >> 24)) ^ g;
}
return h % table->size;
}
HashEntry *
hashLookup(HashTable *table, unsigned char *key)
{
HashEntry *entry;
MUTEX_LOCK();
entry = table->buckets[hashValue(table, key)];
while (entry)
{
if (!strcmp((char *) entry->key, (char *) key))
{
break;
}
entry = entry->next;
}
MUTEX_UNLOCK();
return entry;
}
HashEntry *
hashAdd(HashTable *table, unsigned char *key, void *value)
{
HashEntry *entry;
unsigned long i;
MUTEX_LOCK();
entry = calloc(sizeof(HashEntry), 1);
if (!entry)
{
fprintf(stderr, "cannot calloc hash entry\n");
exit(0);
}
entry->key = key;
entry->value = value;
i = hashValue(table, key);
entry->next = table->buckets[i];
table->buckets[i] = entry;
table->count++;
MUTEX_UNLOCK();
return entry;
}
static int
compareEntries(const void *entry1, const void *entry2)
{
return strcmp
(
(char *) (*((HashEntry **) entry1))->key,
(char *) (*((HashEntry **) entry2))->key
);
}
void
hashEnumerate(HashTable *table, void (*func)(HashEntry *))
{
HashEntry **array;
HashEntry *entry;
int i;
int j;
array = calloc(table->count, sizeof(HashEntry *));
if (!array)
{
fprintf(stderr, "cannot calloc sorting array\n");
exit(0);
}
j = 0;
for (i = 0; i < table->size; i++)
{
entry = table->buckets[i];
while (entry)
{
array[j++] = entry;
entry = entry->next;
}
}
qsort(array, table->count, sizeof(HashEntry *), compareEntries);
for (j = 0; j < table->count; j++)
{
(*func)(array[j]);
}
free(array);
}

View File

@ -0,0 +1,46 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _HASH_H_
#define _HASH_H_
typedef struct HashEntry
{
unsigned char *key;
void *value;
struct HashEntry *next;
} HashEntry;
typedef struct HashTable
{
HashEntry **buckets;
int count;
void (*free)(unsigned char *key, void *value);
int size;
} HashTable;
HashEntry *hashAdd(HashTable *table, unsigned char *key, void *value);
HashTable *hashAlloc(void (*func)(unsigned char *key, void *value));
void hashEnumerate(HashTable *table, void (*func)(HashEntry *));
void hashFree(HashTable *table);
HashEntry *hashLookup(HashTable *table, unsigned char *key);
#endif /* _HASH_H_ */

View File

@ -0,0 +1,907 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hash.h"
#include "html.h"
#include "http.h"
#include "io.h"
#include "main.h"
#include "url.h"
#include "utils.h"
#define IS_WHITE_SPACE(c) \
( \
((c) == ' ' ) || \
((c) == '\t') || \
((c) == '\r') || \
((c) == '\n') \
)
typedef struct HTMLState
{
unsigned short mask;
unsigned short saved;
unsigned short unGotten;
HTML *html;
} HTMLState;
static HashTable *tagTable = NULL;
static HTMLHandler tagHandler = NULL;
static char *urlAttributes[] =
{
"a", "href",
"applet", "codebase",
"area", "href",
"base", "href",
"blockquote", "cite",
"body", "background",
"del", "cite",
"form", "action",
"frame", "longdesc",
"frame", "src",
"head", "profile",
"iframe", "longdesc",
"iframe", "src",
"img", "longdesc",
"img", "src",
"img", "usemap",
"input", "src",
"input", "usemap",
"ins", "cite",
"link", "href",
"object", "archive",
"object", "classid",
"object", "codebase",
"object", "data",
"object", "usemap",
"q", "cite",
"script", "for",
"script", "src",
NULL
};
static int htmlInitialized = 0;
static HashTable *knownTagTable = NULL;
static char *knownTags[] =
{
"!doctype",
"a",
"address",
"applet",
"area",
"b",
"base",
"basefont",
"big",
"blink",
"blockquote",
"body",
"br",
"caption",
"cell",
"center",
"certificate",
"charles",
"cite",
"code",
"colormap",
"dd",
"dir",
"div",
"dl",
"dt",
"em",
"embed",
"font",
"form",
"frame",
"frameset",
"h1",
"h2",
"h3",
"h4",
"h5",
"h6",
"head",
"hr",
"html",
"hype",
"i",
"ilayer",
"image",
"img",
"inlineinput",
"input",
"isindex",
"jean",
"kbd",
"keygen",
"layer",
"li",
"link",
"listing",
"map",
"media",
"menu",
"meta",
"mquote",
"multicol",
"nobr",
"noembed",
"noframes",
"nolayer",
"noscript",
"nscp_close",
"nscp_open",
"nscp_reblock",
"nsdt",
"object",
"ol",
"option",
"p",
"param",
"plaintext",
"pre",
"s",
"samp",
"script",
"select",
"server",
"small",
"spacer",
"span",
"spell",
"strike",
"strong",
"style",
"sub",
"subdoc",
"sup",
"table",
"td",
"textarea",
"th",
"title",
"tr",
"tt",
"u",
"ul",
"var",
"wbr",
"xmp",
NULL
};
static void
diag(int line, HTMLState *state, unsigned short c)
{
fprintf(stderr, "%s(%d): 0x%02x(%c) tag %s attr %s\n", __FILE__, line,
c, c, state->html->tag ? (char *) state->html->tag : "NULL",
state->html->currentAttribute ?
(char *) state->html->currentAttribute->name : "NULL");
fprintf(stderr, "(%s)\n", state->html->url);
}
static void
htmlInit(void)
{
char **p;
knownTagTable = hashAlloc(NULL);
p = knownTags;
while (*p)
{
hashAdd(knownTagTable, copyString((unsigned char *) *p), NULL);
p++;
}
htmlInitialized = 1;
}
static void
htmlCheckForBaseURL(HTML* html)
{
if
(
(!strcmp((char *) html->tag, "base")) &&
(!strcmp((char *) html->currentAttribute->name, "href"))
)
{
FREE(html->base);
html->base = copyString(html->currentAttribute->value);
}
}
static void
htmlCheckForURLAttribute(HTML *html)
{
char **p;
html->currentAttributeIsURL = 0;
p = urlAttributes;
while (*p)
{
if
(
(!strcmp((char *) html->tag, p[0])) &&
(!strcmp((char *) html->currentAttribute->name, p[1]))
)
{
html->currentAttributeIsURL = 1;
break;
}
p += 2;
}
}
static void
htmlCheckAttribute(HTML *html)
{
htmlCheckForBaseURL(html);
htmlCheckForURLAttribute(html);
}
void
htmlRegister(char *tag, char *attributeName, HTMLHandler handler)
{
HashEntry *attrEntry;
HashEntry *tagEntry;
if (!tagTable)
{
tagTable = hashAlloc(NULL);
}
tagEntry = hashLookup(tagTable, (unsigned char *) tag);
if (!tagEntry)
{
tagEntry = hashAdd(tagTable, (unsigned char *) tag,
hashAlloc(NULL));
}
attrEntry = hashLookup(tagEntry->value,
(unsigned char *) attributeName);
if (attrEntry)
{
attrEntry->value = (void *) handler;
}
else
{
hashAdd(tagEntry->value, (unsigned char *) attributeName,
(void *) handler);
}
}
void
htmlRegisterURLHandler(HTMLHandler handler)
{
char **p;
p = urlAttributes;
while (*p)
{
htmlRegister(p[0], p[1], handler);
p += 2;
}
}
static void
callHandler(void *a, HTML *html)
{
HashEntry *attrEntry;
HashEntry *tagEntry;
if (!tagTable)
{
return;
}
tagEntry = hashLookup(tagTable, html->tag);
if (tagEntry)
{
attrEntry = hashLookup(tagEntry->value,
html->currentAttribute->name);
if (attrEntry)
{
(*((HTMLHandler) attrEntry->value))(a, html);
}
}
}
void
htmlRegisterTagHandler(HTMLHandler handler)
{
tagHandler = handler;
}
static unsigned short
htmlGetByte(Input *input, HTMLState *state)
{
unsigned short c;
unsigned short ret;
unsigned short tmp;
if (state->unGotten != 256)
{
tmp = state->unGotten;
state->unGotten = 256;
return tmp;
}
c = getByte(input);
if (c == 256)
{
ret = c;
}
else if (c == 0x1b)
{
c = getByte(input);
if (c == 256)
{
ret = c;
}
else if (c == '$')
{
c = getByte(input);
if (c == 256)
{
ret = c;
}
else if (c == '(')
{
/* throw away 4th byte in ESC sequence */
getByte(input);
state->mask = 0x80;
c = getByte(input);
if (c == 256)
{
ret = c;
}
else
{
ret = c | state->mask;
}
}
else
{
state->mask = 0x80;
c = getByte(input);
if (c == 256)
{
ret = c;
}
else
{
ret = c | state->mask;
}
}
}
else if (c == '(')
{
state->mask = 0;
/* throw away 3rd byte in ESC sequence */
getByte(input);
ret = getByte(input);
}
else
{
unGetByte(input);
ret = 0x1b;
}
}
else
{
ret = c | state->mask;
}
state->saved = ret;
return ret;
}
static void
htmlUnGetByte(HTMLState *state)
{
state->unGotten = state->saved;
}
static unsigned short
eatWhiteSpace(Input *input, HTMLState *state, unsigned short c)
{
while
(
(c == ' ') ||
(c == '\t') ||
(c == '\r') ||
(c == '\n')
)
{
c = htmlGetByte(input, state);
}
return c;
}
static void
htmlFreeAttributes(HTMLState *state)
{
HTMLAttribute *attr;
HTMLAttribute *tmp;
attr = state->html->attributes;
state->html->attributes = NULL;
while (attr)
{
free(attr->name);
free(attr->value);
tmp = attr;
attr = attr->next;
free(tmp);
}
}
static unsigned short
readAttribute(void *a, Input *input, HTMLState *state, unsigned short c)
{
HTMLAttribute *attr;
unsigned short quote;
mark(input, -1);
reportHTML(a, input);
while
(
(c != 256) &&
(c != '>') &&
(c != '=') &&
(c != ' ') &&
(c != '\t') &&
(c != '\r') &&
(c != '\n')
)
{
c = htmlGetByte(input, state);
}
mark(input, -1);
attr = calloc(sizeof(HTMLAttribute), 1);
if (!attr)
{
fprintf(stderr, "cannot calloc HTMLAttribute\n");
exit(0);
}
if (state->html->currentAttribute)
{
state->html->currentAttribute->next = attr;
}
else
{
if (state->html->attributes)
{
htmlFreeAttributes(state);
}
state->html->attributes = attr;
}
state->html->currentAttribute = attr;
attr->name = copyLower(input);
reportHTMLAttributeName(a, state->html, input);
if ((c == 256) || (c == '>'))
{
return c;
}
if (c != '=')
{
c = eatWhiteSpace(input, state, c);
}
if ((c == 256) || (c == '>'))
{
return c;
}
if (c == '=')
{
c = eatWhiteSpace(input, state, htmlGetByte(input, state));
if ((c == '"') || (c == '\''))
{
quote = c;
mark(input, 0);
reportHTML(a, input);
do
{
c = htmlGetByte(input, state);
} while ((c != 256) && (c != quote));
if (c == 256)
{
diag(__LINE__, state, c);
}
mark(input, -1);
attr->value = copy(input);
htmlCheckAttribute(state->html);
reportHTMLAttributeValue(a, state->html, input);
c = htmlGetByte(input, state);
}
else
{
mark(input, -1);
reportHTML(a, input);
while
(
(c != 256) &&
(c != '>') &&
(c != ' ') &&
(c != '\t') &&
(c != '\r') &&
(c != '\n')
)
{
if ((c == '"') || (c == '\''))
{
diag(__LINE__, state, c);
}
c = htmlGetByte(input, state);
}
mark(input, -1);
attr->value = copy(input);
htmlCheckAttribute(state->html);
reportHTMLAttributeValue(a, state->html, input);
}
callHandler(a, state->html);
if (c == '>')
{
return c;
}
}
return eatWhiteSpace(input, state, c);
}
static int
caseCompare(char *str, Input *input, HTMLState *state, unsigned short *ret)
{
unsigned short c;
int i;
for (i = 0; str[i]; i++)
{
c = htmlGetByte(input, state);
if (tolower(c) != tolower(str[i]))
{
*ret = c;
return 0;
}
}
c = htmlGetByte(input, state);
*ret = c;
return 1;
}
static unsigned short
readTag(void *a, Input *input, HTMLState *state)
{
unsigned short c;
mark(input, -1);
reportHTML(a, input);
c = htmlGetByte(input, state);
if (c == '!')
{
c = htmlGetByte(input, state);
if (c == '-')
{
c = htmlGetByte(input, state);
if (c == '-')
{
const unsigned char *beginningOfComment =
current(input);
while (1)
{
c = htmlGetByte(input, state);
if (c == '-')
{
c = htmlGetByte(input, state);
if (c == '-')
{
c = htmlGetByte(input,
state);
if (c == '>')
{
return htmlGetByte(input, state);
}
else if (c == '-')
{
do
{
c = htmlGetByte(input, state);
} while (c == '-');
if (c == '>')
{
return htmlGetByte(input, state);
}
}
}
}
if (c == 256)
{
set(input, beginningOfComment);
while (1)
{
c = htmlGetByte(input, state);
if (c == '>')
{
return htmlGetByte(input, state);
}
else if (c == 256)
{
fprintf(stderr,
"bad comment\n");
mark(input, -1);
FREE(state->html->tag);
state->html->tag =
copyString((unsigned
char *) "!--");
state->html->tagIsKnown = 1;
reportHTMLTag(a,
state->html, input);
return c;
}
}
}
}
}
else
{
htmlUnGetByte(state);
}
}
else
{
htmlUnGetByte(state);
}
}
else
{
htmlUnGetByte(state);
}
do
{
c = htmlGetByte(input, state);
}
while
(
(c != 256) &&
(c != '>') &&
(c != ' ') &&
(c != '\t') &&
(c != '\r') &&
(c != '\n')
);
mark(input, -1);
FREE(state->html->tag);
state->html->tag = copyLower(input);
if (hashLookup(knownTagTable, (*state->html->tag == '/') ?
state->html->tag + 1 : state->html->tag))
{
state->html->tagIsKnown = 1;
}
else
{
state->html->tagIsKnown = 0;
}
reportHTMLTag(a, state->html, input);
if (c == 256)
{
return c;
}
else if (c == '>')
{
return htmlGetByte(input, state);
}
c = eatWhiteSpace(input, state, c);
if (c == 256)
{
return c;
}
else if (c == '>')
{
return htmlGetByte(input, state);
}
do
{
c = readAttribute(a, input, state, c);
} while ((c != 256) && (c != '>'));
state->html->currentAttribute = NULL;
if (tagHandler)
{
(*tagHandler)(a, state->html);
}
if (c == '>')
{
return htmlGetByte(input, state);
}
return c;
}
static unsigned short
readText(void *a, Input *input, HTMLState *state)
{
unsigned short c;
mark(input, -1);
reportHTML(a, input);
do
{
c = htmlGetByte(input, state);
} while ((c != 256) && (c != '<'));
mark(input, -1);
reportHTMLText(a, input);
return c;
}
static unsigned short
dealWithScript(Input *input, HTMLState *state, unsigned short c)
{
if (state->html->tag &&
(!strcasecmp((char *) state->html->tag, "script")))
{
while (1)
{
if (c == 256)
{
break;
}
if (c == '<')
{
if (caseCompare("/script>", input, state, &c))
{
FREE(state->html->tag);
break;
}
}
c = htmlGetByte(input, state);
}
}
return c;
}
void
htmlRead(void *a, Input *input, unsigned char *base)
{
unsigned short c;
HTML html;
HTMLState state;
if (!htmlInitialized)
{
htmlInit();
}
html.base = copyString(base);
html.url = copyString(base);
html.tag = NULL;
html.attributes = NULL;
html.currentAttribute = NULL;
state.mask = 0;
state.saved = 0;
state.unGotten = 256;
state.html = &html;
c = htmlGetByte(input, &state);
while (c != 256)
{
if (c == '<')
{
c = htmlGetByte(input, &state);
htmlUnGetByte(&state);
if
(
(('a' <= c) && (c <= 'z')) ||
(('A' <= c) && (c <= 'Z')) ||
(c == '/') ||
(c == '!')
)
{
c = readTag(a, input, &state);
c = dealWithScript(input, &state, c);
}
else
{
diag(__LINE__, &state, c);
c = readText(a, input, &state);
}
}
else
{
c = readText(a, input, &state);
}
}
FREE(html.base);
FREE(html.tag);
htmlFreeAttributes(&state);
}
unsigned char *
toHTML(unsigned char *str)
{
char buf[2];
int i;
int j;
int len;
char *replacement;
unsigned char *result;
buf[1] = 0;
len = 0;
result = NULL;
for (i = 0; i < 2; i++)
{
for (j = 0; str[j]; j++)
{
switch (str[j])
{
case '<':
replacement = "&lt;";
break;
case '>':
replacement = "&gt;";
break;
case '&':
replacement = "&amp;";
break;
default:
replacement = buf;
buf[0] = str[j];
break;
}
if (result)
{
strcat((char *) result, replacement);
}
else
{
len += strlen(replacement);
}
}
if (!result)
{
result = calloc(len + 3, 1);
if (!result)
{
fprintf(stderr,
"cannot calloc toHTML string\n");
exit(0);
}
result[0] = '"';
}
}
strcat((char *) result, "\"");
return result;
}

View File

@ -0,0 +1,55 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _HTML_H_
#define _HTML_H_
#include <stdio.h>
#include "io.h"
typedef struct HTMLAttribute
{
unsigned char *name;
unsigned char *value;
struct HTMLAttribute *next;
} HTMLAttribute;
typedef struct HTML
{
unsigned char *base;
unsigned char *url;
unsigned char *tag;
int tagIsKnown;
HTMLAttribute *attributes;
HTMLAttribute *currentAttribute;
int currentAttributeIsURL;
} HTML;
typedef void (*HTMLHandler)(void *a, HTML *html);
void htmlRead(void *a, Input *input, unsigned char *base);
void htmlRegister(char *tag, char *attributeName, HTMLHandler handler);
void htmlRegisterTagHandler(HTMLHandler handler);
void htmlRegisterURLHandler(HTMLHandler handler);
unsigned char *toHTML(unsigned char *str);
#endif /* _HTML_H_ */

View File

@ -0,0 +1,404 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include "addurl.h"
#include "html.h"
#include "http.h"
#include "io.h"
#include "main.h"
#include "mime.h"
#include "net.h"
#include "url.h"
#include "utils.h"
static unsigned char *emptyHTTPResponse = (unsigned char *) "";
static unsigned char *http09Response = (unsigned char *) "";
static unsigned char *locationURLWasAdded = (unsigned char *) "";
static int nonEmptyHTTPResponseCount = 0;
static int http10OrGreaterCount = 0;
static unsigned short
readLine(Input *input, unsigned short c)
{
while ((c != 256) && (c != '\r') && (c != '\n'))
{
c = getByte(input);
}
if (c == '\r')
{
c = getByte(input);
if (c == '\n')
{
c = getByte(input);
}
}
else if (c == '\n')
{
c = getByte(input);
}
return c;
}
static unsigned short
readSpaceTab(Input *input, unsigned short c)
{
while ((c == ' ') || (c == '\t'))
{
c = getByte(input);
}
return c;
}
static unsigned short
readNonWhiteSpace(Input *input, unsigned short c)
{
while
(
(c != 256) &&
(c != ' ') &&
(c != '\t') &&
(c != '\r') &&
(c != '\n')
)
{
c = getByte(input);
}
return c;
}
static unsigned char *
httpReadHeaders(HTTP *http, void *a, Input *input, unsigned char *url)
{
unsigned short c;
unsigned char *charset;
unsigned char *contentType;
int locationFound;
unsigned char *name;
URL *rel;
ContentType *type;
unsigned char *value;
contentType = NULL;
locationFound = 0;
if (!*current(input))
{
return emptyHTTPResponse;
}
nonEmptyHTTPResponseCount++;
if (strncmp((char *) current(input), "HTTP/", 5))
{
/* XXX deal with HTTP/0.9? */
return http09Response;
}
http10OrGreaterCount++;
mark(input, 0);
c = readNonWhiteSpace(input, getByte(input));
c = readSpaceTab(input, c);
sscanf((char *) current(input) - 1, "%d", &http->status);
c = readLine(input, c);
while (1)
{
if (c == 256)
{
mark(input, 0);
reportHTTP(a, input);
break;
}
mark(input, -1);
reportHTTP(a, input);
if ((c == '\r') || (c == '\n'))
{
readLine(input, c);
unGetByte(input);
mark(input, 0);
reportHTTP(a, input);
break;
}
while
(
(c != 256) &&
(c != '\r') &&
(c != '\n') &&
(c != ':')
)
{
c = getByte(input);
}
if (c != ':')
{
mark(input, -1);
fprintf(stderr, "no colon in HTTP header \"%s\": %s\n",
copy(input), url);
return NULL;
}
mark(input, -1);
reportHTTPHeaderName(a, input);
name = copyLower(input);
c = readSpaceTab(input, getByte(input));
mark(input, -1);
reportHTTP(a, input);
c = readLine(input, c);
if ((c == ' ') || (c == '\t'))
{
do
{
c = readLine(input, c);
} while ((c == ' ') || (c == '\t'));
}
c = trimTrailingWhiteSpace(input);
mark(input, -1);
value = copy(input);
if (!strcasecmp((char *) name, "content-type"))
{
reportHTTPHeaderValue(a, input, NULL);
type = mimeParseContentType(value);
contentType = mimeGetContentType(type);
charset = mimeGetContentTypeParameter(type, "charset");
if (charset)
{
reportHTTPCharSet(a, charset);
}
mimeFreeContentType(type);
}
else if (!strcasecmp((char *) name, "location"))
{
reportHTTPHeaderValue(a, input, value);
/* XXX supposed to be absolute URL */
rel = urlRelative(url, value);
addURL(a, rel->url);
urlFree(rel);
locationFound = 1;
}
else
{
reportHTTPHeaderValue(a, input, NULL);
}
free(name);
free(value);
c = readLine(input, c);
mark(input, -1);
reportHTTP(a, input);
}
if (!contentType)
{
if (locationFound)
{
return locationURLWasAdded;
}
}
return contentType;
}
void
httpParseRequest(HTTP *http, void *a, unsigned char *url)
{
unsigned short c;
mark(http->input, 0);
do
{
c = getByte(http->input);
} while (c != 256);
mark(http->input, -1);
reportHTTP(a, http->input);
}
void
httpParseStream(HTTP *http, void *a, unsigned char *url)
{
const unsigned char *begin;
unsigned short c;
unsigned char *contentType;
begin = current(http->input);
contentType = httpReadHeaders(http, a, http->input, url);
http->body = current(http->input);
http->bodyLen = inputLength(http->input) - (http->body - begin);
if (contentType)
{
if
(
(contentType != emptyHTTPResponse) &&
(contentType != http09Response) &&
(contentType != locationURLWasAdded)
)
{
reportContentType(a, contentType);
if (!strcasecmp((char *) contentType, "text/html"))
{
htmlRead(a, http->input, url);
}
else
{
do
{
c = getByte(http->input);
}
while (c != 256);
mark(http->input, -1);
reportHTTPBody(a, http->input);
}
free(contentType);
}
}
else
{
fprintf(stderr, "no Content-Type: %s\n", url);
}
}
void
httpRead(HTTP *http, void *a, int sock, unsigned char *url)
{
struct timeval theTime;
reportStatus(a, "readStream", __FILE__, __LINE__);
gettimeofday(&theTime, NULL);
http->input = readStream(sock, url);
reportTime(REPORT_TIME_READSTREAM, &theTime);
reportStatus(a, "readStream done", __FILE__, __LINE__);
httpParseStream(http, a, url);
}
static void
httpGetObject(HTTP *http, void *a, int sock, URL *url, unsigned char **headers)
{
char *get;
unsigned char **h;
char *httpStr;
get = "GET ";
httpStr = " HTTP/1.0\n";
write(sock, get, strlen(get));
if (url->path)
{
write(sock, url->path, strlen((char *) url->path));
}
if (url->params)
{
write(sock, url->params, strlen((char *) url->params));
}
if (url->query)
{
write(sock, url->query, strlen((char *) url->query));
}
write(sock, httpStr, strlen(httpStr));
h = headers;
if (h)
{
while (*h)
{
write(sock, *h, strlen((char *) *h));
write(sock, "\n", 1);
h++;
}
}
write(sock, "\n", 1);
httpRead(http, a, sock, url->url);
}
HTTP *
httpAlloc(void)
{
HTTP *http;
http = calloc(sizeof(HTTP), 1);
if (!http)
{
fprintf(stderr, "cannot calloc HTTP\n");
exit(0);
}
return http;
}
void
httpFree(HTTP *http)
{
if (http)
{
inputFree(http->input);
free(http);
}
}
HTTP *
httpProcess(void *a, URL *url, unsigned char **headers)
{
HTTP *http;
int port;
int sock;
port = -1;
if (url->port == -1)
{
port = 80;
}
else
{
port = url->port;
}
if (!url->host)
{
fprintf(stderr, "url->host is NULL for %s\n",
url->url ? (char *) url->url : "<NULL>");
return NULL;
}
sock = netConnect(a, url->host, port);
if (sock == -1)
{
return NULL;
}
http = httpAlloc();
httpGetObject(http, a, sock, url, headers);
close(sock);
return http;
}
int
httpGetHTTP10OrGreaterCount(void)
{
return http10OrGreaterCount;
}
int
httpGetNonEmptyHTTPResponseCount(void)
{
return nonEmptyHTTPResponseCount;
}

View File

@ -0,0 +1,47 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _HTTP_H_
#define _HTTP_H_
#include <stdio.h>
#include "io.h"
#include "url.h"
typedef struct HTTP
{
const unsigned char *body;
unsigned long bodyLen;
Input *input;
int status;
} HTTP;
HTTP *httpAlloc(void);
void httpFree(HTTP *http);
int httpGetHTTP10OrGreaterCount(void);
int httpGetNonEmptyHTTPResponseCount(void);
void httpParseRequest(HTTP *http, void *a, unsigned char *url);
void httpParseStream(HTTP *http, void *a, unsigned char *url);
HTTP *httpProcess(void *a, URL *url, unsigned char **headers);
void httpRead(HTTP *http, void *a, int fd, unsigned char *url);
#endif /* _HTTP_H_ */

View File

@ -0,0 +1,323 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <errno.h>
#include <malloc.h>
#include <memory.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/stropts.h>
#include "io.h"
#include "utils.h"
struct Input
{
unsigned long readAlloc;
const unsigned char *readBuf;
const unsigned char *readBufPtr;
const unsigned char *readBufEnd;
const unsigned char *readBufMarkBegin;
const unsigned char *readBufMarkEnd;
unsigned long streamSize;
};
static Input *
inputAlloc(void)
{
Input *input;
input = calloc(sizeof(Input), 1);
if (!input)
{
fprintf(stderr, "cannot calloc Input\n");
exit(0);
}
return input;
}
static Input *
readInit(void)
{
Input *input;
input = inputAlloc();
input->readAlloc = 1;
input->readBuf = calloc(input->readAlloc + 1, 1);
if (!input->readBuf)
{
fprintf(stderr, "cannot calloc readBuf\n");
exit(0);
}
return input;
}
Input *
readStream(int fd, unsigned char *url)
{
size_t bytesAvailable;
int bytesRead;
fd_set fdset;
Input *input;
int offset;
int ret;
struct stat statBuf;
unsigned long streamSize;
struct timeval timeout;
input = readInit();
FD_ZERO(&fdset);
FD_SET(fd, &fdset);
timeout.tv_sec = 5 * 60;
timeout.tv_usec = 0;
offset = 0;
streamSize = 0;
while (1)
{
ret = select(fd + 1, &fdset, NULL, NULL, &timeout);
if (!ret)
{
fprintf(stderr, "readStream: select timed out: %s\n",
url);
streamSize = 0;
break;
}
else if (ret == -1)
{
perror("select");
streamSize = 0;
break;
}
if (ioctl(fd, I_NREAD, &bytesAvailable) == -1)
{
/* if fd is file, we get this error */
if (errno == ENOTTY)
{
if (fstat(fd, &statBuf))
{
perror("fstat");
streamSize = 0;
break;
}
else
{
bytesAvailable = statBuf.st_size;
}
}
else
{
if (errno != ECONNRESET)
{
perror("ioctl");
}
streamSize = 0;
break;
}
}
if (offset + bytesAvailable > input->readAlloc)
{
input->readAlloc = offset + bytesAvailable;
input->readBuf = realloc((void *) input->readBuf,
input->readAlloc + 1);
if (!input->readBuf)
{
fprintf(stderr, "cannot realloc readBuf %ld\n",
input->readAlloc + 1);
streamSize = 0;
break;
}
}
bytesRead = read(fd, (void *) (input->readBuf + offset),
bytesAvailable);
if (bytesRead <= 0)
{
break;
}
else if (bytesRead > bytesAvailable)
{
/* should not happen */
streamSize = 0;
break;
}
else
{
offset += bytesRead;
streamSize += bytesRead;
}
}
((unsigned char *) input->readBuf)[streamSize] = 0;
input->readBufPtr = input->readBuf;
input->readBufEnd = input->readBuf + streamSize;
input->streamSize = streamSize;
input->readBufMarkEnd = input->readBuf;
return input;
}
Input *
readAvailableBytes(int fd)
{
int bytesRead;
Input *input;
input = inputAlloc();
input->readAlloc = 10240;
input->readBuf = calloc(input->readAlloc + 1, 1);
if (!input->readBuf)
{
fprintf(stderr, "cannot calloc readBuf\n");
exit(0);
}
input->readBufPtr = input->readBuf;
input->readBufEnd = input->readBuf;
input->readBufMarkEnd = input->readBuf;
bytesRead = read(fd, (void *) input->readBuf, input->readAlloc);
if (bytesRead < 0)
{
perror("read");
return input;
}
else if (bytesRead == input->readAlloc)
{
fprintf(stderr, "readBuf too small\n");
}
((unsigned char *) input->readBuf)[bytesRead] = 0;
input->readBufEnd = input->readBuf + bytesRead;
input->streamSize = bytesRead;
return input;
}
void
inputFree(Input *input)
{
free((char *) input->readBuf);
free(input);
}
unsigned short
getByte(Input *input)
{
if (input->readBufPtr >= input->readBufEnd)
{
input->readBufPtr++;
return 256;
}
return *input->readBufPtr++;
}
void
unGetByte(Input *input)
{
if (input->readBufPtr > input->readBuf)
{
input->readBufPtr--;
}
}
const unsigned char *
current(Input *input)
{
return input->readBufPtr;
}
void
set(Input *input, const unsigned char *pointer)
{
input->readBufPtr = (unsigned char *) pointer;
}
unsigned long
inputLength(Input *input)
{
return input->streamSize;
}
unsigned char *
copyMemory(Input *input, unsigned long *len)
{
unsigned char *ret;
*len = input->readBufMarkEnd - input->readBufMarkBegin;
ret = malloc(*len);
if (!ret)
{
fprintf(stderr, "cannot calloc block\n");
exit(0);
}
memcpy(ret, input->readBufMarkBegin, *len);
return ret;
}
unsigned char *
copy(Input *input)
{
return copySizedString(input->readBufMarkBegin,
input->readBufMarkEnd - input->readBufMarkBegin);
}
unsigned char *
copyLower(Input *input)
{
return lowerCase(copySizedString(input->readBufMarkBegin,
input->readBufMarkEnd - input->readBufMarkBegin));
}
unsigned short
trimTrailingWhiteSpace(Input *input)
{
unsigned char c;
input->readBufPtr -= 2;
do
{
c = *input->readBufPtr--;
} while
(
(c == ' ') ||
(c == '\t') ||
(c == '\r') ||
(c == '\n')
);
input->readBufPtr += 2;
return *input->readBufPtr++;
}
void
mark(Input *input, int offset)
{
input->readBufMarkBegin = input->readBufMarkEnd;
input->readBufMarkEnd = input->readBufPtr + offset;
}

View File

@ -0,0 +1,41 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _IO_H_
#define _IO_H_
typedef struct Input Input;
unsigned char *copy(Input *input);
unsigned char *copyLower(Input *input);
unsigned char *copyMemory(Input *input, unsigned long *len);
const unsigned char *current(Input *input);
unsigned short getByte(Input *input);
void inputFree(Input *input);
unsigned long inputLength(Input *input);
void mark(Input *input, int offset);
Input *readAvailableBytes(int fd);
Input *readStream(int fd, unsigned char *url);
void set(Input *input, const unsigned char *pointer);
unsigned short trimTrailingWhiteSpace(Input *input);
void unGetByte(Input *input);
#endif /* _IO_H_ */

View File

@ -0,0 +1,166 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <string.h>
#include "addurl.h"
#include "hash.h"
#include "html.h"
#include "http.h"
#include "main.h"
#include "mutex.h"
#include "url.h"
#include "utils.h"
typedef struct Arg
{
URL *url;
} Arg;
mutex_t mainMutex;
static unsigned char *limitURLs[] =
{
"http://lemming/people/erik/",
NULL
};
static URL *lastURL = NULL;
static URL *urls = NULL;
void
reportContentType(void *a, unsigned char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
}
void
reportHTMLAttributeName(void *a, Input *input)
{
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
}
void
reportHTMLTag(void *a, Input *input)
{
}
void
reportHTMLText(void *a, Input *input)
{
}
void
reportHTTP(void *a, Input *input)
{
}
void
reportHTTPBody(void *a, Input *input)
{
}
void
reportHTTPCharSet(void *a, unsigned char *charset)
{
}
void
reportHTTPHeaderName(void *a, Input *input)
{
}
void
reportHTTPHeaderValue(void *a, Input *input)
{
}
static void
addURLFunc(void *a, URL *url)
{
lastURL->next = url;
lastURL = url;
}
int
main(int argc, char *argv[])
{
Arg arg;
HTTP *http;
URL *url;
MUTEX_INIT();
if (argc > 1)
{
limitURLs[0] = argv[1];
}
addURLInit(addURLFunc, limitURLs, NULL);
url = urlParse(limitURLs[0]);
urls = url;
lastURL = url;
while (url)
{
arg.url = url;
http = httpProcess(&arg, url, NULL);
if (http)
{
switch (http->status)
{
case 200:
printf("%s\n", url->url);
break;
case 302:
break;
case 403: /* forbidden */
break;
case 404:
/*
printf("bad link %s\n", url->url);
*/
break;
default:
printf("status %d for %s\n", http->status,
url->url);
break;
}
httpFree(http);
}
else
{
printf("httpProcess failed: %s\n", url->url);
}
url = url->next;
}
return 0;
}

View File

@ -0,0 +1,55 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _MAIN_H_
#define _MAIN_H_
#include <synch.h>
#include "html.h"
#include "io.h"
extern mutex_t mainMutex;
#define REPORT_TIME_CONNECT_SUCCESS 0
#define REPORT_TIME_CONNECT_FAILURE 1
#define REPORT_TIME_GETHOSTBYNAME_SUCCESS 2
#define REPORT_TIME_GETHOSTBYNAME_FAILURE 3
#define REPORT_TIME_READSTREAM 4
#define REPORT_TIME_TOTAL 5
#define REPORT_TIME_MAX 6 /* highest + 1 */
void reportContentType(void *a, unsigned char *contentType);
void reportHTML(void *a, Input *input);
void reportHTMLAttributeName(void *a, HTML *html, Input *input);
void reportHTMLAttributeValue(void *a, HTML *html, Input *input);
void reportHTMLTag(void *a, HTML *html, Input *input);
void reportHTMLText(void *a, Input *input);
void reportHTTP(void *a, Input *input);
void reportHTTPBody(void *a, Input *input);
void reportHTTPCharSet(void *a, unsigned char *charset);
void reportHTTPHeaderName(void *a, Input *input);
void reportHTTPHeaderValue(void *a, Input *input, unsigned char *url);
void reportStatus(void *a, char *message, char *file, int line);
void reportTime(int task, struct timeval *theTime);
#endif /* _MAIN_H_ */

View File

@ -0,0 +1,285 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include "mime.h"
#include "utils.h"
#define IS_WHITE_SPACE(c) \
( \
((c) == ' ' ) || \
((c) == '\t') || \
((c) == '\r') || \
((c) == '\n') \
)
#define IS_NON_WHITE_SPACE(c) \
( \
((c) != '\0') && \
((c) != ' ' ) && \
((c) != '\t') && \
((c) != '\r') && \
((c) != '\n') \
)
typedef struct ContentTypeParameter
{
unsigned char *name;
unsigned char *value;
struct ContentTypeParameter *next;
} ContentTypeParameter;
struct ContentType
{
unsigned char *type;
ContentTypeParameter *parameters;
ContentTypeParameter *currentParameter;
};
ContentType *
mimeParseContentType(unsigned char *str)
{
ContentType *contentType;
unsigned char *name;
unsigned char *p;
ContentTypeParameter *parameter;
unsigned char *slash;
unsigned char *subtype;
unsigned char *type;
unsigned char *value;
if ((!str) || (!*str))
{
return NULL;
}
while (IS_WHITE_SPACE(*str))
{
str++;
}
if (!*str)
{
return NULL;
}
slash = (unsigned char *) strchr((char *) str, '/');
if ((!slash) || (slash == str))
{
return NULL;
}
p = slash - 1;
while (IS_WHITE_SPACE(*p))
{
p--;
}
p++;
type = lowerCase(copySizedString(str, p - str));
p = slash + 1;
while (IS_WHITE_SPACE(*p))
{
p++;
}
if (!*p)
{
free(type);
return NULL;
}
subtype = p;
while (IS_NON_WHITE_SPACE(*p) && (*p != ';'))
{
p++;
}
subtype = lowerCase(copySizedString(subtype, p - subtype));
contentType = calloc(sizeof(ContentType), 1);
if (!contentType)
{
fprintf(stderr, "cannot calloc ContentType\n");
exit(0);
}
contentType->type = calloc(strlen((char *) type) + 1 +
strlen((char *) subtype) + 1, 1);
if (!contentType->type)
{
fprintf(stderr, "cannot calloc type\n");
exit(0);
}
strcpy((char *) contentType->type, (char *) type);
strcat((char *) contentType->type, "/");
strcat((char *) contentType->type, (char *) subtype);
free(type);
free(subtype);
while (IS_WHITE_SPACE(*p))
{
p++;
}
if (!*p)
{
return contentType;
}
if (*p != ';')
{
fprintf(stderr, "expected semicolon; got '%c'\n", *p);
fprintf(stderr, "str: %s\n", str);
return contentType;
}
while (1)
{
p++;
while (IS_WHITE_SPACE(*p))
{
p++;
}
if (!*p)
{
fprintf(stderr, "expected parameter: %s\n", str);
break;
}
name = p;
while (IS_NON_WHITE_SPACE(*p) && (*p != '='))
{
p++;
}
parameter = calloc(sizeof(ContentTypeParameter), 1);
if (!parameter)
{
fprintf(stderr, "cannot calloc parameter\n");
exit(0);
}
parameter->name = lowerCase(copySizedString(name, p - name));
while (IS_WHITE_SPACE(*p))
{
p++;
}
if (*p != '=')
{
fprintf(stderr, "expected '=': %s\n", str);
return contentType;
}
p++;
while (IS_WHITE_SPACE(*p))
{
p++;
}
if (*p == '"')
{
p++;
value = p;
while ((*p) && (*p != '"'))
{
p++;
}
if (!*p)
{
fprintf(stderr, "expected '\"': %s\n", str);
return contentType;
}
parameter->value = copySizedString(value, p - value);
p++;
}
else
{
value = p;
while (IS_NON_WHITE_SPACE(*p) && (*p != ';'))
{
p++;
}
parameter->value = copySizedString(value, p - value);
}
if (contentType->currentParameter)
{
contentType->currentParameter->next = parameter;
}
else
{
contentType->parameters = parameter;
}
contentType->currentParameter = parameter;
while (IS_WHITE_SPACE(*p))
{
p++;
}
if (!*p)
{
break;
}
}
return contentType;
}
unsigned char *
mimeGetContentTypeParameter(ContentType *contentType, char *name)
{
ContentTypeParameter *parameter;
if (!contentType)
{
return NULL;
}
parameter = contentType->parameters;
while (parameter)
{
if (!strcasecmp((char *) parameter->name, name))
{
return copyString(parameter->value);
}
parameter = parameter->next;
}
return NULL;
}
void
mimeFreeContentType(ContentType *contentType)
{
ContentTypeParameter *param;
ContentTypeParameter *tmp;
if (contentType)
{
FREE(contentType->type);
param = contentType->parameters;
while (param)
{
tmp = param;
FREE(param->name);
FREE(param->value);
param = param->next;
free(tmp);
}
free(contentType);
}
}
unsigned char *
mimeGetContentType(ContentType *contentType)
{
if (contentType)
{
return copyString(contentType->type);
}
else
{
return NULL;
}
}

View File

@ -0,0 +1,33 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _MIME_H_
#define _MIME_H_
typedef struct ContentType ContentType;
void mimeFreeContentType(ContentType *contentType);
unsigned char *mimeGetContentType(ContentType *contentType);
unsigned char *mimeGetContentTypeParameter(ContentType *contentType,
char *parameter);
ContentType *mimeParseContentType(unsigned char *str);
#endif /* _MIME_H_ */

View File

@ -0,0 +1,45 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <thread.h>
#include <synch.h>
#include "main.h"
#define MUTEX_INIT() \
if (mutex_init(&mainMutex, USYNC_THREAD, NULL)) \
{ \
fprintf(stderr, "mutex_init failed\n"); \
exit(0); \
return 1; \
}
#define MUTEX_LOCK() \
do \
{ \
mutex_lock(&mainMutex); \
} while (0);
#define MUTEX_UNLOCK() \
do \
{ \
mutex_unlock(&mainMutex); \
} while (0);

View File

@ -0,0 +1,278 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <errno.h>
#include <memory.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <thread.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/types.h>
#include <sys/systeminfo.h>
#include <signal.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include "main.h"
#include "mutex.h"
#include "net.h"
#include "view.h"
static int connectCount = 0;
static int dnsCount = 0;
static char *
getHostName(void)
{
size_t alloc;
char *hostName;
long size;
alloc = 512;
hostName = calloc(alloc, 1);
if (!hostName)
{
return NULL;
}
while (1)
{
size = sysinfo(SI_HOSTNAME, hostName, alloc);
if (size < 0)
{
fprintf(stderr, "sysinfo failed\n");
return NULL;
}
else if (size > alloc)
{
alloc = size + 1;
hostName = realloc(hostName, alloc);
if (!hostName)
{
return NULL;
}
}
else
{
break;
}
}
return hostName;
}
static int
getSocketAndIPAddress(void *a, unsigned char *hostName, int port,
struct sockaddr_in *addr)
{
char buf[512];
int err;
struct hostent host;
struct hostent *ret;
unsigned short shortPort;
int sock;
struct timeval theTime;
sock = socket(PF_INET, SOCK_STREAM, 0);
if (sock < 0)
{
perror("socket failed");
return -1;
}
viewReport(a, "calling gethostbyname_r() on");
viewReport(a, (char *) hostName);
reportStatus(a, "gethostbyname_r", __FILE__, __LINE__);
gettimeofday(&theTime, NULL);
/* XXX implement my own DNS lookup to do timeouts? */
/* XXX implement my own DNS lookup to try again? */
ret = gethostbyname_r((char *) hostName, &host, buf, sizeof(buf), &err);
if (!ret)
{
reportTime(REPORT_TIME_GETHOSTBYNAME_FAILURE, &theTime);
reportStatus(a, "gethostbyname_r failed", __FILE__, __LINE__);
viewReport(a, "failed<br><hr>");
close(sock);
return -1;
}
reportTime(REPORT_TIME_GETHOSTBYNAME_SUCCESS, &theTime);
reportStatus(a, "gethostbyname_r succeeded", __FILE__, __LINE__);
viewReport(a, "succeeded<br><hr>");
MUTEX_LOCK();
dnsCount++;
MUTEX_UNLOCK();
memset(addr, 0, sizeof(*addr));
addr->sin_family = host.h_addrtype /* PF_INET */;
shortPort = port;
addr->sin_port = htons(shortPort);
memcpy(&addr->sin_addr, host.h_addr, host.h_length /* 4 */);
return sock;
}
int
netListen(void *a, unsigned char **host, u_short *port)
{
unsigned char *hostName;
struct sockaddr_in name;
int namelen = sizeof(name);
int fd;
hostName = (unsigned char *) getHostName();
if (!hostName)
{
return -1;
}
fd = getSocketAndIPAddress(a, hostName, *port, &name);
if (fd < 0)
{
return -1;
}
if (host)
{
*host = hostName;
}
else
{
free(hostName);
}
if (bind(fd, (struct sockaddr *) &name, sizeof(name)))
{
perror("bind");
return -1;
}
if (listen(fd, 5))
{
perror("listen");
return -1;
}
if (!*port)
{
if (getsockname(fd, (struct sockaddr *) &name, &namelen))
{
return -1;
}
*port = ntohs(name.sin_port);
}
return fd;
}
int
netAccept(int fd)
{
int newFD;
int addrlen = sizeof(struct sockaddr);
struct sockaddr addr;
while ((newFD = accept(fd, &addr, &addrlen)) < 0)
{
if (errno != EINTR)
{
return -1;
}
}
return newFD;
}
int
netConnect(void *a, unsigned char *hostName, int port)
{
struct sockaddr_in addr;
int sock;
struct timeval theTime;
sock = getSocketAndIPAddress(a, hostName, port, &addr);
if (sock < 0)
{
return -1;
}
viewReport(a, "calling connect()");
reportStatus(a, "connect", __FILE__, __LINE__);
gettimeofday(&theTime, NULL);
if (connect(sock, (struct sockaddr *) &addr, sizeof(addr)) == -1)
{
reportTime(REPORT_TIME_CONNECT_FAILURE, &theTime);
/* XXX try again if Connection timed out? */
/* XXX try again if Connection refused? */
if
(
(errno != ETIMEDOUT) &&
(errno != ECONNREFUSED)
)
{
fprintf(stderr, "cannot connect to %s at %d: ",
hostName, port);
perror(NULL);
}
close(sock);
reportStatus(a, "connect failed", __FILE__, __LINE__);
viewReport(a, "failed:");
viewReport(a, strerror(errno) ? strerror(errno) : "NULL");
viewReport(a, "<hr>");
return -1;
}
reportTime(REPORT_TIME_CONNECT_SUCCESS, &theTime);
reportStatus(a, "connect succeeded", __FILE__, __LINE__);
viewReport(a, "succeeded<br><hr>");
MUTEX_LOCK();
connectCount++;
MUTEX_UNLOCK();
return sock;
}
int
netGetConnectCount(void)
{
return connectCount;
}
int
netGetDNSCount(void)
{
return dnsCount;
}

View File

@ -0,0 +1,31 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _NET_H_
#define _NET_H_
int netAccept(int fd);
int netConnect(void *a, unsigned char *hostName, int port);
int netGetConnectCount(void);
int netGetDNSCount(void);
int netListen(void *a, unsigned char **host, unsigned short *port);
#endif /* _NET_H_ */

View File

@ -0,0 +1,176 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include "main.h"
#include "net.h"
mutex_t mainMutex;
static int
readReply(int fd, unsigned char *buf, int size)
{
int bytesRead;
bytesRead = read(fd, buf, size - 1);
if (bytesRead < 0)
{
buf[0] = 0;
}
else
{
buf[bytesRead] = 0;
}
if (bytesRead < 3)
{
fprintf(stderr, "bytesRead %d at line %d\n", bytesRead,
__LINE__);
}
if (strncmp(buf, "+OK", 3))
{
return 1;
}
return 0;
}
static int
writeRequest(int fd, char *command, char *argument)
{
char buf[1024];
int bytesWritten;
int len;
strcpy(buf, command);
if (argument)
{
strcat(buf, argument);
}
strcat(buf, "\r\n");
len = strlen(buf);
bytesWritten = write(fd, buf, len);
if (bytesWritten != len)
{
fprintf(stderr, "bytesWritten at line %d\n", __LINE__);
return 1;
}
return 0;
}
static void
pop(char *host, char *user, char *password)
{
unsigned char buf[4096];
int fd;
fd = netConnect(NULL, host, 110);
if (fd < 0)
{
fprintf(stderr, "netConnect failed\n");
return;
}
if (readReply(fd, buf, sizeof(buf)))
{
fprintf(stderr, "not OK: \"%s\"\n", buf);
return;
}
printf("\"%s\"\n", buf);
if (writeRequest(fd, "USER ", user))
{
return;
}
if (readReply(fd, buf, sizeof(buf)))
{
fprintf(stderr, "not OK: \"%s\"\n", buf);
return;
}
printf("\"%s\"\n", buf);
}
int
main(int argc, char *argv)
{
unsigned char *password;
password = NULL;
pop("nsmail-2", "erik", password);
return 0;
}
void
reportContentType(void *a, unsigned char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
}
void
reportHTMLAttributeName(void *a, HTML *html, Input *input)
{
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
}
void
reportHTMLTag(void *a, HTML *html, Input *input)
{
}
void
reportHTMLText(void *a, Input *input)
{
}
void
reportHTTP(void *a, Input *input)
{
}
void
reportHTTPBody(void *a, Input *input)
{
}
void
reportHTTPCharSet(void *a, unsigned char *charset)
{
}
void
reportHTTPHeaderName(void *a, Input *input)
{
}
void
reportHTTPHeaderValue(void *a, Input *input)
{
}

View File

@ -0,0 +1,673 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <errno.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
#include "html.h"
#include "http.h"
#include "mutex.h"
#include "net.h"
#include "utils.h"
#include "view.h"
typedef int (*Handler)(int fd);
typedef struct FD
{
Handler handler;
int id;
FILE *logFile;
int port;
int suspend;
int writeFD;
} FD;
typedef struct Arg
{
View *view;
} Arg;
mutex_t mainMutex;
static fd_set fdSet;
static int id = 0;
static u_short mainPort = 40404;
static int maxFD = -1;
static FD **table = NULL;
static unsigned char suspendStr[1024];
static char *welcome =
"HTTP/1.0 200 OK
Content-Type: text/html
<title>Interceptor</title>
<h3>HTTP Interceptor Persistent Window</h3>
<p>
Keep this window alive as long as you want to continue the session.
It is recommended that you Minimize (Iconify) this window.
Do not click the Back button in this window.
Do not load another document in this window.
</p>
<script>
interceptorSuspendResumeWindow =
window.open
(
\"\",
\"interceptorSuspendResumeWindow\",
\"menubar,toolbar,location,directories,scrollbars,status\"
);
interceptorSuspendResumeWindow.document.write(
\"<title>Welcome to the HTTP Interceptor</title>\" +
\"<h3>Welcome to the HTTP Interceptor</h3>\" +
\"<p>\" +
\"A new HTTP Interceptor session has been started for you. \" +
\"To start using this session, set your HTTP Proxy preference to the \" +
\"following. (Edit | Preferences | Advanced | Proxies | \" +
\"Manual proxy configuration | View | HTTP)\" +
\"</p>\" +
\"<pre>\" +
\"\\n\" +
\"\\tHTTP Proxy Server Address %s; Port %d\" +
\"\\n\" +
\"</pre>\" +
\"<h3>How to Suspend and Resume Logging</h3>\" +
\"<p>\" +
\"You can temporarily suspend and resume the HTTP Interceptor logging \" +
\"feature by clicking the links below.\" +
\"</p>\" +
\"<a href=http://%s:%d/suspend/%d>Suspend Logging</a><br>\" +
\"<a href=http://%s:%d/resume/%d>Resume Logging</a>\" +
\"<p>\" +
\"You may find it useful to drag these links to your Personal Toolbar.\" +
\"</p>\"
);
</script>
";
static FD *
addFD(int fd, Handler func)
{
FD *f;
if (fd > maxFD)
{
if (table)
{
table = utilRealloc(table,
(maxFD + 1) * sizeof(*table),
(fd + 1) * sizeof(*table));
}
else
{
table = calloc(fd + 1, sizeof(*table));
}
if (!table)
{
return NULL;
}
maxFD = fd;
}
f = malloc(sizeof(FD));
if (!f)
{
return NULL;
}
f->handler = func;
f->id = -1;
f->logFile = NULL;
f->port = 0;
f->suspend = 0;
f->writeFD = -1;
table[fd] = f;
FD_SET(fd, &fdSet);
return f;
}
static void
removeFD(int fd)
{
FD *f;
f = table[fd];
if (f)
{
FD_CLR(fd, &fdSet);
if (f->logFile && (fileno(f->logFile) == fd))
{
fclose(f->logFile);
}
else
{
close(fd);
}
free(f);
table[fd] = NULL;
}
}
static int
logRequest(FD *f, Input *input)
{
Arg arg;
HTTP *http;
if
(
(table[fileno(f->logFile)]->suspend) ||
(strstr(current(input), suspendStr))
)
{
table[f->writeFD]->suspend = 1;
return 0;
}
table[f->writeFD]->suspend = 0;
fprintf
(
f->logFile,
"<script>\n"
"w%d = window.open(\"\", \"%d\", \"scrollbars\");\n"
"w%d.document.write(\""
"<h3>Client Request</h3>"
"<pre><b>",
f->id,
f->id,
f->id
);
http = httpAlloc();
http->input = input;
arg.view = viewAlloc();
arg.view->backslash = 1;
arg.view->out = f->logFile;
httpParseRequest(http, &arg, "logRequest");
free(arg.view);
httpFree(http);
fprintf(f->logFile, "</b></pre>\");\n</script>\n");
fflush(f->logFile);
return 1;
}
static int
readClientRequest(int fd)
{
FD *f;
Input *input;
f = table[fd];
input = readAvailableBytes(fd);
write(f->writeFD, current(input), inputLength(input));
if (!logRequest(f, input))
{
inputFree(input);
}
return 0;
}
static int
logResponse(FD *f, Input *input)
{
Arg arg;
HTTP *http;
if ((table[fileno(f->logFile)]->suspend) || (f->suspend))
{
return 0;
}
fprintf
(
f->logFile,
"<script>\n"
"w%d.document.write(\""
"<h3>Server Response</h3>"
"<pre><b>",
f->id
);
http = httpAlloc();
http->input = input;
arg.view = viewAlloc();
arg.view->backslash = 1;
arg.view->out = f->logFile;
httpParseStream(http, &arg, "readProxyResponse");
free(arg.view);
httpFree(http);
fprintf(f->logFile, "</b></pre>\");\n</script>\n");
fflush(f->logFile);
return 1;
}
static int
readProxyResponse(int fd)
{
FD *f;
Input *input;
f = table[fd];
input = readStream(fd, "readProxyResponse");
write(f->writeFD, current(input), inputLength(input));
if (!logResponse(f, input))
{
inputFree(input);
}
removeFD(f->writeFD);
removeFD(fd);
return 0;
}
static int
acceptNewClient(int fd)
{
FD *client;
int clientFD;
FD *f;
FD *proxy;
int proxyFD;
clientFD = netAccept(fd);
if (clientFD < 0)
{
fprintf(stderr, "netAccept failed\n");
return 0;
}
client = addFD(clientFD, readClientRequest);
if (!client)
{
fprintf(stderr, "addFD failed\n");
return 0;
}
proxyFD = netConnect(NULL, "w3proxy.netscape.com", 8080);
if (proxyFD < 0)
{
fprintf(stderr, "netConnect failed\n");
return 0;
}
proxy = addFD(proxyFD, readProxyResponse);
if (!proxy)
{
fprintf(stderr, "addFD failed\n");
return 0;
}
client->writeFD = proxyFD;
proxy->writeFD = clientFD;
f = table[fd];
client->logFile = f->logFile;
proxy->logFile = f->logFile;
client->id = id;
proxy->id = id;
id++;
return 0;
}
static int
readLoggerRequest(int fd)
{
unsigned char buf[10240];
int bytesRead;
int doSuspend;
FD *f;
FILE *file;
unsigned char *host;
int i;
unsigned char *p;
u_short port;
int proxyListenFD;
unsigned char *resume;
unsigned char *str;
unsigned char *suspend;
int suspendPort;
bytesRead = read(fd, buf, sizeof(buf) - 1);
if (bytesRead < 0)
{
if (errno != ECONNRESET)
{
perror("read");
}
removeFD(fd);
return 0;
}
else if (!bytesRead)
{
removeFD(fd);
return 0;
}
buf[bytesRead] = 0;
resume = "/resume";
suspend = "/suspend";
if (strstr(buf, "/exit"))
{
char *goodbye =
"HTTP/1.0 200 OK\n"
"Content-Type: text/html\n"
"\n"
"Bye!"
;
write(fd, goodbye, strlen(goodbye));
removeFD(fd);
return 1;
}
else if ((strstr(buf, resume)) || (strstr(buf, suspend)))
{
if (strstr(buf, resume))
{
str = resume;
doSuspend = 0;
}
else
{
str = suspend;
doSuspend = 1;
}
p = strstr(buf, str);
p += strlen(str);
if (*p != '/')
{
char *notOK =
"HTTP/1.0 200 OK\n"
"Content-Type: text/html\n"
"\n"
"No backslash after command!"
;
write(fd, notOK, strlen(notOK));
removeFD(fd);
return 0;
}
sscanf(p + 1, "%d", &suspendPort);
for (i = 0; i <= maxFD; i++)
{
if (table[i] && (table[i]->port == suspendPort))
{
table[i]->suspend = doSuspend;
break;
}
}
if (i <= maxFD)
{
char *ok =
"HTTP/1.0 200 OK\n"
"Content-Type: text/html\n"
"\n"
"OK!"
;
write(fd, ok, strlen(ok));
}
else
{
char *notOK =
"HTTP/1.0 200 OK\n"
"Content-Type: text/html\n"
"\n"
"Cannot find port number in table!"
;
write(fd, notOK, strlen(notOK));
}
removeFD(fd);
return 0;
}
/* XXX write(1, buf, bytesRead); */
file = fdopen(fd, "w");
if (!file)
{
char *err = "fdopen failed\n";
write(fd, err, strlen(err));
removeFD(fd);
return 0;
}
table[fd]->logFile = file;
port = 0;
proxyListenFD = netListen(NULL, &host, &port);
if (proxyListenFD < 0)
{
fprintf(file, "listen failed\n");
removeFD(fd);
fclose(file);
return 0;
}
f = addFD(proxyListenFD, acceptNewClient);
if (!f)
{
fprintf(file, "addFD failed\n");
removeFD(fd);
fclose(file);
return 0;
}
fprintf
(
file,
welcome,
host,
port,
host,
mainPort,
port,
host,
mainPort,
port
);
sprintf(suspendStr, "http://%s:%d/suspend/%d", host, mainPort, port);
free(host);
fflush(file);
f->logFile = file;
table[fd]->port = port;
return 0;
}
static int
acceptNewLogger(int fd)
{
FD *f;
int newFD;
newFD = netAccept(fd);
if (newFD < 0)
{
fprintf(stderr, "netAccept failed\n");
return 0;
}
f = addFD(newFD, readLoggerRequest);
if (!f)
{
fprintf(stderr, "addFD failed\n");
return 0;
}
return 0;
}
void
reportContentType(void *a, unsigned char *contentType)
{
}
void
reportHTML(void *a, Input *input)
{
Arg *arg;
arg = a;
viewHTML(arg->view, input);
}
void
reportHTMLAttributeName(void *a, HTML *html, Input *input)
{
Arg *arg;
arg = a;
viewHTMLAttributeName(arg->view, input);
}
void
reportHTMLAttributeValue(void *a, HTML *html, Input *input)
{
Arg *arg;
arg = a;
viewHTMLAttributeValue(arg->view, input);
}
void
reportHTMLTag(void *a, HTML *html, Input *input)
{
Arg *arg;
arg = a;
viewHTMLTag(arg->view, input);
}
void
reportHTMLText(void *a, Input *input)
{
Arg *arg;
arg = a;
viewHTMLText(arg->view, input);
}
void
reportHTTP(void *a, Input *input)
{
Arg *arg;
arg = a;
viewHTTP(arg->view, input);
}
void
reportHTTPBody(void *a, Input *input)
{
Arg *arg;
arg = a;
viewHTTP(arg->view, input);
}
void
reportHTTPCharSet(void *a, unsigned char *charset)
{
}
void
reportHTTPHeaderName(void *a, Input *input)
{
Arg *arg;
arg = a;
viewHTTPHeaderName(arg->view, input);
}
void
reportHTTPHeaderValue(void *a, Input *input)
{
Arg *arg;
arg = a;
viewHTTPHeaderValue(arg->view, input);
}
void
reportStatus(void *a, char *message, char *file, int line)
{
}
void
reportTime(int task, struct timeval *theTime)
{
}
int
main(int argc, char *argv[])
{
FD *f;
int fd;
fd_set localFDSet;
int ret;
fd = netListen(NULL, NULL, &mainPort);
if (fd < 0)
{
fprintf(stderr, "netListen failed\n");
return 1;
}
f = addFD(fd, acceptNewLogger);
if (!f)
{
fprintf(stderr, "addFD failed\n");
return 1;
}
while (1)
{
localFDSet = fdSet;
ret = select(maxFD + 1, &localFDSet, NULL, NULL, NULL);
if (ret == -1)
{
perror("select");
}
for (fd = 0; fd <= maxFD; fd++)
{
if (FD_ISSET(fd, &localFDSet))
{
if ((*table[fd]->handler)(fd))
{
for (fd = 0; fd <= maxFD; fd++)
{
removeFD(fd);
}
return 0;
}
}
}
}
return 1;
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,4 @@
:
dir=/some/directory
./doRun http://www.somedomain.com/ $dir/somedomain.html
./doRun http://www.somedomain.co.jp/ $dir/somedomain-jp.html -d .jp

View File

@ -0,0 +1,728 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include "url.h"
#include "utils.h"
typedef struct StackEntry
{
unsigned char *str;
struct StackEntry *next;
struct StackEntry *previous;
} StackEntry;
typedef struct Stack
{
StackEntry *bottom;
StackEntry *top;
} Stack;
static URL *
urlAlloc(void)
{
URL *result;
result = calloc(sizeof(URL), 1);
if (!result)
{
fprintf(stderr, "cannot calloc URL\n");
exit(0);
}
result->port = -1;
return result;
}
void
urlFree(URL *url)
{
FREE(url->file);
FREE(url->fragment);
FREE(url->host);
FREE(url->login);
FREE(url->net_loc);
FREE(url->params);
FREE(url->password);
FREE(url->path);
FREE(url->pathWithoutFile);
FREE(url->query);
FREE(url->scheme);
FREE(url->url);
FREE(url);
}
static void
urlEmbellish(URL *url)
{
unsigned char *at;
unsigned char *colon;
unsigned char *host;
unsigned char *login;
unsigned char *p;
p = (unsigned char *) strrchr((char *) url->path, '/');
if (p)
{
FREE(url->pathWithoutFile);
url->pathWithoutFile = copySizedString(url->path,
p + 1 - url->path);
p++;
}
else
{
p = url->path;
}
if (p[0])
{
FREE(url->file);
url->file = copyString(p);
}
if (url->net_loc)
{
at = (unsigned char *) strchr((char *) url->net_loc, '@');
if (at)
{
login = url->net_loc;
colon = (unsigned char *) strchr((char *) login, ':');
if (colon && (colon < at))
{
url->password = copySizedString(colon + 1,
at - colon - 1);
url->login = copySizedString(login,
colon - login);
}
else
{
url->login = copySizedString(login,
at - login);
}
host = at + 1;
}
else
{
host = url->net_loc;
}
colon = (unsigned char *) strchr((char *) host, ':');
if (colon)
{
url->host = lowerCase(copySizedString(host,
colon - host));
sscanf((char *) colon + 1, "%d", &url->port);
}
else
{
FREE(url->host);
url->host = lowerCase(copyString(host));
}
}
}
URL *
urlParse(const unsigned char *urlStr)
{
unsigned char c;
unsigned char *net_loc;
unsigned char *p;
unsigned char *path;
unsigned char *str;
URL *url;
if ((!urlStr) || (!*urlStr))
{
return NULL;
}
url = urlAlloc();
url->url = copyString(urlStr);
str = copyString(urlStr);
p = (unsigned char *) strchr((char *) str, '#');
if (p)
{
url->fragment = copyString(p);
*p = 0;
}
p = str;
c = *p;
while
(
(('a' <= c) && (c <= 'z')) ||
(('A' <= c) && (c <= 'Z')) ||
(('0' <= c) && (c <= '9')) ||
(c == '+') ||
(c == '.') ||
(c == '-')
)
{
p++;
c = *p;
}
if ((c == ':') && (p > str))
{
url->scheme = lowerCase(copySizedString(str, p - str));
p++;
}
else
{
p = str;
}
if ((p[0] == '/') && (p[1] == '/'))
{
net_loc = p + 2;
p = (unsigned char *) strchr((char *) net_loc, '/');
if (p)
{
if (p > net_loc)
{
url->net_loc = copySizedString(net_loc,
p - net_loc);
}
}
else
{
if (*net_loc)
{
url->net_loc = copyString(net_loc);
}
p = (unsigned char *) strchr((char *) net_loc, 0);
}
}
path = p;
p = (unsigned char *) strchr((char *) p, '?');
if (p)
{
url->query = copyString(p);
*p = 0;
}
p = path;
p = (unsigned char *) strchr((char *) p, ';');
if (p)
{
url->params = copyString(p);
*p = 0;
}
url->path = copyString(path);
urlEmbellish(url);
free(str);
return url;
}
static unsigned char *
pop(Stack *stack)
{
unsigned char *result;
StackEntry *top;
if (stack->top)
{
top = stack->top;
result = top->str;
stack->top = top->previous;
if (stack->top)
{
stack->top->next = NULL;
}
else
{
stack->bottom = NULL;
}
free(top);
}
else
{
result = NULL;
}
return result;
}
static void
push(Stack *stack, unsigned char *str)
{
StackEntry *entry;
entry = calloc(sizeof(StackEntry), 1);
if (!entry)
{
fprintf(stderr, "cannot calloc StackEntry\n");
exit(0);
}
entry->str = str;
entry->next = NULL;
entry->previous = stack->top;
if (stack->top)
{
stack->top->next = entry;
}
stack->top = entry;
if (!stack->bottom)
{
stack->bottom = entry;
}
}
static unsigned char *
bottom(Stack *stack)
{
StackEntry *bottom;
unsigned char *result;
bottom = stack->bottom;
if (bottom)
{
result = bottom->str;
stack->bottom = bottom->next;
if (stack->bottom)
{
stack->bottom->previous = NULL;
}
free(bottom);
}
else
{
result = NULL;
}
return result;
}
static Stack *
stackAlloc(void)
{
Stack *stack;
stack = calloc(sizeof(Stack), 1);
if (!stack)
{
fprintf(stderr, "cannot calloc Stack\n");
exit(0);
}
return stack;
}
static void
stackFree(Stack *stack)
{
free(stack);
}
static void
urlCanonicalizePath(URL *url)
{
int absolute;
unsigned char *begin;
unsigned char *p;
unsigned char *slash;
Stack *stack;
unsigned char *str;
p = url->path;
if ((!p) || (!*p))
{
return;
}
if (p[0] == '/')
{
absolute = 1;
p++;
}
else
{
absolute = 0;
}
stack = stackAlloc();
while (*p)
{
begin = p;
p = (unsigned char *) strchr((char *) begin, '/');
if (!p)
{
p = (unsigned char *) strchr((char *) begin, 0);
}
if (p == begin)
{
}
else if ((p == begin + 1) && (begin[0] == '.'))
{
}
else if
(
(p == begin + 2) &&
(begin[0] == '.') &&
(begin[1] == '.')
)
{
slash = pop(stack);
str = pop(stack);
if (!str)
{
push(stack, copyString((unsigned char *) ".."));
if (*p)
{
push(stack, copyString(
(unsigned char *) "/"));
}
}
else if (!strcmp((char *) str, ".."))
{
push(stack, str);
push(stack, slash);
push(stack, copyString((unsigned char *) ".."));
if (*p)
{
push(stack, copyString(
(unsigned char *) "/"));
}
}
else
{
free(slash);
free(str);
}
}
else
{
push(stack, copySizedString(begin, p - begin));
if (*p)
{
push(stack, copyString((unsigned char *) "/"));
}
}
if (*p)
{
p++;
}
}
if (absolute)
{
url->path[0] = '/';
url->path[1] = 0;
}
else
{
url->path[0] = 0;
}
while (1)
{
p = bottom(stack);
if (p)
{
strcat((char *) url->path, (char *) p);
free(p);
}
else
{
break;
}
}
stackFree(stack);
}
URL *
urlRelative(const unsigned char *baseURL, const unsigned char *relativeURL)
{
URL *base;
int len;
URL *rel;
unsigned char *tmp;
if ((!baseURL) || (!*baseURL))
{
return urlParse(relativeURL);
}
if ((!relativeURL) || (!*relativeURL))
{
return urlParse(baseURL);
}
rel = urlParse(relativeURL);
if (rel->scheme)
{
return rel;
}
else
{
base = urlParse(baseURL);
if (base->scheme)
{
rel->scheme = copyString(base->scheme);
}
else
{
/* XXX Base is supposed to have scheme. Oh well. */
return rel;
}
}
if (rel->net_loc)
{
goto step7;
}
else
{
rel->net_loc = copyString(base->net_loc);
}
if (rel->path && rel->path[0] == '/')
{
goto step7;
}
if ((!rel->path) || (!*rel->path))
{
FREE(rel->path);
rel->path = copyString(base->path);
if (rel->params)
{
goto step7;
}
rel->params = copyString(base->params);
if (rel->query)
{
goto step7;
}
rel->query = copyString(base->query);
goto step7;
}
if (base->pathWithoutFile)
{
tmp = rel->path;
rel->path = appendString(base->pathWithoutFile, rel->path);
FREE(tmp);
}
urlCanonicalizePath(rel);
step7:
len = strlen((char *) rel->scheme);
len += 1; /* ":" */
if (rel->net_loc)
{
len += 2 + strlen((char *) rel->net_loc); /* "//net_loc" */
}
if (rel->path)
{
len += strlen((char *) rel->path);
}
if (rel->params)
{
len += strlen((char *) rel->params);
}
if (rel->query)
{
len += strlen((char *) rel->query);
}
if (rel->fragment)
{
len += strlen((char *) rel->fragment);
}
FREE(rel->url);
rel->url = calloc(len + 1, 1);
if (!rel->url)
{
fprintf(stderr, "cannot calloc url\n");
exit(0);
}
strcpy((char *) rel->url, (char *) rel->scheme);
strcat((char *) rel->url, ":");
if (rel->net_loc)
{
strcat((char *) rel->url, "//");
strcat((char *) rel->url, (char *) rel->net_loc);
}
if (rel->path)
{
strcat((char *) rel->url, (char *) rel->path);
}
if (rel->params)
{
strcat((char *) rel->url, (char *) rel->params);
}
if (rel->query)
{
strcat((char *) rel->url, (char *) rel->query);
}
if (rel->fragment)
{
strcat((char *) rel->url, (char *) rel->fragment);
}
urlEmbellish(rel);
urlFree(base);
return rel;
}
void
urlDecode(unsigned char *url)
{
unsigned char c;
unsigned char *in;
unsigned char *out;
int tmp;
in = url;
out = url;
while (1)
{
c = *in++;
if (!c)
{
break;
}
else if (c == '%')
{
sscanf((char *) in, "%02x", &tmp);
if (*in)
{
in++;
if (*in)
{
in++;
}
}
*out++ = tmp;
}
else
{
*out++ = c;
}
}
*out++ = 0;
}
#ifdef URL_TEST
static unsigned char *baseURLTest = "http://a/b/c/d;p?q#f";
static char *relativeURLTests[] =
{
"g:h", "g:h",
"g", "http://a/b/c/g",
"./g", "http://a/b/c/g",
"g/", "http://a/b/c/g/",
"/g", "http://a/g",
"//g", "http://g",
"?y", "http://a/b/c/d;p?y",
"g?y", "http://a/b/c/g?y",
"g?y/./x", "http://a/b/c/g?y/./x",
"#s", "http://a/b/c/d;p?q#s",
"g#s", "http://a/b/c/g#s",
"g#s/./x", "http://a/b/c/g#s/./x",
"g?y#s", "http://a/b/c/g?y#s",
";x", "http://a/b/c/d;x",
"g;x", "http://a/b/c/g;x",
"g;x?y#s", "http://a/b/c/g;x?y#s",
".", "http://a/b/c/",
"./", "http://a/b/c/",
"..", "http://a/b/",
"../", "http://a/b/",
"../g", "http://a/b/g",
"../..", "http://a/",
"../../", "http://a/",
"../../g", "http://a/g",
"", "http://a/b/c/d;p?q#f",
"../../../g", "http://a/../g",
"../../../../g", "http://a/../../g",
"/./g", "http://a/./g",
"/../g", "http://a/../g",
"g.", "http://a/b/c/g.",
".g", "http://a/b/c/.g",
"g..", "http://a/b/c/g..",
"..g", "http://a/b/c/..g",
"./../g", "http://a/b/g",
"./g/.", "http://a/b/c/g/",
"g/./h", "http://a/b/c/g/h",
"g/../h", "http://a/b/c/h",
"http:g", "http:g",
"http:", "http:",
NULL
};
static unsigned char *loginTest =
"ftp://user:password@ftp.domain.com:64000/path1/path2/file#fragment";
static void
printURL(URL *url)
{
printf("url %s\n", url->url);
printf("scheme %s, ", url->scheme ? url->scheme : "NULL");
printf("login %s, ", url->login ? url->login : "NULL");
printf("password %s, ", url->password ? url->password : "NULL");
printf("host %s, ", url->host ? url->host : "NULL");
printf("port %d, ", url->port);
printf("path %s, ", url->path ? url->path : "NULL");
printf("file %s, ", url->file ? url->file : "NULL");
printf("fragment %s\n", url->fragment ? url->fragment : "NULL");
printf("======================================\n");
}
int
main(int argc, char *argv[])
{
int failures;
char **p;
int total;
URL *url;
printURL(urlParse(loginTest));
failures = 0;
total = 0;
p = relativeURLTests;
while (p[0])
{
total++;
url = urlRelative(baseURLTest, p[0]);
if (url)
{
if (strcmp((char *) url->url, p[1]))
{
failures++;
printf("urlRelative failed:\n");
printf("\"%s\" +\n", baseURLTest);
printf("\"%s\" =\n", p[0]);
printf("\"%s\"\n", url->url);
printf("should be:\n");
printf("\"%s\"\n", p[1]);
printf("-------------------\n");
}
urlFree(url);
}
else
{
failures++;
printf("urlRelative return NULL for \"%s\"\n", p[0]);
printf("----------------------------------\n");
}
p += 2;
}
printf("%d failures out of %d\n", failures, total);
return 0;
}
#endif /* URL_TEST */

View File

@ -0,0 +1,60 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _URL_H_
#define _URL_H_
/* <scheme>://<net_loc>/<path>;<params>?<query>#<fragment> */
/* <net_loc> = <login>:<password>@<host>:<port> */
typedef struct URL
{
/* standard components */
unsigned char *scheme;
unsigned char *net_loc;
unsigned char *path;
unsigned char *params;
unsigned char *query;
unsigned char *fragment;
/* the whole url */
unsigned char *url;
/* for convenience */
unsigned char *file;
unsigned char *host;
unsigned char *login;
unsigned char *password;
unsigned char *pathWithoutFile;
int port;
/* for linked list */
struct URL *next;
} URL;
void urlDecode(unsigned char *url);
void urlFree(URL *url);
URL *urlParse(const unsigned char *url);
URL *urlRelative(const unsigned char *baseURL,
const unsigned char *relativeURL);
#endif /* _URL_H_ */

View File

@ -0,0 +1,129 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include "utils.h"
unsigned char *
appendString(const unsigned char *str1, const unsigned char *str2)
{
unsigned char *result;
if (!str1)
{
return copyString(str2);
}
if (!str2)
{
return copyString(str1);
}
result = calloc(strlen((char *) str1) + strlen((char *) str2) + 1, 1);
if (!result)
{
fprintf(stderr, "cannot calloc string\n");
exit(0);
}
strcpy((char *) result, (char *) str1);
strcat((char *) result, (char *) str2);
return result;
}
unsigned char *
copyString(const unsigned char *str)
{
unsigned char *result;
if (!str)
{
return NULL;
}
result = (unsigned char *) strdup((char *) str);
if (!result)
{
fprintf(stderr, "cannot strdup string\n");
exit(0);
}
return result;
}
unsigned char *
copySizedString(const unsigned char *str, int size)
{
unsigned char *result;
result = calloc(size + 1, 1);
if (!result)
{
fprintf(stderr, "cannot calloc string\n");
exit(0);
}
strncpy((char *) result, (char *) str, size);
result[size] = 0;
return result;
}
unsigned char *
lowerCase(unsigned char *buf)
{
unsigned char c;
unsigned char *p;
p = buf;
do
{
c = *p;
if (('A' <= c) && (c <= 'Z'))
{
*p = c + 32;
}
p++;
} while (c);
return buf;
}
void *
utilRealloc(void *ptr, size_t oldSize, size_t newSize)
{
unsigned char *end;
unsigned char *p;
unsigned char *ret;
ret = realloc(ptr, newSize);
if (ret && (newSize > oldSize))
{
end = &ret[newSize];
for (p = &ret[oldSize]; p < end; p++)
{
*p = 0;
}
}
return (void *) ret;
}

View File

@ -0,0 +1,37 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _UTILS_H_
#define _UTILS_H_
#ifdef FREE
#undef FREE
#endif
#define FREE(p) do { if (p) { free(p); (p) = NULL; } } while (0)
unsigned char *appendString(const unsigned char *str1,
const unsigned char *str2);
unsigned char *copySizedString(const unsigned char *str, int size);
unsigned char *copyString(const unsigned char *str);
unsigned char *lowerCase(unsigned char *buf);
void *utilRealloc(void *ptr, size_t oldSize, size_t newSize);
#endif /* _UTILS_H_ */

View File

@ -0,0 +1,249 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#include <malloc.h>
#include <stdio.h>
#include <string.h>
#include "http.h"
#include "io.h"
#include "view.h"
#define CONTROL_START "<font color=#FF0000>"
#define CONTROL(str) CONTROL_START str CONTROL_END
#define CONTROL_END "</font>"
#define NL "<br>"
static int verbose = 0;
static void
print(View *view, Input *input)
{
char buf[1024];
char *hex;
char hexBuf[4];
int i;
unsigned long inLen;
unsigned long j;
int len;
char *p;
char *replacement;
char *result;
unsigned char *str;
hex = "0123456789ABCDEF";
str = copyMemory(input, &inLen);
buf[1] = 0;
len = 0;
p = NULL;
result = NULL;
for (i = 0; i < 2; i++)
{
for (j = 0; j < inLen; j++)
{
switch (str[j])
{
case '\r':
if (str[j + 1] == '\n')
{
j++;
replacement = CONTROL("CRLF") NL;
}
else
{
replacement = CONTROL("CR") NL;
}
break;
case '\n':
replacement = CONTROL("LF") NL;
break;
case '\t':
replacement = CONTROL("TAB");
break;
case 0x1b:
replacement = CONTROL("ESC");
break;
case '<':
replacement = "&lt;";
break;
case '>':
replacement = "&gt;";
break;
case '&':
replacement = "&amp;";
break;
case '\\':
case '"':
if (view->backslash)
{
buf[0] = '\\';
buf[1] = str[j];
buf[2] = 0;
}
else
{
buf[0] = str[j];
buf[1] = 0;
}
replacement = buf;
break;
default:
if ((str[j] <= 0x1f) || (str[j] >= 0x7f))
{
replacement = buf;
strcpy(buf, CONTROL_START);
hexBuf[0] = 'x';
hexBuf[1] = hex[str[j] >> 4];
hexBuf[2] = hex[str[j] & 0x0f];
hexBuf[3] = 0;
strcat(buf, hexBuf);
strcat(buf, CONTROL_END);
}
else
{
replacement = buf;
buf[0] = str[j];
buf[1] = 0;
}
break;
}
if (result)
{
strcpy(p, replacement);
p += strlen(replacement);
}
else
{
len += strlen(replacement);
}
}
if (!result)
{
result = calloc(len + 1, 1);
if (!result)
{
fprintf(stderr,
"cannot calloc toHTML string\n");
exit(0);
}
p = result;
}
}
fprintf(view->out, "%s", result);
free(result);
free(str);
}
void
viewHTML(View *view, Input *input)
{
fprintf(view->out, "<font color=#009900>");
print(view, input);
fprintf(view->out, "</font>");
}
void
viewHTMLAttributeName(View *view, Input *input)
{
fprintf(view->out, "<font color=#FF6600>");
print(view, input);
fprintf(view->out, "</font>");
}
void
viewHTMLAttributeValue(View *view, Input *input)
{
fprintf(view->out, "<font color=#3333FF>");
print(view, input);
fprintf(view->out, "</font>");
}
void
viewHTMLTag(View *view, Input *input)
{
fprintf(view->out, "<font color=#CC33CC>");
print(view, input);
fprintf(view->out, "</font>");
}
void
viewHTMLText(View *view, Input *input)
{
print(view, input);
}
void
viewHTTP(View *view, Input *input)
{
print(view, input);
}
void
viewHTTPHeaderName(View *view, Input *input)
{
fprintf(view->out, "<font color=#FF6600>");
print(view, input);
fprintf(view->out, "</font>");
}
void
viewHTTPHeaderValue(View *view, Input *input)
{
fprintf(view->out, "<font color=#3333FF>");
print(view, input);
fprintf(view->out, "</font>");
}
void
viewVerbose(void)
{
verbose = 1;
}
void
viewReport(View *view, char *str)
{
if (verbose)
{
fprintf(view->out, str);
fprintf(view->out, "<br>");
fflush(view->out);
}
}
View *
viewAlloc(void)
{
View *view;
view = calloc(sizeof(View), 1);
if (!view)
{
fprintf(stderr, "cannot calloc View\n");
exit(0);
}
return view;
}

View File

@ -0,0 +1,45 @@
/*
* The contents of this file are subject to the Mozilla Public
* License Version 1.1 (the "License"); you may not use this file
* except in compliance with the License. You may obtain a copy of
* the License at http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS
* IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
* implied. See the License for the specific language governing
* rights and limitations under the License.
*
* The Original Code is Web Sniffer.
*
* The Initial Developer of the Original Code is Erik van der Poel.
* Portions created by Erik van der Poel are
* Copyright (C) 1998,1999,2000 Erik van der Poel.
* All Rights Reserved.
*
* Contributor(s):
*/
#ifndef _VIEW_H_
#define _VIEW_H_
#include <stdio.h>
typedef struct View
{
int backslash;
FILE *out;
} View;
View *viewAlloc(void);
void viewHTML(View *view, Input *input);
void viewHTMLAttributeName(View *view, Input *input);
void viewHTMLAttributeValue(View *view, Input *input);
void viewHTMLTag(View *view, Input *input);
void viewHTMLText(View *view, Input *input);
void viewHTTP(View *view, Input *input);
void viewHTTPHeaderName(View *view, Input *input);
void viewHTTPHeaderValue(View *view, Input *input);
void viewReport(View *view, char *str);
void viewVerbose(void);
#endif /* _VIEW_H_ */