While many development tools include report generators capable of testing the validity of your xhtml site wide, they can only act on the code they are aware of. Where pages are dynamically generated using server-side scripting such as ASP / PHP etc, the only real way of testing the resulting xhtml is to test the content at run-time. The w3c validator is a handy tool for doing this, but it can become tedious to use interactively across a large number of files or test-cases. Additionally, if the site files are part of an extranet, or carry confidential information, you won’t want to be pushing the content out to a public service for validation.
An approach I use from time to time is to batch test site files on an internal network. To do this I use the following tools…
- w3c validator
- wget
- awk (this is already on your system)
There is a small amount of work to do in setting up your system for this approach.
1. Install the w3c valdidator locally
Apple have produced and maintained an excellent article on how to install the validator onto your Mac OS X system. However I found a few areas which at time of writing were not accurately described…
- The Base code and the DTD Library downloads are gz files not tar files. This means the 2nd and 3rd lines of the instructions will not work as written. If of-course you double-click the downloaded files on the desktop, they will first expand to the tar file, then expand to a folder on the desktop. At which point will now have the following files and folders on your Desktop:
- sgml-lib.tar
- sgml-lib.tar.gz
- validator.tar
- validator.tar.gz
- validator-0.7.2 (folder)
- validator-0.7.2 2 (folder)
- validator-0.7.2 2 actually contains the expanded content of the sgml-lib.tar.gz file and contains a htdocs folder itself containing a sgml-lib folder.
- The instructions also seem to fail to inform you where to place the sgml-lib folder. It is apparent from the validator.conf sample file, that the sgml-lib folder should sit inside htdocs. So before going further. just through the finder I copied the sgml-lib folder from validator-0.2.2 2 to validator-0.7.2/htdocs/
- We can now pick up the instructions on the 4th line which reads “cd /Library/WebServer/Documents”. The remainder of the instructions worked fine for me on Mac OS X 10.4.7
- To test your completed install, paste visit: http://localhost/validator/htdocs/
2. Install Fink Commander
As part of the process of installing the validator, you have installed Fink. Fink commander provides a GUI interface to Fink, I suggest you install it as the next step.
3. Install wget
Using Fink Commander, download and install wget. This is a command-line utility that we will use to automate the downloading of files.
4. Create a ‘batch’ file
OK, strictly speaking not a batch file, but a text file of urls for wget to request. We must first take our list of files and process the text for form suitable urls. For this I suggest the following steps…
In the terminal window, use the following commands…
Using the terminal window, navigate to your local webroot folder.To generate a single column list of all the files within webroot we can use the command…
ls -1
We can send the output of one command directly to the next command using the pipe symbol “|”. Ultimately we wish to create a file containing full urls to the files we want to process. So the next stage is to prefix the filenames with the start of the url. We can do this by ‘piping’ the results from ls into awk. Then finally directing the output to file sourceURLs.txt which will be created up one level so as not to infect the webroot. So on one line…
ls -1 *.asp |
awk '{printf"http://localhost/validator/htdocs/check?
uri=http://www.yourdomain.com/%s\\\n",$1}'
> ../sourceURLs.txt
Note: Currently I don’t have style sheet control on this blog. The above 4 lines should be typed into your terminal as a single line.
5. Process the urls
- move up one level so your working directory contains both sourceURLs.txt and the webroot folder…
cd ..
- create a new folder to contain the validation reports…
mkdir validationReports
- change to that directory
cd validationReports
- execute wget using sourceURLs.txt to create validation reports of each page
wget -Ei ../sourceURLs.txt
- The current folder now contains one html file per validated file. Open them in your browser to check the results.