Hello World!

I’m hopping on the bandwagon and getting myself a blog. Don’t expect too much, though. I tend not to update these things…

I intend to have this blog cover code, projects I’m working on, or something related and interesting. However, I will probably post some random stuff, too. I have a couple bad jokes in my head. Keep on the lookout for them. 😀

So, first up, I created a little thing that takes in an xml sitemap, gets all the urls in that sitemap, and checks if they’re valid with W3C. You can take a peek here: michael.lavaveshkul.com/projects/validation/validator.php
I made it synchronously check one url at a time so that I and W3C don’t get flooded with http requests all at once.

It grabs the urls from the loc tags (from the url tags) and puts them in the table. If you don’t have javascript turned on, it will determine the validity of each page then (and may take a while to load). If you don’t, then it turns “Is Valid” text in the upper-right corner into a link and automatically clicks when the page loads.

Clicking that text starts a chain of calls to a little json generating php script that determines if the page for the given url is valid. A call is made for each url. If you’re wondering how I determine if each page is valid, initially I loaded the whole html document and got the title using DOM. But then I figured it was a waste to grab the whole page when I only wanted the title, so I used steam_get_contents to only get the text around the title tag area and check if “[Invalid]” or “[Valid]” shows up. After cruising around W3C’s validation site, I found out I could actually get the status from the headers using their api

I also found out that they want developers to make their programs sleep for at least 1 second between requests when checking a bunch of pages at once. This wasn’t a problem on the php side for the non-js version. I can’t say the same for javascript. If you look at the source, I actually wanted to use validationCheck2, but I couldn’t figure out a way to make it wait for a second between requests (besides something like get the time and run a while loop until it’s 1 second later).

I’d also like to get the soap output from the api for the javascript version. However, I don’t think it’s possible to get it directly from w3.org because of the XMLHttpRequest object isn’t able to call on services from another domain (usually). Is there any way for me to get around it without having to hit my own server?

This thing is still a work in progress, but you can try it out now. If you don’t have a sitemap to test with, you can use blah.xml. (Note: If you don’t have javascript enabled, click “Go”.)

Leave a Reply

Your email address will not be published. Required fields are marked *