I’m hopping on the bandwagon and getting myself a blog. Don’t expect too much, though. I tend not to update these things…
I intend to have this blog cover code, projects I’m working on, or something related and interesting. However, I will probably post some random stuff, too. I have a couple bad jokes in my head. Keep on the lookout for them. 😀
So, first up, I created a little thing that takes in an xml sitemap, gets all the urls in that sitemap, and checks if they’re valid with W3C. You can take a peek here: michael.lavaveshkul.com/projects/validation/validator.php
I made it synchronously check one url at a time so that I and W3C don’t get flooded with http requests all at once.
Clicking that text starts a chain of calls to a little json generating php script that determines if the page for the given url is valid. A call is made for each url. If you’re wondering how I determine if each page is valid, initially I loaded the whole html document and got the title using DOM. But then I figured it was a waste to grab the whole page when I only wanted the title, so I used steam_get_contents to only get the text around the title tag area and check if “[Invalid]” or “[Valid]” shows up. After cruising around W3C’s validation site, I found out I could actually get the status from the headers using their api…