I believe that the best way to define what’s in a web site is to create a machine-readable file listing all the assets you think are important enough to be documented. And a machine-readable file, in this case, means that we’re going to talk XML.
Should we use RDF? That, for everyone who uses RDF, would be a no-brainer. If someone was to take the model I set up and created an ontology optimized for describing web sites, then great! Please share it with me when you get it finished!
In fact, I wanted to use RDF to describe this. I have no problems figuring out the model RDF uses. However, I have yet to find a good tutorial that makes learning RDF syntax as easy. The code examples I’ve seen are inscrutable.
So I don’t want to use RDF. I want the solution to be as simple as marking up HTML. Tim Bray has argued on many occasions that a successful markup language (or programming language) is one you can view-source, and hack around in with a high degree of confidence that what you’ll do will probably work. That should be the sweet spot to aim for in the implementation of the syntax.
Another approach is to build yet another markup language that captures the ontology precisely, and is easy to pick up. I didn’t want to do that either. In a world where everyone and their brother has their own XML-based tag set, yet another one isn’t going to do much good. I’d much rather try to leverage something already popular and easy to use.
So, the solution I want is going to be an ontology-free (or as free as possible), simple to use, already deployed XML markup language.
I can only think of three candidates: RSS, Atom, and XHTML. Before I discuss the implementation, I want to sketch out what I think should be represented and how.