I saw the news about ol’ Bill announcing that WinFS isn’t going to make the Longhorn ship date, and ended up learning a few other things in the process. Here are a few links you might want to check out:
Obviously not a complete list. If you have other links related to powerful, database-driven or database-like search on the OS level, feel free to send me an email.
noted on Wed, 06 Aug 2003Over at O’Reilly Network, there’s an article in the MacDevCenter covering a small selection of what Giles Turnbull calls electronic brains
It’s an interesting genre of software - interesting because there’s a need for it, but no one seems to want to pay for the software. I think the reason for it is that we’re not sure, generally speaking, what the payoff is, unless you’re already a highly organized person. So what to do with software many people should learn to use, but no one wants to pay for?
This is the kind of software that really ought to be integrated into the OS level - and, arguably, we actually have an ‘electronic brain’ there already. You can store data and pictures on the hard drive; you can organize information into categories (ie: directories); you can search, either by the file name, or by the contents of a file (if the contents are text).
Still, many of todays shipping OS’s don’t make great electronic brains. Since I’m a Mac OS X user, I’ll focus my points relative to that system. So what doesn’t the OS do well in the eBrain department? Well, content-level searches are dog slow for one thing. My home folder is just under 10.5GB’s in size, and performance is great for browsing. However, a search for the word “foo”, which I know I use as often as any geek, took several minutes to complete. This is unacceptable performance.
We need a better, faster way to index our drive contents to make searches faster. Mac OS 9 had this down pat. I’m not sure why Mac OS X is so much slower; perhaps it’s because the file count is so much higher. Perhaps no index is being built. I’d also like to see the ability to modify search results by using an RDF file that maps out some of the relationships between various words, and make it completely user-editable, and completely optional. A really simple version of this file that most users would get would be a list of words and their synonyms or antonyms. So, if I had this file, and in it mapped “Blue Jays” to “birds”, then any search for “birds” would turn up articles on “Blue Jays” - especially if the “Blue Jay” articles didn’t have the word “birds” in it.
I have one other major nit with using OS X as an eBrain: it’s really hard to add metadata to resources like images and music files — in other words, binary data. Well, that’s not really true. Mac OS X comes with iTunes, iPhoto, and iMovie. All these apps allow you to add metadata like the creation date of the file, title, a description, and it goes into, surprise surprise, XML files. I would love to hit Command-F in the Finder that can search, and return, all the metadata from those apps and return the results as common peers to the text searches it can already do. Throw in email and my bookmarks, too, while you’re at it. The iLife apps stores their metadata in an XML file, and email is stored in text files as well. The thing that’s really needed is a smart indexer capable of working with content from these disparate sources.
What I’d love to see ain’t Sherlock Or Watson I’m not searching the web. I’m only searching for stuff on my hard drive. Google can’t find this dream app for me... but maybe someone will write this program soon.
Tim Bray’s On Search, the Series is an excellent series of articles that outline the scope of the problem of searching. My wish list isn’t trivial to implement from scratch, but if you’re new to building search applications, then this should be a good source to tap into.
Want to write up this dream app for me? Got time? Don’t know how to build Cocoa Apps? Maybe this book will help.