So, dajobe, being a nice sort, encourages me to say more here; alas, one of the downsides (upsides?) of being a writer with a paying gig is that I often say what I would have otherwise said in a weblog entry as an article for XML.com instead. And so it is this week with the ISWC 2003 event in Florida.
I'm here and spending time writing articles for XML.com (2 so far, maybe a third before I leave), which doesn't leave much time for weblogging the event here. I will say that it's far better attended than I was expecting (470 people! Last time I went to a USENIX event, there were only 200 there), considerably more interesting, and relatively optimistic.
I say some more interesting things (well, *I* think they're more interesting -- let me know if not) about all of this over on XML.com (no link yet, but it'll be up by Thursday...).
In addition, the danger of being a part-time hack, part-time pseudo-journo is of saying more than I do, which is something I dislike in others. And since I'm such a painfully slow programmer (or is it that I'm easily distracted? Hmm...), I fear that I regularly say more than I do. Bother.
Of course the real solution to this problem isn't to say nothing, but to just get more stuff done, and then not worry too much about the ratio.
I started this weblog a long time ago because Bijan Parsia, Paul Ford, and I were trolling around with a proposal for a book called Semantic Web For Everyone, our attempt to popularize the semantic web effor, without faking it. We had a very strong proposal, sent it out to a smallish West Coast tech publisher (not O’Reilly) and never even got a reply back from them. The fuckers.
Anyway, we should have kept trolling it around, but we just gave up — stupidly and too soon, it seems now.
12 months later, Bijan has been working for Jim Hendler at the University of Maryland, doing semantic web research, for a year. And now I do, too. I just started this week, a 50% appointment as a faculty researcher (or somesuch silly title) — I think this going to be very good for me. (I don’t know how good it’s going to be for anyone else, yet, but I’ll try hard.)
Bij decided to throw me into the deep water by having me write a rule engine (sorta specialized to RDF and triplestores) using the famous Rete algorithm. I worked on it on and off this week, on Thursday and Friday, and I’ll likely work on it some this weekend — I’m probably about 33% done with an initial spike. (So, think “RDF-friendly Jess-clone” in Python, where “clone” is meant very loosely.)
Probably the best thing I’ve done so far is to coin a (very clever) name for this rule engine; it’s clever because Rete is a kind of directed acyclic graph which functions like an inverted index for figuring out which rules to fire when facts are run. So you compile the ruleset into this structure which makes figuring out whether a new fact matches a rule rather speedy. The facts filter down through this structure in a way reminiscent of the way a Pachinko machine works.
Once I realized this metaphor, it became obvious that the project should be called Pychinko — clever animated logo coming soon…
The beauty here is that rule engines are very well understood technology, have a kind of natural elegance for solving an interesting range of problems, mesh nicely with RDF and triplestores generally, and, near as I can tell, not a lot of work has been done in Python on a rule engine… At least, there’s not much free software in this space (which, for our purposes, is what counts — the other desideratum here is to beef up the general utility of Python as a semantic web and web services language, as well as add more value to rdflib).
It’s challenging, rewarding, useful, and fun work. And that’s pretty much why I wanted to learn to program in the first place.
Code — in the MIND lab Subversion repository — coming as soon as I have a working spike. I’m aiming for some time this next week, but we’ll see how it goes.
One thing I'm trying to decide is how best to expose SemChimp's rendering interface (though with only one method so far, it's very minimalist) to other processes, whether local or remote. For local processes, I think I'm going to try a pair of named (FIFO) pipes; that way, just about any local process -- as long as it can read() and write() to and from a file -- can write() a query and read() the results. That'll work nicely, whether in a cron job or a web servlet.
As for remote processes, I'll probably start with XML-RPC, if only because it's so godawfully simple. But I'm really aiming for a REST-RDF interface, and Mark Baker's An RDF view of REST will be helpful in sorting that out. This will be especially important when it comes to the rent-a-bot stuff we have in mind, about which more later.
Hmm, haven't heard back from the publisher we submitted our SemWeb proposal to. That's probably not a great sign; I suspect it's something that struck them as not in their baileywick. Which is fine.
We'll just send it to the next pub on the list.
But not before modifying the proposal a bit; I think the main problem is that we assume too much knowledge on the part of the reader, who's not likely to know anything about the SemWeb, a defect easily remediable. The other problem is that the outline hints at a book that's likely too long. So we need to trim some stuff. One thing we've talked about is making SemantiChimp the primary application, rather than doing three different ones. That makes good sense as SemantiChimp is shaping up very nicely.
One thing I like about SemChimp is that, as an irc bot, it very nicely encapsulates some of the things we want to say about the SemWeb, particularly the collaborative-social aspect.
Onion's CAW might be a way to distribute semantichimp annotations.
xpath2rss -- Mark Nottingham's scraper. This, plus oneof Kip's old columns about using XPath in Perl, will make good bits for the scraping chapter.