- March 25th, 2008
- 10:04 pm
At Trifork, we have had a number of different consulting jobs in the Danish public sector, especially for the National IT and Telecom Agency (who is also sponsoring development of OpenUDDI). Among other things, we’ve been deeply involved in OIOWSDL, a WSDL profile for the Danish government. This profile specifies how to use WSDL and XML Schemas to do contract-first development. All in all quite reasonable – you can always argue whether the correct choices have been made, but at least you don’t have to make them yourself.
I had the somewhat doubtful honor of ensuring that OIOWSDL is compatible with all the popular platforms out there – which means that I’ve tested OIOWSDL-conforming WSDL and XSD with IBM RAD, BEA WebLogic, .NET 2 and 3, Ruby, Axis 1 and 2, and others. I already knew that web services were complicated, but that exercise really made the complexity clear. As an added bonus, I’ve been haunted by web services ever since because I’m now the guy who knows everything about them. This is also why the term WS-Deathstar is not accurate – it should be WS-Blackhole, because once you’ve come too close, you’ll never get away.
So, it was quite a change when we were asked to help with a REST profile (OIOREST). For me, it made perfect sense – REST provides much easier access to data, and interoperability is also much easier to accomplish. Unfortunately, it’s only in Danish, but we’ve taken the first steps and written a draft of the profile. The draft can be downloaded at oiorest.dk where you can also find an invitation to a workshop. The workshop is open for everybody who is interested, and the purpose is to extract experiences and attitudes towards REST.
oiorest.dk also sports two examples of what types of REST services we see. Feel free to play with them, but don’t use them for production.
It will be interesting to see where OIOREST will go from here. My hope is that more data will become public accessible – for example the Central Business Register, polution information, and all the other stuff that’s hidden away right now.
The OpenUDDI server poses an interesting challenge when loaded with large amounts of data. The problem is actually pretty generic:
Given a number of objects where each of the objects have a set of (key, value) tuples associated, find the objects which have specific tuples associated. Preferably in SQL, where the tables could look like this:
CODE:
-
OBJECTS
-
------
-
| Id |
-
-----
-
| 1 |
-
| 2 |
-
| 3 |
-
-
-
VALUES
-
-----------------------
-
| ObjId | Key | Value |
-
-----------------------
-
| 1 | a | val1 |
-
| 1 | b | val2 |
-
| 2 | a | val1 |
-
| 3 | c | val1 |
This is just a simple example. In the OpenUDDI server, we're working with about 100.000 entities in the Objects table (Business Entities) and 1,8 million key/value pairs, distributes pretty evenly out on the 100k objects. One of the interesting features is that some of the pairs appear in almost all objects while other pairs are close to unique.
When searching for pairs (a, val1) and (b, val2), SQL queries look something like this:
CODE:
-
SELECT o.* FROM objects o, values v1, values v2 WHERE o.id = v1.objid AND o.id = v2.objid AND (v1.key = 'a' AND v1.value = 'val1') AND (v2.key = 'b' AND v2.value = 'val2')
In other words, a dynamic SQL statement based on the number of pairs in the query. Most databases cannot process this type of query effectively for two reasons:
- Statements cannot be reused, and complexity goes up as the number of pairs increase
- Index utilization is hard because some pairs occur many times while other pairs are almost unique. Only few database engines always use the unique pairs as the primary filter key, regardless of the order in the query
The question then is how to make this run fast. With OpenUDDI, this type of query can run in just under 1 second on PostgreSQL 8.3. All access is index based, but it's still pretty slow, given that we would like to handle many requests per second.
It would be nice, of course, to be able to say that we've solved the problem, but unfortunately this is not so. For now, we've settled on a heuristics based optimization: We count all pairs and store the counts in memory. When a new query comes in, we find the pair with the lowest count and use only that pair in the SQL statement. This will speed the select statement up considerably, the only catch is that we now have to post-process the result in memory in order to filter out any entities which does not match the rest of the pairs.
The remaining pairs could probably be pushed into SQL too by using a subselect, but for now this solution works.
My only problem with this solution is that it seems as if there ought to be a better way. However, I can't think of any, and neither can anybody I've talked to about it.
As I wrote earlier, I spent the first three days of this week on a Kent Beck course
Kent Beck hurried on to QCon London to give a keynote with more or less the same overall message as in the course: accountability and responsibility is just about everything. When you take responsibility, you earn trust, which again enabled you to have a better relationship with other people, including developers, customers, managers, and so on.
One interesting bit came up in regard to discipline. I've always said that XP and agile processes take discipline to implement and use. Kent Beck's take on this was that it was just the opposite - not doing XP was hard for him. Instead, it's more or less a question of habit, which is where the problem often lies: Changing part of yourself requires an investment, but it's not completely clear when the investment will yield a profit. Ironically, this economical argument is also used to promote XP: push the cost into the future and pull the profit closer - for example by releasing often, not gold plating, and so on.
Adopting an agile process then becomes a question of how you change habits, and keep from falling back into the old ones. Leadership is one way, double- and triple-loop learning is another, and there are probably many more. Incidentally, this is exactly the subject I worked with at university together with Michael with just about the same results.
We've just released the new OpenUDDI project site at SourceForge. This will be the new primary project site for OpenUDDI instead of the old site at Softwarebørsen, which was in Danish only. In a short while, we will make our SVN repository public too, but until then, we've uploaded the newest release together with the sources.
If you're interested in OpenUDDI in any way, I encourage you to join the mailing list. We would very much like to hear from anybody using OpenUDDI.
Enticing title? Unfortunately, I don't quite have much to say about the subject - or at least nothing to say here, but hopefully that will get much better next week. For 3 days (Monday-Wednesday), I will be attending a Trifork special, a course held by Kent Beck, the father of XP and much of the agile movement. I've never experienced Kent Beck live before, but I expect the best. As always, the main conclusion will probably be that the secret consists of two things: experience and discipline. Still, it doesn't hurt to learn some new techniques, which I hope will be the case with this course.
It's been some time since the last update, but now I finally got around to do some updates for my Hudson plugin for Eclipse. A couple of new features have been added:
- Name filtering
- Support for Hudson views
Other than that, some smaller bugs have been fixed, and the quite annoying error dialog when Hudson can't be reached doesn't appear anymore unless configured to do so.
The new version, 1.0.4, can be downloaded at http://code.google.com/p/hudson-eclipse/. Comments and suggestions are welcome. Patches are even more welcome.