I’ve spent the last two days giving a course on basic programming and modeling techniques such as UML, design patterns, refactoring, and other related subjects. I think it’s the sixth time I do this, so it’s getting a little trivial. One of the more fun parts, however, is that I also spend a couple of hours on Color Modeling, most of the time on an exercise where the participants use colored post-it-notes to create a model of a business. They’ve more or less tried the same using just plain UML without using any particular methodology, and the difference is always striking: With Color Modeling, the result is quite consistent and contains more or less the same terms and classes. Without it, the models tend to be much more random, and they don’t really capture the essence of the business domain.
While pretty much everybody at the course can see that it can be a good idea to use Color Modeling, almost nobody actually does it afterwards. Personally, I feel that designs based on Color Modeling are much more consistent with the actual problem domain, so it’s somewhat hard to understand why it’s not more widely known (it’s extremely rare that I run into developers who know about the technique).
I know that Color Modeling is not the answer to everything, and that it has its downsides too. However, it strikes me as very strange that just about no developers I know actually know any reusable modeling technique which results in consistent models. Why is this? Modeling is hard, so why is there not any focus on this? It might be an educational issue – when I was at university, I was taught the only other technique I know: Find all the things in the system – these become classes. Find all the actions these things perform – these become the methods on the classes. And that’s more or less it. And isn’t that just a little saddening?
- December 13th, 2007
- 11:32 pm
The project I’m working on now is using Oracle for data storage. Not only that, it’s good old 9.2. Try installing that on your brand new Ubuntu. Oh well, it’s doable, but recently we’ve begun discussing if we could run it on MySQL or PostgreSQL (I’m voting for PostgreSQL, but that’s mostly because I used MySQL back when foreign keys and other advanced features weren’t exactly implemented, and in the meantime, I’ve come to like PostgreSQL. At the very least, hitting Ctrl-c in psql does not exit the prompt like mysql does).
The way it works now is that if something goes wrong, we can either call Oracle or our strategic Oracle partner, and they’ll fix it. This is just about everything I’m against: paying large amounts of money just because then we can blame somebody else when things screw up.
There are basically two issues with switching to a more light-weight database, and they’re probably more or less the same on many projects. First of all, performance. Oracle probably has a good advantage there, especially in handling large and complex queries. Secondly, maintenance, especially backup and failover.
The only one of these points which I think is valid is backup – it doesn’t really help if it takes a week to backup or restore data. However, when that’s said, we’re left with two things: Performance and reliability. The Oracle way (I’m using this term as the broadest possible. Replace Oracle with MSSQL or DB2 if you like) is to add more memory, disk, clustering, high availability. Expensive, but you get to live in your little world where you can just write code against one large database.
The other way, which I prefer (and which is probably in the Web 2.0 spirit) is to distribute. Distribute both data and processing to a number of autonomous nodes which can operate independently of each other. This is no news, and has been done many times, but it’s not something that’s normally considered when building good old business applications.
The result of distributing is essentially that you think about how you’re accessing your data. Instead of just delegating al of the work to the database and hoping for the best, you’re actually forced to analyze data relationships to discover separate components. If this succeeds, the choice of database should no longer be about whether it can optimize a query over 50 tables with subselects, type conversions, views, functions, and other stuff, but if it is efficient at looking up simple data relations. My guess is that all the popular databases can do this, so then you’re free to choose the cheapest, the one which is easiest to work with, or whatever suits your environment.
Returning to the project I’m working on, we have some reservations about some of the queries we’re executing. They’re on Oracle syntax right now, but can probably be converted to regular SQL in a finite amount of time. That doesn’t make them any smaller, but we have a good amount of pretty static data (addresses, classifications, and so on), which are retrieved together with the more dynamic data. It the static data is removed from the SQL, we end up with some much simpler queries, which shouldn’t be a problem for any database engine. The problem which remains is how the static data is retrieved effeciently. At the moment, I’m leaning towards a solution where we implement a service which can take a list of data keys and return the data. Depending on the amount of data, the service can then be implemented as a in-memory map, a memcached cache, or maybe even something like Hadoop. No matter what, basing the model on a basic principle of isolating static data from the dynamic, and only querying the dynamic data seems like the way to go as a first step – and as a nice side effect, the dependency on the database’s ability to perform doesn’t matter that much anymore.
This probably just sound like drunken ramblings to those who have actually implemented distributed business systems, but bear with me, it’s a first for me, and I need to get things out of my head before the space runs out.
- December 4th, 2007
- 11:03 pm
I’ve been scheduled to do one of Trifork’s Software Pilot JAOO meetups. It’s a simple concept: we choose a topic, invite some speakers (or use one from Trifork), and people show to listen to and discuss various topics. It’s free for all, so do join us. More information is available at trifork.com. The topic for the last meetup was application security, which Kresten presented, and the next time it will be about software modeling with focus on color modeling.
I find color modeling interesting because it’s fun to work with, and it’s very simple to learn and teach. Of course, it helps that it puts those colored post-its to use, but my experience also tells me that it’s a good technique in a number of cases.
More on color modeling at the meetup. Right now, I’d like to reflect a little upon the project, I’m on at the moment. We’re currently developing a new system for The Danish Medicines Agency which will be used for all medication ordinations in Denmark. Right now, if you’re admitted to a hospital, you better be awake, because you’re basically the only one who know what kind of medication you’re on. If you can’t tell the doctors yourself, there’s no way to know what medication you’ve received recently. This is what the new system is trying to solve.
Now, the system is pretty simple, as it’s basically a data repository accessible via web services. We receive medication data, store it, and send it out again when requested to do so. The web service interfaces were defined long before we started development, so that wasn’t a concern to us (beyond using them). The usual (or enterprisey) way to proceed would be to
- Install Axis or something alike
- Generate code based on the WSDL and XSD files
- Receive these types at the topmost layer and convert them to some kind of value objects or data transfer objects
- Map the value objects to a database using Hibernate, EJB3 or another O/R mapper
Inspired by Steve Loughran and Edmund Smith’s Rethinking the Java SOAP Stack, I suggested that we skip the code generation and value objects steps and just used XML and JDBC directly. This didn’t exactly receive a warm welcome, but after discussing it more, everybody more or less agreed that the value of the generated code was not exactly clear. Also, there’s been some trouble with Hibernate, especially in very large databases and when writing queries spanning a large number of tables with complex joins.
This means that the system is now based on basic JDBC (through Spring JDBC) and XML. The Java DOM API isn’t exactly a pleasure to work with, so I wrote a pretty simple wrapper class for it, which can do something like this:
Namespaces ns = new Namespaces();
XMLObject xml = new XMLObject();
xml.setValue(“mc:Card/mc:Patient/mc:Identifier”, “identifier”, ns);
// num will be null
Long num = xml.getLong(“Card/Element/Does/Not/Exist”);
String id = xml.getString(“Card/Patient/Identifier”);
In other words, a very simple api for accessing XML structures. In setValue, elements are created automatically if they didn’t exist, and the get methods will never throw a NullPointerException. Compare this with a similar DOM call where any number of NPEs can occur:
How do we then ensure that we actually generate valid XML? The answer is heavy unit testing and schema validation. Just like you should do in any case.
We’ve now reached the point where we’ve implemented most of our web services, and for once, I don’t have the feeling that I’ve spent my time writing stupid value classes with no functionality. The jury is still out on whether the design is actually good and scalable, but I think it’s a good approach. Should everybody do it? Probably not. But it definitely shows that you shouldn’t just accept common orthodoxy and do the usual enterprise system without reflecting upon how your model will be influenced.
By the way, this system will be released under an open source license (probably Apache 2.0 or Mozilla Public License 1.1) at Softwarebørsen, the Danish government’s open source repository.