Thursday, April 30, 2009

Data Transparency: Why You Should Care

Cross-posting from my RTI blog:
Most other messaging systems do one of the following instead:
  1. Use opaque data only, and make the application handle marshalling/serialization itself, including data encapsulation, endianness conversions, and the like.
  2. Include complete data structure information with every message, including the names and types of any fields it contains.
Multi-vendor interoperability is fiction without some kind of type system, which disqualifies (1). How can I so much as send you a single string of text if we can’t agree whether that string should be length-prefixed or NUL-terminated, how wide the characters should be, or what encoding they should use? But self-describing messages are awfully heavy on the network, and they fail to capture generalities that already exist in your system.
Read the whole post here.

Taking Responsibility

I'm listening to a segment on Morning Edition contrasting two ways of interrogating terror suspects: one based on torture, carried out by the CIA, and another based on building rapport, carried out by the U.S. military. As you read or listen, pay attention to the use of language.

Matthew Alexander, a leader in the military program, speaks in active voice and even in the first person: "One of my best techniques...."

Michael Hayden, the CIA Director, speaks in passive voice: "The use of these techniques...."

If you can't take personal responsibility for what you do, and do so with dignity, you're probably doing something you shouldn't.

Tuesday, April 28, 2009

C++ Considered Harmful, Part II: Acceptance

In my earlier post, I passed from Anger (having done away with Denial before writing) all the way through to Depression. I've been spoiled by the control and support Java, .Net (C# and C++/CLI), and even C offer me in controlling my software dependencies. C++ has other things going for it; robust dependency management is not one of them. I Accept it.
I might wish for tool chain vendors to change their name mangling schemes in ways that break link compatibility every time they change the inline implementation in their header files in ways that break binary compatibility, but wishing might be tantamount to backsliding. I'm in Accepting mode.
With respect to the specific problem I talked about last time, I've decided to change my approach. I've discovered that I can get rid of the Visual C++ warnings if I never refer to template instantiations by value in DLL-exported classes, only by pointer or reference. (A combination of pure virtual classes and the PImpl pattern -- depending on whether I need to create automatic objects of a particular type -- help here.) I'm also getting rid of the explicit template instantiations altogether. Let the compiler instantiate those templates whenever, wherever, and as many times as it wants. If the language and its runtime can't save me or my users from ourselves or each other, so be it.

City People, Country People

Two of the important tenets of the environmental movement are public transit and local food. Has it struck anyone else that these might be conflicting goals?

Practical and pervasive public transit demands density: the maximum number of travelers and destinations in the minimum number of square miles.

Local, beyond-organic agriculture requires more farmers per capita., farmers with substantial land in between them. Farmers in close proximity with non-farmers. Low density.

And it's not just a matter of high or low population density, shared transportation or personal transportation. Urban living is human-centered living on a large, industrial scale. Low-intensity agrarian living is nature-centered and non-industrial -- anti-industrial. Could a hybrid society comprised of modern cities, ringed by farmland, and separated by open space develop organically? Could it support a consensus culture? Are those things among our goals in the first place?

Wednesday, April 22, 2009

RTI Message Service: Less is More

Cross-posting from my RTI blog:
If you’ve been reading up on RTI Message Service, you’ve probably noticed that its message throughput is about an order of magnitude greater, give or take, than that of other JMS implementations. That’s pretty cool. (I led the RTI Message Service team, though, so maybe I’m biased.) It means that you can take JMS-standard technology places that you never could before. If you were thinking about buying more servers, or were wondering whether you’d have to build that new component in C to get better performance, maybe those are things you don’t have to worry about anymore.

But that’s not the subject of this post.
Read the whole post here.

Monday, April 20, 2009

Hugo's there?

The airwaves are abuzz this morning with the news that America President Barack Obama greeted Venezuelan President Hugo Chavez with apparent courtesy and a warm handshake.

The forces of righteous indignation immediately sprang into action. Former Speaker of the House Newt Gingrich declared that he didn't mind if President Obama spoke with President Chavez, but that smiling was inappropriate. President Obama should be sure to speak in a "cold and distant way."

Give me a break.

Gingrich continued: "...And if President Obama needs to address President Chavez by name, he should be sure to say 'president' in a sarcastic and mocking tone and should preferably make little quotes in the air with his fingers. And if President Obama needs to scratch his nose while he's speaking with President Chavez, he should scratch with his middle finger, which he should extend to a distance of at least 1.5 inches beyond his other fingers."

But I shouldn't pick on Former Speaker Gingrich. On a more personal note, I'd like to add that the president made a joke the other day that I did not find funny and that the shine of his shoes has consistently failed to meet my expectations.

Sunday, April 19, 2009

C++ Considered Harmful

I've been working recently on a C++ API for a cross-platform library, an API that will be implemented by several software vendors for a very wide variety of operating systems and compilers. The language has been driving me crazy. The problem is the template—or more broadly, the incredible amount of inline implementation modern C++ makes it so difficult to avoid.

Being a responsible software engineer (and not merely a software craftsperson), I strive to develop in a component-oriented fashion.

(A component, for those unfamiliar with this term of art, is the unit of software deployment. A statically linked executable or a dynamically linked library is a component; a statically linked library, a header file, or a class definition are not.)

That is, I do not develop with the expectation that I am building a monolithic system; I develop in a modular fashion and carefully manage the dependencies between modules.

(A module is the unit of software development; it consists of software that is built together for the purpose of delivering a certain functionality. Software elements within the same module often have privileged access to one another's state and behavior that is not granted to elements in other modules. A component typically consists of one or more modules.)

I expect my users to do the same.


The Problem

When I send my component to my user, he (or she) will develop his software to the interface provided by my software. Then he will deploy a system consisting of at least two components: component U (developed by the user) and component R (developed by yours truly). A given version N of U depends on a particular version P of R—and on versions Q1-Qm of all components on which R itself depends. And therein lies the problem.

The inline implementation in the C++ standard libraries does not comprise a component: it is not versioned and deployed alongside R and U in any way that their developers can control or manage. Instead, it is built into R and U separately. Imagine two puzzle pieces; these represent R and U. But these puzzle pieces are made of wax, and in the white heat of C++ their edges run together.

Let's take a concrete example: std::vector<int>. A vector is not a type that can be parameterized to hold integers; it is a template for a type. A vector-of-int is a type; this type is defined implicitly by the compiler as needed.

Suppose R exports a method twaddle_integers(std::vector<int> integers). Some code in U calls this method. No both R and U depend on the definition of std::vector<int>; where does this definition live? Although std::vector is defined as part of the C++ standard library, std::vector<int> is not defined in that component. (Actually, it may be; that's not specified. But in any case, that's not the whole story.) What in fact happens is that std::vector<int> is defined in both R and in U. In fact, depending on how smart the compiler and linker are, std::vector<int> may be defined multiple times in each of R and U.

Multiple identical template instantiations is one of the reasons C++ has a terrible reputation for generating code bloat. But code bloat from identical template instantiations isn't a big problem for a lot of modern applications. The real problems come about when the instantiations are not identical.

Because each of the components R and U is also a module, and is therefore built as a unit, multiple definitions within them are almost certain to be the same, so passing an object of one instantiation of std::vector<int> to code that assumes it belongs to a different instantiation of std::vector<int> isn't a problem. But R and U are not built at the same time or by the same people. There are multiple implementations of the C++ standard library on each platform, and multiple versions of each one, and there is not way to indicate, except in documentation, which version R expects U to use, and no way to detect—other than by a program crash or data corruption—if the authors of U have made a mistake. And of course, if U depends on both R and another component S, and these two have different dependencies from each other, U is SOL.

Most platforms and compiler tool chains don't provide developers with any support in detecting this type of problem. If I build a dynamic library on Linux, for instance, that provides the twaddle_integers API described above, GCC will happily and silently instantiate std::vector<int> in each of R and U. If these instantiations happen to be compatible, everyone else remains blissfully happy too. But if they aren't, software go boom. Best case: the developers of U spend hours or perhaps days looking at memory layouts in their debuggers and scratching their heads, trying to figure out what's going wrong. Worst case: this step is preceded by a call to the tech support line from an angry end user complaining about buggy software.

On Windows, dynamic libraries are treated more rigorously. The situation for developers is therefore more transparent, if simultaneously more difficult. When building a Windows DLL, a software developer has to actually declare which symbols will be exported to users of that DLL. Visual Studio will not automatically export every symbol it sees; it makes you choose.

(From a software engineering standpoint, I would argue that this is the right and proper thing to do. Unfortunately, the C++ language does not define standard syntax for making this sort of declaration, putting rigor and portability at odds with one another. But that's a topic for another day.)

If I try to export my method twaddle_integers(std::vector<int> integers) from DLL R, Visual Studio will issue me a curious warning: it will tell me that std::vector<int> does not have a DLL interface. That is, I am instantiating std::vector<int> in my DLL, and using it in an exported API, but I have not actually declared my intention to export std::vector<int> itself. Of course, I can ignore this warning. I can even put declarations in my code to silence it altogether. If I do one of those things, I will be in the same situation as I am on Linux or anywhere else: multiple instantiations that may or may not work together.

The better thing for me to do is to explicitly export std::vector<int> from R. Visual Studio offers me syntax to do this (nonstandard, of course, since as far as C++ is concerned, all software is always statically linked) and also to indicate to U that it should import this definition from R. All is well with the world: every component that uses a given type uses a common definition of that type, and that definition lives in exactly one component.

Or does it? You see, the C++ standard defines std::vector and it defines int, but its implementation on Windows does not export a definition of std::vector<int>. (...Or, for that matter, std::set<float> or std::map<std::pair<char, std::string>, unsigned short>. A moment's thought and you will understand why this is not a solvable problem.) If, as above, U depends on both R and S, both of which require std::vector<int> and both of which have followed the best practice of explicitly instantiating and exporting that type, the authors of U will encounter multiply defined symbols and be unable to link.

(The situation is even more depressing than this. Many of the templates defined by the C++ standard depend on other templates. In order to export a template instantiation from a DLL, you have to also export instantiations of every one of those templates. Because of recursive definitions in some of these templates, this is not possible to do. There's a reason I've used std::vector<int> in this example: std::vector is the only one of the myriad standard containers whose instantiations can be exported from a DLL.)


The Seven-Per-Cent Solution

None of this is to say that C++ does not have many fine qualities. Anyone developing an application who requires the level of performance and determinism that is only available from unmanaged code, and who would also like a level of expressiveness greater than that of C, is pretty much stuck with it—and by and large they're happy. But I would argue that, while C++ may be very well suited to the development of highly performant, highly flexible monolithic software, it is very poorly suited to the development of componentized software. This is a sobering thought, because I believe that the future for the development of software, as it was for the development of physical goods during our last industrial revolution, lies in highly componentized, piece-wise development. 

But let's return from our academic discussion to the real world: I really am developing "R," and I have to put something in my header files. Here's what I plan to do:

  1. Define typedefs for all of the template instantiations I use in order to keep track of them. (Note that a typedef is just a name alias; it does not trigger a template instantiation.)
  2. On Windows only, explicitly instantiate everything I need and export those instantiations. There's no point in subjecting non-Windows users to the pain of link errors when their tool chains can't do anything useful with those explicit instantiations anyway.
  3. Wrap all of those template instantiations and exports in a preprocessor switch so that anyone afflicted with multiply defined symbols can switch my definition off. Anyone doing this, of course, is explicitly taking responsibility for building their software using template definitions compatible with mine and for any problems he might have if he does not.

My "solution" is imperfect, and it's downright ugly. Got a better idea?


Afterward: Alternative Approaches

There are solutions to all of this madness. The problem is not with generic programming itself; it is with the generative compile-time approach to generic programming taken by C++. Other languages take different approaches, and therefore don't have the same problems.

Java provides generic types using a compile-time mechanism called erasure: generic types are real types; their "instantiations" are not. In fact, they have no instantiations at all: generic parameters constitute syntactic sugar only and do not exist (except as type casts) at runtime. This approach has been criticized as a kludge, but it enables certain unique and useful capabilities such as the use of wildcards in generic parameters.

C#, like C++, does instantiate all generic types, but it does so at runtime, thus ensuring that only one definition of any instantiation ever exists. And since the generic types that define these parameters are types themselves, and not just fancy macros, they exist, properly versioned, in some particular component. This approach offers superior performance when compared with the Java approach at the cost of the loss of wildcard parameter syntax and of a heavier runtime system capable of maintaining each generic instantiation.

These languages, and the extensive libraries that come with them, have a lot to offer software developers. They are both managed languages, and management itself offers many benefits as well—but at a substantial runtime cost, putting language platforms that offer it out of bounds for many applications.

Monday, April 6, 2009

Saturday, April 4, 2009

Words for the Wise

Ancient Japanese proverb:

Bushi wa kuwanedo takayoji.
A samurai, even when he has not eaten, uses his toothpick (like a lord).

I found two interpretations of this proverb.
First, that s samurai is expected to behave properly, regardless of his current circumstances.
Second, that a samurai will graciously refuse to impose on others, even when he has nothing.

It's sort of funnier if you don't think about what it might mean.

Friday, April 3, 2009

Party of No

Size of President Bush's last budget: $3.1 trillion (not including Iraq or Afghanistan wars)
Annual cost of Iraq war: more than $230 billion
Annual cost of Afghanistan war: $55 billion
Total Bush "budget": $3.4 trillion

Size of President Obama's budget: $3.5 trillion (including Iraq and Afghanistan wars)

Obama budget / Bush budget: 103%


Translation: President Obama's budget for 2009 is exactly the same size as President Bush's effective budget was in 2008.

House Republican votes for President Obama's budget: zero