Cathedral of the Bizarre: Yes, a Federated Twitter Is Possible

Dan Wineman asks whether a federated Twitter is possible (via John Gruber). By this he means whether it is possible for a network not governed by a single authority to provide a qualitatively similar experience to Twitter. In his words:

Immediacy: if a post has been made by someone I follow, I can see it in my timeline right away (or close enough that I don’t notice the difference).

Chronology: posts always appear in order by time posted.

Monotonicity: timelines grow only from the top; older posts are never retroactively inserted.

The short answer is Yes.

The slightly longer answer is It Depends.

The much longer answer follows.

Dan answers his question with a speculative No, and his argument essentially rests on an assumption that a centrally controlled system can provide traditional ACID properties, while a decentralized system cannot. There are two aspects to this assumption: (1) federation with respect to the flow of data and (2) federation with respect to organizational boundaries.

1. Data Consistency in Distributed Systems

If we continue Dan's line of argument, we must conclude that every tweet must be stored in a single strongly consistent database, stored in a single physical location. As of early 2008, that was in fact mostly how Twitter was implemented, albeit with multiple database shards fronted by a distributed in-memory cache. But the Twitter team didn't think they could scale that solution (and no doubt many would agree with them). By early 2010, they were moving to a more explicitly distributed architecture based on Apache Cassandra. In other words, as of 2012, tweet storage is already federated at the physical level, and has been for a long time, in one way or another.

Latency is the key. We can rephrase the original question like this: We want the federates in a globally distributed system to reach a sufficient level of data synchronization such that any isolated client only rarely observes data out of order. Can the time it takes the system to reach this level of synchronization be short enough that the client will still consider the system to be responsive? Yes.

Consider that the participants in a Twitter conversation are almost never communicating simultaneously over a faster back channel, so they don't necessarily know how long it's taken for a tweet to propagate. And unlike in a verbal conversation, they are almost never waiting for feedback before continuing. Therefore, latencies that would be considered totally unacceptable for speech or even IM may be perfectly acceptable for Twitter -- 10 seconds at least, perhaps as long as a minute or more. Consider also that the only data that needs to be synchronized among a given group of participants is the set of tweets constituting the conversation among themselves; other tweets are not observed. And because users declare who they wish to follow ahead of time, the system has substantial foreknowledge of what those groupings of individuals and tweets are likely to be, and can plan for them.

2. Distributing Data in Open Systems

A single organization can make a lot of compromises in order to achieve its performance goals: it can use highly optimized proprietary protocols in ad hoc ways, and it can tune each component of the system based on total knowledge of the configurations of every other component. These things are not feasible when there is no single authority over the entire network. Federated systems require formal interfaces, including well-known data formats (the structure of a tweet is already well known today) and protocols. Enforcing those interfaces will have associated direct and indirect costs.

There is plenty of room for improvements that could recoup those costs. For example, today, Twitter clients and servers often communicate by sending tweet data as text over HTTP. This is not a terribly efficient way to communicate, especially when latency is an overriding concern. There might indeed be a question about whether a federated Twitter-like service -- which would include server-to-server communication as well -- would be feasible with a text-over-HTTP backbone. But there are many alternative data encodings and protocols from which to choose, so this need not be a significant barrier. (Thrift, Protocol Buffers, and DDS all come to mind.)

For Another Time: Brewer's Theorem

Dan's original three criteria read to me like an alternative way of slicing the Consistency, Availability, and Partition-tolerance (CAP) axes of Brewer's Theorem. Those classical axes fail to explicitly address the element of time, however, which Dan's criteria do, to his credit. Time is a critical element in the design and implementation of any distributed system, and it is too often treated rather lazily. For instance, it is meaningless to speak of a distributed system being "immediately responsive" when "immediate" cannot have a single definition across the entire system: clearly, it takes non-zero time for data to propagate across the system, and to be processed, and the threshold for acceptable latency -- not to mention jitter -- will depend on whether the participants are humans and/or automated processes and what their jobs are.

Sadly, a reformulation of the CAP Theorem will have to wait for another day.

Cathedral of the Bizarre

Pages

Sunday, September 2, 2012

Yes, a Federated Twitter Is Possible

No comments:

Post a Comment