XMPP: janlehnardt@jabber.ccc.de
GPG Fingerprint: D2B1 7F9D A23C 0A10 991A F2E3 D9EE 01E4 7852 AEE4
OTR Key: DD971A73 2C16B1AB FDEB5FB4 571DECA1 55F54AD1

Why Technology Hype is a Good Thing

Hype for technologies is usually seen as a bad thing™.

While I don’t want to deny that there are downsides to this, I also see an upside.

tl;dr: Don’t consider a hype an annoying fashion, but rather a massively distributed mechanism for more quickly figuring out value.

Hype gets a technology and the ideas behind it exposure with a lot of people. Not all ideas are going to be good, and the more people take a good look at one, the earlier we will find out what is good and what isn’t.

History is full of this: XML started out as a document structuring format. But soon after it became popular, people started using it for everything, including computer to computer communication (SOAP, WS-*).

And that was very very very popular. Until we collectively realised that we’ve taken things a little too far. XML is a fine format for the things it was designed to do, but it is not a good generic data protocol format.

Some of the good ideas of XML survive though: Validation is still useful, but it is more optional in the JSON world we live today. There are enough cases where validation is not needed, or would be a prohibitive barrier to entry. We know that now. So we made it opt-in.

Another thing I see hyped these days (for a while now, actually) is Docker. I don’t have too much experience with it, but it appears that people are starting to do everything with Docker. And I’m excited about this. Not because I believe Docker is the end to all means, but because we will collectively find out more quickly what the good and not so good ideas behind Docker are.

The more popular something gets, the more critics it attracts. And in the view of the discourse of community learning and understanding, that’s a good thing. Criticism that has merit will spread, and we will all learn what the good ideas are.

In the JavaScript world, React is all the rage. At JSConf EU, we got a lot of proposals for React-ish talks, which is just another way of saying, people are doing everything with React.

We don’t know yet, what the good ideas behind React are going to be, so we have to have a little patience before things will shake out. But they will.

To hype!

Engaging With Hateful People in Your Community Lends Legitimacy to Their Presence

A woman friend who stays anonymous privately tweeted this discussion earlier. I believe anyone involved with Open Source communities should read and understand this. This is published with her consent.

People, and men in particular, let’s talk about how we deal with people from hate movements showing up in our repos.

OK so for everyone’s context, screencaps of a Gamergater commenting on a Node repo. We need to talk about this.

And to be clear, I’m not shaming @mikeal or @izs (you’re awesome). IMO there’s something broken in our response when male supremacists …

… show up, and men who clearly care about inclusiveness are getting this wrong. So I simply hope my tweets will add useful perspective.

For you men the screencapped exchange might read as
(1) horrible male supremacist argues
(2) rebuttal by @mikeal & @izs
Nothing wrong with (2)

So you might be surprised that to me it reads very different:

(1) male supremacist person shows up - clearly a safety issue
(2) people in the community are actually engaging with this person :(

Note the difference: As a man you might have read the arguments. I literally stopped at the usernames.

(Aside: If the “gr.amergr.ater” username somehow doesn’t spell “misogynistic hate mob” to you, please go read up on this.

This isn’t random internet drama; other people’s safety depends on you understanding this stuff.)

So why do you men get to care about the bigoted arguments and even engage & rebut? Because you’re unlikely to be targeted.

They read as “abhorrent” to you, but not as “threat to your safety”. Good for you!

But for me, the presence of this person is a problem.

When I see a male supremacist show up in an online space, the likelihood that I will participate drops to zero.

Because as much as I care about participating in you all’s projects, no open source project is worth compromising my safety over.

You can have a male supremacist in your online space. You can have me. But you can’t have both.

It’s probably not unreasonable to extrapolate that other women are similarly put off by the presence of male supremacists.

So when you decide to engage & rebut hateful arguments, this comes at the expense of excluding women and minorities from the online space.

Okay, part 2: What’s the right way to deal with male supremacists and similar hate groups showing up?

I don’t have a clear answer. What I care most about is that community members are protected.

Here’s my suggestion #1: Don’t engage. It’s better to instantly block that person from the repo and delete their comments.

GitHub’s combination of “everyone can read”, “everyone can comment”, and “weak moderation features” can make it hard to ban effectively.

Hence suggestion #2: Take discussions (sensitive ones especially) off GitHub completely, to more private, better-moderated spaces.

GitHub’s weaknesses make it not very safe for women and minorities, so if you want those voices heard, avoid the GitHub issue tracker.

By the way: Similar things apply when male supremacists send you reasonable-looking pull requests.

I noticed that this gr.amergr.ate person had sent a small PR to a [my-project] plugin, and the plugin maintainer merged it.

This made me super uncomfortable, and I hope I don’t have to interact with that maintainer, because I really don’t trust their judgment.

When you get a PR from an author whose very name spells hate, then even if the diff looks reasonable, don’t merge it.

Reject it, or ignore it until becomes obsolete. Or even reimplement the fix yourself. Just don’t merge. Let’s not have hate in git log.

The Innovator in Hindsight

In where Clayton Christiensen predicts that Linux on the Desktop will never come and that it will fuel the mobile revolution. In 2004.

Including an introduction to the Innovator’s Dilemma and Solution, and how this quares against Open Source.


The State of CouchDB 2013

This is a rough transcript of the CouchDB Conf, Vancouver Keynote.


Good morning everyone. I thank you all for coming on this fine day in Vancouver. I’m very happy to be here. My name is Jan Lehnardt and I am the Vice President of Apache CouchDB at the Apache Software Foundation, but that’s just a fancy title that means I have to do a bunch of extra work behind the scenes. I’m also a core contributor to Apache CouchDB and I am the longest active committer to the project at this point.

I started helping out with CouchDB in 2006 and that feels like a lifetime ago. We’ve come a long way, we’ve shaped the database industry in a big way, we went though a phoenix from the ashes time and came out still inspiring future generations of developers to do great things.

So it is with great honour that I get to be here on stage before you to take a look at the state of CouchDB.


I’d like to start with some numbers:

Commit History

We have made a lot of changes in 2012 to make 2013 a great year for CouchDB and it sure looks like we succeeded and that 2014 is only going to trump that.

I’d like to thank everyone on the team for their hard work.


We’ve just shipped CouchDB 1.5.0 last week and it comes with a few exciting new things as previews, for you to try out and play with and report any issues with back to us. And that is on top of all the regular bug fixing and other improvements.


  1. A completely new developed admin UI, nicknamed Fauxton, that is poised to replace the much-loved, but increasingly dated Futon. I’d like to personally thank the Fauxton team: Sue “Deathbear” Lockwood, Russell “Chewbranca” Branca, Garren Smith and many more volunteers for their work as well as the company Cloudant for sponsoring a good chunk of that work. Great job everyone! Fauxton is going to be replacing Futon in one of the next few releases and will give us the foundation for the next stage of CouchDB’s life.

  2. Plugins. While it was always possible to write plugins for CouchDB, you kind of had to be an expert in CouchDB to get started. We believe that writing plugins is a great gateway drug to getting more people to hack on CouchDB proper, so we made it simpler to build plugins and to install plugins into a running instance of CouchDB. It is still very early days, we don’t even have a plugin registry yet, but we are surely excited about the prospects of installing GeoCouch with a single click of a button in Futon or Fauxton. We also included a template plugin that you can easily extend and make your own, along with a guide to get you started.

The plugins effort also supports a larger trend we are starting to follow with the CouchDB core codebase: decide on a well-defined core set of functionality and delegate more esoteric things to a rich plugin system That means we no longer have to decline the inclusion of useful code like we’ve done in the past, because it wasn’t applicable to the majority of CouchDB users. Now we can support fringe features and plugins that are only useful to a few of our users, but who really need them.

  1. A Node.JS query server. CouchDB relies on JavaScript for a number of core features and we want to continue to do so. In order to keep up with the rapid improvements made to the JavaScript ecosystem we have tentative plans to switch from a Spidermonkey-driven query server to a V8-driven one. In addition, the Node.js project has a really good installation story, something that we had trouble with in the past, and includes a few utilities that make it very easy for us to switch the query server over.

All this however is not to blindly follow the latest trends, but to encourage the community to take on the query server and introduce much needed improvements. The current view server is a tricky mix of JS, Erlang and C and we are not seeing many people daring to jump into that. In a second step we expect these improvements to trickle down to the other query server implementations like Python or PHP and make things better for everyone. For now this is also a developer preview and we are inviting all Node.js developers to join us and build a a better query server.


  1. Docs landed in 1.4.0, but 1.5.0 is seeing a major update to the now built-in documentation system. With major thanks to Alexander Shorin, Dirkjan Ochtmann and Dave Cottlehuber who were instrumental in that effort, CouchDB now has “really good docs” instead of a “really crappy wiki”, that are shipped with every release and are integrated with Futon and Fauxton.


The immediate next area of focus for the CouchDB project is the merging of two forks: BigCouch and rcouch.

BigCouch is a Dynamo implementation on top of CouchDB that manages a cluster of machines and makes them look as a single one, adding performance improvements and fault tolerance to a CouchDB installation. This is a major step in CouchDB’s evolution as it was designed for such a system from the start, but the core project never included a way to use and manage a cluster. Cloudant have donated their BigCouch codebase to the Apache project already and we are working on an integration.

rcouch is a what I would call a “future port” of CouchDB by longtime committer and contributor Benoit Chesneau. rcouch looks like CouchDB would, if we started fresh today with a modern architecture. Together with BigCouch’s improvements, this will thoroughly modernise CouchDB’s codebase to the latest state of the art of Erlang projects. rcouch also includes a good number of nifty features that make a great addition to CouchDB’s core feature set and some great plugins.

Finally, we’ve just started an effort to set up infrastructure and called for volunteers to translate the CouchDB documentation and admin interface into all major languages. Driven by Andy Wenk from Hamburg, we already have a handful of people signed up to help with translations for a number of different languages.

This is going to keep us busy for a bit and we are looking forward to ship some great releases with these features.


2013 was a phenomenal year for Apache CouchDB. 2014 is poised to be even greater, there are more people than ever pushing CouchDB forward and there is plenty of stuff to do and hopefully, we get to shape some more of the future of computing.

Thank you!

Understanding CouchDB Conflicts

As part of a summary what Nodejitsu is planning to do with the scalenpm.org campaign money, they said:

All of these problems stem from the same symptom: conflicts in CouchDB. If you’re new to CouchDB you can read up on conflicts here. Conflicts are caused in the npm registry because (depending on several factors) a given publish can involve multiple writes to the same document. When these writes do not hit the same CouchDB server conflicts are generated. There are other medium-term solutions to scaling writes (such as sticky HTTP sessions), but conflicts will inevitably arise so we must address the symptom as well as the cause.

Of course CouchDB has a concept of conflicts, they are core to what makes CouchDB great: master-less peer to peer replication of your data. But I feel they are misrepresented here, so I’ll try and clarify things a little.

We will find out that the symptom isn’t CouchDB’s conflicts feature, but how the npm client treats CouchDB document updates in a way that is not recommended (note that I’m not trying to point any fingers here, I just hope people can learn from this :).

How to store data in CouchDB

The standard way to store data in CouchDB is to HTTP PUT a JSON object into a CouchDB database:

PUT /database/document

When retrieving that document, it it will look like this:

GET /database/document

CouchDB will automatically add two properties to our JSON object, an _id and a _rev. The _id represents whatever we named the document in the initial request (we can also let CouchDB assign a random _id) and the _rev, or “revision” represents an opaque hash value over the contents of a document.

To change the value of a document, we need to prove to CouchDB that we know what its latest revision is:

PUT /database/document
{"_id":"document","_rev":"1-23202479633c2b380f79507a776743d5","a":1, "b":2}

The next time we get the document it looks like this:

GET /database/document
{"_id":"document","_rev":"2-c5242a69558bf0c24dda59b585d1a52b","a":1, "b":2}

You see the revision updated. Now lets try to update the document again, but provide the old _rev:

PUT /database/document
{"_id":"document","_rev":"1-23202479633c2b380f79507a776743d5","a":1, "b":2, "c":3}

We get:

{"error":"conflict","reason":"Document update conflict."}

Understanding revisions

This way, CouchDB ensures that a client never accidentally overwrites any data it didn’t know about. Think about this like a wiki editing system: one person edits a wiki page and adds a few paragraphs of new information while another person just fixes a typo halfway through the first person writing their contribution. Without any cleverness, the first person will overwrite the second’s person typo fix when they save their version (or revision) of the wiki page. To ensure they don’t, each revision could be tagged with a _rev that the client then need to provide when writing back to the server. If they don’t match, the client needs to re-read the document and merge any other changes that might have happened in the meantime (the typo fix) and then try to save the wiki page again. In more technical terms, this is called “optimistic locking”. This is to avoid the scenario of “pessimistic locking” where the second person has to wait for the first person to make their changes before they can edit the wiki page.

CouchDB works the same way and for good reasons, but it can be counter-intuitive to how people are used to working with databases. Some users think their way out of this without really understanding why CouchDB works this way. When they encounter a document update conflict, they will make a GET or HEAD request to CouchDB to learn about the latest _rev of a document and then use that for a second write request without first regarding the new data that has appeared on the server. In some cases, this is a viable strategy, especially, if only a single database server is involved and changes to documents are restricted to one or very few users (like in npm).

Distributed systems and all that

Now the fun part is when we add more database servers. One way to set up CouchDB is to run multiple instances behind an HTTP load balancer (because that’s really easy to do) and set up bi-directional replication between the two databases. This helps with reliability and load distribution, as two servers can share the read-load and if specced correctly, a single server can survive the outage of the peer, while the load balancer ensures that users never see a difference.

Two couches and a load balancer

A load balancer usually distributes reads and writes randomly between the two CouchDBs. This is where the fun begins. Let us update our document once more:

PUT /database/document
{"_id":"document","_rev":"2-c5242a69558bf0c24dda59b585d1a52b","a":1, "b":2, "d":4}

Now this gets written to CouchDB A because the load balancer decides so. CouchDB A now has:

GET /database/document
{"_id":"document","_rev":"3-2235fd4815b81b2da1b84159aba4006e", "a":1, "b":2, "d":4}

But CouchDB B still has:

GET /database/document
{"_id":"document","_rev":"2-c5242a69558bf0c24dda59b585d1a52b","a":1, "b":2}

Usually replication updates this quickly, but it might take a while due to write load, and if the client sends multiple requests in quick succession, there is a fair chance that updating the document yet another time will hit CouchDB B which will reject the write, because the _rev doesn’t match any more:

PUT /database/document
{"_id":"document","_rev":"3-2235fd4815b81b2da1b84159aba4006e", "a":1, "b":2, "e":5}


{"error":"conflict","reason":"Document update conflict."}

If we use the strategy of quickly getting the _rev from the doc and trying again, we might GET from CouchDB B again, to get "_rev": "2-c5242a69558bf0c24dda59b585d1a52b" and attempt the write again:

PUT /database/document
{"_id":"document","_rev":"3-c5242a69558bf0c24dda59b585d1a52b", "a":1, "b":2, "e":5}

If this PUT also goes to CouchDB B (you see this scenario is getting less and less likely, but it is still possible and certainly expected in a system like npm’s), this write will succeed and now we two conflicting revisions on CouchDB A and CouchDB B:

CouchDB A: 3-c5242a69558bf0c24dda59b585d1a52b
CouchDB B: 4-8b6ea819bf3384b2c215fd05fc5a1e5a

When CouchDB replication now runs, it will introduce a conflict on both CouchDB’s, as it is expected to. But since this is an undesirable situation, CouchDB generally recommends against using this strategy to deal with document update conflicts.

Solving the riddle

There are multiple ways to fix this:

  1. When making a change, don’t require multiple GETs and POSTs. It is my understanding that the npm developers are working on that (Run npm install -g npm to make use of this without waiting for the next node release, thanks @izs).
  2. Don’t update the _rev locally in the client without also merging any new data from the server. I hope the npm developers are also taking this into account.
  3. Sticky sessions: most HTTP load balancers can be configured to send subsequent requests from the same client to the same backend server. This is generally not desirable because it limits scalability and fault tolerance, but it is a a worthwhile stop-gap, if not default setup, if applicable to the setup. I can’t comment on whether this is applicable to npm or not.

I hope I could shed a bit of light on a thing that we, the CouchDB developers, have thought about a lot in the design of CouchDB, but have obviously failed to communicate sufficiently in the earlier days of CouchDB.

Let me close with saying the Team CouchDB is proud to support npm and the node community! <3

Archive →