jan@apache.org
twitter.com/janl
github.com/janl
XMPP: janlehnardt@jabber.ccc.de
GPG Fingerprint: D2B1 7F9D A23C 0A10 991A F2E3 D9EE 01E4 7852 AEE4
OTR Key: DD971A73 2C16B1AB FDEB5FB4 571DECA1 55F54AD1

Engaging With Hateful People in Your Community Lends Legitimacy to Their Presence

A woman friend who stays anonymous privately tweeted this discussion earlier. I believe anyone involved with Open Source communities should read and understand this. This is published with her consent.


People, and men in particular, let’s talk about how we deal with people from hate movements showing up in our repos.

OK so for everyone’s context, screencaps of a Gamergater commenting on a Node repo. We need to talk about this.

And to be clear, I’m not shaming @mikeal or @izs (you’re awesome). IMO there’s something broken in our response when male supremacists …

… show up, and men who clearly care about inclusiveness are getting this wrong. So I simply hope my tweets will add useful perspective.

For you men the screencapped exchange might read as
(1) horrible male supremacist argues
(2) rebuttal by @mikeal & @izs
Nothing wrong with (2)

So you might be surprised that to me it reads very different:

(1) male supremacist person shows up - clearly a safety issue
(2) people in the community are actually engaging with this person :(

Note the difference: As a man you might have read the arguments. I literally stopped at the usernames.

(Aside: If the “gr.amergr.ater” username somehow doesn’t spell “misogynistic hate mob” to you, please go read up on this.

This isn’t random internet drama; other people’s safety depends on you understanding this stuff.)

So why do you men get to care about the bigoted arguments and even engage & rebut? Because you’re unlikely to be targeted.

They read as “abhorrent” to you, but not as “threat to your safety”. Good for you!

But for me, the presence of this person is a problem.

When I see a male supremacist show up in an online space, the likelihood that I will participate drops to zero.

Because as much as I care about participating in you all’s projects, no open source project is worth compromising my safety over.

You can have a male supremacist in your online space. You can have me. But you can’t have both.

It’s probably not unreasonable to extrapolate that other women are similarly put off by the presence of male supremacists.

So when you decide to engage & rebut hateful arguments, this comes at the expense of excluding women and minorities from the online space.

Okay, part 2: What’s the right way to deal with male supremacists and similar hate groups showing up?

I don’t have a clear answer. What I care most about is that community members are protected.

Here’s my suggestion #1: Don’t engage. It’s better to instantly block that person from the repo and delete their comments.

GitHub’s combination of “everyone can read”, “everyone can comment”, and “weak moderation features” can make it hard to ban effectively.

Hence suggestion #2: Take discussions (sensitive ones especially) off GitHub completely, to more private, better-moderated spaces.

GitHub’s weaknesses make it not very safe for women and minorities, so if you want those voices heard, avoid the GitHub issue tracker.

By the way: Similar things apply when male supremacists send you reasonable-looking pull requests.

I noticed that this gr.amergr.ate person had sent a small PR to a [my-project] plugin, and the plugin maintainer merged it.

This made me super uncomfortable, and I hope I don’t have to interact with that maintainer, because I really don’t trust their judgment.

When you get a PR from an author whose very name spells hate, then even if the diff looks reasonable, don’t merge it.

Reject it, or ignore it until becomes obsolete. Or even reimplement the fix yourself. Just don’t merge. Let’s not have hate in git log.

The Innovator in Hindsight

In where Clayton Christiensen predicts that Linux on the Desktop will never come and that it will fuel the mobile revolution. In 2004.

Including an introduction to the Innovator’s Dilemma and Solution, and how this quares against Open Source.

[http://web.archive.org/web/20130729212828id/http://itc.conversationsnetwork.org/shows/detail135.html](http://web.archive.org/web/20130729212828id/http://itc.conversationsnetwork.org/shows/detail135.html)

The State of CouchDB 2013

This is a rough transcript of the CouchDB Conf, Vancouver Keynote.

Welcome

Good morning everyone. I thank you all for coming on this fine day in Vancouver. I’m very happy to be here. My name is Jan Lehnardt and I am the Vice President of Apache CouchDB at the Apache Software Foundation, but that’s just a fancy title that means I have to do a bunch of extra work behind the scenes. I’m also a core contributor to Apache CouchDB and I am the longest active committer to the project at this point.

I started helping out with CouchDB in 2006 and that feels like a lifetime ago. We’ve come a long way, we’ve shaped the database industry in a big way, we went though a phoenix from the ashes time and came out still inspiring future generations of developers to do great things.

So it is with great honour that I get to be here on stage before you to take a look at the state of CouchDB.

Numbers

I’d like to start with some numbers:

Commit History

We have made a lot of changes in 2012 to make 2013 a great year for CouchDB and it sure looks like we succeeded and that 2014 is only going to trump that.

I’d like to thank everyone on the team for their hard work.

Currently

We’ve just shipped CouchDB 1.5.0 last week and it comes with a few exciting new things as previews, for you to try out and play with and report any issues with back to us. And that is on top of all the regular bug fixing and other improvements.

Fauxton

  1. A completely new developed admin UI, nicknamed Fauxton, that is poised to replace the much-loved, but increasingly dated Futon. I’d like to personally thank the Fauxton team: Sue “Deathbear” Lockwood, Russell “Chewbranca” Branca, Garren Smith and many more volunteers for their work as well as the company Cloudant for sponsoring a good chunk of that work. Great job everyone! Fauxton is going to be replacing Futon in one of the next few releases and will give us the foundation for the next stage of CouchDB’s life.

  2. Plugins. While it was always possible to write plugins for CouchDB, you kind of had to be an expert in CouchDB to get started. We believe that writing plugins is a great gateway drug to getting more people to hack on CouchDB proper, so we made it simpler to build plugins and to install plugins into a running instance of CouchDB. It is still very early days, we don’t even have a plugin registry yet, but we are surely excited about the prospects of installing GeoCouch with a single click of a button in Futon or Fauxton. We also included a template plugin that you can easily extend and make your own, along with a guide to get you started.

The plugins effort also supports a larger trend we are starting to follow with the CouchDB core codebase: decide on a well-defined core set of functionality and delegate more esoteric things to a rich plugin system That means we no longer have to decline the inclusion of useful code like we’ve done in the past, because it wasn’t applicable to the majority of CouchDB users. Now we can support fringe features and plugins that are only useful to a few of our users, but who really need them.

  1. A Node.JS query server. CouchDB relies on JavaScript for a number of core features and we want to continue to do so. In order to keep up with the rapid improvements made to the JavaScript ecosystem we have tentative plans to switch from a Spidermonkey-driven query server to a V8-driven one. In addition, the Node.js project has a really good installation story, something that we had trouble with in the past, and includes a few utilities that make it very easy for us to switch the query server over.

All this however is not to blindly follow the latest trends, but to encourage the community to take on the query server and introduce much needed improvements. The current view server is a tricky mix of JS, Erlang and C and we are not seeing many people daring to jump into that. In a second step we expect these improvements to trickle down to the other query server implementations like Python or PHP and make things better for everyone. For now this is also a developer preview and we are inviting all Node.js developers to join us and build a a better query server.

Docs

  1. Docs landed in 1.4.0, but 1.5.0 is seeing a major update to the now built-in documentation system. With major thanks to Alexander Shorin, Dirkjan Ochtmann and Dave Cottlehuber who were instrumental in that effort, CouchDB now has “really good docs” instead of a “really crappy wiki”, that are shipped with every release and are integrated with Futon and Fauxton.

Beyond

The immediate next area of focus for the CouchDB project is the merging of two forks: BigCouch and rcouch.

BigCouch is a Dynamo implementation on top of CouchDB that manages a cluster of machines and makes them look as a single one, adding performance improvements and fault tolerance to a CouchDB installation. This is a major step in CouchDB’s evolution as it was designed for such a system from the start, but the core project never included a way to use and manage a cluster. Cloudant have donated their BigCouch codebase to the Apache project already and we are working on an integration.

rcouch is a what I would call a “future port” of CouchDB by longtime committer and contributor Benoit Chesneau. rcouch looks like CouchDB would, if we started fresh today with a modern architecture. Together with BigCouch’s improvements, this will thoroughly modernise CouchDB’s codebase to the latest state of the art of Erlang projects. rcouch also includes a good number of nifty features that make a great addition to CouchDB’s core feature set and some great plugins.

Finally, we’ve just started an effort to set up infrastructure and called for volunteers to translate the CouchDB documentation and admin interface into all major languages. Driven by Andy Wenk from Hamburg, we already have a handful of people signed up to help with translations for a number of different languages.

This is going to keep us busy for a bit and we are looking forward to ship some great releases with these features.

tl;dr

2013 was a phenomenal year for Apache CouchDB. 2014 is poised to be even greater, there are more people than ever pushing CouchDB forward and there is plenty of stuff to do and hopefully, we get to shape some more of the future of computing.

Thank you!

Understanding CouchDB Conflicts

As part of a summary what Nodejitsu is planning to do with the scalenpm.org campaign money, they said:

All of these problems stem from the same symptom: conflicts in CouchDB. If you’re new to CouchDB you can read up on conflicts here. Conflicts are caused in the npm registry because (depending on several factors) a given publish can involve multiple writes to the same document. When these writes do not hit the same CouchDB server conflicts are generated. There are other medium-term solutions to scaling writes (such as sticky HTTP sessions), but conflicts will inevitably arise so we must address the symptom as well as the cause.

Of course CouchDB has a concept of conflicts, they are core to what makes CouchDB great: master-less peer to peer replication of your data. But I feel they are misrepresented here, so I’ll try and clarify things a little.

We will find out that the symptom isn’t CouchDB’s conflicts feature, but how the npm client treats CouchDB document updates in a way that is not recommended (note that I’m not trying to point any fingers here, I just hope people can learn from this :).

How to store data in CouchDB

The standard way to store data in CouchDB is to HTTP PUT a JSON object into a CouchDB database:

PUT /database/document
{"a":1}

When retrieving that document, it it will look like this:

GET /database/document
{"_id":"document","_rev":"1-23202479633c2b380f79507a776743d5","a":1}

CouchDB will automatically add two properties to our JSON object, an _id and a _rev. The _id represents whatever we named the document in the initial request (we can also let CouchDB assign a random _id) and the _rev, or “revision” represents an opaque hash value over the contents of a document.

To change the value of a document, we need to prove to CouchDB that we know what its latest revision is:

PUT /database/document
{"_id":"document","_rev":"1-23202479633c2b380f79507a776743d5","a":1, "b":2}

The next time we get the document it looks like this:

GET /database/document
{"_id":"document","_rev":"2-c5242a69558bf0c24dda59b585d1a52b","a":1, "b":2}

You see the revision updated. Now lets try to update the document again, but provide the old _rev:

PUT /database/document
{"_id":"document","_rev":"1-23202479633c2b380f79507a776743d5","a":1, "b":2, "c":3}

We get:

{"error":"conflict","reason":"Document update conflict."}

Understanding revisions

This way, CouchDB ensures that a client never accidentally overwrites any data it didn’t know about. Think about this like a wiki editing system: one person edits a wiki page and adds a few paragraphs of new information while another person just fixes a typo halfway through the first person writing their contribution. Without any cleverness, the first person will overwrite the second’s person typo fix when they save their version (or revision) of the wiki page. To ensure they don’t, each revision could be tagged with a _rev that the client then need to provide when writing back to the server. If they don’t match, the client needs to re-read the document and merge any other changes that might have happened in the meantime (the typo fix) and then try to save the wiki page again. In more technical terms, this is called “optimistic locking”. This is to avoid the scenario of “pessimistic locking” where the second person has to wait for the first person to make their changes before they can edit the wiki page.

CouchDB works the same way and for good reasons, but it can be counter-intuitive to how people are used to working with databases. Some users think their way out of this without really understanding why CouchDB works this way. When they encounter a document update conflict, they will make a GET or HEAD request to CouchDB to learn about the latest _rev of a document and then use that for a second write request without first regarding the new data that has appeared on the server. In some cases, this is a viable strategy, especially, if only a single database server is involved and changes to documents are restricted to one or very few users (like in npm).

Distributed systems and all that

Now the fun part is when we add more database servers. One way to set up CouchDB is to run multiple instances behind an HTTP load balancer (because that’s really easy to do) and set up bi-directional replication between the two databases. This helps with reliability and load distribution, as two servers can share the read-load and if specced correctly, a single server can survive the outage of the peer, while the load balancer ensures that users never see a difference.

Two couches and a load balancer

A load balancer usually distributes reads and writes randomly between the two CouchDBs. This is where the fun begins. Let us update our document once more:

PUT /database/document
{"_id":"document","_rev":"2-c5242a69558bf0c24dda59b585d1a52b","a":1, "b":2, "d":4}

Now this gets written to CouchDB A because the load balancer decides so. CouchDB A now has:

GET /database/document
{"_id":"document","_rev":"3-2235fd4815b81b2da1b84159aba4006e", "a":1, "b":2, "d":4}

But CouchDB B still has:

GET /database/document
{"_id":"document","_rev":"2-c5242a69558bf0c24dda59b585d1a52b","a":1, "b":2}

Usually replication updates this quickly, but it might take a while due to write load, and if the client sends multiple requests in quick succession, there is a fair chance that updating the document yet another time will hit CouchDB B which will reject the write, because the _rev doesn’t match any more:

PUT /database/document
{"_id":"document","_rev":"3-2235fd4815b81b2da1b84159aba4006e", "a":1, "b":2, "e":5}

Returns:

{"error":"conflict","reason":"Document update conflict."}

If we use the strategy of quickly getting the _rev from the doc and trying again, we might GET from CouchDB B again, to get "_rev": "2-c5242a69558bf0c24dda59b585d1a52b" and attempt the write again:

PUT /database/document
{"_id":"document","_rev":"3-c5242a69558bf0c24dda59b585d1a52b", "a":1, "b":2, "e":5}

If this PUT also goes to CouchDB B (you see this scenario is getting less and less likely, but it is still possible and certainly expected in a system like npm’s), this write will succeed and now we two conflicting revisions on CouchDB A and CouchDB B:

CouchDB A: 3-c5242a69558bf0c24dda59b585d1a52b
CouchDB B: 4-8b6ea819bf3384b2c215fd05fc5a1e5a

When CouchDB replication now runs, it will introduce a conflict on both CouchDB’s, as it is expected to. But since this is an undesirable situation, CouchDB generally recommends against using this strategy to deal with document update conflicts.

Solving the riddle

There are multiple ways to fix this:

  1. When making a change, don’t require multiple GETs and POSTs. It is my understanding that the npm developers are working on that (Run npm install -g npm to make use of this without waiting for the next node release, thanks @izs).
  2. Don’t update the _rev locally in the client without also merging any new data from the server. I hope the npm developers are also taking this into account.
  3. Sticky sessions: most HTTP load balancers can be configured to send subsequent requests from the same client to the same backend server. This is generally not desirable because it limits scalability and fault tolerance, but it is a a worthwhile stop-gap, if not default setup, if applicable to the setup. I can’t comment on whether this is applicable to npm or not.

I hope I could shed a bit of light on a thing that we, the CouchDB developers, have thought about a lot in the design of CouchDB, but have obviously failed to communicate sufficiently in the earlier days of CouchDB.

Let me close with saying the Team CouchDB is proud to support npm and the node community! <3

The Parable of Mustache.js

In October 2009 Defunkt released mustache.rb. It looked useful to me at the time, but I needed it in JavaScript. The code was just under 200 lines and simple enough that I thought I could port it over on a lazy Sunday, or Saturday, I forget.

So I did, and to this day, I believe my biggest contribution to mustache.js was not all the fancy stuff that happened since, but the initial transliteration of Ruby to JavaScript. Yes, transliteration, Ruby and JavaScript are close enough that I did a more or less line by line port of the original code. I can’t claim I did any programming myself there.

If you look at the early versions of both projects, they look remarkably similar mustache.rb vs mustache.js

mustache.rb was a huge hit in Ruby land right away and soon after I did my port a number of very smart people came around and rewrote the existing RegEx based “parser” (eugh) into a proper parser and compiler combination to make sure all proper computer science theory is applied and things run most efficient.

I looked back at the code then and couldn’t find my way around, things were split over dozens of files and I am sure it all made sense to a very organised brain, but I couldn’t even begin to start to understand what the whole thing did. And I remember distinctly that I was happy that I found mustache.rb in the before state so that I had a chance to port it. I would never have begun to try if I had found the “more optimal” solution.

The realisation there though made me also a bit sad. I am all for doing the right thing and executing things faster and everything, but it bugged me that the clarity of the code was all gone now.

Then other smart people came and wanted to do the same for mustache.js and I was hesitant. But instead of trying to argue with clarity, because fuck that if we can make it 10x faster, my argument was that these parsers & compilers are way too long for a JavaScript implementation that was targeted at the browser. My tagline was “by the time your compiler is finally done downloading, my RegEx parser has already produced a result”. It was of course a bit of a cop-out, but I got quite far with it.

Until Nate showed up and wrote a parser/compiler in ~350 lines of code (that was roughly the acceptable limit for me then) and I finally lost that argument too. But it took a while until mustache.js finally got its well needed superpowers, with huge help from Michael.

Meanwhile, I met Yehuda Katz at a conference and someone introduced me as “Jan, he does CouchDB and mustache.js” and Yehuda just said “Oh boy, I’m going to upset you soon.” and then avoided me for the rest of the conference (or it was just coincidence, I don’t know :). A week later he released handlebars.js, a mustache.js-inspired templating engine with some extra features that were really useful. But it wasn’t according to spec, just close enough, that a lot of people started using it. I thought “good for them”, but I secretly (or not so secretly) wanted to steal the best ideas and get people to use mustache.js instead. That didn’t really go anywhere because people stopped working on the spec in a sensible manner and we couldn’t agree on more features. And eventually, I spent more time with CouchDB again.

In the meantime many more mustache implementations in other languages appeared and one implicit design goal between was it to allow sharing of templates between them and produce compatible output. There is even a spec.

Around the same time Twitter started using mustache.js in the frontend and it was a great honour. But they too preferred a proper parser/compiler implementation so @fat & team went ahead and wrote Hogan, a mustache.js-compatible implementation with a faster load and runtime. @fat talked about this at his 2012 JSConf US talk, he got asked why he didn’t submit a pull request to mustache.js instead of making a new project. He said that a Pull Request that consists of “Hey, I threw away all your code and replaced it with mine” isn’t such a good idea. I agree, it wouldn’t have left the best impression.

He praised though, that they were able to build a compatible implementation that could compete on its own merits while still being compatible with a spec and he said that all library development should be done that way. Too often we conflate a great idea with its implementation and we would be better of allowing different ones and let users pick and choose their poison.

While watching @fat speak, it dawned on me that if I look at things this way, then the RegEx implementation of mustache.js is just another implementation of the same spec and its merits are tight and accessible code. And I could have hugged him for it. In fact I ran up on stage during the talk and nearly ran him over (22m:15s). My beloved implementation still had a reason to exist and that made me very happy. Until today, the 0.4 branch of mustache.js is still around.

Hugs

Lessons Learned:

Accessible code has merit over optimised code, even if the optimised code should be used most of the time.

Build libraries with a spec, encourage competing implementations. (We plan to do this for Hoodie)

Hugs win.


The state of mustache today is alright, the implementations are all reasonably mature projects and people just use them. But there are a few that want to move it forward but are hindered by the complacency of the early adopters who have moved on to other things (including me). I hope we can get that resolved eventually.

On that note, mustache.js has a number of interesting Pull Requests and issues open and if anyone is interested in taking over maintainership, please make yourself heard in on the GitHub project.

Archive →