24 June 2013

The problem

One of my Zookeeper clusters has to be available to everyone. But I want to make sure only specific known hosts are allowed to connect. The problem was, those known hosts are dynamic and I didn't want any configuration for this component. The servers are running in the cloud, they come and go.

At Technicolor, we use Chef to manage all our boxes. Every box is registered on the Chef server, we know every private and public IP address of every server within our setup. Be it EC2, IBM SCE, Rackaspace or any other cloud provider. What I came up with required writing Zookeeper authentication provider connected to the Chef server; only allow the connection if the IP is known at the time of connection.

It all sounds pretty straightforward, however, Zookeeper authentication is not very well documented. It took quite a while to connect all the dots and make it work.

The code

Full code available on Github.

The solution

First of all, the term authentication. It is a bit confusing at start, Zookeeper uses the term ACL. That's because permissions are applied per znode and not the server. When talking about authentication one is really thinking access to that particular znode.

My authentication implementation can be found here. All the Ruby bits are in the same project.

As can be seen, all what has to be done is creating a class which will implement org.apache.zookeeper.server.auth.AuthenticationProvider interface. In case of Chef based authentication, in my environment, I wanted every node to have full post authentication access to Zookeeper. Hence this:

public boolean matches(String id, String aclExpr) {
  return true;
}

public boolean isAuthenticated() {
  return true;
}

public boolean isValid(String id) {
  return true;
}

If the purpose of the authenticartion provider was to lock down separate znodes, the implementation of matches(String id, String aclExpr) would differ. There are quite good examples in Zookeeper sources, these are available here.

Writing the provider was the easy part. Once the jar is compiled, placed in the lib folder and Zookeeper is restarted, it is available. The poorly documented bit is the following:

Every node that should not be available to the world should have the ACL applied. ACL is not recursive.

That basically means:

Even though the autentication is enabled and working, it can't be observed because everything is public anyway.

Example

Let's consider fresh Zookeeper installation. It comes with 3 znodes. These are /, /zookeeper and /zookeeper/quota. Even if the authentication provider is loaded and working, everyone in the world will be able to see and modify these if ACL is not applied. Following code may be used to lock down the / znode:

require "rubygems"
require "zookeeper"
acl = [ Zookeeper::ACLs::ACL.new(
  :perms => 31,
  :id => Zookeeper::ACLs::Id.new( :scheme => "chef", :id => "" )
)]
z = Zookeeper.new("127.0.0.1:2181")
z.set_acl(:path => "/", :acl => acl )

From now on the world would not be allowed to browse nor modify anything under /. However, /zookeeper and /zookeeper/quota are still wide open. Full lock down of a fresh Zookeeper installation is simple:

require "rubygems"
require "zookeeper"
acl = [ Zookeeper::ACLs::ACL.new(:perms => 31, :id => Zookeeper::ACLs::Id.new( :scheme => "chef", :id => "" ) )]
z = Zookeeper.new("127.0.0.1:2181")
z.set_acl(:path => "/", :acl => acl )
z.set_acl(:path => "/zookeeper", :acl => acl )
z.set_acl(:path => "/zookeeper/quota", :acl => acl )

From now on the world will still be able to connect to my Zookeeper. But no operations are permitted. Obviously, whatever new znode is created, it should have the ACL applied. Like this:

...
zk_acl = [ Zookeeper::ACLs::ACL.new(
  :perms => 31,
  :id => Zookeeper::ACLs::Id.new( :scheme => "chef", :id => "" )
)]
z.create(:path => "/nodes/#{@@ipv4}", :acl => zk_acl, :ephemeral => true)
...

Unless the znode is meant to be wide open, obviously.