How to install and configure CKAN 2.9.0 using Docker
In 2014, I was invited to do a couple of mini courses on CKAN, one of them at the pleasant island of Florianóplis and the other in the freezing winter of Moscow. I had some experience with it already when creating collaboratively the dados.gov.br open data portal in 2012, but I had to study it again in 2014 in order to catch up with then recent developments.
The slides from those mini courses, one in English and the other in Portuguese, are available on SlideShare:
- CKAN Overview (presented at the IV Moscow Urban Forum, in Moscow)
- Minicurso de CKAN (presented at the Linked Open Data Brazil conference in Florianóplis, in Portuguese)
They are based on version 2.2 of CKAN and cover topics ranging from the technical stuff, like installing, configuring and maintaining CKAN to the daily operation by editors and people who will write the dataset descriptions, fill in the metadata about resources (e.g. the format of a file), etc.
Now, the local IT department is interested in using CKAN and has asked me about it, so I took the opportunity to once more get some hands-on experience with installing and configuring the current version. Which is, at the time of this writing, 2.9.0.
What has changed?
Back then, there was already a way to install CKAN by using a Docker image, but it was a bit experimental. I don’t think there was even Docker Compose recipe for doing it, as Docker Compose was very new at the time. The most recommended way to install it was still to do it from source. Nowadays, Docker containers are the rule of the day to keep your IT infrastructure neatly organized, and you’d have to come up with a pretty good excuse for not using them.
The CKAN documentation still offers three ways to install, so that hasn’t changed. Thankfully, using Docker is still a possible and supported path forward. The options are:
- operating system packages, available for Ubuntu 18.04 and 20.04;
- from source, if you have another operating system or want to develop CKAN; or
- using Docker Compose, for a more manageable container based infrastructure.
As may be obvious from the title, we’re going to choose the Docker Compose based install.
We’ll just follow the instructions from the CKAN documentation and use the default configuration choices wherever possible.
Preparations and dependencies
First, make sure you have a lot of free disk space, as all those images may take a whole lot of it. As they are based on Ubuntu 16.04, which is quite old by now, chances are that you are using a more recent operating system than that. So Docker will have to download a lot of stuff to make up for the difference. In my case, I’m using Ubuntu 18.04, so the full install took just about 5 GB of space.
The Docker volumes, by default, are stored on /var/lib/docker
. At any
moment after installation, you may check out what Docker volumes you have
with the command
$ docker volume ls
You can also use
$ docker volume inspect [volume]
to see where the files inside [volume]
are actually stored (under the
variable Mountpoint
). For more information, see the
Docker documentation on volumes.
You also need to install, if you haven’t already, Docker and Docker Compose. I’m taking as a base Ubuntu 18.04, but it should be similar for other Ubuntu based GNU/Linux systems. Just do
$ sudo apt install docker.io docker-compose
or follow the instructions from the documentation of Docker and Docker Compose.
Building the Docker images
First we need to download the CKAN repository from Github. Go to a directory of your choice and clone the repository. Depending on your download speed, this might take a while.
$ cd /path/to/my/projects
$ git clone https://github.com/ckan/ckan.git
The repository has all current and previous versions in version control, so we choose the (supposedly) stable version 2.9.0 by checking out its corresponding tag:
$ cd ckan
ckan$ git checkout tags/ckan-2.9.0
Next, copy contrib/docker/.env.template
to contrib/docker/.env
. That
contains some environment variables you might want to change, such as the site
URL, port and passwords. If you’re just checking out how CKAN works, i.e., in
a non-production environment, it’s ok to leave it as it is (with the
defaults).
Now we build the Docker images. This step takes a long time, a lot of disk space and downloads many images and packages, so you might want to go do something else while it runs.
Spoiler: we will come across a problem in the next step after building the image. To save some time, you might not want to execute the next step and move on right through to the fix.
ckan$ cd contrib/docker
ckan/contrib/docker$ docker-compose up -d --build
Next, we restart the container cluster with:
ckan/contrib/docker$ docker-compose restart ckan
WARNING: The CKAN_MAX_UPLOAD_SIZE_MB variable is not set. Defaulting to a blank string.
Restarting ckan ... done
Check to see if the container is running:
ckan/contrib/docker$ docker ps | grep ckan
67693ebf5e92 docker_ckan "/ckan-entrypoint.sh…" 49 seconds ago Up 6 seconds 0.0.0.0:5000->5000/tcp ckan
And the system logs
ckan/contrib/docker$ docker-compose logs -f ckan
WARNING: The CKAN_MAX_UPLOAD_SIZE_MB variable is not set. Defaulting to a blank string.
Attaching to ckan
ckan | db:5432 - accepting connections
ckan | Command 'db' not known (you may need to run setup.py egg_info)
ckan | Known commands:
ckan | create Create the file layout for a Python distribution
ckan | exe Run #! executable files
ckan | help Display help
ckan | make-config Install a package and create a fresh config file/directory
ckan | points Show information about entry points
ckan | post Run a request for the described application
ckan | request Run a request for the described application
ckan | serve Serve the described application
ckan | setup-app Setup an application, given a config file
ckan | db:5432 - accepting connections
ckan | Command 'db' not known (you may need to run setup.py egg_info)
ckan | Known commands:
ckan | create Create the file layout for a Python distribution
ckan | exe Run #! executable files
ckan | help Display help
ckan | make-config Install a package and create a fresh config file/directory
ckan | points Show information about entry points
ckan | post Run a request for the described application
ckan | request Run a request for the described application
ckan | serve Serve the described application
ckan | setup-app Setup an application, given a config file
ckan exited with code 2
Oops. We have a problem. Apparently the command db
is not being found
somewhere.
The fix
Searching around, I found on the CKAN issue tracker in Github that this error happens due to a bug on this version of CKAN. Apparently it has been already fixed by this pull request by Mark Stuart that has already been accepted and merged into the master branch (soon to be renamed to main, like everything on Github). However, that fix is not yet incorporated into a stable release.
So we are left with a few alternatives to proceed:
- Use an older version of CKAN, like 2.8.5, and hope it doesn’t have the bug;
- use another method of installation instead of Docker;
- use the master branch, which might contain experimental features and is not yet stable code; or
- use version 2.9.0, but apply a patch just for this fix.
We decided to use the latter, as suggested by mabah-mst on that issue, by running the following commands:
ckan/contrib/docker$ cd ../..
ckan$ git checkout tags/ckan-2.9.0
#Apply fix to get CKAN to start at all in the docker-compose: https://github.com/ckan/ckan/pull/5381
ckan$ git diff 9abeaa1b7d2f6539ade946cc3f407878f49950eb^ 9abeaa1b7d2f6539ade946cc3f407878f49950eb | git apply
That way, we get the stable 2.9.0 release and just this fix, and not any other possible experimental features that may have been incorporated into the master branch.
Now let’s build the thing again. And await once more…
ckan$ cd contrib/docker
ckan/contrib/docker$ docker-compose up -d --build
Now check again to see if the container is running properly:
ckan/contrib/docker$ docker ps | grep ckan
582466833e55 docker_ckan "/ckan-entrypoint.sh…" 4 hours ago Up 4 hours 0.0.0.0:5000->5000/tcp ckan
and open a browser at the default port of 5000 (localhost:5000
) to check it
out.
Voilà! CKAN is running!
Basic setup
If the site language you’re going to use isn’t English, now is a good time
to change the default language. For that, you need to find and edit the
production.ini
file. To find out where it is, type
ckan/contrib/docker$ docker volume inspect docker_ckan_config
[
{
"CreatedAt": "2020-09-29T10:16:51-03:00",
"Driver": "local",
"Labels": {
"com.docker.compose.project": "docker",
"com.docker.compose.volume": "ckan_config"
},
"Mountpoint": "/var/lib/docker/volumes/docker_ckan_config/_data",
"Name": "docker_ckan_config",
"Options": null,
"Scope": "local"
}
]
and you can see it is under Mountpoint
, at
/var/lib/docker/volumes/docker_ckan_config/_data
. The directory permissions
belong to the root user, so you’ll have to edit it using sudo
(e.g.
sudo vim /var/lib/docker/volumes/docker_ckan_config/_data/production.ini
or
sudo gedit /var/lib/docker/volumes/docker_ckan_config/_data/production.ini
).
Look for the section named “Internationalisation Settings”. Change the
ckan.locale_default
and ckan.locale_order
variables accordingly.
## Internationalisation Settings
ckan.locale_default = pt_BR
ckan.locale_order = en pt_BR ja it cs_CZ ca es fr el sv sr sr@latin no sk fi ru de pl nl bg ko_KR hu sa sl lv
ckan.locales_offered =
ckan.locales_filtered_out = en_GB
Fun fact: the first ever translation of CKAN to Brazilian Portuguese was made in 2009 by yours truly. I also kept it updated, some years on, as new versions of CKAN came out.
Restart CKAN and you should see the new default language come into effect.
ckan/contrib/docker$ docker-compose restart ckan
Enabling the first user
Now we need to create the administrator user to begin working with CKAN.
We take a look at the documentation to find out the command to create an admin, for example, with the username johndoe.
ckan/contrib/docker$ docker exec -it ckan /usr/local/bin/ckan-paster --plugin=ckan sysadmin -c /etc/ckan/production.ini add johndoe
Command 'sysadmin' not known (you may need to run setup.py egg_info)
Known commands:
create Create the file layout for a Python distribution
exe Run #! executable files
help Display help
make-config Install a package and create a fresh config file/directory
points Show information about entry points
post Run a request for the described application
request Run a request for the described application
serve Serve the described application
setup-app Setup an application, given a config file
But alas, that doesn’t work! The command sysadmin
is not known. As it turns
out, it’s another bug. This time,
I had to open a new issue myself. A couple of days later,
Konstantin Sivakov
pointed out that that particular command has
changed in CKAN 2.9:
The old paster CLI has been removed in favour of the new ckan command. In most cases the commands and subcommands syntax is the same, but the
-c
or--config
parameter to point to the ini file needs to provided immediately after the ckan command, eg:
ckan -c /etc/ckan/default/ckan.ini sysadmin
A sign of a healthy free and open source software community is if and how long it takes until you get a response a question when you encounter a problem like this. In our case, it took just a couple of days, which means that the CKAN community is very much alive and helpful. Thanks to Brett, here is the solution:
ckan/contrib/docker$ docker exec -it ckan bash
$ source /usr/lib/ckan/venv/bin/activate
(venv) $ ckan -c /etc/ckan/production.ini sysadmin add admin email=admin@localhost name=admin
Of course you can (and should) change the username and email here. The email address can be used later, for example, in case you forget your password and need to reset it, without having to access the command like on the server again. If everything goes correctly, the system should prompt you to insert a password and to confirm it by typing a second time.
Site administrator configuration through the web
Now we can log in to CKAN and start configuring the settings that can be
adjusted through the web interface. Open localhost:5000
in a web browser.
After logging in, you should see a new bar at the top with your username and a few interface elements you can click.
Click the hammer icon 🔨 there and at the “config” tab you can adjust more settings, such as the site’s name, logo, description and colors. Choose a layout for the home page. You can also specify a custom CSS file to further customize the look and feel of your data catalog.
Setting up organizations and groups
As a site administrator, you are also capable of inserting new datasets on your catalog. But more usually, you would want to distribute that workload to other people with a more focused role. There are many different ways you can organize this. If you have a medium to large institution, you may want to create “organizations” for each of your departments, so that each one manages their own datasets. Organizations in CKAN are just a way to manage a large quantity of datasets, delegating responsibilities to each department over their respective datasets.
This might be a good time to set up the email service configuration, so that
CKAN can send emails to people when you create users and also for the password
reset system to function properly. For that, you need to edit the .env
file,
at the following section:
# Email settings
CKAN_SMTP_SERVER=smtp.corporateict.domain:25
CKAN_SMTP_STARTTLS=True
CKAN_SMTP_USER=user
CKAN_SMTP_PASSWORD=pass
CKAN_SMTP_MAIL_FROM=ckan@localhost
Use your own SMTP server address and credentials there. Then restart CKAN with
ckan/contrib/docker$ docker-compose restart ckan
for the changes to take effect. Now we can add the first organization:
Click “Organizations”, then the “Add organization” button.
This is pretty simple. Add a name, a description and optionally an image to represent it. Click “Create Organization”. Then immediately click “🔧 Manage”, then the “👥 Members” tab, then “Add Member”. Choose the role “Admin”, as this will be the administrator of this organization. Under “new user”, fill in their email address. They should receive a message with a link to create their new user, which will automatically receive the permissions you set here (in this case, the role of the administrator of this organization, which is different from the site-wide administrator we’re using).
The organization administrator will then have to follow the link on their email, activate their own accounts and log in. After that, they can follow the same procedure above to add other users with the role of “Editor”. After activating their own accounts, editors can create and edit datasets in CKAN inside their respective organizations.
Another way to organize datasets in the data catalog is by creating thematic groups (e.g. health, education, transportation, etc.). This is completely optional, but if you want to use it you can go at any moment to “Groups” at the menu at the top, click “Add Group”, and fill in the group information: name, description and an image.
After that, the group should be available for editors to choose from when creating or editing datasets.
For more information on managing organizations and datasets, see the corresponding section of the CKAN Sysadmin Guide.
You should also instruct editors on how to create dataset and operate CKAN. People with the role of editor should be someone who understands well the data they’re adding, because they will need to describe them adequately in plain language to the end users of the data portal in a way that is easy to understand. For instructions on how to create and edit datasets, the CKAN documentation has a User Guide.
Going into production mode
That covers the basic setup. Before going into production mode, it is
especially important that you edit and review all of the settings in both the
.env
and the production.ini
files. Please read the
“steps toward production”
section of the Maintainer’s Guide in the CKAN Documentation for more
information.
Final thoughts
Overall, we had some problems with version 2.9.0, as the instructions from the documentation did not work right away. However, with help from the CKAN community, we were able to overcome those hurdles and to end up with a working installation of the latest version of CKAN in a Docker environment.
Edit: for another possible way to install CKAN 2.9.0 with Docker, please see this blog post by Luiz Felipe Costa. Luiz developed the CKAN customization for the dados.gov.br revision from 2017, which I was the product owner of, and I can only recommend his services.