Augusto Herrmann

 Blog posts

My first ever Python code, 14 years later

Post cover image

While browsing some old backups of mine I serendipitously came across the first ever pieces of code I wrote in Python. It was back in 2007, while I was an undergrad in Computational Mathematics, when I studied the fundamentals of cryptography in college with professor Dr. Jeroen van de Graaf at the Computer Science Department of the Federal University of Minas Gerais. For the learning exercises, we needed to make calculations with large integers, which was not something simple to do in many programming languages at the time. Professor Jeroen suggested that we use Python for that, and it turned out to be very easy to translate the abstract algorithms from the books into running code. We used many books…

read more

My discord with Discord, or choosing alternatives with better terms of service

Post cover image

For over a year we have been living in a pandemic. Those of us that can, do avoid leaving home as much as possible. The ensuing feeling of isolation and need to connect to others has driven us to use online services more and more, which has led applications like Zoom and Google Meet to experience a staggering amount of growth. The same has also happened to group chat with audio/video applications like Discord. Everyone and their neighbor has been either creating a new online community on Discord (oddly called “servers”, even though they are not servers in a strictly technical sense (internally they’re called “guilds”, according to this Reddit thread)), or moving to Discord their online community that existed…

read more

Open Data in perspectives: an account of the Open Data Day 2020 Rio at the National Archives

Post cover image

A year ago I was at one of the two Open Data Day Rio de Janeiro events, which was organized by the Arquivo Nacional, the National Archives of the Brazilian federal government. The event was a day early, on Friday, considering that the Open Data Day is always on a Saturday, because it would work best for the institution to host the event on a work day. I was invited to give a talk there about the Open Data Day itself: what it is, why is it important, and how have some previous ODD events been like. The other presentations at the event also showed other perspectives on open data. Otávio Neves, Director of Transparency at the Office of the…

read more

How to build a custom environment for Jupyter in Docker

Post cover image

If you have been doing software development in recent years, you’ve probably come across the use of containers not only for deployment, but also during development to ensure that your build is completely reproducible in different systems. Also very popular in data science, data visualization and related circles is using Python and Jupyter Notebooks and Jupyter Lab to explore and experiment with data. Some people criticize Jupyter for often result in unreproducible work, something that is really important for the scientific method, because the development environment may differ from the one reproducing it and the cells could be executed out of order. For instance, in an experiment made by the Jetbrains Datalore team last year which downloaded almost 10 million…

read more

How to deal with international data formats in Python

Post cover image

A frequent hassle when dealing with data from various international sources is how to deal with differences in how various languages and cultures represent decimal and thousands separators, the order of year, month and day in dates, etc. Many countries go from the smaller (day) to the largest (year) unit of time, while some, like the U.S., do the weird thing that is starting from the middle (month), then going small (day), then completely reversing direction going to the large unit (year). If you look at decimal separators, it seems that just about half the world uses dots and the other half uses commas. The thousands separator is the other mark. That is, in countries that use the dot as…

read more

Why do we still call Facebook a platform? What is a platform, really?

Post cover image

With the tech giants under more scrutiny than ever, we keep hearing the media call the platform vs. publisher discussion repeatedly by the international press and also by US politicians. As the Electronic Frontier Foundation (EFF) correctly puts it, for the purposes of CDA Section 230, it doesn’t matter. As EFF and other digital society thinkers have argued over the years CDA 230 makes no such distinction. A question of semantics But what is a “platform”, really? A common use among people outside the field of technology is to use the word to mean just a place that people can express themselves. If you have any website that accepts user-generated content, then you are a platform. One problem with this…

read more

Open data: a committee in retrospect

Post cover image

The prospect of recreating an open data committee in the Brazilian federal government prompted me to remember and tell the story of the open data committee that we created eight years ago. Please note, however, that this is not the whole story of the National Infrastructure for Open Data (INDA), or even the most important parts of it, but rather just the part that involves its committee and the issues that were discussed in it over the years. Inspiration and motivation Back in 2011, when we were designing the open data policy of the Brazilian federal government, one of the challenges we faced was how to ensure that citizens had a place and a say in how the policy would…

read more

How to install and configure CKAN 2.9.0 using Docker

Post cover image

In 2014, I was invited to do a couple of mini courses on CKAN, one of them at the pleasant island of Florianóplis and the other in the freezing winter of Moscow. I had some experience with it already when creating collaboratively the dados.gov.br open data portal in 2012, but I had to study it again in 2014 in order to catch up with then recent developments. Augusto presents his CKAN course at the IV Moscow Urban Forum in 2014 (photo credits: Moscow Urban Forum). The slides from those mini courses, one in English and the other in Portuguese, are available on SlideShare: CKAN Overview (presented at the IV Moscow Urban Forum, in Moscow) Minicurso de CKAN (presented at the…

read more

Cadence and aesthetics: weird things that change articles in unexpected ways in romance languages

Post cover image

As a aspiring polyglot and amateur linguist, sometimes I find curious similarities between grammar rules in different languages. One in particular often surprises students when they first come across it, especially if their native language has no such thing (for instance, Portuguese) or if it uses no articles at all (as is the case of many Slavic languages, such as Russian). When you learn other languages, one of the first things you learn is that nouns sometimes have a different gender than the corresponding one in your native language. So you have to memorize the gender of nouns and practice a lot. You also learn that you have to use articles, pronouns, and often adjectives in accordance to the noun’s…

read more

A simple Python code refactoring pattern: replacing special handling in lists

Post cover image

Whenever we find ourselves repeating the same or similar code in several places around our files, we know it’s time for refactoring it. Otherwise it becomes difficult to maintain in the long run and accumulates technical debt. If you spot a bunch of ifs around the code for handling special cases, depending on the values of items in a list, then one possible simple code refactoring could go like this. Just to take a simple example, suppose you have a list of items. For instance, a list of cities around the world. These could be the possible destinations where you would ship some products to. In [1]: cities = [ ‘Manaus’, ‘Belém’, ‘Recife’, ‘Maceió’, ‘Salvador’, ‘Belo Horizonte’, ‘Brasília’, ‘Rio de…

read more

It’s 2020. Why are you not opening up data yet? Bingo!

Post cover image

When I started advocating for and building open data eleven years ago, the world was a very different place. Brazil had neither open data portal nor policy, and even the countries that pioneered the open data agenda were just beginning. Now we can see a very different landscape. Most nation states have joined the open data agenda and feature a one stop shop portal where people can download a plethora of data about almost any subject, including the most important ones. Many local governments do so, too. It may seem so that public sector managers have been since then mostly convinced already of the reasons for opening up data, be they for economic growth, job creation, public sector cost savings…

read more

On the State of Open Data: does it face an identity crisis?

Post cover image

What is the current state of open data around the world? Is open data facing an identity crisis? These are some of the questions that a recent book and its launch event tried to answer. Six months ago, a book contemplating the state of open data around the world was released by the Open Data for Development (OD4D) initiative. The OD4D is a global partnership that supports southern leadership and locally-led data ecosystems around the world as a way to spur positive social change and sustainable development – OD4D’s website The program is hosted by the International Development Research Centre (IDRC) of Canada. The IDRC has also published the book, in partnership with African Minds, a non-profit, open access book…

read more

Counting tabular and map datasets in CKAN

Post cover image

It has come to my attention that some international open data ranking systems, specifically, the Open-Useful-Reusable Government Data (OURdata) Index, measured by the Organization for Economic Co-operation and Development (OECD), do measure not only how many datasets a given national open government data portal has, but also how many of those are tabular and how many are maps. I don’t think measuring the number of datasets in an open government data portal is a very useful metric, considering that governments may just as well split large datasets into smaller ones in order to achieve a larger “amount of datasets”, without adding any benefit or value to the data user. On the contrary, that practice might make relevant data more difficult…

read more

Tokens and tribulations

Post cover image

After fifteen years looking from afar at the evolution of the Brazilian Public Key Infrastructure – ICP Brasil, I have, at last, acquired my own certificate. And, with it, a hardware token to store the private key. I have decided, then, to try and install and use it in an Ubuntu 18.04.2 LTS operating system while documenting the steps in order to help other people that may want to use it in the same system and might face difficulties. Installing the usb token for digital certificates In order to install the GD Starsign token drivers from Giesecke & Devrient GmbH on Ubuntu 18.04.2 LTS, download the drivers from the drivers page from GD South America and unpack the files. Even…

read more

Notes on the course: new advances in digital and open government

Post cover image

This week, once again, we were participating in the course New Advances in Open and Digital Government. The course is lectured by Prof. Dr. Marijn Janssen of Delft Technical University in the Netherlands, and is promoted in Brasília by the Secretariat for Digital Government – SGD – and the National School of Public Administration – Enap. This is the second time such a course was offered, after having debuted in 2018. Most of the participants are public officials from many different organizations in the Brazilian federal government. In this series of posts I will share my main observations and comments on the contents of the course. Note these are my own views and observations and they do not represent in…

read more