Automating the New Blog

After many years of service, I decided to retire my old blog made with [Blogger do Google]. My experience with static site generators has always been very good, since the early static site generators that I wrote even in the 90s, some in Object Pascal (Delphi), others even in Perl! In recent years, I had the pleasure of creating the website for the first [PyCon Amazônia], using [Pelican] and before that, I was already working on a new version of the generator for my [book site]. But I wanted a new blog and something quick, after some research, I decided to try [Hugo], written in [Go] and ultra fast. I found it very simple, easy to install and multi-platform. The new site, hosted on [https://blog.nilo.pro.br] was created by converting the Blogger posts. I also took advantage of porting the infrastructure to AWS, since the cost is very low.

But something was missing, the compilation and publication process were not automated and I realized that it had been a long time since I posted anything, exactly since January 2017. Although the new blog was published in February 2019, no new post appeared. It was already time to solve this problem.

A static site is simply a site where all pages are created beforehand, i.e., they do not depend on a server to generate their content. For example, a website made with WordPress is a dynamic site, because the content of each post comes from a database and each page is built as it is requested. Of course, today there are many ways to improve the performance of these sites, making pages only when they are changed or created. A dynamic site cannot function without its server, especially configured, usually with a database and other necessary parameters for its operation.

Dynamic websites were created to facilitate online content management. The idea was to provide content creators with tools as easy as those of a forum. And it works very well, but at a relatively high cost. For authors who also program, solutions with static sites, similar to how we generate programs, have become very interesting. Static sites can be generated on the author’s/developer’s machine, they do not need a database or a special server. A static site can run even from a local directory on the computer, making it easier to develop and test content.

A static site generator like Hugo processes a series of directories with the blog posts or website articles and compiles them. This article, for example, was written in Word, converted to Markdown with pandoc, edited in Vim, and translated to HTML by Hugo. Hugo also creates all navigation links between pages, including tags and internal pages. Total time on my virtual machine: 242 ms. Yes, the entire blog site takes less than 1/4 of a second to be completely generated. There are more than 300 files and it is fast enough for this to be recompiled as I save the file.

The compilation process itself can be automated because Hugo reads a configuration file and generates the site in a public folder. Hosting on AWS (Amazon Web Services) is done using an S3 bucket. S3 works like a cloud disk where we can copy files. The hosting process for the static site is quite simple, once generated, it copies the files to the S3 bucket and that’s it. The process would be finished if there weren’t two problems. Although S3 can be configured as a web server, it does not cache and you are charged for each access, in large sites this can get expensive. Today it is also interesting to give access to the site in IPv6 and with SSL/TLS certificates. To solve these problems, we need to use another AWS solution called Cloudfront. Cloudfront distributes content across multiple locations around the world, reducing the time of access between the site and the client accessing it, a fact that interests me since my servers are in Europe and the blog is usually read from Brazil.

Using Cloudfront, we can configure a distribution (configuration) indicating what to distribute where, mainly indicating the content origin. An S3 bucket is a compatible origin with Cloudfront. You can also choose to use an SSL/TLS certificate created by AWS for free, as long as it’s used by their services. In this way, we have the content being served close to who accesses it (cache), using SSL, with IPv6 and even supporting HTTP/2! All with just a few clicks and without having to maintain a server running. The monthly cost of all this infrastructure for less than 5000 accesses per month is something below €1, much lower than simpler hosting services.

The remaining step is to update the locally generated files and also generate an invalidation cache in Cloudfront. This invalidation makes the cache empty and Cloudfront retrieves a new copy of the files from S3. If this step is not performed, the update may take some days until the cached copy expires.

Let’s see how to automate it using the AWS client Python on Linux.

# Makes the build/compile the site statically in the public folder
hugo
# Copies the public directory to the bucket. Replace BLOG_BUCKET with the name of your created bucket.
aws s3 sync public/. s3://<<BLOG_BUCKET>>
# Creates an invalidation in Cloudfront. Replace DIST_ID with your distribution ID on CloudFront.
aws cloudfront create-invalidation --distribution-id <<DIST_ID>> --paths "/*"

And the update is done. This way, once a new article is created, the site can be updated with just one command. I’m using this script to update the new blog, hopefully it will help increase the frequency of posts :-D!