Aymeric on Software

Things I Learned in 2020

2021-01-09T00:00:00+00:00

Let’s get the elephant out of the room: 2020 has been a very different kind of year because of the COVID-19 crisis. I am very fortunate that myself, my wife, our family and close friends have not been – so far – impacted in any meaningful way.

However, this blog post is not about COVID-19. In fact, I have been thinking of doing some kind of the yearly retrospective for the last couple of years. So here I am, starting one now, and trying to leave the COVID topic aside.

I do read a lot of development blogs. My interests in programming languages and technologies are varied and somewhat eclectic. The blogs I read range from the serious and in-depth analyses to more light-weight off the cough testimonies. Oftentimes, this is how I stumble on some interesting tool or a piece of technology which I may end up using months or years later. I am writing this blog post with the hope that it may be useful to someone else in the same manner.

TL;DR

So here are three things I have learned in 2020: Terraform, Notion and I somewhat improved my writing skills.

Terraform

I had been aware of Terraform for some time. It is a tool born of the infrastructure as code methodology from DevOps. Unlike older solutions, which were historically rooted in imperative scripts such as Chef or Puppet, Terraform only supports the declarative approach. In other words, you describe the end state of the cloud infrastructure you want, and Terraform figures out which steps are necessary, if any, to bring the system in line with expectations.

Terraform is a cloud agnostic tool. I did not need a multi-cloud tool, so I had reservations in adopting Terraform over a first party solution like CloudFormation for AWS. Both tools play the same role and are declarative, but CloudFormation being a first party product, it has the advantage that it integrates better with the AWS stack. The reasoning is correct, but what I failed to appreciate is how poor and painful it was to use CloudFormation in practice. Terraform provides a much better experience:

The description language of Terraform is less verbose and better designed than CloudFormation’s. It feels like Terraform was designed to be read and written by humans, whereas CloudFormation looks like a memory dump transcribed into YAML or JSON.
The documentation of Terraform is excellent despite being a third party provider. Not that the AWS documentation is bad either.
The killer feature of Terraform is the ability to refactor re-usable code into modules. This is a very straightforward and well-documented process. It compares infinitely better to the kind of hacks one would have to do with CloudFormation to achieve similar results.

One of the drawbacks of Terraform is that the current state of the infrastructure is stored on a file locally, whereas a first party solution like CloudFormation can store the state in the cloud. Personally, I like having the state in a GIT repository and I find it somewhat reassuring.

I currently use Terraform to use all my AWS resources: Route 53 DNS entries, EKR Docker Registries and a handful of Amazon S3 buckets.

Notion

I have been looking for a tool to take and organize notes. One of the many requirements was that the tool needed to be truly cross platform with support for macOS, Windows and iOS and features cloud syncing. In addition, I wanted a tool which can integrate very well with Markdown.

I essentially use Notion to collect links and information when I research a topic and to write design documents. I also adopted the practice of scientists who keep logs about their work. The draft of this blog post was also written in Notion first, before being exported as Markdown and published via Hugo.

I have been looking for a good note taking tool since 2019. I have looked at many and tried a handful. Even though I was aware of Notion for a long time, I was somewhat put off by it. I understood it could be used to publish things on the Internet and it had some kind of database features. Those features are not exactly something I cared for, so they acted as repulsive force, because I was worried about feature creep and unnecessary complexity. These thoughts were echoed by CGP Grey from Cortex; so Notion was put to the back of the pile. How wrong was I!

The truth is that Notion is still a pretty good way to take and organize textual notes. All the normal features from a Markdown editor are well supported, including syntax highlighting for code snippets, even though it uses a WYSIWYG editor. There are some useful additional features like the ability to use Callouts, Emojis or insert LaTeX formulas. The import and export of Markdown is not perfect but it is usable. I have used some of the advanced features like tables, databases and calendars and they are fine. The ability to publish a note as a public webpage on the Internet in one click has turned out to be surprisingly useful, especially since that page automatically updates when the source document in Notion is updated. And to top it all, it looks like my usage of the tool would fit in the free tier forever.

Although I am overall satisfied, Notion has some usability annoyances. Non standard key bindings and mouse control for text selection drive me mad. Occasionally, the WYSIWYG editor will require some non obvious combination of key strokes. But the main downside, by far, is that I am not in control of the data. The way information is stored on the file system or on Notion’s servers is completely undocumented. The Notion API was successfully reversed engineered, but an official Content API should be launching in 2021. Until then, it seems like the official way to backup data is to Export the Workspace as HTML or Markdown, but it is a lossy export. I am currently relying on exports but I hope to start using the upcoming Content API when it becomes available.

Notion’s Alternatives

While researching a note taking tool, I stumbled on Bear. It provides a native editing experience that I like more, but fewer features than Notion overall. The downside is that it is only supported on iOS and macOS (no Windows / no Web). But if that’s not a deal breaker, it is an excellent tool. I also stumbled on Typora which is a very good Markdown Editor, that I use regularly with the Ursine theme. I don’t use Notion’s project management features very much. For very simple lists of things to do, I tend to rely on the much simpler Microsoft To Do.

Improved Written Communication

I have worked from home for most of the year. Even though there are some obvious advantages like avoiding the daily commute grind, the lack of face to face communication has many drawbacks that cannot be fixed by video conferencing.

One of the obvious consequences is that all but trivial communications (which can be done via instant messaging with tools like Slack) must be formalized into a meeting or an exchange of e-mails. Consequently, I have found myself reading and writing more prose than ever before.

The trick with written communication is that you need to be as clear and non-ambiguous as possible while remaining concise. Clarity is very important. If one of your colleagues were to misinterpret one of your ideas, you an easily end up exchanging a few e-mails just to clear things out. This translates into wasted time, and some frustration, for everybody involved. If you had been in the same room, you would have been able to intervene sooner and save that time. Making things clear can involve some amount of repetition or rephrasing, or showing examples. Whenever possible I try to show a diagram or a table instead of prose.

The goal of clarity needs to counter-balanced by the necessity to be short and concise. Long and winding emails are boring to read, so people will just skim them or put them off until the last minute. I know, I am guilty of the same thing. For communication to have an impact, it needs to be read, therefore it needs to be short.

Through repetition, I would like to believe that I have somewhat achieved a basic level of competency at written communication. I am better at it than last year, and definitely better than ten years ago, as plainly demonstrated by the earliest – cringeworthy – posts on this very blog.

As an aside, I would like to recommend “Have You Eaten Grandma?”, a book which is very humorous way to review the basic rules of grammar and punctuation in British English. (Yes this blog is written in American English except for rule about punctuation within quotes or brackets.)

My Firefox Setup

2020-11-06T00:00:00+00:00

Why I Use Firefox

I have used Firefox since the very beginning on Windows. I stuck with it even in the early 2010s, when it clearly lagged behind Chrome, because of a profound distaste for Chrome’s GUI and Google’s business practices. In my opinion, Firefox Quantum released in 2017, has addressed most of the performance gap between Chrome and Firefox. In many cases, Firefox feels even snappier than Chrome. I still use Chrome every day, but mostly as my “debugging” browser for web development, my “translator” and, very rarely, my “Flash player”.

On macOS, I have used Safari since the beginning. However, as of macOS 10.15, I was forced to switch to Firefox when Apple decided to disable the legacy JavaScript API for extensions. The new extension model involves installing an App from the App Store. The “container” App provides an XPC extension which runs inside the Safari browser. This is arguably more efficient than JavaScript extensions, but this choice makes it very difficult for developers to port existing extensions to Safari. The “container” app also provides a much greater attack surface than a mere JavaScript extension.

Content Blocking Extensions

The move to the new extension API in Safari killed support for my favorite extension: uBlock Origin.

For performance and security reasons, Apple decided to no longer let extensions intercept HTTP requests in Safari. Instead, extensions had to migrate to the newer content blocking API introduced in 2015, as part of iOS 9. This API requires extensions to provide a fixed list of blocking rules, identical for all sites. This is nowhere near as effective as uBlock Origin’s dynamic rules. However, the privacy advantage is undeniable, since extension no longer have access to the sensitive information that can be inferred from intercepting all HTTP requests.

This debate about content blocking extensions was recently revived by the upcoming transition to Manifest v3 for Google Chrome extensions. Google is pushing for a similar approach to Apple’s with their new declarativeWebRequest API. They also argue that their new API would be more efficient and safer. However, I am somewhat skeptical, since the old webRequest API will be kept as part of manifest v3 minus the ability to block requests. In other words, Google plans to cripple content blocking extensions like uBLock Origin but other extensions would still be able to spy on users by sniffing requests. The old webRequest blocking API will also be kept for their “Enterprise customers”, which I interpret as Google’s partial acknowledgement that their new API does… indeed… suck.

My Firefox Extensions

This is the list of my Firefox Extensions:

uBlock Origin. As previously mentioned, I ditched Safari in favor of Firefox because of this extension. It is the best and fastest ad blocker. It can also be configured to block any kind of content on the spot. Content blocking extensions not only make web pages render better and faster, but also more securely since legitimate ad networks have been tricked many times in the past into distributing malware.
HTTPS Everywhere. This extension ensures that you always use HTTPS for a site that support it, even though you may be following an HTTP link. It is becoming less of an issue, but still worth having.
Cookie AutoDelete. This extension can be configured to delete cookies and local storage data for all visited sites, except for a short list of exceptions. This is more aggressive than the built-in delete cookies when Firefox is closed option since the extension will activity attempt to delete cookies while Firefox is running. I would recommend turning off notifications, since this extension is a bit too proud of itself for deleting cookies.
I don’t care about cookies. This extension automatically discards the annoying EU Cookie Banners that seem to spread around the Internet like the plague. I really do not need know about your cookies, nor care, since they will soon be deleted (see above). The only downside of this extension is that it does occasionally break sites. When this occurs, disable the extension for the site, click on the banner, and move on.
Dark Reader. I use macOS built-in feature to automatically switch to dark mode at night. The problem is that many web pages do not implement dark mode and carry on using very light backgrounds. Dark Reader attempts to automatically switch the pages to have light text on dark background. It usually does a decent job and it can be tweaked for more challenging sites. The only downside is that it seems unable to detect sites that natively support dark mode via the prefers-color-scheme media query.
Firefox Multi-Account Containers. This extension creates boundaries for cookies and local storage data. It is conceptually similar to running different tabs in the browser under different user profiles. I use this extension to segment sites where I need to remember cookies. For personal use the obvious example is social media. I also use the extension to keep multiple Google Classroom accounts simultaneously opened, to track the progress of my children’s online learning. I also use dedicated containers for work.
Tampermonkey. This extension allows you to run a custom scripts to modify the appearance and functionality of any sites. I use it to improve some internal web applications we use at work. I also use it to make Google search results look the same as before. It is also trivial to remove ads from Google search results.
Instapaper. I use Instaper to quickly save interesting web pages I do not have the time to read right away. I can go back to them later in the evening and read them on my phone.
Lastpass is my password manager. It works and it is cross-platform. I am very convinced one needs to use a password manager, but there are many things in Lastpass that annoy me. I may change in the future.

Configuration Tweaks

I use the following configuration tweaks to improve privacy without resorting to a VPN:

Enable DNS over HTTPS (aka DoH). This feature is normally turned on by default, except if you live in the UK as it frustrates the government surveillance efforts (see Snoopers’ Charter) and the “voluntary” porn filters run by ISPs. The UK Internet Service Providers Association has named Firefox an Internet Villain over the issue.
Turn on Encrypted SNI support (aka eSNI). This ensures that Firefox does not leak the name of the visited sites in the clear, when doing a TLS handshake. By default, over plain old SNI, the domain name is sent in plain text before the encrypted connection is established.

You can verify the state of DoH and eSNI using this convenient Test Page from Cloudflare.

Alternatively, if you live in the UK, you could also try to visit the top banned site which cannot effectively be blocked by ISPs when using DoH and eSNI. Since the site is hosted by CLoudFlare, IP blocking would result in massive over blocking of thousands of legitimate sites.

The Future is Uncertain

Unfortunately, it looks like my current set up is living on borrowed times. Mozilla laid off the majority of the Servo team last August, which begs the question about the future of the Firefox, since Servo backported a lot of their technologies to make Firefox Quantum possible. The Servo project on Github is now essentially dead. I am somewhat less worried about the Rust language itself, since it is now large and useful enough to survive on its on merits.

Instead, Mozilla seems to be focusing their energy on their VPN offer and their clone of Have I been pwned?. Obviously these services have a clear monetization strategy compared to a free web browser, but it’s hard to imagine how Mozilla intends to fulfill their self professed goal of providing a better Internet for People with these niche and unambitious projects. If the Firefox browser is unable to keep pace with its commercial competitors, it will fail to attract new users.

In any case, I may transition back to Safari on the Mac. The upcoming version of Safari will support manifest v2 JavaScript extensions as announced at WWDC. This appears to support webRequest, so a port of uBlock Origin may be theoretically possible.

Another alternative would be to consider using Brave, which is yet another Chromium based browser with a focus on privacy. However, the browser has made some controversial moves in the past.

Powered by Hugo

2020-10-10T00:00:00+00:00

This blog is now powered by Hugo, a static page generator, and features a refreshed look. This is hopefully the first of many posts after a years long hiatus.

Old Setup

When I started the blog in 2009, it was powered by a self-hosted Wordpress instance. Since I was getting frustrated with maintaining PHP, MySQL and Wordpress up to date, I decided to switch to Octopress, a statically generated site, in 2013.

Overall, I have been satisfied with Octopress, a somewhat abandoned setup for Jekyll, the static site generator written in Ruby that powers Github Pages.

The setup enabled me to write blog posts in Markdown rather than plain HTML. The beginning of each markdown file contained a small front matter header to describe the post’s meta data such as titles and publication dates. These files were then fed into the Jekyll machinery that generated this complex web site, including an Atom feed and a sitemap.

This setup served me well for many years but I had two growing sources of annoyances. First, the theme I was using, looked incredibly dated and was overdue for a refresh. Second, the Ruby stack that powers Jekyll can be very finicky: I had to install a very specific version of Ruby (via rvm or rbenv) to make things work.

New Setup

For the new setup, I have decided to use Hugo, another static site generator, written in Go. Hugo is very similar to Jekyll, as it also uses Markdown and Front Matter. This made the transition easier. For the most part, I was able to re-use my existing markdown source files unchanged.

The motivation to choose Hugo over Jekyll is that Hugo is a batteries included solution, which covers all the needs of this blog. I did not have to worry about Ruby and Gem dependencies. My Hugo setup is as simple as sudo port install hugo. Moreover, I have found that Hugo’s documentation is really excellent: every time I wondered how to do something, I was able to figure it out rapidly.

This blog now features a modern theme, with a refined minimalist layout, that let readers concentrate on the content. The garish gradients and drop shadows of the previous theme are gone. I have also re-introduced color on this site, by using slightly different shades of blue as accent or background colors. A dark mode color scheme is also supported and displayed when requested by the browser.

For the first time, I made my own theme, rather than use one readily available on the Internet, as I now know enough CSS and SASS to be dangerous. However, my work was loosely based on the Hugo Bootstrap v4 Blog theme. It uses Bootstrap 4 for the layout, Fortawesome for the symbols, and Lato as the main font.

Another change is the lack of Google Analytics, as I have grown incredibly frustrated with Google’s approach to privacy, and their lack of innovation. In addition, many laws require to show cookie banners to end users, which is hardly conducive to a good user experience. I am currently pondering relying solely on the NGINX access logs and running either a self hosted Matomo or Plausible, but currently I have no analytics.

What’s Next?

Obviously, I intend to blog a lot more in 2020 than I did I the past few years.

Aside from inevitable tweaks and fixes, I don’t have any major plans for this blog with the exception of search.

Another change in the pipeline is to complete review the hosting which has been on Kimsufi since 2014.

The .NET Core Zoo

2017-11-18T07:55:00+00:00

Unless you have been living under a rock, you may have noticed that Microsoft has been iterating at a furious pace on their .NET Core initiative for the past two years.

It is a clean reboot of the .NET Framework that is both open-source and cross-platform, targeting Windows, macOS and Linux.

For a company that has historically been an enemy of open-source software, it is difficult not to be amazed at the amount of openness that Microsoft is demonstrating with .NET Core. Not only have they released the entire stack as open source software but the development is also being done in the open on Github.

The issue with .NET Core is that it can easily be confused with older closed source products that Microsoft has released for the past 15 years. The problem is compounded by Microsoft’s willingness to re-use similar names to designate radically different underlying technologies.

In this post, I will provide you with an overview of .NET Core, ASP.NET Core and Visual Studio, and I will explain how the open-source flavors differ from their closed source relatives.

The Different Flavors of .NET Framework

Different flavors of the .NET framework have coexisted for some time. Typically each flavor combines a different runtime and a different set of standard libraries.

The .NET framework denotes the classical Windows-only .NET framework. This is the oldest flavor of .NET which shipped with Windows XP.
The .NET Core framework is the new open-source framework released in 2016. It works on Linux, macOS and Windows. Generally speaking, it can be considered as a subset of the older .NET Framework.
Xamarin Mono is a third party framework developed by open source Linux developers. It predates .NET Core by more than 10 years. It is also considered a subset of .NET Framework, because the Xamarin developers have never managed to quite catch up with Microsoft. However Mono also features special support for iOS, Android, Game Consoles, and Linux UI frameworks, which is entirely absent from Microsoft’s flavors.
.NET Standard is a recent initiative from Microsoft which formalizes the APIs shared by the different flavors of .NET. If your code targets .NET Standard, it should be able to run without recompiling on all flavors of .NET runtimes: .NET, .NET Core and Xamarin Mono.

The Different Versions of .NET Core

Microsoft’s download page for .NET Core shows many different options:

Microsoft offers both a Long Time Support (LTS) version and a regular version. The LTS version is older but Microsoft promises to support it for about 3 years. The downside is that it lags behind in terms of features.
A runtime and SDK versions. The SDK includes the runtime necessary for running .NET Core executables, but also all the development tools (i.e. the compiler and dotnet command line tool). Thus you only need to install one of them.
The Windows Server Hosting includes the runtime, but also a special module necessary to run ASP.NET Core sites within IIS. Since IIS is only available on Windows the Windows Server Hosting is not available on Linux nor macOS.

The Different Versions of Visual Studio

You can develop for .NET Core by using the text editor of your choice and the command line dotnet tool. However, I would highly recommend that you have a look at the different versions of Visual Studio that Microsoft provides.

Visual Studio 2017 for Windows. This is the traditional version of Visual Studio. The Community edition is free and fully featured when it comes to .NET and .NET Core development. The only downside is that you will probably have to use the paid-for Professional or Enterprise editions in a commercial setting.
Visual Studio Code is an open-source text editor based on Atom. It is a very capable text editor which also provides advanced IDE features, such as refactoring tools, compiler and debugger support for common languages including C#. It competes against Atom and Sublime Text.
Visual Studio 2017 for Mac. Although it shares the same name as the Windows version, it is a radically different piece of software based on Mono Develop. It is a bit rough and has definitely not achieved the level of maturity as the Windows counterpart. That being said, Visual Studio Code was in a similar state when it was introduced in 2015, and I am confident that Microsoft will iterate and make Visual Studio for Mac a lot better soon.

The Different Versions of ASP.NET

This the area where the marketing at Microsoft has gone wrong. The name ASP.NET Core refers to old technologies which gives this very modern and compelling framework a very bad press, in my opinion.

ASP, Active Server Pages is a very old technology released in the late 1990s. Think of it as a clone of PHP but using Visual Basic instead. It is also similar to what JSP for Java used to be. It was abandoned long ago and is not suitable for modern web application development.
ASP.NET superceded ASP in the early 2000s, and development shifted from Visual Basic to C#. It is a framework based on Web Forms, with a page template system powered by XML and uses the *.aspx file extension. The framework was invented before AJAX became mainstream, and a lot of effort seems to be devoted to make sure developers never write any JavaScript. It also integrates with the System.Web pipeline of IIS which, even though is a step up from CGI, makes use of global variables to represent the state of requests and responses. Even though it is still maintained, it is a technology that is clearly showing its age.
ASP.NET MVC is a modern framework that Microsoft introduced in 2009. It is inspired by the successful Ruby on Rails framework, and as the name indicates, features a clear separation between Models, Views and Controllers. It also introduced a better view template system named Razor and improved routing. Even though it is still integrated with the same System.Web IIS centric pipeline, it goes out of its way to hide that fact from developers, and wrap HTTP requests and responses into non-global variables, which can be mocked for the purpose of unit testing.
ASP.NET Web API is another framework Microsoft introduced in 2012. It is very similar to ASP.NET MVC, to the point that entire class hierarchies between the two frameworks share identical names and functions. But unlike ASP.NET MVC which focuses on generating dynamic HTML pages, ASP.NET Web API as the name indicates, targets the creation of Web APIs. It is a pretty good framework to write JSON RESTful APIs for instance.
Finally ASP.NET Core is a complete rewrite and a merge of both ASP.NET MVC and ASP.NET Web API. It was released in 2016 and is developed in the open on Github. It makes great use of dependency injection and provides support for writing middleware. ASP.NET Core web applications can be deployed to Linux servers or Docker containers.

The Different Versions of ASP.NET Core

As I indicated in the previous section, ASP.NET Core is a complete reboot of the Microsoft framework for writing web application software.

But the moniker ASP.NET Core denotes some radically different realities:

The version of ASP.NET Core which targets Kestrel. This is the main version that Microsoft is promoting. Kestrel is an open-source asynchronous HTTP server with very good performance characteristic. If you target .NET Core as your runtime, this is the only available option. Kestrel and .NET Core can be deployed to Linux (possibly using nginx or Apache as a reverse proxy). You can also target .NET Standard or .NET 4.7 for Windows. In this flavor of ASP.NET Core, the entire HTTP pipeline has been rewritten from scratch, and you can inject your own middleware, as in any modern web application framework.
The version of ASP.NET Core which integrates with the IIS pipeline. This flavor of ASP.NET Core integrates with the classic IIS pipeline introduced for ASP.NET and it relies on the System.Web classes and their global variables. If you adopt this flavor of ASP.NET Core you have to target .NET 4.7 and run the web application on IIS on a Windows Server. In addition some features, such as writing your own custom middleware, are not available. The advantage of this flavor of ASP.NET Core, is that it can be viewed as an incremental update from the old ASP.NET MVC and ASP.NET Web API frameworks. Unless you have some existing ASP.NET code, you should probably stay away from this configuration for new projects.
Finally you can use ASP.NET Core with Kestrel inside IIS. So this is the option where you are essentially using IIS as a reverse proxy for the Kestrel server. In this version you sort of get the best of both worlds: your Windows centric IT team can continue managing IIS sites as before, and you get all the features and performance improvements available to the Kestrel stack. This is achieved by installing a custom IIS module provided by Microsoft, which is called the ASP.NET Core Module. This is the recommended way to deploy ASP.NET Core on Windows servers. On Linux Microsoft recommends using nginx or Apache in similar reverse proxy configurations.

Conclusion

.NET Core and ASP.NET Core are very compelling open source technologies promoted by Microsoft. The frameworks, libraries, compilers and runtime are published on Github. You can use Visual Studio Code, another open-source piece of software to write C# code and deploy it to Linux, possibly via docker containers.

But make no mistake, Microsoft still has ways to make money:

Microsoft sells access to Azure, their own cloud computing platform which competes with the likes of AWS and Google Cloud. Naturally Azure integrates very well with ASP.NET Core.
Microsoft also sells Visual Studio 2017, which unlike Visual Studio Code is closed source software. Although it may be used for free under some circumstances, for most commercial settings you will have to pay a subscription to use it.
Microsoft is also strongly pushing SQL Server, their closed source relational database server. It has better integration than any other SQL databases with the .NET Core stack. Traditionally SQL Server used to run on Windows Servers only, but since the last release of SQL Server 2017 last October, the software can now be deployed to Linux servers as well.

Let's Encrypt with Nginx

2015-12-09T22:30:32+00:00

The Objectives

One of the major hurdles hampering the deployment of HTTPS on smaller websites like this one, has always been the price of certificates. As much I would have liked to get one, I could hardly justify the cost. That’s why a year ago, when the Let’s Encrypt project was announced, with the promise of free domain certificates, I was particularly excited. I decided to migrate my websites as soon as the project reached the public beta phase, last week.

Even if your site does not require HTTPS for security reasons, it is worth considering:

to provide additional privacy to your users. Unscrupulous ISPs have started to inject javascript or cookies into third party HTTP pages. They cannot do this with HTTPS pages.
to benefit from the performance improvements of HTTP/2. Even though in theory HTTP/2 does not require TLS, all major browsers have decided to boycott the “plaintext” version of the protocol.

In this blog post, I intend to explain how I migrated this Octopress blog hosted with nginx from HTTP to HTTPS and obtained an A+ Grade from SSL Labs.

Intall Let’s Encrypt

This part of the process is very well covered by the Let’s Encrypt documentation. Since there are no packages for Ubuntu Server LTS at the moment, I used the source code approach. In this case, we end up using the letsencrypt-auto command instead of letsencrypt directly. letsencrypt-auto is a wrapper script that ensures that the tool and its dependencies are up to date, prior to running any letsencrypt commands.

# Optionally install git
sudo apt-get install git-core

# Checkout the let's encrypt git repository into /srv/letsencrypt
git clone https://github.com/letsencrypt/letsencrypt /srv/letsencrypt

# Run the tool at least once
cd /srv/letsencrypt
./letsencrypt-auto --help

When I did this, I happened to have some kind of InsecurePlatform Python warning.

Creating virtual environment...
Updating letsencrypt and virtual environment dependencies...../home/aymericb/.local/share/letsencrypt/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
./home/aymericb/.local/share/letsencrypt/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning

This warning comes from urllib3. I ignored it and everything was fine!

Getting the Certificate

At this stage you are ready to request a certificate for the website. Let’s Encrypt uses the ACME protocol, which requires serving a bespoke generated file on the web server to confirm ownership of a domain. The letsencrypt tool aims to automate the whole setup, so that writing ./letsencrypt-auto --apache -d blog.barthe.ph should be sufficient to get the entire web server up and running, if you happen to use Apache.

Unfortunately, the plugin for nginx is not stable yet and cannot be used. The setup will be slightly more complicated. We will use the --webroot switch, telling letsencrypt where to find our web server files, and afterwards, we will have to modify the nginx configuration files ourselves.

Getting the Let’s Encrypt certificate is the easy part:

./letsencrypt-auto certonly --webroot -w /srv/blog.barthe.ph/www -d blog.barthe.ph

You are asked to provide an email address (used to warn you when your certificate is about to expire), and agree to the terms and conditions.

Afterwards, you need to manually edit the nginx configuration files using your favorite editor (e.g. nano -w /etc/nginx/sites-available/blog.barthe.ph). The TLS certificate and the private keys are located in /etc/letsencrypt/live/blog.barthe.ph.

# Add the following to redirect HTTP requests to HTTPS
server {
        server_name blog.barthe.ph;
        listen 80;
        return 301 https://$server_name$request_uri;
}
# Modify the existing HTTP setup to use HTTPS
server {
        server_name blog.barthe.ph;
        listen 443 ssl;

        # Basic TLS setup
        ssl_certificate /etc/letsencrypt/live/blog.barthe.ph/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/blog.barthe.ph/privkey.pem;
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

        # keep the rest of the site configuration unchanged
        # [...]
 }

You’ll notice that in the configuration, I chose to support TLS 1.0 and greater. I wanted to be more aggressive but TLS 1.1 is not that well supported. It does not work on OS X 10.8 for instance. However, I did drop support for the older SSLv3 protocol (which preceded TLS 1.0) and is still supported by most websites. The consequence is that my site is no longer compatible with older versions of Internet Explorer (IE6 to IE8). Supporting SSLv3 without weakening the security of modern TLS 1.2 browsers is becoming increasingly difficult because of downgrade attacks such as POODLE.

In order for the changes to take effect you should invoke sudo service nginx reload. If you botched the configuration file, the site will continue running with the existing configuration and you should get an error in /var/log/nginx/error.log.

Once that’s done, you should have a web server running with HTTPS, and that redirects older HTTP URLs to their HTTPS equivalents. W00t!

A+ Grade Security

When testing the Octopress server on different browsers, I got some warnings about mixed content. It means that although my website was configured to serve its content through HTTPS, it still referenced URLs that used HTTP, either for internal URLs or externals URLs (such as Google Fonts, etc…). Chrome in particular was the most strict of all the browsers about this. After spending some time searching and replacing all references to HTTP URLs, I managed to fix all the warnings.

The next step was to edit /etc/nginx/sites-available/blog.barthe.ph to improve performance, by enabling TLS session caching:

ssl_session_cache shared:SSL:10m;
ssl_session_timeout 20m;

Let’s now focus on improving security and get an A+ grade on SSL Labs.

Let’s start by overriding the default prime used for Diffie Hellman. The default prime is usually too small, and also since it is a shared prime, some pre-computed attacks like Logjam are possible. Beware, the openssl command will take a long time; from 20 minutes to a few hours.

# Be patient…
openssl dhparam -out /etc/ssl/private/dhparams_4096.pem 4096

# Add "ssl_dhparam /etc/ssl/private/dhparams_4096.pem;" to nginx config

Now, let’s add support for OCSP stapling. OCSP is a mechanism that can be used by the browser to confirm that a certificate has not been revoked. This normally involves an extra connection to the certificate authority, unless the website uses stapling.

ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/letsencrypt/live/blog.barthe.ph/fullchain.pem;

Let’s enable HSTS. This prevents a browser that connected to the website at least once before, from ever accepting an HTTP connection from the same domain.

add_header Strict-Transport-Security max-age=31536000;  # Valid for 1 year

Finally let’s massage the cipher suite to only use safe and secure ciphers.

ssl_prefer_server_ciphers on;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:... SEE BELOW';

In order to obtain the list of ciphers separated by :, use openssl ciphers -V to dump all available ciphers. Then do as follow:

Delete all references to DES, 3DES and RC4, and just keep AES ciphers.
Delete all references to MD5.
Put all ciphers that use SHA1, whose name ends with _SHA and are SSLv3 at the bottom.
Put ciphers with larger key lengths on top.
Put ciphers with forward secrecy on top. That would be ECDHE or DHE, the DH stands for Diffie-Hellman, and the E for Ephemeral, which means a new key is negotiated for each connection. Stealing the private key will not allow an attacked to decrypt past TLS sessions.
Favor GCM over CBC as a mode of operation. This is faster, and CBC brought security issues like POODLE in the past.
Favor Elleptic Curves (anything with EC) over plain RSA, DSA, or DHE. Elliptic curves are faster because they require smaller primes than traditional crypto, and as far as we know the security behind the maths is solid.

Here’s my list:

0xC0,0x2C - ECDHE-ECDSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH     Au=ECDSA Enc=AESGCM(256) Mac=AEAD
0xC0,0x30 - ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH     Au=RSA  Enc=AESGCM(256) Mac=AEAD
0xC0,0x24 - ECDHE-ECDSA-AES256-SHA384 TLSv1.2 Kx=ECDH     Au=ECDSA Enc=AES(256)  Mac=SHA384
0xC0,0x28 - ECDHE-RSA-AES256-SHA384 TLSv1.2 Kx=ECDH     Au=RSA  Enc=AES(256)  Mac=SHA384
0x00,0xA3 - DHE-DSS-AES256-GCM-SHA384 TLSv1.2 Kx=DH       Au=DSS  Enc=AESGCM(256) Mac=AEAD
0x00,0x9F - DHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=DH       Au=RSA  Enc=AESGCM(256) Mac=AEAD
0x00,0x6B - DHE-RSA-AES256-SHA256   TLSv1.2 Kx=DH       Au=RSA  Enc=AES(256)  Mac=SHA256
0x00,0x6A - DHE-DSS-AES256-SHA256   TLSv1.2 Kx=DH       Au=DSS  Enc=AES(256)  Mac=SHA256
0xC0,0x2B - ECDHE-ECDSA-AES128-GCM-SHA256 TLSv1.2 Kx=ECDH     Au=ECDSA Enc=AESGCM(128) Mac=AEAD
0xC0,0x2F - ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 Kx=ECDH     Au=RSA  Enc=AESGCM(128) Mac=AEAD
0xC0,0x23 - ECDHE-ECDSA-AES128-SHA256 TLSv1.2 Kx=ECDH     Au=ECDSA Enc=AES(128)  Mac=SHA256
0xC0,0x27 - ECDHE-RSA-AES128-SHA256 TLSv1.2 Kx=ECDH     Au=RSA  Enc=AES(128)  Mac=SHA256
0x00,0xA2 - DHE-DSS-AES128-GCM-SHA256 TLSv1.2 Kx=DH       Au=DSS  Enc=AESGCM(128) Mac=AEAD
0x00,0x9E - DHE-RSA-AES128-GCM-SHA256 TLSv1.2 Kx=DH       Au=RSA  Enc=AESGCM(128) Mac=AEAD
0x00,0x67 - DHE-RSA-AES128-SHA256   TLSv1.2 Kx=DH       Au=RSA  Enc=AES(128)  Mac=SHA256
0x00,0x40 - DHE-DSS-AES128-SHA256   TLSv1.2 Kx=DH       Au=DSS  Enc=AES(128)  Mac=SHA256          
0xC0,0x2E - ECDH-ECDSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH/ECDSA Au=ECDH Enc=AESGCM(256) Mac=AEAD
0xC0,0x32 - ECDH-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH/RSA Au=ECDH Enc=AESGCM(256) Mac=AEAD
0xC0,0x26 - ECDH-ECDSA-AES256-SHA384 TLSv1.2 Kx=ECDH/ECDSA Au=ECDH Enc=AES(256)  Mac=SHA384
0xC0,0x2A - ECDH-RSA-AES256-SHA384  TLSv1.2 Kx=ECDH/RSA Au=ECDH Enc=AES(256)  Mac=SHA384
0xC0,0x31 - ECDH-RSA-AES128-GCM-SHA256 TLSv1.2 Kx=ECDH/RSA Au=ECDH Enc=AESGCM(128) Mac=AEAD
0xC0,0x2D - ECDH-ECDSA-AES128-GCM-SHA256 TLSv1.2 Kx=ECDH/ECDSA Au=ECDH Enc=AESGCM(128) Mac=AEAD
0xC0,0x29 - ECDH-RSA-AES128-SHA256  TLSv1.2 Kx=ECDH/RSA Au=ECDH Enc=AES(128)  Mac=SHA256
0xC0,0x25 - ECDH-ECDSA-AES128-SHA256 TLSv1.2 Kx=ECDH/ECDSA Au=ECDH Enc=AES(128)  Mac=SHA256
0x00,0x3D - AES256-SHA256           TLSv1.2 Kx=RSA      Au=RSA  Enc=AES(256)  Mac=SHA256
0x00,0x9C - AES128-GCM-SHA256       TLSv1.2 Kx=RSA      Au=RSA  Enc=AESGCM(128) Mac=AEAD
0x00,0x3C - AES128-SHA256           TLSv1.2 Kx=RSA      Au=RSA  Enc=AES(128)  Mac=SHA256
0xC0,0x0A - ECDHE-ECDSA-AES256-SHA  SSLv3 Kx=ECDH     Au=ECDSA Enc=AES(256)  Mac=SHA1
0xC0,0x14 - ECDHE-RSA-AES256-SHA    SSLv3 Kx=ECDH     Au=RSA  Enc=AES(256)  Mac=SHA1
0xC0,0x09 - ECDHE-ECDSA-AES128-SHA  SSLv3 Kx=ECDH     Au=ECDSA Enc=AES(128)  Mac=SHA
0xC0,0x13 - ECDHE-RSA-AES128-SHA    SSLv3 Kx=ECDH     Au=RSA  Enc=AES(128)  Mac=SHA1
0x00,0x39 - DHE-RSA-AES256-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(256)  Mac=SHA1
0x00,0x38 - DHE-DSS-AES256-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(256)  Mac=SHA1          
0x00,0x33 - DHE-RSA-AES128-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(128)  Mac=SHA1
0x00,0x32 - DHE-DSS-AES128-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(128)  Mac=SHA1          
0xC0,0x22 - SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(256)  Mac=SHA1
0xC0,0x21 - SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(256)  Mac=SHA1
0xC0,0x20 - SRP-AES-256-CBC-SHA     SSLv3 Kx=SRP      Au=SRP  Enc=AES(256)  Mac=SHA1
0xC0,0x1F - SRP-DSS-AES-128-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(128)  Mac=SHA1
0xC0,0x1E - SRP-RSA-AES-128-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(128)  Mac=SHA1          
0xC0,0x1D - SRP-AES-128-CBC-SHA     SSLv3 Kx=SRP      Au=SRP  Enc=AES(128)  Mac=SHA1
0xC0,0x05 - ECDH-ECDSA-AES256-SHA   SSLv3 Kx=ECDH/ECDSA Au=ECDH Enc=AES(256)  Mac=SHA1
0xC0,0x0F - ECDH-RSA-AES256-SHA     SSLv3 Kx=ECDH/RSA Au=ECDH Enc=AES(256)  Mac=SHA1
0xC0,0x0E - ECDH-RSA-AES128-SHA     SSLv3 Kx=ECDH/RSA Au=ECDH Enc=AES(128)  Mac=SHA1
0xC0,0x04 - ECDH-ECDSA-AES128-SHA   SSLv3 Kx=ECDH/ECDSA Au=ECDH Enc=AES(128)  Mac=SHA1
0x00,0x35 - AES256-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(256)  Mac=SHA1
0x00,0x2F - AES128-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(128)  Mac=SHA1

Automatic Updates of the Certificate

As a matter of policy the certificates emitted by Let’s Encrypt are only valid for a short period of 90 days. This explains why the project focuses so much on automation.

It is possible to renew the certificate by executing the following commands, which I added to a /srv/letsencrypt/cron-update.sh script

#!/bin/sh
/srv/letsencrypt/letsencrypt-auto certonly --webroot -w /srv/blog.barthe.ph/www -d blog.barthe.ph --renew-by-default  --agree-tos
service nginx reload

Unfortunately, you’ll find out that the letsencrypt-auto command fails with the following error:

Failed authorization procedure. blog.barthe.ph (http-01): urn:acme:error:unauthorized :: The client lacks sufficient authorization :: Error parsing key authorization file: Invalid key authorization: 231 parts

I found out that the ACME protocol attempts to validate the ownership using HTTP and is unable to follow the HTTP to HTTPS 301 redirection we set up at the beginning. So I had to slightly change my setup in /etc/nginx/sites-available/blog.barthe.ph.

# The following lines prevent Let's Encrypt ACME protocol from working 
#server {
#        server_name blog.barthe.ph;
#        listen 80;
#        return 301 https://$server_name$request_uri;
#}
#
# Replace by these lines, to continue serving /.well-known/ files on port 80
server {
        server_name blog.barthe.ph;
        listen 80;
        location /.well-known/ {
                root /srv/blog.barthe.ph/www;
                try_files $uri $uri/ =404;
        }
        location / {
                return 301 https://$server_name$request_uri;
        }
}

The new version let all ACME requests to /.well-known/ be served normally over HTTP by nginx, but redirects all the others to HTTPS. Once that change is done and nginx reloaded, executing the letsencrypt-auto successfully updates the certificate.

The next is to add a CRON job, by typing crontab -e (as root). I set up mine to update the certificate once a month, as follow:

# contrab -e
# m h  dom mon dow   command
0 11 7 * * /srv/letsencrypt/cron-update.sh

Conclusion

So far I’m relatively happy with Let’s Encrypt. I have chosen to run the letsencrypt-auto run with root privileges, but if that bothers you, there is a project named Let’s Encrypt no sudo which aims to prevent that.

For websites with more traffic, you could delay the HTTP to HTTPS redirection until you have fully tested the HTTPS version of the site. This is a good opportunity to fix all mixed URL scheme content issues.

Finally, I published my final /etc/nginx/sites-available/blog.barthe.ph config file as a Gist.

Edit. I initially made a mistake and wrote return 301 https://blog.barthe.ph instead of return 301 https://$server_name$request_uri in the nginx configuration. The consequence is that URLs starting with http would be redirected to the home page of the blog, instead of the https counterpart page. This defect was somehow masked by HSTS, because after browsing once to the site, the browser would redirect to the correct pages.

Migrated to Kimsufi

2014-09-17T08:12:55+01:00

Originally, this website was powered by a shared hosting plan on Dreamhost, but I migrated to a fully hosted solution over a year ago.

After considering many options, I chose a Dedibox SC gen 2, a dedicated server solution from Online.net a French server manged hosting provider. The machine features a fairly low spec CPU, a 500GB hybrid SSD-HDD disk, 2GB of RAM and unlimited bandwidth with guaranteed throughput.

All that for €12 a month! It is much cheaper that all other hardware of VPS solutions I could find, and in fact about the same price as the Dreamhost hosting plan it replaced.

France seems to have two companies (Online with Dedibox and OVH with Kimsufi) dedicated to offer very low cost hosting solutions. The prices can be low thanks to custom dedicated hardware that is both small and power efficient. As you can see below, a Dedibox SC is barely larger than a 3.5 inch HDD. My gut feeling is that the emergence of these offers in France is linked to the three strike piracy law and the disappearance of Megaupload. A lot of these servers were used as seed boxes, at least in the beginning…

The hosting on Dedibox turned out to be very good and sufficient for my needs. A properly configured nginx server, that serves mostly static pages, has no problem serving the 20,000 hits per month my sites require. I initially hesitated using Amazon S3 for some of the most popular pages, but dropped the plan entirely. I also have a few dynamic pages, but these are mostly for personal use. The PHP code of Exposition, my unfinished photo gallery software, works a lot faster than on Dreamhost.

The support on Dedibox is not great though. I had some billing issue at the beginning and it took forever for it to be resolved. I’m not sure I would trust them for something critical, like a business.

In the past few days, I made the switch to their competitor Kimsufi. I was running out of space on the 500GB disk of the Dedibox, because of my large photo collection. I opted for a KS-2, which is the exact same price price as the Dedibox SC Gen 2. It is normally fitted with a 1TB disk, but I got 2TB disk instead (surprise!). The KS-2 is also fitted with 4GB of RAM instead of the 2GB of the Dedibox.

The rest is worse though. On Dedibox, the bandwidth is guaranteed, whereas on Kimsufi it’s best effort. In practice, I have not noticed any differences. The disk on the Dedibox is a hybrid SDD; Kimsufi uses a traditional slower spinning disk. Dedibox offers free monitoring services, like email notifications when the machine is turned on or off. Kimsufi offers nothing. OVH has a full featured iOS app that used to support Kimsufi users, but is now restricted to the pro tier of OVH hosting. Shame on you OVH! So I ended up configuring a free pingdom.com account for basic monitoring.

In the end, it’s a pretty good deal if you need the storage or the RAM. 2TB of “Cloud Storage” (it should be 1TB) for €12 month and similar fees with no limits in bandwidth is pretty good. It’s more than what you would get from Amazon S3 or Dropbox for the same price. But if you’re fine with 500GB or only 2GB of RAM, I would recommend the Dedibox offer instead.

Let’s hope the new hosting is as reliable as the previous one…

Compute the Balance of a Bitcoin Wallet with node.js and blockchain.info

2014-07-23T21:23:24+01:00

The Problem

As I have explained in great details in a previous post, the balance of a Bitcoin wallet differs from the balance that can be computed by looking at the blockchain. That’s because every time you make a transaction, the Bitcoin software sends back to you, the fraction of the funds that were not sent. These funds are sent back to a new address, invisible to the end users. These addresses are pre-allocated and constitute the key pool.

In this article, I explain how you can compute the balance of a BitcoinQt wallet.dat file using node.js and the JSON API of blockchain.info. I also show how to extract the public addresses contained in the wallet, whether encrypted or not, and demonstrate how to use the blockchain.info API efficiently.

Extracting the Keys

Bitcoin relies on public/private key cryptography. The public key is what is turned into a public address through hashing (detailed below) and the private key is the one used to sign the transactions. In order to compute the balance of a wallet, we need to extract all the public keys from the key pool.

The wallet.dat is a Berkley DB database. This is an old NoSQL embedded database from long before the term was coined. Unlike relational databases, such as SQLite, it is not possible to make complicated queries in SQL. The wallet uses a single key/value dictionary to store all public and private keys, the last transactions, and all other account informations.

The keys from the key pool are stored directly into the database as individual records. The database keys are split into two parts. The first part of the DB key is some kind of identifier, that describes the type of record. Non encrypted Bitcoin keys are identified with key while encrypted keys are identified with ckey. The second part of the DB key is the Bitcoin public key itself, preceded by a single byte that indicates the length in bytes of the public key. The database record associated to the DB key is the private Bitcoin key, which may or may not be encrypted. This splitting of the DB key into two parts may sound odd, you have to remember that the DB keys must be unique to identify a record.

Here’s a drawing, in case you’re starting to be confused between the DB keys and the Bitcoin public and private keys.

You can see details about the key loarding algorithm in walletdb.cpp from the source code of Bitcoin. Look at the CWalletDB::ReadKeyValue() method.

I had no luck using node-bdb for reading the wallet with node.js, because cursors are not supported. Instead, I used a crude algorithm looking for the ckey and key strings within the wallet. This works because BerkeleyDB does not use any compression, but that’s not ideal.

Computing Bitcoin Public Addresses

Once you have obtained the Bitcoin public keys from the pool, you can turn them into a Bitcoin addresses through a complicated process of hashing, which is well documented.

However, I found a discrepancy with the documentation. The length of the keys I read from my wallet were all 33 bytes instead of the expected 65 bytes. Bitcoin uses ECDSA, a variant of elliptic curve cryptography. Both public and private keys are points on a specific elliptic curve (Secp256k1) whose equation is $ y^2 = x^3 + 7 $. The keys are the $x$ and $y$ coordinates stored as 32 bit integers preceded by one extra header byte that OpenSSL uses to store format information. Since $x$ and $y$ satisfy the known equation of the curve, you can store a single coordinate and a sign to compute the other coordinate with $y = \pm\sqrt{x^3 + 7}$. This is what is called a compressed key, and it is 33 bytes long.

I initially thought that I would have to decompress the key, and convert it to the 65 bytes format in order to compute the address. But it turns out that Bitcoin hashes the OpenSSL ECDSA public key as it is. I suspect that means you could have multiple public addresses associated to the same public key: one public address for uncompressed 65 bytes key, and one public address for compressed 33 bytes keys…

You can simply hash the public keys with with the ripemd160 and sha256 algorithms as documented. The details of the hashing are strangely complicated, but easy to follow, and you end up with a 25 bytes binary address. This binary address is then converted into an ASCII string using the Base58 Check encoding. As explained in the source code and the docs, this encoding is used instead of the easier to calculate and more base64 encoding, to avoid confusion between similar characters like 0OIl.

Implementation with node.js is relatively straightforward using the crypto and bs58 modules

Computing the Balance with Blockchain.info

You can use the JSON API from blockchain.info to compute the balance of each address in the key pool, and sum them up to obtain the final balance. These APIs are public and can be used without authentication, but they are rate limited. This can be a problem if you use naively the single address API. Instead you should use the multi address API or alternatively the unspent outputs API.

Source Code

I wrote a quick and dirty sample using node.js and published it on Github.

HFS+ Bit Rot

2014-06-10T09:13:12+01:00

HFS+ is a terribly old filesystem with serious flaws. I sincerely hope that Apple comes with an update of the filesystem for WWDC 2015. After all they have been working on Swift for the past 4 years and we have just learned about it last week.

HFS+ is seriously old

HFS+ was released in 1998, in the era of Mac OS Classic. It predates the current Unix based version of OS X by at least three years. It was created in a period where Apple’s business was in such a dire situation that Michael Dell’s uttered this now infamous quote “What would I do? I’d shut it down and give the money back to the shareholders”.

Technically HFS+ is small evolution of its predecessor HFS which dates back to 1985. The major change from HFS to HFS+ is the transition of block addresses from 16 bits to 32 bits. This change was really needed, as hard drives capacity exploded in the late 90s. On a 16 bit addressing scheme, a file containing a single byte would use 16KB on a 1GB hard drive, and around 16MB on a 1TB hard drive. The other changes included the transition to longer filenames (from 31 to 255 characters) and a switch to Unicode encoding.

By and large the rest of the design of HFS+ has remained unchanged since 1985.

HFS+ has serious limitations and flaws

When it was first released, HFS+ did not support hard links, journaling, extended attributes, hot files, and online defragmentation. These features were gradually added with subsequent releases of Mac OS X. But they are basically hacked to death, which leads to a complicated, slow and not so reliable implementation.

In the early days, the system had a hard limit to the number of files that could be written and deleted over the lifetime of the volume. It was 2,147,483,648 (i.e. 2^31). After that, the volume would stop being able to add any more files or directories. On HFS+, every entry in the filesystem is associated to a CNID (Catalog Number ID). The early implementations used a simple a global counter nextCatalogID stored in a volume header, that could only be incremented by one until the maximum value was reached. More recent versions of Mac OS X can now recycle old unused CNIDs, but this gives you an idea of the types of considerations that went into the design of HFS+.

More recently Apple added support for full disk encryption with FileVault 2 and Fusion Drive. But these features are implemented in a layer underneath the file system, by Core Storage, a logical volume manager. Additional features like Snapshotting and Versioning would probably require much tighter integration to the file system… and they would also make TimeMachine extremely efficient and reliable. Currently TimeMachine is built on top of the file system and relies on capturing I/O events, which adds overhead and complexity. Another hack.

Finally there is Bit Rot. Over time data stored on spinning hard disks or SSDs degrade and become incorrect. Modern file systems like ZFS, which Apple considered but abandoned as a replacement, include checksums of all ~~meta data structures~~ content [1]. That means that when the file is accessed, the filesystem detects the corruption and throws an error. This prevents incorrect data from propagating to backups. With ZFS, you can also scrub your disk on a regular basis and verify if existing files have been corrupted preemptively.

A concrete example of Bit Rot

I have a large collection of photos, which starts around 2006. Most of these files have been kept on HFS+ volumes since their existence.

In addition to TimeMachine backups, I also use two other backup solutions that I described in a previous blog post. I keep a copy of the photos on a Linux microserver using ext3, which I checksum and verify regularly using snapraid. I also keep off-site backups using ARQ and Amazon Glaciers.

Before I acquired the Linux Microserver, I used to keep a copy of all my photos on a Dreamhost account. I recently compared these photos against their current versions on the iMac and was a bit shocked by the results.

The photos were taken between 2006 and 2011, most of them after 2008. There are 15264 files, which represent a total of 105 GiB. 70% of these photos are CR2 raw files from my old EOS 350D camera. The other photos are regular JPEGs which come from the cameras of friends and relatives.

HFS+ lost a total of 28 files over the course of 6 years.

Most of the corrupted files are completely unreadable. The JPEGs typically decode partially, up to the point of failure. So if you’re lucky, you may get most of the image except the bottom part. The raw .CR2 files usually turn out to be totally unreadable: either completely black or having a large color overlay on significant portions of the photo. Most of these shots are not so important, but a handful of them are. One of the CR2 files in particular, is a very good picture of my son when he was a baby. I printed and framed that photo, so I am glad that I did not lose the original.

If you’re keeping all your files and backups on HFS+ volumes, you’re doing it wrong.

How to check for file corruptions

I used the following technique to compare the photos from the Dreamhost backup against my main HFS+ volume.

I ran the shasum commmand line tool to compute SHA1 hashes of every single file in the backup folder, except .DS_Store files. Then, I ran shasum in verify mode to check the files on my main volume against the hashes. Differences either indicate voluntary modifications (which did not apply in my case), or corruptions courtesy of HFS+ (which was my case).

# Compute checksums
find . -type f -a ! -name ".DS_Store" -exec  shasum '{}' \; > shasums.txt
# Check against checksums
shasum -c < shasums.txt  > check.txt
# Filter out differences
cat check.txt | fgrep -v OK

You can use the same technique to check for corruptions on a single volume. You need to compute checksums and verify against them from time to time. If you use clone backups, it is probably a good idea to check for corruptions before doing the clone.

Addendum - June 11th, 2014

Thanks to everyone who spent the time to send feedback. There are a few things I would like to add:

[1] Erratum. ZFS uses checksums for everything, not just the meta-data.
I understand the corruptions were caused by hardware issues. My complain is that the lack of checksums in HFS+ makes it a silent error when a corrupted file is accessed.
This not an issue specific to HFS+. Most filesystems do not include checksums either. Sadly…
Other people have written articles on similar topics. Jim Salter and John Siracusa for Ars Technica in particular.

Why the Blockchain and the Bitcoin Wallet Balances Differ

2014-04-03T21:28:18+01:00

If you look at a website like blockchain.info or blockexplorer.com, you may notice it is possible to find out the details about a particular bitcoin address, such as the last transactions and of course the balance.

If you try this on a Bitcoin address that belongs to you, and fire up the Bitcoin Qt client (aka Bitcoin Core), you may have noticed a discrepancy. It’s very likely for the balance displayed on the website to be less than the one displayed by the software wallet.

The discrepancy is caused by the nature of bitcoin. Instead of storing actual coins, the bitcoin protocol should be seen as a distributed public database of transactions which together form the blockchain. You “receive” bitcoins when another party decides to use their private key to sign a transaction and send some amount of bitcoins to your public address. Bitcoins only exist in the sense that you can trace the chain of valid transactions until you reach special coinbase transaction, i.e. some mined bitcoins. You can almost think of all the transactions forming a singly linked list, that stops at one end with mined bitcoins, and on the other end with unspent bitcoins… except for the fact that each transaction can have multiple inputs or outputs. (Please keep in mind this is voluntarily simplified, if you wish to know more check the protocol documentation)

One of the quirks of the protocol, is that the amount of the inputs and outputs in the transaction must match (in reality, the output can be less than the input, and the remainder then constitutes the optional transaction fee). That rule greatly simplifies the validation of transactions, since there is no need to extract the entire history of transactions to figure out how much funds are spent or unspent for a given transaction: it’s either all or nothing.

The drawback of this solution arises when you need to spend only a fraction of the amount received in a previous transaction. In that case, the wallet software automatically creates two outputs to the transaction: one output is used to send money to the intended recipient, one output is used to send the remainder to the sender.

At this stage, it’s probably simpler to reason with an example. Let’s imagine Alice wants to sent 1.2 BTC to Bob. Alice previously received 1 BTC from Chip and 0.5 BTC from Dale. The new transaction she makes has to reference both previous unspent transactions as inputs, since neither of these transactions taken individually have enough funds. One of the outputs of the transaction must be the 1.2 BTC that are sent to Bob. But Alice also need to add a 0.3 BTC output that are sent back to herself. In the future, if she could use these 0.3 BTC coins that remain in her wallet, by referencing this 0.3 BTC output as an input for a new transaction.

It would be possible to use the same public address to send the money back to the sender, but the Bitcoin Qt software sends it to a new address instead, for privacy reasons. A bitcoin wallet contains at least a hundred of such addresses which constitute the key pool. The key pool is pre-allocated (therefore many addresses will have a balance of zero) so that slightly out of date backups of wallet files result in no loss of bitcoins. Every time a transaction that requires return funds is made, these returned funds seem to “disappear” from the balance of the wallet’s public address. It’s possible to reach a balance of zero on your public address in that way.

How I Store my Bytes

2013-10-16T09:33:00+00:00

Over a year ago, I read an article on Mockyblog about storing your personal data on a HP ProLiant MicroServer. After juggling with no less than 4 external hard drives, to palliate the the lack of space on my iMac, I ended up buying this machine and turning it into a Linux powered NAS. After listening to a recent episode of Accidental Tech Podcast, where the host chose a more expensive and convenient approach, I decided to share my experience.

The HP microserver is cheaper than a real NAS. HP regularly operates a cashback offer on this hardware (which I used). You could acquire an HP ProLiant G7 N54L 2.2GHz for roughly £150 with a cashback offer last month. The cashback and the price varies but regularly comes back. The hardware is comparable to a NAS: it is small, fitted with a 4 HD bay, it has a CD-ROM bay that can be used for an additional disk, and it has E-SATA ports for adding external disks. In comparison, Synology hardware is usually north of £400 and DROBO is north of £300. The cheapest 4 day NAS I found is the Synology DS413j which currently retails at £265. But the hardware is pale in comparison of my HP microserver: single core CPU, no E-SATA and only 512MB of RAM.
Booting from the internal USB connector. There is a connector fitted on the motherboard, and this is where I plug my USB thumbdrive (a SanDisk Cruzer Blade). Since the microserver is not connected to any screen nor keyboard, having it boot from a thumbdrive is an advantage. In case of troubles, I can take the drive, plug it to the iMac and boot using VMWare. I also regularly clone it to another thumbdrive for backup purposes. Using the USB drive as a boot disk also means that all my hard drives are allowed to spin down to save power and prolong their lifetime.
It’s relatively quiet and low power. I measured my power consumption over the course of 1 month and it comes to about 40 Watts on average, for a monthly cost of £3.50. It is fitted with 4 hard drives, which are probably powered down about 90% of the time. I use hdparm to power down the drives when not in use. The CPU of the machine is constantly solicited though, with various services I installed, and this probably prevents the machine from lowering to 30-35 Watt, which can be observed by just booting it and doing nothing. The large fan is relatively quiet, but it got more noisy after 1 year and absorbing dust. Currently the noise is about 40dB from 1 meter away which is comparable to my late 2008 iMac.
Use encryption on all disks. As I explained in my previous post, the goal is not to stop the NSA or GCHQ from stealing my data. There is no 4th amendment in the UK, and the government can force you to reveal your key or put you in jail for 2 or 5 years if you refuse. However, encryption is a good way to prevent identify theft, if someone breaks into your house and steal your toys. I use full disk encryption with LUKS on all hard drives, but not the thumb drive. The encrypted disks appear as regular block devices, and you can use any filesystem or utility. Since I have no keyboard to enter a password at boot time, I have SSH into the box and enter it manually after each reboot. It’s slightly inconvenient, but I only power down the machine to dust it off, about once a month.
Use snapraid for redundancy. I fitted the server with the hard disks I initially used as external drives for the Mac. Consequently all for disks are of different sizes. My goal is to achieve 1 disk redundancy, i.e. be immune to the failure of a single disk. Traditional RAID would have a hard time to cope with this setup, and it would be hard to expand it dynamically. Mockyblog mentioned using ZFS with RAID-Z. This fulfills the redundancy goal, but unfortunately it’s not possible to add a disk to an existing zpool. There are complicated techniques to work around it though… BTRFS is a good alternative to ZFS but it not yet ready. So in the end I adopted snapraid. Despite its name, it has nothing to do with RAID. It’s a file utility than runs on Linux, MacOS X and Windows and work with any file system. It can be easily configured with 3 data disks and 1 parity disk. The parity is stored in a series of checksum files and it’s somewhat similar to PAR2. It reads chunks of files from each data disk and store the checksum on the parity disk. You need to run snapraid regularly to update the checksums or use a cron job. It’s very well suited for for NAS storage, where data seldom changes and is mostly added (rather than modified/deleted). Compared to RAID, you also do not need to wake up all 4 drives when choose to read or write data.
Plex Media Server. This probably the service I use the most. This is a huge improvement over manually sorting files and firing up EyeTV or VLC, because Plex automatically retrieves thumbnails and meta data, and the UI is very good. I can also access my collection from all my small collection of iOS and Android devices, or via a web page. Streaming works even outside of the home network, and even though my upload link is not good enough for 1080p films, it does stream music very well. As a consequence, I have deleted almost all of my songs from the 32GB iPhone and for the first time ever, I have lot of space to spare. It’s like having your own Spotify, iCloud, and other radio things, for much cheaper and without ads and privacy concerns.
Bittorrent Sync. As an alternative to Dropbox, not the Pirate Bay style bittorrent. Again it’s free, and I can share a lot more than with the dropbox free tier. You could add a dedicated server with 100MBits bandwidth in the mix (which you can get for about £3.50 a month for 500GB these days, but I use this instead). My phone, my office computers, my laptop all use it. I recently synchronized 20GB of photos and videos of my brother’s wedding.
Various download services. You can queue large download (or upload) from HTTP, SSH, RSync, BitTorrent, NZB, or anything really… The advantage is that this server is low power, you can schedule for data transfer to occur in the middle of the night, so that bandwidth is not affected during the day. There are lots of software you can run on the server and use with a Web interface: Bittorrent Sync, Sabnzbd, Sickbeard, Couchpotato, Transmission to name a few.
Private Minecraft server. Or any game really. This is really the kind of stuff that would be hard to achieve on a NAS. I actually had to put extra RAM on the server. In honesty the CPU is a bit weak for that kind of thing, but it can be done. I doubt a NAS would be able to do this…

In the end, using commodity PC hardware and Linux you can build pretty much any solution for less money than a dedicated NAS.

There is no denying that researching and implementing features that come out of the box on a commercial NAS is really time consuming. But once it’s done, maintenance does not cost any time. I hardly touch the server anymore, unless I need to dust it off or install updates.

Fixing the Twitter Timeline in Octopress

2013-08-10T10:30:00+00:00

On June 11th 2013, Twitter shut down the old 1.0 API and the Twitter sidebar widget of Octopress, the software that powers this blog, suddenly stopped working.

In the 1.1 version of the API, every single call is authenticated by OAuth. The intended goal for Twitter was to shutdown third party Twitter clients, in order to monetize content by shoving ads into the face of their users. This controversial move was announced about a year ago and has already made some victims.

Jason McIntosh published on his blog a technique to replace the built-in Octopress widget by an official Twitter timeline widget. It works, but I think the widget looks very gross, and since it hurts my feelings to use any kind of official twitter “client”, I decided to fix the original Octopress widget instead.

The most obvious way to fix the problem, is to update the twitter.js file to use the new API. However, it is probably not a good idea to put your OAuth tokens in JavaScript, where anybody could grab them and abuse them. These OAuth tokens need to be kept on the server side. The solution I describe below does this in a very straightforward manner.

1. Register as a Twitter developer and get OAuth credentials

Log into https://dev.twitter.com/apps with your Twitter credentials (eg: bartheph).
Create a new application. (eg: blog.barthe.ph).
Accept the controversial rules of the road.
Click on “Create my access token”.
Write down the following values:
- Consumer key
- Consumer secret
- Access token
- Access token secret

2. Set up a cron job to generate a static timeline.json file

This file will contain the timeline, and be loaded by Octopress’ slightly modified JavaScript code
Install ruby with the oauth gem. On Ubuntu, you do something like that:
```
sudo apt-get install ruby rubygems
sudo gem install oauth
```

Create a ruby file on the server. You need to put your Twitter OAuth token and choose an output path.

#!/usr/bin/env ruby
require 'rubygems'
require 'oauth'

# Edit config to suit your needs
config = {
    :consumer_key => 'xxxxx',
    :consumer_secret => 'xxxxxx',
    :oauth_token => 'xxxx-xxxxxx',
    :oauth_token_secret => 'xxxxx',
    :output_path => '/srv/blog.barthe.ph/www/timeline.json'
}

# Create OAuth context
oauth = OAuth::Consumer.new(
    config[:consumer_key], 
    config[:consumer_secret],
    { :site => "https://twitter.com", :scheme => :header }
)
access_token = OAuth::AccessToken.from_hash(
    oauth, 
    { :oauth_token => config[:oauth_token], :oauth_token_secret => config[:oauth_token_secret] }
)

# Get timeline
url = 'https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=bartheph&trim_user=true&count=22&include_entities=1&exclude_replies=1'
response = access_token.request(:get, url)

# Write response into output file with JSONP wrapper
File.open(config[:output_path], 'w') { |file| 
    file.write('processTweeter(')
    file.write(response.body)
    file.write(');')
}

Make sure the file is executable and not readable by anyone but the web server, since it contains OAuth tokens.
```
chmod 700 /srv/blog.barthe.ph/cron/update_twitter.rb
```
Add the ruby script to crontab to run it periodically. Type crontab -e and add a line similar to what follows.
```
# Update every 10 minutes
*/10 * * * * /srv/blog.barthe.ph/cron/update_twitter.rb
```

3. Modify Octopress to use `timeline.json` instead of the Twitter API

Edit your local version of source/javascripts/twitter.js.
Edit the getTwitterFeed() function. Add a jasonp attribute and change the value of url.


function getTwitterFeed(user, count, replies) {
  count = parseInt(count, 10);
  $.ajax({

//  REMOVED     url: "http://api.twitter.com/1/statuses/user_timeline/" + user + ".json?trim_user=true&count=" + (count + 20) + "&include_entities=1&exclude_replies=" + (replies ? "0" : "1") + "&callback=?"
/* ADDED */ url: 'http://blog.barthe.ph/timeline.json?callback=processTweeter'
    , type: 'jsonp'
/* ADDED */, jsonp: 'processTweeter'
    , error: function (err) { $('#tweets li.loading').addClass('error').text("Twitter's busted"); }
    , success: function(data) { showTwitterFeed(data.slice(0, count), user); }
  })
}

That’s all. As you can see, on my blog, it looks just like before the old API was shut down. Using dynamic server side code goes a bit against the spirit of Octopress, but it’s a very tiny self contained change.

A JavaScript Object Oriented Programming Technique

2013-05-21T18:32:00+00:00

There are many conflicting approaches to object oriented programming in JavaScript. On the one hand, the language was designed with prototypal object oriented programming in mind, with the only built-in mechanism for inheritance being object inheritance. However, the language is also fitted with a new keyword of dubious quality which shows aspects of a more traditional class based OO programming.

Douglas Crockford presented in his book JavaScript: The Good Parts one of the earliest techniques of consistent OO programming. However, I found it quite outdated, and after some of research and experimentation, I came up with the following technique which I now use in Exposition.


// Namespace
var ph = ph || {};
ph.barthe = ph.barthe || {};

// Use strict header
(function() {
"use strict";

// Class declaration
ph.barthe.MyClass = function(ctr_param) {
	
	// Remap this
	var self = this;

	// Private variable
	var m_private_var;

	// Private method
	var privateMethod = function(param) {
		// ...
	};

	// Public method
	self.publicMethod = function(param) {
		// ...
	};

	// Constructor body
	(function() {
		// Temporary variable i, does not leak
		for (var i=0; i<ctr_param; ++i)
		   m_private_var += privateMethod(i);
		self.publicMethod('stuff');
	})();
};

// Object usage
var ph.barthe.exampleUsage = function() {
	var my_instance = new ph.barthe.MyClass(15);
	return my_instance.publicMethod('other stuff');	
};

// Use strict footer
})();

In the previous code sample, I employed the following techniques:

Use pseudo namespaces to avoid cluttering the global namespace and avoid potential naming conflicts with third party scripts. The namespace is in reality a single global object to which properties are appended. The OR operator in var ph = ph || {}; is used to make sure the object is only created if it does not already exist. This makes the code tolerant to changes in the order

Aymeric on Software

Things I Learned in 2020

TL;DR

Terraform

Notion

Notion’s Alternatives

Improved Written Communication

My Firefox Setup

Why I Use Firefox

Content Blocking Extensions

My Firefox Extensions

Configuration Tweaks

The Future is Uncertain

Powered by Hugo

Old Setup

New Setup

What’s Next?

The .NET Core Zoo

The Different Flavors of .NET Framework

The Different Versions of .NET Core

The Different Versions of Visual Studio

The Different Versions of ASP.NET

The Different Versions of ASP.NET Core

Conclusion

Let's Encrypt with Nginx

The Objectives

Intall Let’s Encrypt

Getting the Certificate

A+ Grade Security

Automatic Updates of the Certificate

Conclusion

Migrated to Kimsufi

Compute the Balance of a Bitcoin Wallet with node.js and blockchain.info

The Problem

Extracting the Keys

Computing Bitcoin Public Addresses

Computing the Balance with Blockchain.info

Source Code

HFS+ Bit Rot

HFS+ is seriously old

HFS+ has serious limitations and flaws

A concrete example of Bit Rot

How to check for file corruptions

Addendum - June 11th, 2014

Why the Blockchain and the Bitcoin Wallet Balances Differ

How I Store my Bytes

Fixing the Twitter Timeline in Octopress

1. Register as a Twitter developer and get OAuth credentials

2. Set up a cron job to generate a static timeline.json file

3. Modify Octopress to use timeline.json instead of the Twitter API

A JavaScript Object Oriented Programming Technique

3. Modify Octopress to use `timeline.json` instead of the Twitter API