How to rely on third-party CDNs in a secure way

Instead of using dependency managers and packaging tools like Webpack or Bower (or even NPM) for web resources it is tempting to instead just link directly to the resources you need.

But except for avoiding complexity and having to learn something new like dependency management or semantical versioning schemes this also opens up some attack vectors.

For example, if the third-party you are loading your resources from is compromised an attacker might replace your Bootstrap plugin with a key listener to steal your users passwords. If the domain name is hi-jacked the JavaScript library you depend on can suddenly be used to mine Bitcoin using your visitors computers.

Luckily this is easily preventable with modern web standards. If we want to include Bootstrap using a third-party CDN like Bootstrap CDN it will provide two options to choose from. One is a direct link to the resource, e.g. https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css to be embedded as we see fit and the other is an HTML snippet which looks something like the following.

<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">

There are two additional parameters here which are normally not taught when learning HTML and CSS. These are integrity and crossorigin.

Integrity provides a hash of the content (ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T) and describes the hash algorithm used (sha384). This allows the browser to check that the content loaded from the stackpath.bootstrapcdn.com has not been modified in any way. This protects your visitors from side-loaded malicious scripts by a compromised third-party provider, also known as cross-site scripting (XSS).

The second parameter is related to cross-origin resource sharing (CORS) which is a complicated (but important!) security subject. Mozilla has an excellent article on how it works though. With this enabled the website tells the browser to perform a CORS check on the resource. This mainly provides a way to prevent cross-site request forgery (CSRF).

The great thing about this is that there is almost no down-sides. The performance requirements on the browser is negligible. Using semver for automatic version updates does not work either but this is, according to some, a feature.

Consolidating images with perceptual hashing

I have a lot of photos spread out over multiple devices and platforms from over the years. These are pictures from Facebook, old camera photos, Google Photos and so on. And most of these are duplicates, copied from one device to several different platforms and backups, all with very subtle differences depending on what platform they have been processed by.

Most of these have some matter of export functions for photos. But it is still a manual and tedious process to sort and combine them to a single set of photos that includes everything but also does so without duplicates.

As a programmer, there are several ways of automating the process. The easiest one is to compare images, pixel by pixel, to find copies. But there are also smarter ways to do this.

The most common method of quickly comparing files is through the use of hashes. Hashes are a way of reducing large quantities of data, such as files, to smaller and more manageable chunks.

The only problem with this approach is that while cryptographic hashes are useful for quickly finding duplicates they are very volatile. What this means is that a small change in a large file will result in a completely different hash. Compare the two images of a tiny radiator below, the original and one with an artistic filter.

I have included the hash output of MD5, a common cryptographic hash function, below each image. While the images are almost identical the resulting hashes are very different.

A tiny radiator I found many years ago

MD5: 90de6b60e0dcdb92114166f1a9c7e821

A tiny radiator I found many years ago

MD5: c4ef0980be39404ee2e81344aa185cc3

To make hashes feasible for fingerprinting images in this manner, an alternative hashing method is required. Perceptual hashing is a method where the goal is to specifically have differences in a medium reflected in the corresponding hash. This means that a small change in the source will also have a small impact on the corresponding hash. I have included the same images from above below but this time with their perceptual hashes.

A tiny radiator I found many years ago

pHash: 07070f471b033007

A tiny radiator I found many years ago

pHash: 07070f471b073007

This makes the problem of finding duplicate images a lot simpler to solve with the use of programming. In cases where two images result in the same perceptual hash, we can safely assume it is the same image in different formats or with different dimensions and simply save the better one.

Resolutions for 2019

A couple of years ago I wrote a post with a short summary of my resolutions for the upcoming year. Things that I wanted to achieve or improve. I never followed up on it or even considered how it went or if anything should change for the next year. In terms of goal planning it was a catastrophic failure.

What I should have done was to (1) made sure that my resolutions were measurable and (2) actually followed up on the progress of my goals to make sure I was on track. I did neither of these.

This year I have decided to write new resolutions. And to make sure I can manage the above I have a simpler approach this year. Here are my two resolutions for 2019.

My reading list

These are books that I would like to read during 2019. I have had vague resolutions before like “read more books” which is difficult to measure or a goal like “read 12 books” which is measurable but not specific. The problem with the latter is that I will pick books that are easy instead of books that are useful. This is a list of specific books that I would like to read during 2019 in no particular order or significance.

A personal project

This one is difficult to define in a clear way. I used to do a lot more hobby projects on the side than I do now and I would like to get back to that. If that means working one major project or several small projects does not matter. Even a tiny weekend project is enough. The project should be:

Planned

I should have some kind of documentation on what I want to accomplish before I start. A simple project plan. Proper planning is probably my weakest area and this should help me improve.

Documented

A plan does not bring much to a project unless there is some follow-up. For this project I should document successes and failures to keep track of changes to the project plan as well as completion rate.

Completed

The main problem that I suffer with when it comes to personal projects is completion. A lot of projects stagnate because of boredom or lack of time. I found this quote on Hacker News which summarizes it pretty well.

I think side projects, software at least, are a lot like the Civilization games. You can’t wait to start. The first 10% is awesome. 10-40% is complex and the difficulty ramps up. 40-100%, all you can think about is starting over on something else. At around 80%, you just quit and actually do start over.

Having a proper plan and follow-up should help to make sure that the project is on track but also to define an end-goal of the project. Hopefully this is enough to make sure the project is actually completed.

My blog setup

I am trying to get back into blogging. Mostly as a way of documenting my side-projects which I am otherwise awful at. Previously this has been a writing exercise for me (and still is) but I hope to switch focus to more code and small projects.

The blog is powered by Hugo, a static site generator which basically converts text files to this site. Those text files could be hosted on GitHub as it was before were it was set up according to the official Hugo guide. That setup used a continuous deployment tool (Wercker) to automatically regenerate and re-upload my site whenever there was a content change.

This new setup instead uses gogs to host the repository and trigger a webhook which runs a small script to build and upload the site. The webhook listener is available in the repository on Raspbian with apt install webhook.

It can be configured to do what I want by creating a small file at /etc/webhook.conf with the following:

[{
    "id": "blog",
    "execute-command": "/var/hooks/blog/deploy.sh",
    "command-working-directory": "/var/hooks/blog",
    "pass-arguments-to-command": [{
            "source": "payload",
            "name": "head_commit.id"
        },
        {
            "source": "payload",
            "name": "pusher.name"
        },
        {
            "source": "payload",
            "name": "pusher.email"
        }
    ],
    "trigger-rule": {
        "match": {
            "type": "payload-hash-sha256",
            "secret": "mysecret",
            "parameter": {
                "source": "header",
                "name": "X-Gogs-Signature"
            }
        }
    }
}]

This listens for incoming notifications from gogs. There is also a password (mysecret in the example above) set to safeguard against abuse. Whenever a commit is uploaded to gogs it triggers a notification to the listener which in turn triggers a small shell script that performs all the real work.

#!/usr/bin/env bash

# Update workspace
rm -rf workspace
git clone https://git.destruktiv.se/rasmus/Blog.git workspace
cd workspace

# Generate blog
hugo --uglyUrls

# Clear old content
ssh -i ../deploy.key -l rasmus rasmuslarsson.se rm -rf /var/www/html/*

# Upload new
scp -i ../deploy.key -r public/* rasmus@rasmuslarsson.se:/var/www/html

This requires that a deployment key is generated with ssh-keygen -f deploy.key and the public part is added to the remote host. The remote command to clear old content can be removed to avoid the brief downtime when deploying with the downside that old files might accumulate.

An efficient work environment in Gnome Shell

At work or at home I mainly use a Ubuntu + Gnome Shell setup for getting work done. While Gnome Shell has its limitations it can be dramatically improved with the right extensions. At least for now I don’t know any better option.

With my current setup I have support for multiple desktops which is useful for switching contexts between e.g. communication and development. The multiple desktops are also a benefit for laptop work as I am otherwise limited to a single screen for workspace.

In addition to the built in features I have native support for Pomodoro, task management, weather indicator, time tracking, and integrated media controls for Spotify.

This is a complete list of everything that makes Gnome Shell an efficient work environment for me.

Dash to Dock

“A dock for the Gnome Shell. This extension moves the dash out of the overview transforming it in a dock for an easier launching of applications and a faster switching between windows and desktops. Side and bottom placement options are available.”

https://extensions.gnome.org/extension/307/dash-to-dock/

Hamster Time Tracker

The time tracker is useful for billable work or work estimations. By specifying a task it will automatically start tracking time for that task until paused or switched to a new task.

https://extensions.gnome.org/extension/425/project-hamster-extension/

Media player indicator

This allows e.g. Spotify to be controlled through the volume menu. Also provides song indication with album cover and such.

https://extensions.gnome.org/extension/55/media-player-indicator/

OpenWeather

“Weather extension to display weather information from openweathermap.org or darksky.net for almost all locations in the world.”

https://extensions.gnome.org/extension/750/openweather/

Pomodoro

This extension markets itsels as “A simple pomodoro timer” but is hands down the best Pomodoro timer I have ever used regardless of platform or OS.

https://extensions.gnome.org/extension/53/pomodoro/

Todo.txt

A simple task manager and to-do list integrated into the task bar. Also integrates with Todo.txt for storing, exporting, and importing data.

https://extensions.gnome.org/extension/570/todotxt/

On password security

LastPass

Passwords are probably the biggest security risk that users face today. Most assume that a secure password is at least 8 characters long, contain upper case and lower case letters as well as at least one digit. Problem is that this leads to passwords like Password1. But the real problem is that users believe that this password is so secure, since it technically follows the requirements for a “secure” password, that it can safely be used everywhere.

This is a huge security issue.

Surprisingly to some the use of a weak password is not the actual cause of distress, but rather that it’s re-used everywhere. A stupid simple password like Password1 might take five or ten attempts to guess, while a slightly more secure password might take hundreds or thousands of attempts.

But what about those super secure passwords that would take billions of years to find even with our most advanced computers today, passwords that look like “U0bVc@#VHFwvkJK&tY”. Surely, those passwords would be safe to re-use?

The answer is actually no, absolutely not. It all comes down to website security. It doesn’t matter if your password could guard Fort Knox if the website allows anyone to view and download your password.

And yes, this is actually more common than you think. Check out ’;–have i been pwned? for some famous security breaches. Even big players like Adobe, Forbes and Gawker have been vulnerable to attacks in the past. It doesn’t even matter if the site encrypts your password or not, because you would have to trust every single site you use to use strong encryption.

This is why a weak password is not that big of a deal, to an extent. Even if the password might be guessed in a few hundred attempts it’s not insane if it only grants access to one particular service and that service is trifling to you. In other words, if you use a secure password for your banking and you use the same password for some online forum. Then if the forum gets hacked the hackers will automatically have the password to your bank as well. But if you use different passwords for every site they will only have access to the forum.

There are several ways to go about creating unique passwords for every site. One popular option is to use site-related passwords. If we continue from the previous example one such password might look like Password1Facebook for Facebook and Password1Twitter for Twitter and so on. While this would stop automated attempts it’s still quite easy to guess what the password is for a different service.

Another option is a kind of compromise. You basically divide the websites and services you use into two or more security zones. In this case you have one password for each zone. The upside of this is that you only need to remember as many passwords as you have zones. If a password is compromised it only affects the other websites in the same zone. The downside is that multiple accounts can still be affected when only one is compromised.

The third option, and only if you want to feel completely secure, is to use a unique, strong password for each and every site or service. While this would prevent all attack vectors mentioned earlier it’s a hassle to remember that many passwords which is the biggest downside of this approach.

Luckily this can be solved with a password manager. With a password manager you basically remember one ridiculously strong password to keep track of and safeguard all your other passwords. Most, if not all, password managers also have built-in functionality to generate secure passwords for you.

In selecting a password manager there are troves to select from. In my search I considered Keypass, Dashlane, Lastpass and 1Password (There are many more, but in my opinion these are your best choices). I passed on Dashlane and 1Password because they lacked Linux support which is a deal-breaker for me. Keypass is fantastic but doesn’t integrate well with browsers and has no built-in synchronization between devices. Which is why I finally signed up for LastPass.

One of my favourite features is the Security Challenge it offers, where it makes sure that you aren’t using duplicate, weak, or old passwords. It even scans known password leaks to give some warning if one of your accounts have been compromised. ’;–have i been pwned? mentioned above does this as well.

This brings us to the two final topics I wanted to discuss regarding password security. How to choose a strong password, and a brief explanation on what entropy is. They are tightly related and I’ll begin with entropy.

Entropy is a measure of true (unpredictable) randomness. This is used in password security as a sort of score for how guessable your password is. The higher the score, the more random and unpredictable your password is and the more secure it is as a result.

Entropy is also the reason why you need so many weird letters in your password. If your password only uses lower case letters then that limits the number of possible combinations to 26^X where X is the length of your password. Hence, a single character can only be one of the 26 letters, two characters can be combined 26 * 26 = 676 different ways, three characters 26^3 = 17,576 and so on.

If you add capital letters this is increased exponentially to 2,704 possible combinations for only two letters and 140,608 for three letters and so on. Adding numbers and special characters increases the possible combinations even further.

But this is not a measure of entropy. Entropy is randomness. If you choose an English word as your password you might have a password that is 10 or even 12 characters long. But in that case an attacker can just try every word in a dictionary to figure out your password.

This might not seem like a big deal, but when we have computers that can try a hundred billion possible password combinations per second it’s not even a matter of microseconds. It’s worth mentioning that these computers are not prohibitively expensive and most people with a gaming computer have this computation power already.

Explaining how entropy is calculated properly and what constitutes predictability would require its own article. I will just say this, anything you choose that helps you remember the password (dates, names, lucky numbers, etc.) reduces the security of your password.

This leads us to the final topic, how to choose a strong password. There are ways to choose passwords that are both secure, even by modern standards, and easy to remember. Diceware is my preferred option. You basically have a list of 7776 words which you use a die to select from (completely analog). This gives you 60,466,176 possible combinations from only two words. 470,184,984,576 combinations from three words and so on. This would give som indication of how secure this is. And because of the way the brain works it’s easier to remember words than letters, even when the words don’t make any sense together.

Authenticating ejabberd users with Symfony2 and FOSUserBundle

I’ve been trying to set up an XMPP server since MSN went out of style (i.e. since forever). However, managing users is a bit of a hassle and normally the two alternatives are to either create users manually or allow them to register through the client. The first is tedious and the second is not very user-friendly.

But with ejabberd there’s also support for MySQL databases. Which means that I can write a simple registration service where users can manage their accounts themselves. All it takes is updating their information in the database. The downside is that ejabberd only supports plaintext password storage on external database connections which is insanely unsafe for numerous reasons.

There is an alternative though. There is an old and barely documented configuration option in ejabberd which supports external authentication. Though the documentation is lacking there exists a number of example implementations to learn from.

The configuration allows to load a separate program which ejabberd can pipe authentication data into in order to learn if the user should be able to log in or not. You can even return true for all cases in order to allow people to log in with whatever username and password they want.

Basically this means that I am able to utilize not only the password encryption scheme from Symfony2, but also to allow all users from the website it powers to immediately get access to their own XMPP accounts. The only downside is that I haven’t found a way to link roosters, which means that the contact list in the chat client and on the website is not synchronized in any way. So users will have to maintain two lists of contacts.

All code is available on GitHub as per usual. If you want to use this on your own project make sure that the ejabberd user has write access to app/logs and app/cache. If you just want an XMPP account you can create a user on Destruktiv.

Resolutions for 2016

Inspired by “12 resolutions for programmers” I decided to write a post on my own aspirations this year. I may or may not expand on this over the upcoming months. We’ll see.

  • Eat healthier
  • Play more games
  • Learn a new language
  • Read more books
  • Start using a password manager
  • Create something in an unfamiliar language (Clojure or Go maybe?)
  • Get better at math
  • Attend more lectures
  • Try Dvorak

Advent of Code

Eric Wastl put together a programming challenge advent calendar. There’s a new holiday themed coding challenge every day up until Christmas. So far they only take a couple of minutes to solve which is perfect for some daily exercise.

Advent of Code

Link for the curious: http://adventofcode.com/

Hello world revisited

So I finally made the jump to a static blog.

I have been contemplating this move for quite some time now. Mostly it’s been a consideration between Pelican (Python) and Jekyll (Ruby). Where Jekyll has been more tempting with the huge ecosystem around Octopress to benefit from.

The downside with Jekyll, and Octopress especially, is that I have to keep an entire framework of blog generating software around. This was a lesser problem with Pelican which was smaller, but still a limitation.

Enter Hugo. Hugo is (yet) another static blog generator, this time written in Go. While I don’t have to touch any of the Go code myself it has some benefits in being compiled and because of that it’s blazingly fast. It also doesn’t enforce any directory structure or configuration formats (the default being TOML).

Setting everything up has been a breeze. I write my posts in Markdown, commit them to a GitHub repository which automatically notifies a build system which downloads everything and generates the blog. It then uploads the generated blog files to the web server and *bam*, we’re live.

I will keep this blog on a temporary domain until I have managed to migrate over all my old posts. Or well, at least the ones I’d like to keep.

Update: I have finally begun migrating my old posts so this blog has replaced my old one. A lot of old content has disappeared in the migration. I’ve kept what little I liked as well as the posts that got a lot of traffic.