How to backup your postgres database on SpiderOak using Dokku

Now that we know how to setup a rails application using Dokku on a DigitalOcean droplet, it might be a good time to think about automating our database backups. If you haven’t read the first part, you should do it before reading any further.

Sure, you can enable weekly backups of your whole droplet on DigitalOcean (the cost is minimal), but for a database it is wiser to backup at least once a day. Let’s configure the whole thing. We are freelancers (or small development teams) and we are used to get our hands dirty and do stuff by ourselves. It’s not a question of not having enough money to pay someone else, it’s because we are smart and resourceful! See, it already feels better when we see it in this light!

We will use SpiderOak to store our backups. Their zero-knowledge architecture will make sure our data remains private.

UPDATE: Whilst SpiderOak is not free, they offer a 60-days free trial for 2GB storage (no credit card required). After that, the cost is $7 per month for 30 GB storage. Thanks to NoName in the comments for asking me to clarify this point.

Create an account on SpiderOak

We will first install the client on our local workstation and create our account.

On the SpiderOak page, click on downloads
Click download link

Then, choose the correct client for your distribution:
Choose your SpiderOak client

Run the installer. You should be presented with the following screen:
Enter your info and create your SpiderOak account

Next step is to register your local computer with SpiderOak.
Choose your SpiderOak client

Finally, you will be presented a screen to select what you want to sync from your local computer to the cloud. You can leave the default options for now:
Choose your SpiderOak client

We won’t use the SpiderOak “Hive” folder

SpiderOak creates the SpiderOak Hive folder in the installation process. All files added to the Hive folder of a device are automatically synced to the Hive folder in every other devices. It is a convenient way to have things running quickly without configuring shared folders manually. One problem of using the Hive for our backups is that it will sync everything. You put something personal in your Hive on your local computer and oops, it will be sent to your droplet! That sounds not very good to me. For this reason, we should disable the Hive Folder syncing.

Still on your local workstation, go to your SpiderOak preferences:
Where are the preferences? Here!

And disable the hive:
Disable the hive

Note that if you don’t mind syncing your personal Hive on your DigitalOcean droplet, you can leave the option enabled.

Add your droplet as a SpiderOak device

Log to your DigitalOcean droplet by typing:

Open your sources.list file

And add the following line at the end:

Save, exit and run

If you get the following error:

Look at it straight in the eye and IGNORE IT without showing mercy.

You’re now ready to install SpiderOak

We must now configure SpiderOak but we don’t have any GUI on our server. What will we do? Simple, we just run the following command:

You will have to provide your SpiderOak login info.

Don’t type any number. Simply press Enter as suggested to set up a new device. It will ask for the name of the device. Enter a descriptive name, something like myapp-droplet. Wait until the end of the syncing process. It may take several minutes, be patient!

Let’s create a folder for our DB backups

Then we include this folder in SpiderOak:

The output should look like this:

Great, SpiderOak is all configured! Time to setup our database backups.

Create a shell script

Create a new file in /home/dokku and name it backup_db.sh. Paste the following:

Give the correct permission to the file:

As you can see, we use our Dokku postgres plugin to dump our db and we gzip the result in our db_backups folder. Then we run SpiderOakONE with the –batchmode flag to make it do its thing and shutdown immediately after.

Setup a cronjob

To automate our DB backups, we’ll add a cronjob.

Add the following line, save and exit:

It will run our backup script at 5am everyday. That’s all we need for now. Hmm… perhaps you don’t want to wait at 5am just to test if the script works. In this case, run the script directly.

The call to “SpiderOakONE –batchmode” will probably make this command run slowly. I don’t know what SpiderOak is doing exactly but sometimes it can take several minutes to complete the syncing.

Once it finally completes, go back to your local workstation to see if you can find your backup.
Your backup is here!

If you want, you can make sure that you are able to restore your backup before calling it a day (have a look at the dokku psql:restore command to that end). Restoring postgres databases usually gives of warnings but it’s generally safe to ignore them. Still, you’re better to make sure everything work as expected.

That’s it! You now have automated database backups on a zero-knowledge cloud architecture. Hope you enjoyed this tutorial! As usual, your comments are much appreciated.

  • NoName

    You need to mention that SpiderOakONE is not FREE (however it has FREE 60-days Trial period).

  • Frank

    You’re right, I’ve updated the post.

  • Dave Porter

    Thanks for the article.
    As someone who already has Dropbox. Google Drive, OneDrive and iCloud, it seems unnecessary to pay for yet another cloud storage – is there an option to use one of the above instead?
    Cheers, Dave

  • Frank

    Dave, I don’t know if all these cloud providers you listed can be installed and configured easily on a server without a GUI, but if it’s not the case, you could try setting it up using X11 forwarding http://www.cyberciti.biz/tips/running-x-window-graphical-application-over-ssh-session.html

    That’s said, I guess you would still need a way to have your cloud client run as a daemon on your server so it can sync automatically without your intervention.

    There is also the ownCloud open source option that let you host your files yourself (or find a trusted and free provider). Maybe it would be a nice alternative to SpiderOak but I did not try it myself so I cannot say more.

  • http://www.elsotanillo.net Juan Sierra Pons

    Hi,

    I think it is better to do not gzip your database dump as compressing your data doesn’t help the de-duplication process as SpiderOak client will have to calculate the new block each time

    https://spideroak.com/faq/how-do-i-get-the-best-backup-deduplication-from-compressed-files
    https://spideroak.com/manual/backup-explained
    http://www.elsotanillo.net/2011/09/backing-up-a-cpanel-hosting-account/

    Without the compression your first backup will take long time but the next ones will be quicker as SpiderOak client will sent only the differences between days.

    So you will have 2 advantages:
    * Less backup time.
    * Less space occupped on your SpiderOak account.

    Hope it helps!

    Best regards

  • Frank

    Interesting, Juan! I’ll certainly try it to see how it goes. Thanks

  • Karim Tarek

    Thanks for the post Frank.
    I wonder why not use something like ‘whenever’ and ‘backup’ gems and host the DB dumps on Amazon S3? have you tried this setup before?

  • Frank

    That’s interesting, thanks for the suggestion Karim! Might be the subject for another post :) I chose SpiderOak mainly because of their zero knowledge architecture. I’m impressed by their commitment on privacy.

  • Karim Tarek

    Frank, Thanks for getting back. I appreciate it :)