Software and Computer Engineering

Rclone is rsync for Cloud Storage

For many years now I've been using OneDrive on my Windows machines. Using it is central to how I manage all my important data:

1) Getting receipts, documents, tax records, payslips, all backed up to the cloud, in case my machine breaks or the storage drive fails. Back in my university days, I would keep all documents from my classes and assignments there as well.

2) Keeping data synced across computers, so I can start working on a document in any of them and then and continue working on the same document from anywhere else.

3) Syncing all of this data to my mobile phone for convenience, which includes both: uploading receipts from my phone banking apps for later reference; and fetching some document from the cloud and sharing a copy in some messaging app.

With Linux being more central to me in my personal computer usage, I've been looking into how I could adapt this workflow. OneDrive does not have a Linux client, so looking on GitHub I found onedriver a volunteer open-source project by Jeff Stafford aiming to bring OneDrive to Linux.

It works by mounting a virtual block drive on a directory of your choice, intercepting all VFS (Virtual File System) calls there and converting those to HTTP requests to the official OneDrive HTTP API. I got it working and auto-starting after login after a few stabs at it and figuring out systemd units and I has mostly been OK, but not great.

For reading out files it has served well, but one time when I was working at a Libre Office document from Linux -- and I'm pretty sure I saved it multiple times while I worked on it -- and the file was completely lost with no warning messages. This broke my trust on using onedriver for critical work, as losing data would absolutely never happen for years while using Word paired on OneDrive (even at the worst case, the file would be in the original computer and not synced, but never actually lost).

Since them I've receded to use the web interface for OneDrive on Linux, which is not as seamless as just having it on the file system. Still, the fact you could mount a virtual drive on Linux and choose how to interpret the file system calls to list, read and write files programmatically did not cease to amaze me. My understanding is that this functionality is provided by the fusermount3 kernel module, which lets a program attach “handlers” to those VFS operations, which in turn run whatever code -- say an HTTP request or an FTP transfer --- instead of hitting a physical block device. It's the embodiment of the Unix philosophy of “everything is a file”!

Well, fast forward a few weeks, I found this tool which seems to be a real gem: rclone. It positions itself as rsync for the cloud. So here are some of it's capabilities:

Firstly, it lets you sync a directory tree to and from your machine to a cloud-backed storage, such as Amazon S3, Google Drive, Azure Blob Storage (there are many options), in a similar fashion that rsync let's you sync the changes between two directory threes.

Secondly, -- and that's the killer feature in my eyes -- it let's you mount this cloud-backed storage as a virtual drive on your file system! Great, that is just what I've been looking for.

I got a Blob Container spinning on Azure and I was able to flawlessly sync files into it and out of it in a few minutes with no surprises. The only thing it didn't seem to handle (or that I didn't figure out yet how to enable in configuration) is what happens when two computers attempt to edit a file concurrently. What happened was that the one to save the latest overwrote the changes by the other.

Well, I'm excited about rclone because it is a project that is more than 10 years old at this point and is fully open-source and volunteer driven. I like to keep toolbox minimal and get to know each tool very well, so rclone not having a “pricing” page and not being VC-backed makes me expect it to still be there for decades to come without pulling the rug under the user-base.