Breakage

| No TrackBacks

I decided to upgrade from Movable Type version 4 to 5 today. Unfortunately, I'm suffering some breakage as a result.

For starters, the unity-tricolor theme I was using doesn't seem to work properly in MT 5, and I can't seem to find an MT 5 version to restore the old layout. Hopefully I'll find something to replace this ugly red style soon.

I also decided to migrate the blog's comments to Disqus hoping I'll get more participation if I use a comment system that has wider adoption and more login options. But the Disqus MT plugin is for MT 4, and there is a little bit of flakiness I need to work on correcting.

I'll try and get this back in shape ASAP.

asterisk-func_dns Update

| No TrackBacks

I pushed some updates to my Asterisk func_dns module today.

These updates enable func_dns to build against (at least) Asterisk 1.8, since 1.4 is now long since deprecated. Also, I bastardized the Makefile a little bit to automatically detect a /usr/lib64 Asterisk modules directory and use it as an installation path instead of the old hardcoded /usr/lib/asterisk/modules directory.

I wish there was a pkgconfig that ships with Asterisk, but I did not see one in any of the files installed by the EPEL 6 Asterisk package.

Thank You Sprint!

| No TrackBacks

Yesterday, my Sprint Nexus S 4G stopped sending or receiving calls or text messages. The phone is running stock Sprint Android ICS software.

This occurred after a round of manual application updates to apps like Facebook, Skype, iMO and others. My initial suspicion (after trying to reboot the phone) was that one of these apps and its new permissions might have installed some kind of hook that disrupted my calling abilities. That was probably a stretch, but it was easy enough to try uninstalling these apps.

That didn't work, so I pursued information on the net that suggested updating the phone's profile (and its PRL) and uninstalling Google Voice (though my copy was factory preloaded).

I thought that I might be experiencing a Sprint network problem (despite the fact that data service worked fine) so I decided to sleep on it and see if it was working in the morning.

No dice, so I did a factory reset. I was annoyed that I'd need to replace all of my apps and configuration after said reset, but not as annoyed as I was when I discovered that it still wasn't working.

Finally, I got to another phone and called Sprint support. At this point I wasn't expecting a carrier outage, as an outage lasting this long in the telecommunciations industry can get the carrier in some regulatory hot water. To my surprise, I was told there was an outage in my area, and that Sprint had lost communication with the tower. I suppose that explains why the phone indicates full signal strength even though I can't make calls.

I've been assured that it will be fixed in the next two hours. Time to work on restoring my phone. :(

Super Duper Update!

It's now 5 hours after being told it would be fixed in two hours, and Sprint's connection to their tower is still not working!

Super Duper Super Duper Update!

It's now almost 11 hours since Sprint's first ETR (or at least the first one I was given) and the phone has been unusable in my area since Sunday afternoon (an outage of over 24 hours at this point). The new ETR I have been given is for 3:30 AM, 17 hours and 30 minutes after the first ETR.

Oh Sprint, you continue to be so amazing!

At work, our app is hosted on a pair of Internet connections from different upstream providers. We have incoming and outgoing SIP calls, incoming web traffic, incoming and outcoming e-mail, and incoming and outgoing web service calls.

We wanted to be able to load balance all of these functions with failover, and we have a philosophy of simultaneously utilizing resources from all of our available routers and connections. That allows us to avoid situations in which the failover system or failover circuit does not function as intended when it becomes the active master.

We use a variety of load-balancing proxies and techniques to allow these various services to function reliably.

What we're not using is BGP anycast -- mainly because to do so requires a Class C of provider-neutral IP space and an AS number, and IP space is getting harder to come by. Instead, we utilize DNS-based load balancing and failover from DynECT for all of our inbound traffic.

Our network consists of Linux-based routers/load balancers running in parallel. In order to load balance our outbound Internet traffic, all that traffic goes through proxy servers which are looked up on our internal DNS. Here, we utilize a 5 second TTL for fast failover.

Finally, we have a load balancer probe which continually tests the uptime of our backend servers and our Internet circuits.

One time, we discovered a flaw with this system. We performed uptime monitoring of our outbound Internet circuits by pinging the default gateway. In many cases this check was sufficient, but if our Internet provider was experiencing loss of connectivity at a level beyond our next-hop, this strategy failed.

A tempting solution would be to ping something out on the Internet, but that means tying our reliability to the uptime of what we are pinging. We also weren't too sure about constantly pinging someone we hadn't already made arrangements with.

It turns out there is a better way. DynECT is constantly requesting web pages from each of our external load balancers, in order to determine whether or not to publish that IP address for our domains. We realized that we could monitor the frequency of these requests, and if we were not receiving them on a particular load balancer, that server could arrange for its own internal IP to stop being served for our internal proxy server DNS A record.

Of course, using this approach meant that if we stopped getting requests from DynECT due to a problem on their end, we could generate an outage where we were trying to prevent one. In order to build in some redundancy, we upgraded our Pingdom account and created a check for each server/Internet circuit. Now, if we're not hearing from either DynECT or Pingdom on a given circuit, we consider that circuit offline.

Since the implementation of this solution, we have experienced conditions in which the older "I can ping my default route" check did not trip, but our Internet circuit was nonetheless offline. But our new WAN monitoring solution reliably catches the problem and brings the affected circuit out of service correctly.

Many Linux/BSD users are now hosting their dotfiles in git repositories. This scheme allows you to quickly deploy your favorite system configuration to a new server on which you've been given an account, letting you get bash, vim, screen or whatever utilities you use most working exactly as you prefer them with a minimal amount of fuss.

I started following this approach and have been doing so successfully for months.

In order to make deployment as easy as possible, I wrote a simple "apply" script in bash which would symlink desired configuration files into place, and automatically add an include line to the local server's bashrc which would include my global bashrc, so that my settings would mix in gracefully with operating system defaults.

I published a subset of my shared-env repository on GitHub to help anyone who wants to save a little bit of time spent gettting a skeleton in place.

My shared-env contains a few handy vim plugins (including localvimrc and my own makesd) and the really cool tab completion directory history script z.sh.

How to Fuck Up a (Re)launch

| No TrackBacks

My credit union recently decided to redo its online banking portal. I want to like it, but the first taste in my mouth is bitter.

On Tuesday, the old portal was canned, and they posted an announcement that logins would be disabled as they worked on the upgrade. Indeed, I was unable to access my banking portal until mid-afternoon Wednesday.

Upon attempting to log in, I fed in the same credentials I had been using for years. The credentials were immediately rejected, so I tried again, assuming I must have fat-fingered my password.

Instead of getting in, the system locked me out of my account. I started digging through their FAQ which was helpfully linked all over the place. The FAQ implied something about a temporary password but did not specify what it might be.

When I returned to the home page, I noticed an "Alert" icon, and upon hovering over it, got advised that my temporary password was the last four digits of my social security number and my birth year.

As a developer who put time into smoothing over relaunches before, I was a bit miffed. It's perfectly possible to check old passwords (even when you properly implemented salting and hashing). It's also perfectly possible to force the user to change their password on first login into your new system, so that you can store it in whatever new database table you want, using whatever new encoding scheme you like.

Put aside the fact that their new system couldn't validate my old login credentials. They overwrote everyone's password with the last four digits of their social security number, combined with their birth year. As if either string is really a secret. Everyone on the planet asks you for the last four digits of your social security number, or in some cases, the full number. If a user is not actively participating in online banking, will these temporary passwords ever expire?

Of course, I couldn't access my account using the new-found information, because two password attempts was enough to lock me out of my account. And again with their helpful suggestion: either use the "Forgot Password" mechanism to unlock your account, or contact support.

When I tried the former, I was given the option to validate my identity by SMS or a phone call. In both cases the first 6 digits of the phone number were masked (eg, XXX-XXX-1234), and in both cases, I didn't recognize the trailing 4 digits. Now I was concerned about how my credit union had two bogus phone numbers linked to my account.

So I called support, and waited on hold for half an hour. The support person was friendly enough, especially given the fact he must be taking hundreds of similar calls. My account was unlocked, and again I was advised that a new temporary password had been issued.

I made sure to try and plug this password in before getting off the phone. If something didn't work out, I did not want to sit on hold again.

This time, the portal urged me "for my security" to close my browser and re-open it, because I had an "active banking session". Sounds like someone has a nasty cookie bug and decided to paper over it with some baloney about how secure they are being.

No matter, I elected to log in on the laptop. Seeing that I had finally made progress, I bid goodbye to the support person and moved on. Now I was being required to set a new password, which had sensible requirements for character classes but had to be between 8 and 12 characters. Why on earth would you limit my online banking password to 12 characters?

At the next screen, I was forced to change my user ID. It used to be a member number, but now I had to pick a user ID with no special characters, but with a mandatory number. "For my security."

The next screen had me entering phone numbers and other means to restore access to my account in the event I lose my password. They asked me to fill out some dreaded security questions. No one could ever Google my mother's maiden name. I'm feeling really secure!

After finally completing the new registration process, I was booted back to a login screen and asked to use my new credentials. Twice, I tried, and twice, the credentials I had just created failed to allow me in to the system. Again, my account was locked out.

Thankfully, on my second attempt at dealing with their garbage, the system decided to remember my password. And it even let me log in!

Of course, a credit union portal is only useful if my account balances, transaction history, or other banking services are available. The landing screen was full of promise -- "New mobile app!" "Track your spending!" But none of it worked. All of the informational views contained errors.

The portal is still fucked as I speak. I actually wanted to know my balance, so I decided to try the telephone access system, which I used frequently in prior years when the online banking portal wasn't close at hand. Turns out that has also been overhauled, and that doesn't work either.

I got my last laugh when I landed at a new login screen after my session timed out. Never mind the fact that they have so many different login screens (excellent for training your users to be cautious of phishing attempts). It turns out I should be using Internet Explorer, Firefox, or Safari "For my security". I suppose I'll just toss this Chrome garbage for IE. You know, "For my security".

Hey - maybe their new system is awesome. After spending an hour just trying to get in, it sure would be great to find out. In the mean time, I can't help but think "For your security" is the newest excuse of the mighty Bastard Operator From Hell.

Update 08/24/2012

I don't always specifically name the subject of any of my grievances, since I try to use my criticism constructively, if only as an example of what not to do. But Texans Credit Union has fucked up too badly for me to keep my mouth shut. It is now Friday and after 4 business days of being completely unavailable, the Internet banking portal is still offline, and I still can't access the account information hotline. To make matters worse, both the main customer service number, and the account information hotline, are currently offline. That is to say: they aren't taking calls. The phone company plays a prompt as if their telco equipment isn't even acknowledging the call attempts. No ringback.

I know, I know.

I'm preaching to the crowd. But I suspect I may not be preaching to the converted.

Everyone has heard the mantra since the beginning of their computing lives: back up your important data! But it's incredible to me how many computer literate people still get it wrong. Even when they give the right advice to their novice friends, they fail to implement the correct strategies in practice.

When I was a teenager, I accidentally ruined a friend's semester-long research project. I was helping him install a new hard drive. He didn't need my help, but back then it was still really exciting to get a new drive, to crack open the case and to install new gear. He decided to trust my butterfingers to the task of reconnecting power to his old drive. Hard drive molex power connectors are supposed to be impossible to insert backwards, but I couldn't see into his case to monitor what I was doing, and managed to connect it backwards anyway.

He had no backups, and after paying hundreds to a data recovery expert, he never recovered his project. Sorry David!

More recently, a friend encrypted his laptop hard drive and discovered that he could use Unicode characters in the password. The only problem was that while the UI accepted those characters in the screen where your password is set, it did not accept those characters on the screen where you decrypt and log in. Whoops.

This weekend, my RAID array finally decided to crap out. I'd been operating a RAID 10 made of 4 Seagate 1 TB drives. All of the drives had been through at least one RMA cycle. (They don't make these things as reliable as they used to. But it's definitely not just Seagate.) After dealing with rounds of RMAs, I got lazy and ran the array in degraded mode for months. When you lose drives in a RAID 10, it effectively becomes a RAID 0. A RAID 0 is more dangerous than a bare hard drive, since a failure of any single disk in the stripe will destroy the array.

I'm not totally irresponsible, mind you. I have nightly automatic backups of my most critical data, courtesy of Rsync.net and Duplicity. I even tested the backups after setting them up.

But what I did not test is that all of my expected files were present in the backup. I misconfigured Duplicity and caused it to ignore my most critical directories - those with the source code of everything I've been working on for many years.

Many copies of these projects exist at different places - work, GitHub, etc. But there were some critical projects that were either not up to date, or not mirrored anywhere at all.

You can imagine my swearing and my feelings of panic. Thankfully, after I calmed down and carefully reassembled the RAID array, I was able to recover everything of importance. Nowadays, Linux is pretty resilient when reading from failing media.

Always back up your important data. Do it regularly. Back it up in more than one way. Don't assume your RAID array will save your data. Don't assume your backups will work, or will contain everything you need. Don't run your RAID arrays in degraded mode, don't let your backup process fail and not fix it, and don't forget to test your backups!

Implementing backups is boring. Testing backups is really boring. But some day you might be really glad you did.

Modifying Your Sports Car

| No TrackBacks

I want a fast car!

When I was 15 I got a ride in a friend's WS6. In addition to looking great, it was my first experience with a fast car. I was hooked, and for a while I was convinced I'd get myself a WS6 when I turned 16.

At 16, of course, I didn't have tens of thousands of dollars to drop on a new car. I was very fortunate that my parents bought me a used Honda Prelude, and I showed my gratitude by totaling it a month later when I tried to negotiate a turn too quickly in the rain. If there is anything I learned from the Prelude (other than the fact that it really sucks to wreck your new car), it was that I could perform simple maintenance on a car, such as an oil change, in my garage.

16, and my high school years in general, occurred during the rise of the rice-burner in America. Imports outnumbered domestics at my school and every morning and afternoon featured a soundtrack of coffee can exhausts farting in and out of the parking lot. There was a giant pissing match between everybody and everybody else over whose car is faster.

Most of us were just dreamers with no means. I replaced the Prelude with a Nissan 240sx and sat to work modifying it -- that is, dreaming of modifying it. I was going to convert it to a Turbo Silvia. But I didn't have any money.

Fate (and my own stupidity) eventually took hold and I smashed the 240sx too. I ended up in a cheaper 5 spd Nissan Sentra. The biggest "mod" the Sentra ever received was a transplant of a decent custom stereo that was once installed in the 240sx. But as I drove this beater over the years, I swapped two clutches, and practically the whole ignition system. I started to gain real confidence that I knew how to work on cars.

Eventually, as a young adult, I took out a loan and swapped the Sentra for a Mazda Miata. I didn't have a lot of money, but I had some, and I was soon pouring money into modifying the Miata. It seemed like the dream I always had about upgrading a car was finally coming true. I was so excited that I barely noticed when a professional mechanic advised me not to get into heavily modifying my daily driver -- especially if I am not a professional mechanic prepared to keep it running. I ignored this advice again when a friend with a heavily modified 350z explained his regrets at having destroyed the car's reliability.

I suppose I thought that they were part of a "club" that didn't want new members. In reality, they were just ensuring they would eventually get a big "I told you so!"

I'll build a fast car!

I started with a simple tune-up. The car ran a little smoother, but it was no faster.

I added custom seat heaters. They were a nice mod. But it was no faster.

Being a Texas driver, my Miata was often stuck in traffic on hot days. This led to a number of incidents when the car would overheat and require me to switch from comforting A/C to blistering heat. In went my first high-dollar mod, a thousand-dollar aluminum race radiator combined with an oil cooler. I dropped another $400 on a custom fan shroud with two large electric fans. I felt great about my investment, but I still had problems on hot days. And the car was no faster.

Naturally, any fast car should handle very well. The Miata is known for handling well from the factory, and for being very easy to drive, but I decided to supercharge the handling. I bought upgraded anti-roll bars, dropped 2k on a custom coil-over suspension and spent half a grand on 28 replacement suspension bushings from a performance part manufacturer in England.

I installed the coil-overs myself but was not prepared to tackle the massive task of installing bushings alone. I paid a friendly mechanic with a brand new intake for his car (taking another $400 out of my own pocket) and we spent a day doing the bushings with a real lift and hydraulic press. Towards the end of the evening we were both tired, and when he started using the impact wrench to torque down my eccentrics while the wheels were still sagging, I didn't have much energy left to argue the merits of doing the job properly (while the car was resting on its own weight). Although all the bushings were now installed with preloaded stress, the car felt great, and though it didn't accelerate any faster, it probably cornered faster.

Nevermind that broken part, I'll just keep upgrading!

Before the bushings started to fail, the custom lexan fan shroud was cracking. This gentleman's home-built radiator shroud design was not thought out well. In addition to falling apart, the foam used to seal up the shroud around the edges was melting and forming a disgusting goo on the radiator. I should have stopped there and gotten a real fan shroud, but I was too interested in spending money on more fun parts for the car.

By this time it was clear that I wanted a supercharger. After seeing a deal online for $500 off on an MP62 kit, I put it on a credit card and ended up with a pristine, brand new supercharger in a cardboard box in my closet. Installing the supercharger would be a big ordeal and there was a lot of prep work I'd have to do.

I installed an upgraded header and heat wrapping. The car sounded a little better, and it may have been faster (by a very small amount). But I still wasn't ready to install the supercharger.

In order to support the load of the supercharger, I'd need a new performance clutch. I spent the money on a Stage 3 clutch and decided to get a lightweight flywheel to make the engine rev faster while I was at it. $1000 later I was underneath the car, swapping in the new parts. I thought I would go ahead and replace the rear main seal (which wasn't leaking oil) with a new one. Preventative maintenance and all.

Eventually I decided to get smart and run a compression test on the engine. Bad news -- I needed new rings. That was going to be expensive and would further delay my supercharger installation. But no matter, I was making progress towards my dream. I was going to finish building my fast car.

Unfortunately, dreams are often just that. My car was already less reliable than when I started. I needed frequent alignments to keep the lowered car driving correctly. I was going through tires. And before long, I noticed the car was leaking oil -- from the transmission bellhousing. My brand new $1000 clutch setup started to slip, and then slip more. A year after purchase, its warranty already having expired, I still had the supercharger sitting in a box in my closet.

I must have screwed up the rear main seal. I decided that I needed to replace the clutch. This time I paid a professional Miata shop to do the work. $1000 later I had a new clutch and a resurfaced flywheel. The oil leak had destroyed the old clutch and flywheel. At least the pros did it this time. They replaced my rear main seal again. Surely they would do it correctly.

My fan shroud was falling apart even worse than before. The car started making bad noises when I went over bumps and the handling began to degrade. My suspension bushings were beginning to fail, the result of cutting corners in the install procedure. I poured in yet more money to start replacing these parts as they went.

And then I noticed more oil leaking from the bellhousing. I should have twisted the shop's arm to fix the problem. Instead, I waited a year until the damage was already done.

It's just a freaking money pit!

My priorities started changing. The Miata is a fun car but there were new things in my life. I had turntables and a small but growing collection of music software. I started pursuing a more productive hobby.

I decided to hawk the brand new supercharger in my closet. I sold it at a loss.

I still have frame rail reinforcements, and a set of new rear drive gears for a different ratio. I still have top of the line sound deadening material I never installed. If I'm lucky, I'll be able to sell some of those parts at a loss too.

By now I was just trying to keep the car running as a daily driver. The clutch continued to get worse and I had to replace the $150 dry cell battery with another one after it lasted no longer than a standard battery. (The factory Panasonic battery the car came with was in no way worn out when I installed the first dry cell battery. It was just another "upgrade")

On the way home from work one day, a driver in a Tahoe was too busy chatting on his cell phone to notice my Miata in the lane next to him, and so he slammed my car into a guardrail. Both of our insurance companies quickly reminded me that the thousands of dollars of custom work done to the car did nothing to increase its actual value. I knew this before I started modifying the car, but it was still a punch in the gut.

You told me so!

In the end, I caused the Miata enough problems that it actually became slower.

It's true what they say. Modifying a car is an expensive, time consuming proposition. Cars are already depreciating assets; adding custom parts to them makes it even worse.

When cars leave the manufacturer, assuming the vehicle is of reasonable quality, it will have reasonable durability. Virtually everything you do to "customize" beyond the factory specs will negatively impact the longevity. Custom parts break much more often than factory ones, and custom parts often break the factory parts you plug them into.

If I had it all to do over again, I would have upgraded the stereo and installed the seat heaters. I wouldn't have upgraded anything else. I should have just fixed problems when they came up. But like so many things in life, I learned this lesson the hard way.

Super Special Update!

The heat shield caught on fire. It might have burned my car to the ground, and maybe my garage and some adjacent apartments as well, had I not happened to pop the hood right after it happened. I've been assured that the heat shield is rated to thousands of degrees.

I thought I'd post a quick tip for anyone upgrading a set of clients in a kerberized NFSv4 network. I'm in the process of pushing out CentOS 6 to a cluster currently supported by NFSv4 on CentOS 5 and my standard "setup krb5/nfsv4 client" script didn't leave me with a working client. Instead, I got this error on the NFS server every time I attempted the NFS mount:

gss_kerberos_mech: unsupported algorithm 6

or

gss_kerberos_mech: unsupported algorithm 23

Some advice pointed out that the keytab might need to be written out without the newer key types, but attempting to limit to des-cbc-crc did not fix the problem.

Instead, I found that the following settings in the [libdefaults] section of /etc/krb5.conf fixed my environment:

[libdefaults]
 # cventers: These overrides are TEMPORARY until we have abandoned CentOS 5
 default_tgs_enctypes = des-cbc-md5 des-cbc-crc arcfour-hmac-md5 arcfour-hmac-exp
 default_tkt_enctypes = des-cbc-md5 des-cbc-crc arcfour-hmac-md5 arcfour-hmac-exp
 permitted_enctypes = des-cbc-md5 des-cbc-crc arcfour-hmac-md5 arcfour-hmac-exp
 allow_weak_crypto = true

qpsmtpd plugins

| No TrackBacks

As part of deploying a new Postfix-and-qpsmtpd based mail architecture at work, I have developed some qpsmtpd plugins and extended its native queue/smtp-forward plugin.

  1. filter/dkimsign: Signs e-mail using Mail::DKIM. There are a other dkimsign plugins out there but I wanted to take a stab at doing one myself.
  2. filter/header_whitelist: Possibly controversial, could break many things if misused. I wanted a way to clean up all the extra garbage version headers, etc added by the multitude of scripts and platforms generating email in our environment. If the mere existence of this plugin doesn't violate RFC2822 or e-mail best practices, certain configurations certainly would. Use with care.
  3. queue/smtp-forward: I have extended the stock plugin to support the Postfix XCLIENT verb. This allows a qpsmtpd to pass information about the client (their IP and HELO, in particular) which Postfix can then use for access control and/or logging. I'll try and submit this back upstream.

You can find the plugins at my GitHub page.