The past week (last Saturday to this Saturday) has possibly been… no, scratch that… has definitely been the worst week I’ve ever had since starting Simpli.
As my last blog entry states, October was our best month ever. We signed an avalanche of new dedicated server customers. We got a huge order for hardware. And pretty quickly — toward the beginning of October — I realized Russ and I, as the only two full-time employees at Simpli, weren’t going to be able to handle it all ourselves. I was working an extra 2-4 hours a day doing support tickets, and Russ was pulling 12 hours a day as well. We kept adding new customers, but we were already maxed out on available time to work at Simpli.
But I dreaded interviewing again. Then I remembered that Brandon had interviewed another candidate and seemed positive about him. So Russ and I pulled him for another interview. After the interview, it was really a tossup whether we should hire him, but I felt that we could use the help, and I was about to go insane with work overload.
Unfortunately, our new hire didn’t really know anything about Linux, despite the claims on his resume. Russ was frustrated by having to teach him basics like how to SSH into a server and restart a service. This was the week before last.
Last Saturday (8 days ago), the shit started to hit the fan. Meowcat, one of our shared servers, started having strange issues with PHP. Squirrelmail broke. To make matters worse, in an unrelated-to-meowcat incident, we tripped a power breaker at Market Post Tower.
Normally, a breaker trip is, well, not really a pleasant experience, but at least it’s an easy one to recover from. You move some servers onto another circuit breaker, turn the breaker back on, and everything works. At least that’s how it always happened at AboveNet.
Market Post Tower apparently doesn’t have the same safeguards that AboveNet does, because when we turned the breaker back on, a power surge hit our servers. That’s actually only our best guess on what happened, since all we know is that at least 8 servers failed within the next 24 hours, and they were all on that circuit. I mean complete failures — motherboard and PSU toasted. The machines wouldn’t even boot.
Russ and I started digging out the spare servers, and finally found enough to get everyone back online. We had to give 2 people some really nice free upgrades, which I’m sure they appreciated, and we ended up sending back over $3000 in servers to be warranty replaced. We worked over 40 combined hours that weekend.
People were starting to complain about meowcat, and to make a really long story short, it took 5 people, over 50 hours of work, and 5 days to figure out what the problem was with PHP and Squirrelmail. We thought that there was an OS corruption, so we decided to move all 400+ sites on meowcat to another server. meowcat’s RAID array was also showing some flakiness, which hastened the decision to move all sites to another server.
The move was intense, culminating with me at the office at about 4:15AM one night last week (I think it was Thursday morning at that point; I’d only left the office since the previous Saturday to sleep, and I wasn’t getting much of that either) and Russ staying up all night that night to fix the issues that just kept cropping up. I left the office, having just worked a 16-hour day, and the stupid Squirrelmail/PHP issue was still not fixed. By now I knew it had to be a configuration issue, but I was too exhausted to track it down.
I finally made an appearance at the office again Thursday afternoon, having gotten a scant 4 hours of sleep. I found Russ passed out on the couch since he hadn’t slept at all yet. That day, Russ, Mooneer, and I all pulled our weight, and Ben was off duty, so he wasn’t in the picture, but our new employee couldn’t fix any customer issues because he simply didn’t have the knowledge to fix the issues. Exhausted, I spent a few more hours fixing customer issues and listening and reading complaint after complaint about meowcat’s issues. I knew at that point I had to let our new hire go, so I called him into my office that afternoon and dismissed him. He was a really nice guy, but the technical skills were completely missing, and we didn’t have the time to train someone. We needed someone who could hit the ground running and step in to fix urgent issues like those on meowcat.
In the meantime, I finally realized that our Cisco “guru” we hired as a contractor had completely flaked out on us, and sent him a termination letter as well. This left us in a rush to find a new Cisco contractor, which (hopefully) we have found this week. He’s supposed to meet with us Monday afternoon, which means we may finally get our new networking gear in shape and ready to deploy.
We’re also interviewing for a new person on Monday, who hopefully will be a good fit for us. His resume looks promising. Russ and I have learned from our previous experience and we plan to give him a short written test about some of the Linux fundamentals (for instance: what does the ifconfig command do?)
On Saturday (11/5), another circuit tripped, this time at AboveNet. Russ handled it with aplomb, though, and got everything back online so quickly that most of our customers didn’t even notice. (We are enacting a policy that if anything like that happens again, we’ll notify everyone who was affected quickly so they can confirm that everything is back online and working.) Meanwhile, I finally broke down under stress on Saturday afternoon. A couple friends helped me through it, though, and for that I am grateful. I don’t often have stress breakdowns, but this time I pushed myself too hard for an entire week, and ended up in a bad mental state for a lot of this weekend. I pushed through it today (Sunday) and again spent a whole day working so that the support tickets would be cleared up for Mooneer in the morning. There is still a ton of stuff that needs to be done, but I feel like the balance is finally starting to tip in our favor, and once we hire another full-time person, I can stop manning the support desk and go back to just being sales and CEO.
What did I learn from all this? A couple really important things. First was a lesson relearned. C told me this a long time ago. He said he lets everyone on his team vote to hire or not hire a new person, and if even one person says “no”, he doesn’t hire that person. That’s smart. Hiring is tough, but it’s not impossible. A lot of it is gut instinct. My gut instinct told me our new hire wasn’t going to work, but I figured anyone was better than no one. That was really a bad assumption, and I won’t make that mistake again. Next time, we have to be 100% sure the new hire will be competent and will be able to do the job.
Second, I push myself way too hard sometimes. In an effort to shield Russ from breaking down under pressure, I sacrificed my own well-being (first sleep, and then a healthy mental state) by working crazy hours. What I forgot, though, is that Russ is perfectly capable of drawing those boundaries and pushing back on me when he’s had too much. It’s important for me to be able to take that step back and realize when I can’t push myself any more. I felt things starting to break down for me on Friday, and by Saturday afternoon I was a mess. It lasted until sometime really late last night — somewhere beween SuperHappyDevHouse and bowling — when I finally started to relax and feel like a normal human being again.
The big, big, big lesson here is one I seem to get smacked with every once in a while. That one is: I can’t do this all myself. Of course, Simpli has long since grown from being just me. But again, this week, I tried to do it all myself, and make sure Russ was protected from the insanity that was going on. But it’s not my job to decide when Russ has had too much stress — I have to trust that he will tell me when he’s too overwhelmed to continue. And I have to trust my own instincts that tell me who to hire and where to go with Simpli. And sometimes–just sometimes–it’s okay to think about my own personal needs instead of constantly putting everyone else first. 😉
Here’s to next week. May it be a big step in the right direction. I (and Simpli) could definitely use that right now.