The past week (last Saturday to this Saturday) has possibly been… no, scratch that… has definitely been the worst week I’ve ever had since starting Simpli.
As my last blog entry states, October was our best month ever. We signed an avalanche of new dedicated server customers. We got a huge order for hardware. And pretty quickly — toward the beginning of October — I realized Russ and I, as the only two full-time employees at Simpli, weren’t going to be able to handle it all ourselves. I was working an extra 2-4 hours a day doing support tickets, and Russ was pulling 12 hours a day as well. We kept adding new customers, but we were already maxed out on available time to work at Simpli.
But I dreaded interviewing again. Then I remembered that Brandon had interviewed another candidate and seemed positive about him. So Russ and I pulled him for another interview. After the interview, it was really a tossup whether we should hire him, but I felt that we could use the help, and I was about to go insane with work overload.
Unfortunately, our new hire didn’t really know anything about Linux, despite the claims on his resume. Russ was frustrated by having to teach him basics like how to SSH into a server and restart a service. This was the week before last.
Last Saturday (8 days ago), the shit started to hit the fan. Meowcat, one of our shared servers, started having strange issues with PHP. Squirrelmail broke. To make matters worse, in an unrelated-to-meowcat incident, we tripped a power breaker at Market Post Tower.
Normally, a breaker trip is, well, not really a pleasant experience, but at least it’s an easy one to recover from. You move some servers onto another circuit breaker, turn the breaker back on, and everything works. At least that’s how it always happened at AboveNet.
Market Post Tower apparently doesn’t have the same safeguards that AboveNet does, because when we turned the breaker back on, a power surge hit our servers. That’s actually only our best guess on what happened, since all we know is that at least 8 servers failed within the next 24 hours, and they were all on that circuit. I mean complete failures — motherboard and PSU toasted. The machines wouldn’t even boot.
Russ and I started digging out the spare servers, and finally found enough to get everyone back online. We had to give 2 people some really nice free upgrades, which I’m sure they appreciated, and we ended up sending back over $3000 in servers to be warranty replaced. We worked over 40 combined hours that weekend.
People were starting to complain about meowcat, and to make a really long story short, it took 5 people, over 50 hours of work, and 5 days to figure out what the problem was with PHP and Squirrelmail. We thought that there was an OS corruption, so we decided to move all 400+ sites on meowcat to another server. meowcat’s RAID array was also showing some flakiness, which hastened the decision to move all sites to another server.
The move was intense, culminating with me at the office at about 4:15AM one night last week (I think it was Thursday morning at that point; I’d only left the office since the previous Saturday to sleep, and I wasn’t getting much of that either) and Russ staying up all night that night to fix the issues that just kept cropping up. I left the office, having just worked a 16-hour day, and the stupid Squirrelmail/PHP issue was still not fixed. By now I knew it had to be a configuration issue, but I was too exhausted to track it down.
I finally made an appearance at the office again Thursday afternoon, having gotten a scant 4 hours of sleep. I found Russ passed out on the couch since he hadn’t slept at all yet. That day, Russ, Mooneer, and I all pulled our weight, and Ben was off duty, so he wasn’t in the picture, but our new employee couldn’t fix any customer issues because he simply didn’t have the knowledge to fix the issues. Exhausted, I spent a few more hours fixing customer issues and listening and reading complaint after complaint about meowcat’s issues. I knew at that point I had to let our new hire go, so I called him into my office that afternoon and dismissed him. He was a really nice guy, but the technical skills were completely missing, and we didn’t have the time to train someone. We needed someone who could hit the ground running and step in to fix urgent issues like those on meowcat.
In the meantime, I finally realized that our Cisco “guru” we hired as a contractor had completely flaked out on us, and sent him a termination letter as well. This left us in a rush to find a new Cisco contractor, which (hopefully) we have found this week. He’s supposed to meet with us Monday afternoon, which means we may finally get our new networking gear in shape and ready to deploy.
We’re also interviewing for a new person on Monday, who hopefully will be a good fit for us. His resume looks promising. Russ and I have learned from our previous experience and we plan to give him a short written test about some of the Linux fundamentals (for instance: what does the ifconfig command do?)
On Saturday (11/5), another circuit tripped, this time at AboveNet. Russ handled it with aplomb, though, and got everything back online so quickly that most of our customers didn’t even notice. (We are enacting a policy that if anything like that happens again, we’ll notify everyone who was affected quickly so they can confirm that everything is back online and working.) Meanwhile, I finally broke down under stress on Saturday afternoon. A couple friends helped me through it, though, and for that I am grateful. I don’t often have stress breakdowns, but this time I pushed myself too hard for an entire week, and ended up in a bad mental state for a lot of this weekend. I pushed through it today (Sunday) and again spent a whole day working so that the support tickets would be cleared up for Mooneer in the morning. There is still a ton of stuff that needs to be done, but I feel like the balance is finally starting to tip in our favor, and once we hire another full-time person, I can stop manning the support desk and go back to just being sales and CEO.
What did I learn from all this? A couple really important things. First was a lesson relearned. C told me this a long time ago. He said he lets everyone on his team vote to hire or not hire a new person, and if even one person says “no”, he doesn’t hire that person. That’s smart. Hiring is tough, but it’s not impossible. A lot of it is gut instinct. My gut instinct told me our new hire wasn’t going to work, but I figured anyone was better than no one. That was really a bad assumption, and I won’t make that mistake again. Next time, we have to be 100% sure the new hire will be competent and will be able to do the job.
Second, I push myself way too hard sometimes. In an effort to shield Russ from breaking down under pressure, I sacrificed my own well-being (first sleep, and then a healthy mental state) by working crazy hours. What I forgot, though, is that Russ is perfectly capable of drawing those boundaries and pushing back on me when he’s had too much. It’s important for me to be able to take that step back and realize when I can’t push myself any more. I felt things starting to break down for me on Friday, and by Saturday afternoon I was a mess. It lasted until sometime really late last night — somewhere beween SuperHappyDevHouse and bowling — when I finally started to relax and feel like a normal human being again.
The big, big, big lesson here is one I seem to get smacked with every once in a while. That one is: I can’t do this all myself. Of course, Simpli has long since grown from being just me. But again, this week, I tried to do it all myself, and make sure Russ was protected from the insanity that was going on. But it’s not my job to decide when Russ has had too much stress — I have to trust that he will tell me when he’s too overwhelmed to continue. And I have to trust my own instincts that tell me who to hire and where to go with Simpli. And sometimes–just sometimes–it’s okay to think about my own personal needs instead of constantly putting everyone else first.
Here’s to next week. May it be a big step in the right direction. I (and Simpli) could definitely use that right now.


Bookmark in Del.icio.us
Technorati
Digg This!
Review on StumbleUpon



November 7th, 2005 at 1:58 am
wow, I hope this doesn’t come off as overly snarky, but even I know more about Linux than this dude you hired (ssh, ifconfig, etc is Linux 101 stuff at best), and I’m just a Cisco geek. If I had any idea it was that easy to BS your way through a Linux admin-gig interview at a startup, I would have been applying for some of them.
November 7th, 2005 at 7:22 am
Tech interview! Tech interview!
I’m not big on having written tests as part of an interview, but you should ALWAYS ALWAYS get a feel for the persons technical prowess. Normal interviewing doesn’t have to take long… you get a feel for how the person is in a social and customer related sense, then you hammer him/her with technical questions to ensure they not only know the basics, but advanced topics. Use your last set of problems as an example:
When a PHP server is causing these types of errors, how would you approach the situation?
A power grid just failed, what should we do?
Don’t only get a grip on his knowledge, but how to problem solve, too.
If you need anymore interviewing tips, my father is a VP of human resources, so I am quite familiar with various approaches.
November 7th, 2005 at 11:00 am
Fun, fun, fun. You’d have had a lot more tickets from me (all marked ‘Urgent’) had wifey been in charge:)
This guy seriously didn’t know what ‘ifconfig’ did?
November 8th, 2005 at 6:27 pm
Remember when the tech industry was full of qualified people? … Yeah me neither. Good luck with the new new guy.
November 9th, 2005 at 8:21 pm
Wow, just reading that makes me feel a little stress haha
November 23rd, 2005 at 6:20 pm
It’s so sad that even in a tech mecca like cali you can’t find qualified people. I think it’s because of a the feeling in IT that failure is acceptable. If I fuck it up, I’ll just recompile, or re-install the OS, or disconnect the rack and try again. Mechanical engineers can’t do that. You can’t unbuild a bridge and start over. But nobody in IT other than the people signing the checks has any fucking clue that mistakes cost money and cause frustration. Because the barrier to entry is so low, every Tom, Dick, and philosophy major is attempting to get an IT job. Eventually one of those poorly-trained people makes it to a managerial position and then the whole fucking thing falls apart.
I’m stuck in San Antonio where the tech market is weak and the brains filling the chairs are even weaker. I work for a global publisher and am frightened on a daily basis at how inept these peoople, at a multi-billion dollar company, can be. Examples: Taking down a server that supports the operations of over 40 people without notice….for 4 hours! Installing a patch to the application server software on said server without validating that our product can be successfully deployed to that version of the app server. (This one cost a whole day for the same 40 people) One time a high-level development manager and his new pet enterprise architect deleted half of the certification database at 3AM, but didn’t know until shit broke in the morning. If that wasn’t bad enough, when they asked the DBAs if they had backups, the DBAs were unsure.
I’m envious that you’ve been able to go out on your own and create a company from just your smarts and determination. I’ve had some great positions in corporate america, but this one is the one that will probably cause me to attempt to do the same. I think it’s time to become self-sufficient. I can’t take one more day working with these ass-clowns.
Sorry for the little rant, but I had to bitch to somebody who understands because my wife, a non-geek, certainly doesn’t! I guess my point is that if you can’t find people who know /dev/null from a hole in the ground in a tech mecca like SV, then there’s no hope for the IT field in any city in America. God damn I wish I were still in Chicago at G2Switchworks. 25 J2EE developers, 3 DBAs and 2 architects: 30 true professionals. I was the stupid one in that group and damn was it awesome.
November 30th, 2005 at 12:57 am
I have no resume at all, but I’ve used ifconfig, done a little more than just calling /etc/init.d scripts, and I frequently use SSH.
I feel better about my future now. (and I’m frankly, terrified that people who barely know how to use a console are managing to get paid jobs root-ing around commercial servers)