42

!rant & story_time
This happend to the startup I was working for at ~2011. I was a junior Android dev, working on a very popular app.
During experiments for a new feature, I discovered that the system AlarmManager has a serious bug - you can set a repeating alarm with interval=0ms. If your app takes more then 1 ms to handle the Intent, then the AlarmManager will start to fill up the intent Queue, with unexpected results to the OS. causing it to slow down, and reboot when it ran out of Ram. Why? my guess was that because the AlarmManager was part of the OS, then any issues caused by it caused the system process to ran out of ram, crashing it, and the whole system with it. the real kicker was that even after a reboot, the AlarmManager still had Intents queued, causing the device to bootloop for a while, untill the queue was cleared. My boss decided to report the problem to google, as this was an issue in the OS. I built an example app, that caused the crash 10-30 seconds after starting, and submitted to Google. Google responded later that day with "not an issue, no one will ever do this".
Well... At this point I decided to review the autoupdate feature in our app, to make sure this will not happen to us. We just released a new feature where a user can set an update schedule option in the app settings - where you could setup a daily, weekly, or hourly update for the app. after reviewing it, It looked good, and the issue was not triggered in the manual QA I did. So, it was all good. And we released an updated version to the store.
After we did an update-install, we discoverd that, there was a provlem reading the previous version SharedPrefs value for the update schdule settings, and the value defaulted to 0...
the result was, our app caused all our users to go into a bootloop, and because the alarm was reset when the devices booted up, the bootloop could only be solved in a factory reset, or removing our app, before the device rebooted, and then waiting a few reboot cycles.
We lost 50 places in the market, and it took us 6 months to get back to where we were.

It was not my fault, but it sucked big time!

Comments
  • 3
    QA was missing from your software team. The mistake was to pass a schedule based feature knowing the alarm manager bug and just by doing a one man manual QA. But shit happens.
  • 3
    @rayanon at that point in the startup, there was no dedicated QA. just me, and the founders, who were useless for this.
  • 4
    DAMN.
    Also it makes me sad that Google didn't take this seriously.
  • 0
    That sounds like a pretty big fuck up. How did the company resolve this issue and how they communicated this to the users?
    I bet your inbox exploded after the rollout?
Add Comment