BungeeCord task cancelling issues

My blog is getting quite stale, so I am going to post about a bug I fixed in BungeeCord recently regarding scheduled tasks, relating to shutdown issues in HubMagic and RedisBungee. But before we do that, let's take a quick look at the scheduler.

The BungeeCord scheduler

The BungeeCord scheduler is a simple task running API that runs tasks asynchronously and supports repeating tasks. To handle this, it uses a special class called BungeeTask. BungeeTask is the actual runnable used by BungeeCord.

After task execution is deemed to be complete, it cancels the task with the scheduler. This is where we find our bug.

Wrong assumptions

When a task is cancelled via the ScheduledTask's cancel() method, the BungeeTask sets its running flag to false. So far, so good. It will be noticed and the task will be cancelled as usual. But in between this time and the task cancelling, the task may have been cancelled!

That's right, we have a condition where the task is a zombie and tries to cancel itself, only to find it was already cancelled. The actual error is in BungeeScheduler, which tries to find the task by its ID but fails as the task was never found. No null check is performed on the removed task. Java sees a reference to a null object (to its cancel() method) and throws a non-specific NullPointerException.

A fix

I was able to fix this issue rather easily. First, I made ScheduledTask's cancel() methods always map to an corresponding scheduler cancel(taskId), which is only called once, then made cancel(ScheduledTask) always call cancel() on the provided task.

The code is slightly misleading. cancel() calls the scheduler's cancel(taskId) method, which removes the task and calls cancel() on it. However, since the running boolean is false, we don't NPE.

A postmortem

  • It is fairly clear that the cancellation of scheduled tasks was never tested. The relevant code had not been touched for almost a year. This is further attested to by tasksByPlugin issue that was reported by vemacs.
  • Silly bugs like this are low-hanging fruit and are harming the BungeeCord ecosystem greatly. A extensive bugcheck over the code base should be initiated.