The Missing Layer in Your AI Stack

Get the Forrester report to see what’s changing.

Fixing a Memory Leak in Go: Understanding time.After

Recently, we decided to investigate why our application ARANGOSYNC for synchronizing two ArangoDB clusters across data centers used up a lot of memory – around 2GB in certain cases. The environment contained ~1500 shards with 5000 GOroutines. Thanks to tools like pprof (to profile CPU and memory usage) it was very easy to identify the issue. The GO profiler showed us that memory was allocated in the function `time.After()` and it accumulated up to nearly 1GB. The memory was not released so it was clear that we had a memory leak. We will explain how memory leaks can occur using the `time.After()` function through three examples.

Valid usage of the time.After() function

select {
  case <-time.After(time.Second):
     // do something after 1 second.
}

Nothing is wrong with the above code because there is only one possibility when the `select` statement is finished. When it is done the timer which was created internally in the `time.After()` function was stopped and resources were freed.

Invalid usage of the time.After() function

It is very tempting to write the following code:

select {
  case <-time.After(time.Second):
     // do something after 1 second.
  case <-ctx.Done():
     // do something when context is finished.
     // resources created by the time.After() will not be garbage collected
  }

In the above `select` statement, if the `time.After()` function is finished everything works like in the first example. But if the `ctx.Done()` is finished earlier, then the timer which was created in the `time.After` function is not stopped and resources are not released – causing a memory leak (see the documentation here).

Improved usage of the time.After() function

In production code, one should use `time.After()` in the following way instead:

  delay := time.NewTimer(time.Second)

  select {
  case <-delay.C:
     // do something after one second.
  case <-ctx.Done():
     // do something when context is finished and stop the timer.
     if !delay.Stop() {
        // if the timer has been stopped then read from the channel.
        <-delay.C
     }    
  }

Here, one creates a new timer and when it is finished all resources created by the `time.NewTimer()` are released. In the other case when `ctx.Done()` occurs before, then resources are released using the `delay.Stop()` function. It may occur that the `ctx.Done()` finishes, and immediately afterwards the timer expires. So that is why there is an additional condition \ checking whether the timer has expired or stopped.

I hope that this finding is useful for others, it at least solved our problem immediately. Feel free to leave comments below or ping me on the ArangoDB Community Slack (@tomasz.arangodb)

how to

Related Blogs