Softmax without Overflow

less than 1 minute read

Overflow problems are common in neural network-like structures.

The result is inveriant even if we add/subtract constant , because softmax function uses the sum of to normalize the result. We need to choose . In the example below, is used, but any number should be fine.

def softmax(x):
  exp_x = np.exp(x)
  return exp_x/np.sum(exp_x, axis=1, keepdims=True)

will become

def softmax(x):
    e = np.exp(x - np.max(x))
    if e.ndim == 1:
        return e / np.sum(e, axis=0)
    else: # dim = 2
        return e / np.sum(e, axis=1, keepdims=True)

You may need to use this e = np.exp(x - np.max(x, axis=1)[:, np.newaxis]).

Reference 1