Faster Log Gamma Calculation
We sometimes encounter the calculation of
\(\newcommand{\Ga}{\Gamma} \begin{align} \log \frac{\Ga (\alpha + n)}{\Ga (\alpha)}. \end{align}\)
If \(n\) is small, there is a faster calculation. We use the property of Gamma function, \(\Gamma(x+1) = x \Gamma (x)\).
\(\newcommand{\Ga}{\Gamma} \begin{align} &\quad \log \frac{\Ga (\alpha + n)}{\Ga (\alpha)} \\ &= \log \frac{(\alpha + n - 1) \Ga (\alpha + n - 1)}{\Ga (\alpha)} \\ &= \log \frac{(\alpha + n -1) (\alpha + n -2) \Ga (\alpha + n -2) }{\Ga (\alpha)} \\ &= \log \frac{(\alpha + n -1) (\alpha + n -2) \cdots (\alpha) \Ga (\alpha)}{\Ga (\alpha)} \\ &= \log (\alpha + n -1) (\alpha + n -2) \cdots (\alpha)\\ &= \log \prod_{m=1}^{n} (\alpha + n - m) \end{align}\)
Example code in Cython:
cdef extern from "math.h":
double log (double x)
double lgamma (double x)
cdef double gammaln_sum(np.ndarray[ftype_t, ndim=1] array1,
np.ndarray[ftype_t, ndim=1] array2):
# (special.gammaln( alpha_s + n_s_temp ) - special.gammaln( alpha_s )).sum()
cdef double value = 0.0
cdef int i
cdef int n = array1.shape[0] # length of array
cdef int m
for i in range(n):
if array2[i] < 18:
# low frequency words
for m in range(int(array2[i])):
value += log(array1[i] + m)
else:
value += lgamma(array1[i]) - lgamma(array2[i])
return value
In C++:
double CstmCpp::calc_loglik_doc_second_term(int &doc_id, int &word_id, SparseVector<int> &n_k_doc){
// compute_second_term_of_log_probability_document
double alpha_k = calc_alpha_word_given_doc(doc_id, word_id);
int n_k_w = n_k_doc.coeffRef(word_id); // get_word_count_in_doc()
if(n_k_w > 18){
return lgamma(alpha_k + n_k_w) - lgamma(alpha_k);
}else{
double temp = 0.0;
for(int i=0; i<n_k_w; i++){
temp += log(alpha_k + i);
}
return temp;
}
}