Can anyone help me out in fitting a gamma distribution in python? Well, I’ve got some data : X and Y coordinates, and I want to find the gamma parameters that fit this distribution… In the Scipy doc, it turns out that a fit method actually exists but I don’t know how to use it :s.. First, in which format the argument “data” must be, and how can I provide the second argument (the parameters) since that’s what I’m looking for?

Generate some gamma data:

import scipy.stats as stats    
alpha = 5
loc = 100.5
beta = 22
data = stats.gamma.rvs(alpha, loc=loc, scale=beta, size=10000)    
# [ 202.36035683  297.23906376  249.53831795 ...,  271.85204096  180.75026301
#   364.60240242]

Here we fit the data to the gamma distribution:

fit_alpha, fit_loc,
print(fit_alpha, fit_loc, fit_beta)
# (5.0833692504230008, 100.08697963283467, 21.739518937816108)

print(alpha, loc, beta)
# (5, 100.5, 22)

I was unsatisfied with the ss.gamma.rvs-function as it can generate negative numbers, something the gamma-distribution is supposed not to have. So I fitted the sample through expected value = mean(data) and variance = var(data) (see wikipedia for details) and wrote a function that can yield random samples of a gamma distribution without scipy (which I found hard to install properly, on a sidenote):

import random
import numpy

data = [6176, 11046, 670, 6146, 7945, 6864, 767, 7623, 7212, 9040, 3213, 6302, 10044, 10195, 9386, 7230, 4602, 6282, 8619, 7903, 6318, 13294, 6990, 5515, 9157]

# Fit gamma distribution through mean and average
mean_of_distribution = numpy.mean(data)
variance_of_distribution = numpy.var(data)

def gamma_random_sample(mean, variance, size):
    """Yields a list of random numbers following a gamma distribution defined by mean and variance"""
    g_alpha = mean*mean/variance
    g_beta = mean/variance
    for i in range(size):
        yield random.gammavariate(g_alpha,1/g_beta)

# force integer values to get integer sample
grs = [int(i) for i in gamma_random_sample(mean_of_distribution,variance_of_distribution,len(data))]

print("Original data: ", sorted(data))
print("Random sample: ", sorted(grs))

# Original data: [670, 767, 3213, 4602, 5515, 6146, 6176, 6282, 6302, 6318, 6864, 6990, 7212, 7230, 7623, 7903, 7945, 8619, 9040, 9157, 9386, 10044, 10195, 11046, 13294]
# Random sample:  [1646, 2237, 3178, 3227, 3649, 4049, 4171, 5071, 5118, 5139, 5456, 6139, 6468, 6726, 6944, 7050, 7135, 7588, 7597, 7971, 10269, 10563, 12283, 12339, 13066]

If you want a long example including a discussion about estimating or fixing the support of the distribution, then you can find it in and the linked mailing list message.

Preliminary support to fix parameters, such as location, during fit has been added to the trunk version of scipy.

OpenTURNS has a simple way to do this with the GammaFactory class.

First, let’s generate a sample:

import openturns as ot
gammaDistribution = ot.Gamma()
sample = gammaDistribution.getSample(100)

Then fit a Gamma to it:

distribution = ot.GammaFactory().build(sample)

Then we can draw the PDF of the Gamma:

import openturns.viewer as otv

which produces:

A gamma distribution

More details on this topic at:

1): the “data” variable could be in the format of a python list or tuple, or a numpy.ndarray, which could be obtained by using:


where the 2nd data in the above line should be a list or a tuple, containing your data.

2: the “parameter” variable is a first guess you could optionally provide to the fitting function as a starting point for the fitting process, so it could be omitted.

3: a note on @mondano’s answer. The usage of moments (mean and variances) to work out the gamma parameters are reasonably good for large shape parameters (alpha>10), but could yield poor results for small values of alpha (See Statistical methods in the atmospheric scineces by Wilks, and THOM, H. C. S., 1958: A note on the gamma distribution. Mon. Wea. Rev., 86, 117–122.

Using Maximum Likelihood Estimators, as that implemented in the scipy module, is regarded a better choice in such cases.