I was trying one Dataquest exercise and I figured out that the variance I am getting is different for the two packages..

e.g for [1,2,3,4]

``````from statistics import variance
import numpy as np
print(np.var([1,2,3,4]))
print(variance([1,2,3,4]))
//1.25
//1.6666666666666667
``````

The expected answer of the exercise is calculated with np.var()

Edit
I guess it has to do that the later one is sample variance and not variance.. Anyone could explain the difference?

Use this

``````print(np.var([1,2,3,4],ddof=1))

1.66666666667
``````

Delta Degrees of Freedom: the divisor used in the calculation is `N - ddof`, where N represents the number of elements. By default, `ddof` is zero.

The mean is normally calculated as `x.sum() / N`, where `N = len(x)`. If, however, `ddof` is specified, the divisor `N - ddof` is used instead.

In standard statistical practice, `ddof=1` provides an unbiased estimator of the variance of a hypothetical infinite population. `ddof=0` provides a maximum likelihood estimate of the variance for normally distributed variables.

Statistical libraries like numpy use the variance n for what they call var or variance and the standard deviation

For more information refer this documentation : numpy doc

It is correct that dividing by N-1 gives an unbiased estimate for the mean, which can give the impression that dividing by N-1 is therefore slightly more accurate, albeit a little more complex. What is too often not stated is that dividing by N gives the minimum variance estimate for the mean, which is likely to be closer to the true mean than the unbiased estimate, as well as being somewhat simpler.