I was trying one Dataquest exercise and I figured out that the variance I am getting is different for the two packages..

e.g for [1,2,3,4]

from statistics import variance
import numpy as np

The expected answer of the exercise is calculated with np.var()

I guess it has to do that the later one is sample variance and not variance.. Anyone could explain the difference?

Use this



Delta Degrees of Freedom: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default, ddof is zero.

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead.

In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Statistical libraries like numpy use the variance n for what they call var or variance and the standard deviation

For more information refer this documentation : numpy doc

It is correct that dividing by N-1 gives an unbiased estimate for the mean, which can give the impression that dividing by N-1 is therefore slightly more accurate, albeit a little more complex. What is too often not stated is that dividing by N gives the minimum variance estimate for the mean, which is likely to be closer to the true mean than the unbiased estimate, as well as being somewhat simpler.