Each Answer to this Q is separated by one/two green lines.
random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example
random.gauss will sample random point from a normal distribution with a given mean and sigma values.
I’m looking for a way to extract a number
N of random samples between a given interval using my own distribution as fast as possible in
python. This is what I mean:
def my_dist(x): # Some distribution, assume c1,c2,c3 and c4 are known. f = c1*exp(-((x-c2)**c3)/c4) return f # Draw N random samples from my distribution between given limits a,b. N = 1000 N_rand_samples = ran_func_sample(my_dist, a, b, N)
ran_func_sample is what I’m after and
a, b are the limits from which to draw the samples. Is there anything of that sort in
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
Also, there is a good question on StackOverflow related to the topic:
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf’s. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object): """ draws samples from a one dimensional probability distribution, by means of inversion of a discrete inverstion of a cumulative density function the pdf can be sorted first to prevent numerical error in the cumulative sum this is set as default; for big density functions with high contrast, it is absolutely necessary, and for small density functions, the overhead is minimal a call to this distibution object returns indices into density array """ def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x): self.shape = pdf.shape self.pdf = pdf.ravel() self.sort = sort self.interpolation = interpolation self.transform = transform #a pdf can not be negative assert(np.all(pdf>=0)) #sort the pdf by magnitude if self.sort: self.sortindex = np.argsort(self.pdf, axis=None) self.pdf = self.pdf[self.sortindex] #construct the cumulative distribution function self.cdf = np.cumsum(self.pdf) @property def ndim(self): return len(self.shape) @property def sum(self): """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized""" return self.cdf[-1] def __call__(self, N): """draw """ #pick numbers which are uniformly random over the cumulative distribution function choice = np.random.uniform(high = self.sum, size = N) #find the indices corresponding to this point on the CDF index = np.searchsorted(self.cdf, choice) #if necessary, map the indices back to their original ordering if self.sort: index = self.sortindex[index] #map back to multi-dimensional indexing index = np.unravel_index(index, self.shape) index = np.vstack(index) #is this a discrete or piecewise continuous distribution? if self.interpolation: index = index + np.random.uniform(size=index.shape) return self.transform(index) if __name__=='__main__': shape = 3,3 pdf = np.ones(shape) pdf=0 dist = Distribution(pdf, transform=lambda i:i-1.5) print dist(10) import matplotlib.pyplot as pp pp.scatter(*dist(1000)) pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512) p = np.exp(-x**2) pdf = p[:,None]*p[None,:] #2d gaussian dist = Distribution(pdf, transform=lambda i:i-256) print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range import matplotlib.pyplot as pp pp.scatter(*dist(1000)) pp.show()
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000): burnin_size = 10000 size += burnin_size x0 = np.array([[0, 0]]) xt = x0 samples =  for i in range(size): xt_candidate = np.array([np.random.multivariate_normal(xt, np.eye(2))]) accept_prob = (target_density(xt_candidate))/(target_density(xt)) if np.random.uniform(0, 1) < accept_prob: xt = xt_candidate samples.append(xt) samples = np.array(samples[burnin_size:]) samples = np.reshape(samples, [samples.shape, 2]) return samples
This function requires a function
target_density which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
import numpy as np import scipy.interpolate as interpolate def inverse_transform_sampling(data, n_bins, n_samples): hist, bin_edges = np.histogram(data, bins=n_bins, density=True) cum_values = np.zeros(bin_edges.shape) cum_values[1:] = np.cumsum(hist*np.diff(bin_edges)) inv_cdf = interpolate.interp1d(cum_values, bin_edges) r = np.random.rand(n_samples) return inv_cdf(r)
So if we give our data sample that has a specific distribution, the
inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the
Here is a rather nice way of performing inverse transform sampling with a decorator.
import numpy as np from scipy.interpolate import interp1d def inverse_sample_decorator(dist): def wrapper(pnts, x_min=-100, x_max=100, n=1e5, **kwargs): x = np.linspace(x_min, x_max, int(n)) cumulative = np.cumsum(dist(x, **kwargs)) cumulative -= cumulative.min() f = interp1d(cumulative/cumulative.max(), x) return f(np.random.random(pnts)) return wrapper
Using this decorator on a Gaussian distribution, for example:
@inverse_sample_decorator def gauss(x, amp=1.0, mean=0.0, std=0.2): return amp*np.exp(-(x-mean)**2/std**2/2.0)
You can then generate sample points from the distribution by calling the function. The keyword arguments
x_max are the limits of the original distribution and can be passed as arguments to
gauss along with the other key word arguments that parameterise the distribution.
samples = gauss(5000, mean=20, std=0.8, x_min=19, x_max=21)
Alternatively, this can be done as a function that takes the distribution as an argument (as in your original question),
def inverse_sample_function(dist, pnts, x_min=-100, x_max=100, n=1e5, **kwargs): x = np.linspace(x_min, x_max, int(n)) cumulative = np.cumsum(dist(x, **kwargs)) cumulative -= cumulative.min() f = interp1d(cumulative/cumulative.max(), x) return f(np.random.random(pnts))