Each Answer to this Q is separated by one/two green lines.
np.random.seed do in the below code from a Scikit-Learn tutorial? I’m not very familiar with NumPy’s random state generator stuff, so I’d really appreciate a layman’s terms explanation of this.
np.random.seed(0) indices = np.random.permutation(len(iris_X))
np.random.seed(0) makes the random numbers predictable
>>> numpy.random.seed(0) ; numpy.random.rand(4) array([ 0.55, 0.72, 0.6 , 0.54]) >>> numpy.random.seed(0) ; numpy.random.rand(4) array([ 0.55, 0.72, 0.6 , 0.54])
With the seed reset (every time), the same set of numbers will appear every time.
If the random seed is not reset, different numbers appear with every invocation:
>>> numpy.random.rand(4) array([ 0.42, 0.65, 0.44, 0.89]) >>> numpy.random.rand(4) array([ 0.96, 0.38, 0.79, 0.53])
(pseudo-)random numbers work by starting with a number (the seed), multiplying it by a large number, adding an offset, then taking modulo of that sum. The resulting number is then used as the seed to generate the next “random” number. When you set the seed (every time), it does the same thing every time, giving you the same numbers.
If you want seemingly random numbers, do not set the seed. If you have code that uses random numbers that you want to debug, however, it can be very helpful to set the seed before each run so that the code does the same thing every time you run it.
To get the most random numbers for each run, call
numpy.random.seed(). This will cause numpy to set the seed to a random number obtained from
/dev/urandom or its Windows analog or, if neither of those is available, it will use the clock.
For more information on using seeds to generate pseudo-random numbers, see wikipedia.
If you set the
np.random.seed(a_fixed_number) every time you call the numpy’s other random function, the result will be the same:
>>> import numpy as np >>> np.random.seed(0) >>> perm = np.random.permutation(10) >>> print perm [2 8 4 9 1 6 7 3 0 5] >>> np.random.seed(0) >>> print np.random.permutation(10) [2 8 4 9 1 6 7 3 0 5] >>> np.random.seed(0) >>> print np.random.permutation(10) [2 8 4 9 1 6 7 3 0 5] >>> np.random.seed(0) >>> print np.random.permutation(10) [2 8 4 9 1 6 7 3 0 5] >>> np.random.seed(0) >>> print np.random.rand(4) [0.5488135 0.71518937 0.60276338 0.54488318] >>> np.random.seed(0) >>> print np.random.rand(4) [0.5488135 0.71518937 0.60276338 0.54488318]
However, if you just call it once and use various random functions, the results will still be different:
>>> import numpy as np >>> np.random.seed(0) >>> perm = np.random.permutation(10) >>> print perm [2 8 4 9 1 6 7 3 0 5] >>> np.random.seed(0) >>> print np.random.permutation(10) [2 8 4 9 1 6 7 3 0 5] >>> print np.random.permutation(10) [3 5 1 2 9 8 0 6 7 4] >>> print np.random.permutation(10) [2 3 8 4 5 1 0 6 9 7] >>> print np.random.rand(4) [0.64817187 0.36824154 0.95715516 0.14035078] >>> print np.random.rand(4) [0.87008726 0.47360805 0.80091075 0.52047748]
As noted, numpy.random.seed(0) sets the random seed to 0, so the pseudo random numbers you get from random will start from the same point. This can be good for debuging in some cases. HOWEVER, after some reading, this seems to be the wrong way to go at it, if you have threads because it is not thread safe.
For numpy.random.seed(), the main difficulty is that it is not
thread-safe – that is, it’s not safe to use if you have many different
threads of execution, because it’s not guaranteed to work if two
different threads are executing the function at the same time. If
you’re not using threads, and if you can reasonably expect that you
won’t need to rewrite your program this way in the future,
numpy.random.seed() should be fine for testing purposes. If there’s
any reason to suspect that you may need threads in the future, it’s
much safer in the long run to do as suggested, and to make a local
instance of the numpy.random.Random class. As far as I can tell,
random.random.seed() is thread-safe (or at least, I haven’t found any
evidence to the contrary).
example of how to go about this:
from numpy.random import RandomState prng = RandomState() print prng.permutation(10) prng = RandomState() print prng.permutation(10) prng = RandomState(42) print prng.permutation(10) prng = RandomState(42) print prng.permutation(10)
[3 0 4 6 8 2 1 9 7 5]
[1 6 9 0 2 7 8 3 5 4]
[8 1 5 0 7 2 9 4 3 6]
[8 1 5 0 7 2 9 4 3 6]
Lastly, note that there might be cases where initializing to 0 (as opposed to a seed that has not all bits 0) may result to non-uniform distributions for some few first iterations because of the way xor works, but this depends on the algorithm, and is beyond my current worries and the scope of this question.
I have used this very often in neural networks. It is well known that when we start training a neural network we randomly initialise the weights. The model is trained on these weights on a particular dataset. After number of epochs you get trained set of weights.
Now suppose you want to again train from scratch or you want to pass the model to others to reproduce your results, the weights will be again initialised to a random numbers which mostly will be different from earlier ones. The obtained trained weights after same number of epochs ( keeping same data and other parameters ) as earlier one will differ. The problem is your model is no more reproducible that is every time you train your model from scratch it provides you different sets of weights. This is because the model is being initialized by different random numbers every time.
What if every time you start training from scratch the model is initialised to the same set of random initialise weights? In this case your model could become reproducible. This is achieved by numpy.random.seed(0). By mentioning seed() to a particular number, you are hanging on to same set of random numbers always.
I hope to give a really short answer:
seed make (the next series) random numbers predictable. You can think every time after you call
seed, it pre-defines series numbers and numpy random keeps the iterator of it, then every time you get a random number it just gonna call get next.
np.random.seed(2) np.random.randn(2) # array([-0.41675785, -0.05626683]) np.random.randn(1) # array([-1.24528809]) np.random.seed(2) np.random.randn(1) # array([-0.41675785]) np.random.randn(2) # array([-0.05626683, -1.24528809])
You can notice when I set the same seed, no matter how many random number you request from numpy each time, it always gives the same series of numbers, in this case which is
array([-0.41675785, -0.05626683, -1.24528809]).
Imagine you are showing someone how to code something with a bunch of “random” numbers. By using numpy seed they can use the same seed number and get the same set of “random” numbers.
So it’s not exactly random because an algorithm spits out the numbers but it looks like a randomly generated bunch.
All the answers above show the implementation of
np.random.seed() in code. I’ll try my best to explain briefly why it actually happens. Computers are machines that are designed based on predefined algorithms. Any output from a computer is the result of the algorithm implemented on the input. So when we request a computer to generate random numbers, sure they are random but the computer did not just come up with them randomly!
So when we write
np.random.seed(any_number_here) the algorithm will output a particular set of numbers that is unique to the argument
any_number_here. It’s almost like a particular set of random numbers can be obtained if we pass the correct argument. But this will require us to know about how the algorithm works which is quite tedious.
So, for example if I write
np.random.seed(10) the particular set of numbers that I obtain will remain the same even if I execute the same line after 10 years unless the algorithm changes.
A random seed specifies the start point when a computer generates a random number sequence.
For example, let’s say you wanted to generate a random number in Excel (Note: Excel sets a limit of 9999 for the seed). If you enter a number into the Random Seed box during the process, you’ll be able to use the same set of random numbers again. If you typed “77” into the box, and typed “77” the next time you run the random number generator, Excel will display that same set of random numbers. If you type “99”, you’ll get an entirely different set of numbers. But if you revert back to a seed of 77, then you’ll get the same set of random numbers you started with.
For example, “take a number x, add 900 +x, then subtract 52.” In order for the process to start, you have to specify a starting number, x (the seed). Let’s take the starting number 77:
Add 900 + 77 = 977
Subtract 52 = 925
Following the same algorithm, the second “random” number would be:
900 + 925 = 1825
Subtract 52 = 1773
This simple example follows a pattern, but the algorithms behind computer number generation are much more complicated
numpy.random.seed(0) numpy.random.randint(10, size=5)
This produces the following output:
array([5, 0, 3, 3, 7])
Again,if we run the same code we will get the same result.
Now if we change the seed value 0 to 1 or others:
numpy.random.seed(1) numpy.random.randint(10, size=5)
This produces the following output:
array([5 8 9 5 0]) but now the output not the same like above.
All the random numbers generated after setting particular seed value are same across all the platforms/systems.
There is a nice explanation in Numpy docs:
it refers to Mersenne Twister pseudo-random number generator. More details on the algorithm here: https://en.wikipedia.org/wiki/Mersenne_Twister