Is random.expovariate equivalent to a Poisson Process - python

I read somewhere that the python library function random.expovariate produces intervals equivalent to Poisson Process events.
Is that really the case or should I impose some other function on the results?

On a strict reading of your question, yes, that is what random.expovariate does.
expovariate gives you random floating point numbers, exponentially distributed. In a Poisson process the size of the interval between consecutive events is exponential.
However, there are two other ways I could imagine modelling poisson processes
Just generate random numbers, uniformly distributed and sort them.
Generate integers which have a Poisson distribution (i.e. they are distributed like the number of events within a fixed interval in a Poisson process). Use numpy.random.poisson to do this.
Of course all three things are quite different. The right choice depends on your application.

https://stackoverflow.com/a/10250877/1587329 gives a nice explanation of why this works (not only in python), and some code. In short
simulate the first 10 events in a poisson process with an averate rate
of 15 arrivals per second like this:
import random
for i in range(1,10):
print random.expovariate(15)

Related

CMOS XOR propagation delay in Python

I have been struggling with something relatively simple but haven't yet figure out a good way to solve it.
I need to simulate, at a very high level, XOR gates. I have two streams of 0/1 and want to do piece-wise XOR and that's the easy bit. Now I wanted to add a limitation of real life CMOS XOR gates, simply the propagation delay.
This means that if the input change so quickly that the XOR would have to transition faster than a certain delay, the XOR output would not transition, therefore missing some of the transitions at the output.
Googling a bit, I think I found a MATLAB tool that does that (https://www.mathworks.com/help/sps/ref/cmosxor.html) and I would like something similar to put into my python codes.
Any help?
Thanks a lot!
For high level simulation, you could apply a time wheel. The time wheel has a fixed number of slots corresponding to multiples of a basic time unit (fractions of a nanosecond). Attached to each slot is a list of events scheduled for this time.
An event is a transition of an input or output line. The simulation algorithm works around the wheel and calculates subsequent events. These are stored in their respective slot lists. Time wraps around at the end of the wheel horizon. Events are removed from their lists after processing.
Example:
Input A of an XOR gate goes from 0 to 1 at time 8. This causes the output F of the gate to toggle its polarity 2 time slots later.
It is possible to use different delays depending on the direction of the transition (0->1 or 1->0). Typically, the delay also depends on the number of inputs driven by the gate output. The granularity or time accuracy of the simulation is determined by the number of slots in the wheel. The more slots the smaller is the timestep per slot. It is essential that the wheel horizon is big enough to prevent double wraps.
The following depicts the difference between discrete and analog simulation:
If you require the accuracy of analog simulation, you can either resort to a fully analog simulator, or you could assume exponential transitions and calculate the time when certain threshold are reached.

Difference between Numpy's random module and Python? [duplicate]

I have a big script in Python. I inspired myself in other people's code so I ended up using the numpy.random module for some things (for example for creating an array of random numbers taken from a binomial distribution) and in other places I use the module random.random.
Can someone please tell me the major differences between the two?
Looking at the doc webpage for each of the two it seems to me that numpy.random just has more methods, but I am unclear about how the generation of the random numbers is different.
The reason why I am asking is because I need to seed my main program for debugging purposes. But it doesn't work unless I use the same random number generator in all the modules that I am importing, is this correct?
Also, I read here, in another post, a discussion about NOT using numpy.random.seed(), but I didn't really understand why this was such a bad idea. I would really appreciate if someone explain me why this is the case.
You have made many correct observations already!
Unless you'd like to seed both of the random generators, it's probably simpler in the long run to choose one generator or the other. But if you do need to use both, then yes, you'll also need to seed them both, because they generate random numbers independently of each other.
For numpy.random.seed(), the main difficulty is that it is not thread-safe - that is, it's not safe to use if you have many different threads of execution, because it's not guaranteed to work if two different threads are executing the function at the same time. If you're not using threads, and if you can reasonably expect that you won't need to rewrite your program this way in the future, numpy.random.seed() should be fine. If there's any reason to suspect that you may need threads in the future, it's much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven't found any evidence to the contrary).
The numpy.random library contains a few extra probability distributions commonly used in scientific research, as well as a couple of convenience functions for generating arrays of random data. The random.random library is a little more lightweight, and should be fine if you're not doing scientific research or other kinds of work in statistics.
Otherwise, they both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next. For this reason, neither numpy.random nor random.random is suitable for any serious cryptographic uses. But because the sequence is so very very long, both are fine for generating random numbers in cases where you aren't worried about people trying to reverse-engineer your data. This is also the reason for the necessity to seed the random value - if you start in the same place each time, you'll always get the same sequence of random numbers!
As a side note, if you do need cryptographic level randomness, you should use the secrets module, or something like Crypto.Random if you're using a Python version earlier than Python 3.6.
From Python for Data Analysis, the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.
By contrast, Python's built-in random module only samples one value at a time, while numpy.random can generate very large sample faster. Using IPython magic function %timeit one can see which module performs faster:
In [1]: from random import normalvariate
In [2]: N = 1000000
In [3]: %timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
1 loop, best of 3: 963 ms per loop
In [4]: %timeit np.random.normal(size=N)
10 loops, best of 3: 38.5 ms per loop
The source of the seed and the distribution profile used are going to affect the outputs - if you are looking for cryptgraphic randomness, seeding from os.urandom() will get nearly real random bytes from device chatter (ie ethernet or disk) (ie /dev/random on BSD)
this will avoid you giving a seed and so generating determinisitic random numbers. However the random calls then allow you to fit the numbers to a distribution (what I call scientific random ness - eventually all you want is a bell curve distribution of random numbers, numpy is best at delviering this.
SO yes, stick with one generator, but decide what random you want - random, but defitniely from a distrubtuion curve, or as random as you can get without a quantum device.
It surprised me the randint(a, b) method exists in both numpy.random and random, but they have different behaviors for the upper bound.
random.randint(a, b) returns a random integer N such that a <= N <= b. Alias for randrange(a, b+1). It has b inclusive. random documentation
However if you call numpy.random.randint(a, b), it will return low(inclusive) to high(exclusive). Numpy documentation

Monte Carlo Simulation with chaning distribution

I am experimenting with Monte Carlo Simulations and I have come up with this interesting problem. Suppose we are generating random values using a Normal distribution with St.Dev = 2 and mean = the last value generated (Markov process), we start at the value 5, but every time we generate a value greater than 9 we start generating random values using a second Normal distribution with St.Dev = 3. If we generate a value greater than 15 or less than 0 we start from 5 again. We want to find the expected value of this random process. Now one way would be to just generate a very large amount of samples, but seeing as this would be impractical if we decide to work with a more complicated process, my question is: What is the smart way to estimate the expected value (also probability distribution and other standard characteristics of this random process).
I have looked into the variations of Monte Carlo like Makrkov Chain Monte Carlo (MCMC). Yet I cannot seem to think of a good approach to solving this problem.
Any advice or sources would be helpful :)
PS I am working in Python, but any reference would be helpful, be it a code implementation in some other language, a theoretical explanation or even just the right term to search for on the Internet.

Sampling from a huge uniform distribution in Python

I need to select 3.7*10^8 unique values from the range [0, 3*10^9] and either obtain them in order or keep them in memory.
To do this, I started working on a simple algorithm where I sample smaller uniform distributions (that fit in memory) in order to indirectly sample the large distribution that really interests me.
The code is available at the following gist https://gist.github.com/legaultmarc/7290ac4bef4edb591d1e
Since I'm having trouble implementing something more robust, I was wondering if you had other ideas to sample unique values from a large discrete uniform. I'm looking for either an algorithm, a module or an idea on how to manage very large lists directly (perhaps using the hard drive instead of memory).
There is an interesting post, Generating sorted random ints without the sort? O(n) which suggests that instead of generating uniform random ints, you can do a running-sum on exponential random deltas, which gives you a uniform random result generated in sorted order.
It's not guaranteed to give exactly the number of samples you want, but should be pretty close, and much faster / lower memory requirements.
Edit: I found a second post, generating sorted random numbers without exponentiation involved? which suggests tweaking the distribution density as you go to generate an exact number of samples, but I am leery of just exactly what this would do to your "uniform" distribution.
Edit2: Another possibility that occurs to me would be to use an inverse cumulative binomial distribution to iteratively split your sample range (predict how many uniformly generated random samples would fall in the lower half of the range, then the remainder must be in the upper half) until the block-size reaches something you can easily hold in memory.
This is a standard sample with out replacement. You can't divide the range [0, 3*10^9] into equally binned ranges and sample same amount in each bin.
Also, 3 billion is relative large, many "ready to use" codes only handle 32 bit integers, roughly 2 billion(+-). Please take a close look at their implementations.

What is pseudo random?

I was reading the docs for the random module and noticed it said pseudo random and thought doesnt pseudo mean False so i was wondering what it means when it says that
For Example:
import random
print random.randint(1,2)
print random.randint(1,3)
does this still mean that the first print statement has a 50% chance of printing 1 and a 50% chance of printing 2
and that the second print statement has a 33% chance of printing one and a 33% chance of printing 2 etc.
if not then how are the pseudo random numbers generated ?
To produce true randomness requires specialized hardware that measures random events, such as radioactive decay (random) or brownian motion (also essentially random). Most computers obviously don't have these, so instead you have to use a really complex, evenly distributed, hard to predict 'pseudorandom' algorithm that starts with a number determined by, for example, the current timestamp. Such algorithms are plenty good enough for standard use cases needing 'randomness' as long as you're careful to not seed two random number generators with the same timestamp (start them at the same time on different threads, for example), which will make them do identical things. A common example of such a random number generator is Mersenne Twister: http://en.wikipedia.org/wiki/Mersenne_twister
A site that offers truly random values, explains a lot about randomness and pseudorandomness and has some yummy statistics about its randomness: http://www.random.org/ (see Learn More and Statistics) (It actually seems that it relies on measuring tiny fluctuations in a chaotic system, e.g. atmospheric noise, but the statistics show that it is so much like true randomness you can't tell it apart!)

Categories

Resources