Replicate a T-SQL random function in Python

Replicate a T-SQL random function in Python - python

I have a T-SQL code and I want to run a few simulations in Python. There is a code that includes random functions and I am not sure how I can replicate it.
When I have RAND() in SQL, I just use this in Python:
import random as random
print random.random()
But, I have also this code: RAND(CHECKSUM(NEWID()))
I guess, it is used for some kind of seed in the RAND function. But, how I can replicate the same thing in Python to have as much closer results I could?

in Python you need to first call
random.seed() in you program (once only)
then
random.random()
each time you want a random number
to give a pseudo random number from 0 to 1
yes RAND(CHECKSUM(NEWID())) appears to be a good trick to get a random value to use as a seed - it could be argued that a NEWID has a greater amount of randomness than using time as a seed. It depends on your application, time is considered insufficient as a seed for cryptography without adding other entropy - the python randomize uses time as a seed

Related

Matching random seed in R for code coming from Python

I am given some python codes which depends on random numbers. This python code use a random_seed = 300.
Now I am trying to replicate this Python code in R. To make sure that the replication is perfect, I need to compare the end results between R and Python. Given that, the code depends on the random numbers, is there any way to know the equivalent random seed to be used in R?
I had a look into Creating same random number sequence in Python, NumPy and R, but it appears to be opposite way implementation i.e. from R to Python.
There is also a R library called reticulate where I could run python code in R, but could not figure out if I could fetch the R-equivalent random seed using this library
Any pointer will be very helpful.
Many thanks,

python random number why isn't it same on all computers

One friend that I don't have any communication with anymore once told me the following:
Using that library(python's random) you can select a seed If you give it at seed it
means the randomly generated number will always be the same No matter
what computer you run it on
So I tried to test this, because this is what I need so it's the same on all computers and everytime someone calls this(this is important, as I am working on a blockchain NFT and trust is important here)
So I found this: https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/
on that link, there's example:
from random import seed
from random import random
# seed random number generator
seed(1)
# generate some random numbers
print(random(), random(), random())
# reset the seed
seed(1)
# generate some random numbers
print(random(), random(), random())
running above in playground of python, I get
(0.417022004703, 0.720324493442, 0.000114374817345) (0.417022004703,
0.720324493442, 0.000114374817345)
But as you can see, on that website, the creator of that post got the following:
0.13436424411240122 0.8474337369372327 0.763774618976614
0.13436424411240122 0.8474337369372327 0.763774618976614
Why aren't they same on all computers then ? I am using the same seed. and how can I ensure that they will be the same ?

The ANSWER is that, even though you have tagged Python-3.x, you are actually using Python 2, and the random number algorithm changed between 2 and 3.
I can tell that because your print statement printed the values as a tuple, with parentheses. That wouldn't happen with Python 3's print function.
It's odd to rely on a particular implementation of a random number algorithm for financial purposes. If you really need reproducibility, then you should embed your own algorithm. There are several RNGs that are not difficult to code. But if the algorithm needs to be predictable, why not just use incrementing numbers? If you don't need randomness, then don't use randomness.

I get the post author's values both in Python 2.3.0 (over 18 years old) and in Python 3.10 (the newest, just a month old).
I get your values if I use numpy.random instead of Python's random.
So I suspect you're not telling the truth about your code or that that "playground of python" you're using is weird.

Difference between Numpy's random module and Python? [duplicate]

I have a big script in Python. I inspired myself in other people's code so I ended up using the numpy.random module for some things (for example for creating an array of random numbers taken from a binomial distribution) and in other places I use the module random.random.
Can someone please tell me the major differences between the two?
Looking at the doc webpage for each of the two it seems to me that numpy.random just has more methods, but I am unclear about how the generation of the random numbers is different.
The reason why I am asking is because I need to seed my main program for debugging purposes. But it doesn't work unless I use the same random number generator in all the modules that I am importing, is this correct?
Also, I read here, in another post, a discussion about NOT using numpy.random.seed(), but I didn't really understand why this was such a bad idea. I would really appreciate if someone explain me why this is the case.

You have made many correct observations already!
Unless you'd like to seed both of the random generators, it's probably simpler in the long run to choose one generator or the other. But if you do need to use both, then yes, you'll also need to seed them both, because they generate random numbers independently of each other.
For numpy.random.seed(), the main difficulty is that it is not thread-safe - that is, it's not safe to use if you have many different threads of execution, because it's not guaranteed to work if two different threads are executing the function at the same time. If you're not using threads, and if you can reasonably expect that you won't need to rewrite your program this way in the future, numpy.random.seed() should be fine. If there's any reason to suspect that you may need threads in the future, it's much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven't found any evidence to the contrary).
The numpy.random library contains a few extra probability distributions commonly used in scientific research, as well as a couple of convenience functions for generating arrays of random data. The random.random library is a little more lightweight, and should be fine if you're not doing scientific research or other kinds of work in statistics.
Otherwise, they both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next. For this reason, neither numpy.random nor random.random is suitable for any serious cryptographic uses. But because the sequence is so very very long, both are fine for generating random numbers in cases where you aren't worried about people trying to reverse-engineer your data. This is also the reason for the necessity to seed the random value - if you start in the same place each time, you'll always get the same sequence of random numbers!
As a side note, if you do need cryptographic level randomness, you should use the secrets module, or something like Crypto.Random if you're using a Python version earlier than Python 3.6.

From Python for Data Analysis, the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.
By contrast, Python's built-in random module only samples one value at a time, while numpy.random can generate very large sample faster. Using IPython magic function %timeit one can see which module performs faster:
In [1]: from random import normalvariate
In [2]: N = 1000000
In [3]: %timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
1 loop, best of 3: 963 ms per loop
In [4]: %timeit np.random.normal(size=N)
10 loops, best of 3: 38.5 ms per loop

The source of the seed and the distribution profile used are going to affect the outputs - if you are looking for cryptgraphic randomness, seeding from os.urandom() will get nearly real random bytes from device chatter (ie ethernet or disk) (ie /dev/random on BSD)
this will avoid you giving a seed and so generating determinisitic random numbers. However the random calls then allow you to fit the numbers to a distribution (what I call scientific random ness - eventually all you want is a bell curve distribution of random numbers, numpy is best at delviering this.
SO yes, stick with one generator, but decide what random you want - random, but defitniely from a distrubtuion curve, or as random as you can get without a quantum device.

It surprised me the randint(a, b) method exists in both numpy.random and random, but they have different behaviors for the upper bound.
random.randint(a, b) returns a random integer N such that a <= N <= b. Alias for randrange(a, b+1). It has b inclusive. random documentation
However if you call numpy.random.randint(a, b), it will return low(inclusive) to high(exclusive). Numpy documentation

Should I seed the random number generator?

From the docs:
random.seed(a=None, version=2) Initialize the random number generator.
If a is omitted or None, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).
But...if it's truly random...(and I thought I read it uses Mersenne, so it's VERY random)...what's the point in seeding it? Either way the outcome is unpredictable...right?

The default is probably best if you want different random numbers with each run. If for some reason you need repeatable random numbers, in testing for instance, use a seed.

The module actually seeds the generator (with OS-provided random data from urandom if possible, otherwise with the current date and time) when you import the module, so there's no need to manually call seed().
(This is mentioned in the Python 2.7 documentation but, for some reason, not the 3.x documentation. However, I confirmed in the 3.x source that it's still done.)
If the automatic seeding weren't done, you'd get the same sequence of numbers every time you started your program, same as if you manually use the same seed every time.

But...if it's truly random
No, it's pseudo random. If it uses Mersenne Twister, that too is a PRNG.
It's basically an algorithm that generates the exact same sequence of pseudo random numbers out of a given seed. Generating truly random numbers requires special hardware support, it's not something you can do by a pure algorithm.
You might not need to seed it since it seeds itself on first use, unless you have some other or better means of providing a seed than what is time based.
If you use the random numbers for things that are not security related, a time based seed is normally fine. If you use if for security/cryptography, note what the docs say: "and is completely unsuitable for cryptographic purposes"

If you want to reproduce your results, you seed the generator with a known value so you get the same sequence every time.

A Mersenne twister, the random number generator, used by Python is seeded by the operating system served random numbers by default on those platforms where it is possible (Unixen, Windows); however on other platforms the seed defaults to the system time which means very repeatable values if the system time has a bad precision. On such systems seeding with known better random values is thus beneficial. Note that on Python 3 specifically, if version 2 is used, you can pass in any str, bytes, or bytearray to seed the generator; thus taking use of the Mersenne twister's large state better.
Another reason to use a seed value is indeed to guarantee that you get the same sequence of random numbers again and again - by reusing the known seed. Quoting the docs:
Sometimes it is useful to be able to reproduce the sequences given by
a pseudo random number generator. By re-using a seed value, the same
sequence should be reproducible from run to run as long as multiple
threads are not running.
Most of the random module’s algorithms and seeding functions are
subject to change across Python versions, but two aspects are
guaranteed not to change:
If a new seeding method is added, then a backward compatible seeder will be offered.
The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed.
For this however, you mostly want to use the random.Random instances instead of using module global methods (the multiple threads issue, etc).
Finally also note that the random numbers produced by Mersenne twister are unsuitable for cryptographical use; whereas they appear very random, it is possible to fully recover the internal state of the random generator by observing only some hundreds of values from the generator. For cryptographical algorithms, you want to use the SystemRandom class.

In most cases I would say there is no need to care about. But if someone is really willing to do something wired and (s)he could roughly figure out your system time when your code was running, they might be able to brute force replay your random numbers and see which series fits. But I would say this is quite unlikely in most cases.

Compute random number over certain time interval with Python

I did some research before posting but seem to be at a lost (not too experienced in coding).
I am attempting to generate or compute a random number for certain time interval with Python. I'm not looking for full code, I want help using the time library if that is the correct one to use.
Pseudo-code:
Allow python [PC] to compute a random number for 3 seconds
------> Store the computed generation in a value (i can handle this)
I would then use the random generated value to link access a python list (which would be automatically generated via a random number generation as well but i can figure that out).

I'm not sure why you want to do this, but here's how to compute many random numbers, throwing most of them away, and then using the last one after 3 seconds have elapsed.
import random
import time
start = time.clock()
while time.clock() - start < 3:
random_number = random.randint(0,100)
print random_number
This pointlessly throws away about 2 million perfectly good random numbers on my machine.
(And, as abarnert points out, this also maxes out one CPU core for the whole 3 seconds in a busy loop, which is very, very wasteful, but I thinks it's what you were asking for?)
EDIT: Updated to use time.clock instead of time.time, as suggested by abarnert again (thanks), because this seems to give better resolution across platforms and doesn't suffer from problems when the system time is altered in the middle of the program running.

First, you didn't say what kind of random number you want to generate, but given that your example is 10, I assume it's an integer in some range—let's say you're calling random.randrange(30).
Now, you want to compute a number every second for 3 seconds, then keep the last one. I don't know why you'd even want to do this, but you can do it like this:
for i in range(3):
number = random.randrange(30)
time.sleep(1.0)
At the end of 3 seconds, number will be the third random number generated.
The key here is that, to do something once per second (in a synchronous program—don't do this in a GUI or server!)—you just call time.sleep.
If the operation you were doing took a significant chunk of a second (or longer), this wouldn't be appropriate. Instead, you'd want to compute the start time, and sleep until a second after that:
t0 = time.monotonic()
for i in range(3):
number = random.randrange(30)
t0 += 1
time.sleep(t0 - time.monotonic())
Note that I've used time.monotonic here. This function is specifically designed for this kind of use case. It returns as much precision as can be gotten with reasonable efficiency (in particular, unlike time.time, it doesn't give you 1s precision on some platforms), and it guarantees that you'll never go backward even if, e.g., you change the system clock in the middle of the program. If you're using 3.2 or earlier, either look through the docs for the best alternative (possibly using time.clock()), or look into using ctypes to call the appropriate platform native function.
But in this case, random.randrange is going to take somewhere on the order of a microsecond, which is so much less time than the minimum resolution of most systems' simple timers that there's no reason to do such a thing.

If you want to take 3 seconds to get a random number, because you're concerned about the quality of the random number, you can use os.urandom() to generate the value. If all you really want to do is to select an item from your list at random, you can use random.choice()

Note: The function time.clock() has been removed, after having been deprecated since Python 3.3: use time.perf_counter() or time.process_time() instead, depending on your requirements, to have well-defined behavior. (Contributed by Matthias Bussonnier in bpo-36895.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.