I need to run a MonteCarlo simulations in parallel on different machines. The code is in c++, but the program is set up and launched with a python script that set a lot of things, in particular the random seed. The function setseed thake a 4 bytes unsigned integer
Using a simple
import time
setseed(int(time.time()))
is not very good because I submit the jobs to a queue on a cluster, they remain pending for some minutes then they starts, but the start time is impredicible, it can be that two jobs start at the same time (seconds), so I switch to:
setseet(int(time.time()*100))
but I'm not happy. What is the best solution? Maybe I can combine information from: time, machine id, process id. Or maybe the best solution is to read from /dev/random (linux machines)?
How to read 4 bytes from /dev/random?
f = open("/dev/random","rb")
f.read(4)
give me a string, I want an integer!
Reading from /dev/random is a good idea. Just convert the 4 byte string into an Integer:
f = open("/dev/random","rb")
rnd_str = f.read(4)
Either using struct:
import struct
rand_int = struct.unpack('I', rnd_string)[0]
Update Uppercase I is needed.
Or multiply and add:
rand_int = 0
for c in rnd_str:
rand_int <<= 8
rand_int += ord(c)
You could simply copy over the four bytes into an integer, that should be the least of your worries.
But parallel pseudo-random number generation is a rather complex topic and very often not done well. Usually you generate seeds on one machine and distribute them to the others.
Take a look at SPRNG, which handles exactly your problem.
If this is Linux or a similar OS, you want /dev/urandom -- it always produces data immediately.
/dev/random may stall waiting for the system to gather randomness. It does produce cryptographic-grade random numbers, but that is overkill for your problem.
You can use a random number as the seed, which has the advantage of being operating-system agnostic (no /dev/random needed), with no conversion from string to int:
Why not simply use
random.randrange(-2**31, 2**31)
as the seed of each process? Slightly different starting times give wildly different seeds, this way…
You could also alternatively use the random.jumpahead method, if you know roughly how many random numbers each process is going to use (the documentation of random.WichmannHill.jumpahead is useful).
Related
This should be easy. After looking at what's going on, I'm not so sure. Is the writing/reading of a single binary integer atomic? That's how the underlying hardware does reads and writes of 32-bit integers. After some research, I realized Python does not store integers as a collection of bytes. It doesn't even store bytes as a collection of bytes. There's overhead involved. Does this overhead break the atomic nature of binary integers?
Here is some code I used trying to figure this out:
import time
import sys
tm=time.time()
int_tm = int(tm * 1000000)
bin_tm = bin(int_tm)
int_bin_tm = int(bin_tm, 2)
print('tm:', tm, ", Size:", sys.getsizeof(tm))
print('int_tm:', int_tm, ", Size:", sys.getsizeof(int_tm))
print('bin_tm:', bin_tm, ", Size:", sys.getsizeof(bin_tm))
print('int_bin_tm:', int_bin_tm, ", Size:", sys.getsizeof(int_bin_tm))
Output:
tm: 1581435513.076924 , Size: 24
int_tm: 1581435513076924 , Size: 32
bin_tm: 0b101100111100100111010100101111111011111110010111100 , Size: 102
int_bin_tm: 1581435513076924 , Size: 32
For a couple side questions, does Python's binary representation of integers really consume so much more memory? Am I using the wrong type for converting decimal integers to bytes?
Python doesn't guarantee any atomic operations other than specific mutex constructs like locks and semaphores. Some operations will seem to be atomic because the GIL will prevent bytecode from being run on multiple python threads at once "This lock is necessary mainly because CPython's memory management is not thread-safe".
Basically what this means is that python will ensure an entire bytecode instruction is completely evaluated before allowing another thread to continue. This does not mean however an entire line of code is guaranteed to complete without interruption. This is especially true with function calls. For a deeper look at this, take a look at the dis module.
I will also point out that this talk about atomicity means nothing at a hardware level, the whole idea of an interpreted language is to abstract out the hardware. If you want to consider "actual" hardware atomicity, it will generally be a function provided by the operating system (which is how python likely implements things like threading.Lock).
Note on data sizes (this is just a quickie, because this is a whole other question):
sizeof(tm): 8 bytes for the 64 bit float, 8 bytes for a pointer to the data type, and 8 bytes for a pointer to the reference count
sizeof(int_tm): ints are a bit more complicated as some small values are "cached" using a smaller format for efficiency, then large values use a more flexible type where the number of bytes used to store the int can expand to however big it needs to be.
sizeof(bin_tm): This is actually a string, which is why it takes so much more memory than just a number, there is a fairly significant overhead, plus at least one byte per character.
"Am I using the wrong type for converting..?" We need to know what you're trying to do with the result to answer this.
I'm computing the answer for some very big division questions and wonder why b=a/c (where a and c and both positive whole numbers) is so must faster to figure out than when you type the questions and ask that the answer be printed: b=a/c is way faster than b=a/c followed by print b.
Very slow:
from datetime import datetime - startTime = datetime.now()
a=2**1000000-3
b=a/13
print b
print(datetime.now()-startTime)
but without the print b it is very fast. I later typed in c=a%13 to see if anything was actually happening (I'm still pretty new to programming) and it is very fast when I type in print c (without the print b code).
As far as I understand, IO operations are slow and printing to the screen is like writing to a file and it is going to block the thread for some time.
As someone pointed out, conversion from number to string could be taking time too. Whenever I have to measure time of something. I measure the time of the calculation and print any kind of result after measuring time.
To make the program even faster, but memory hungry, you could save every result in a list and then compile one big string and print only once.
Repetitive call to print take more time than one big call to print.
from datetime import datetime
startTime = datetime.now()
a=2**1000000-3
b=a/13
elapsedTime = datetime.now() - startTime
print "Elapsed time %s\n Number: %s" % (elapsedTime, b)
You're trying to output pretty big number (about 10^300000) – it takes time to convert it to decimal format from binary (I guess need to do about 300000 divisions for this, internally numbers are stored in binary format). If you really need to output whole number in decimal format – I don't think you can speed it up a lot. But you can quickly print number in hex or binary format:
hex(b)
bin(b)
You can use Decimal type to store numbers in decimal format internally but calculations with this type could be much slower.
IO is slow as people have pointed out. When doing IO, you do not know how your process will get scheduled - it will probably be put on a waiting queue. I am not sure what you're timing but simple engineering logic will dictate that you will want to make the IO a small portion of the overall run time (i.e. insignificant). So I suggest you do 1*10^6 divisions (the more the better) and then do IO assuming you are testing just the mathematics. Note that you are involving how computers and OS work in your calculations.
I am using the following method to create a random code for users as part of a booking process:
User.objects.make_random_password()
When the users turn up at the venue, they will present the password.
Is it safe to assume that two people won't end up with the same code?
Thanks
No, it's not safe to assume that two people can't have the same code. Random doesn't mean unique. It may be unlikely and rare, depending on the length you specify and number of users you are dealing with. But you can't rely on its uniqueness.
It depends on now many users you have, and the password length you choose, and how you use User.objects.make_random_password() For the defaults, the chance is essentially zero, IMO;
This method is implemented using get_random_string(). From the django github repo:
def get_random_string(length=12,
allowed_chars='abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
"""
Returns a securely generated random string.
The default length of 12 with the a-z, A-Z, 0-9 character set returns
a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
"""
if not using_sysrandom:
# This is ugly, and a hack, but it makes things better than
# the alternative of predictability. This re-seeds the PRNG
# using a value that is hard for an attacker to predict, every
# time a random string is required. This may change the
# properties of the chosen random sequence slightly, but this
# is better than absolute predictability.
random.seed(
hashlib.sha256(
"%s%s%s" % (
random.getstate(),
time.time(),
settings.SECRET_KEY)
).digest())
return ''.join([random.choice(allowed_chars) for i in range(length)])
According to github, the current code uses a 12 character password from a string of 62 characters (lower- and uppercase letters and numbers) by default. This makes for 62**12 or 3226266762397899821056 (3.22e21) possible different passwords. This is much larger than the current world population (around 7e9).
The letters are picked from this list of characters by the random.choice() function. The question now becomes how likely it is that the repeated calling of random.choice() returns the same sequence twice?
As you can see from the implementation of get_random_string(), the code tries hard to avoid predictability. When not using the OS's pseudo-random value generator (which on Linux and *BSD gathers real randomness from e.g. the times at which ethernet packets or keypresses arrive), it re-seeds the random module's Mersenne Twister predictable PRNG at each call with a combination of the current random state, the current time and (presumably constant) secret key.
So for two identical passwords to be generated, both the state of the random generator (which is about 8 kiB in python) and the time at which they are generated (measured in seconds since the epoch, as per time.time()) have to be identical. If the system's time is well-maintained and you are running one instance of the password generation program, the chance of that happening is essentially zero. If you start two or more instances of this password generating program at exactly the same time and with exactly the same seed for the PRNG, and combine their output, one would expect some passwords to appear more than once.
I'm in the process of working on programming project that involves some pretty extensive Monte Carlo simulation in Python, and as such the generation of a tremendous number of random numbers. Very nearly all of them, if not all of them, will be able to be generated by Python's built in random module.
I'm something of a coding newbie, and unfamiliar with efficient and inefficient ways to do things. Is it faster to generate say, all the random numbers as a list, and then iterate through that list, or generate a new random number each time a function is called, which will be in a very large loop?
Or some other, undoubtedly more clever method?
Generate a random number each time. Since the inner workings of the loop only care about a single random number, generate and use it inside the loop.
Example:
# do this:
import random
for x in xrange(SOMEVERYLARGENUMBER):
n = random.randint(1,1000) # whatever your range of random numbers is
# Do stuff with n
# don't do this:
import random
# This list comprehension generates random numbers in a list
numbers = [random.randint(1,1000) for x in xrange(SOMEVERYLARGENUMBER)]
for n in numbers:
# Do stuff with n
Obviously, in practical terms it really doesn't matter, unless you're dealing with billions and billions of iterations, but why bother generating all those numbers if you're only going to be using one at a time?
import random
for x in (random.randint(0,80) for x in xrange(1000*1000)):
print x
The code between parentheses will only generate one item at a time, so it's memory safe.
Python builtin random module, e.g. random.random(), random.randint(), (some distributions also available, you probably want gaussian) does about 300K samples/s.
Since you are doing numerical computation, you probably use numpy anyway, that offers better performance if you cook random number one array at a time instead of one number at a time and wider choice of distributions. 60K/s * 1024 (array length), that's ~60M samples/s.
You can also read /dev/urandom on Linux and OSX. my hw/sw (osx laptop) manages ~10MB/s.
Surely there must be faster ways to generate random numbers en masse, e.g.:
from Crypto.Cipher import AES
from Crypto.Util import Counter
import secrets
aes = AES.new(secrets.token_bytes(16), AES.MODE_CTR, secrets.token_bytes(16), counter=Counter.new(128))
data = "0" * 2 ** 20
with open("filler.bin", "wb") as f:
while True:
f.write(aes.encrypt(data))
This generates 200MB/s on a single core of i5-4670K
Common ciphers like aes and blowfish manage 112MB/s and 70MB/s on my stack. Furthermore modern processors make aes even faster up to some 700MB/s see this link to test runs on few hardware combinations. (edit: link broken). You could use weaker ECB mode, provided you feed distinct inputs into it, and achieve up to 3GB/s.
Stream cipher are better suited for the task, e.g. RC4 tops out at 300MB/s on my hardware, you may get best results from most popular ciphers as more effort was spent optimising those both and software.
Code to generate 10M random numbers efficiently and faster:
import random
l=10000000
listrandom=[]
for i in range (l):
value=random.randint(0,l)
listrandom.append(value)
print listrandom
Time taken included the I/O time lagged in printing on screen:
real 0m27.116s
user 0m24.391s
sys 0m0.819s
Using Numpy -
import numpy as np
np.random.randint(low="put the range like 1 to 100, so put '1' in
low",high="100",size="1000000")
I know there have been some questions regarding file reading, binary data handling and integer conversion using struct before, so I come here to ask about a piece of code I have that I think is taking too much time to run. The file being read is a multichannel datasample recording (short integers), with intercalated intervals of data (hence the nested for statements). The code is as follows:
# channel_content is a dictionary, channel_content[channel]['nsamples'] is a string
for rec in xrange(number_of_intervals)):
for channel in channel_names:
channel_content[channel]['recording'].extend(
[struct.unpack( "h", f.read(2))[0]
for iteration in xrange(int(channel_content[channel]['nsamples']))])
With this code, I get 2.2 seconds per megabyte read with a dual-core with 2Mb RAM, and my files typically have 20+ Mb, which gives some very annoying delay (specially considering another benchmark shareware program I am trying to mirror loads the file WAY faster).
What I would like to know:
If there is some violation of "good practice": bad-arranged loops, repetitive operations that take longer than necessary, use of inefficient container types (dictionaries?), etc.
If this reading speed is normal, or normal to Python, and if reading speed
If creating a C++ compiled extension would be likely to improve performance, and if it would be a recommended approach.
(of course) If anyone suggests some modification to this code, preferrably based on previous experience with similar operations.
Thanks for reading
(I have already posted a few questions about this job of mine, I hope they are all conceptually unrelated, and I also hope not being too repetitive.)
Edit: channel_names is a list, so I made the correction suggested by #eumiro (remove typoed brackets)
Edit: I am currently going with Sebastian's suggestion of using array with fromfile() method, and will soon put the final code here. Besides, every contibution has been very useful to me, and I very gladly thank everyone who kindly answered.
Final Form after going with array.fromfile() once, and then alternately extending one array for each channel via slicing the big array:
fullsamples = array('h')
fullsamples.fromfile(f, os.path.getsize(f.filename)/fullsamples.itemsize - f.tell())
position = 0
for rec in xrange(int(self.header['nrecs'])):
for channel in self.channel_labels:
samples = int(self.channel_content[channel]['nsamples'])
self.channel_content[channel]['recording'].extend(
fullsamples[position:position+samples])
position += samples
The speed improvement was very impressive over reading the file a bit at a time, or using struct in any form.
You could use array to read your data:
import array
import os
fn = 'data.bin'
a = array.array('h')
a.fromfile(open(fn, 'rb'), os.path.getsize(fn) // a.itemsize)
It is 40x times faster than struct.unpack from #samplebias's answer.
If the files are only 20-30M, why not read the entire file, decode the nums in a single call to unpack and then distribute them among your channels by iterating over the array:
data = open('data.bin', 'rb').read()
values = struct.unpack('%dh' % len(data)/2, data)
del data
# iterate over channels, and assign from values using indices/slices
A quick test showed this resulted in a 10x speedup over struct.unpack('h', f.read(2)) on a 20M file.
A single array fromfile call is definitively fastest, but wont work if the dataseries is interleaved with other value types.
In such cases, another big speedincrease that can be combined with the previous struct answers, is that instead of calling the unpack function multiple times, precompile a struct.Struct object with the format for each chunk. From the docs:
Creating a Struct object once and calling its methods is more
efficient than calling the struct functions with the same format since
the format string only needs to be compiled once.
So for instance, if you wanted to unpack 1000 interleaved shorts and floats at a time, you could write:
chunksize = 1000
structobj = struct.Struct("hf" * chunksize)
while True:
chunkdata = structobj.unpack(fileobj.read(structobj.size))
(Note that the example is only partial and needs to account for changing the chunksize at the end of the file and breaking the while loop.)
extend() acepts iterables, that is to say instead of .extend([...]) , you can write .extend(...) . It is likely to speed up the program because extend() will process on a generator , no more on a built list
There is an incoherence in your code: you define first channel_content = {} , and after that you perform channel_content[channel]['recording'].extend(...) that needs the preliminary existence of a key channel and a subkey 'recording' with a list as a value to be able to extend to something
What is the nature of self.channel_content[channel]['nsamples'] so that it can be submitted to int() function ?
Where do number_of_intervals come from ? What is the nature of the intervals ?
In the rec in xrange(number_of_intervals)): loop , I don't see anymore rec . So it seems to me that you are repeating the same loop process for channel in channel_names: as many times as the number expressed by number_of_intervals . Are there number_of_intervals * int(self.channel_content[channel]['nsamples']) * 2 values to read in f ?
I read in the doc:
class struct.Struct(format)
Return a
new Struct object which writes and
reads binary data according to the
format string format. Creating a
Struct object once and calling its
methods is more efficient than calling
the struct functions with the same
format since the format string only
needs to be compiled once.
This expresses the same idea as samplebias.
If your aim is to create a dictionary, there is also the possibility to use dict() with a generator as argument
.
EDIT
I propose
channel_content = {}
for rec in xrange(number_of_intervals)):
for channel in channel_names:
N = int(self.channel_content[channel]['nsamples'])
upk = str(N)+"h", f.read(2*N)
channel_content[channel]['recording'].extend(struct.unpack(x) for i,x in enumerate(upk) if not i%2)
I don't know how to take account of the J.F. Sebastian's suggestion to use array
Not sure if it would be faster, but I would try to decode chunks of words instead of one word a time. For example, you could read 100 bytes of data a time like:
s = f.read(100)
struct.unpack(str(len(s)/2)+"h", s)