Seeding the random generator for tests - python

I made it work using factory-boy's get_random_state/set_random_state, although it wasn't easy. And the biggest downside is that the values are big. So the thing that comes to mind is to write it to a file. But then if I accidentally run the tests not telling it to seed from the file, the value is lost. Now that I think about it, I can display the value too (think tee). But still I'd like to reduce it to 4-5 digits.
My idea is as follows. Normally when you run tests it somewhere says, "seed: 4215." Then to reproduce the same result I've got to do SEED=4215 ./manage.py test or something.
I did some experiments with factory-boy, but then I realized that I can't achieve this even with the random module itself. I tried different ideas. All of them failed so far. The simplest is this:
import random
import os
if os.getenv('A'):
random.seed(os.getenv('A'))
else:
seed = random.randint(0, 1000)
random.seed(seed)
print('seed: {}'.format(seed))
print(random.random())
print(random.random())
/app $ A= python a.py
seed: 62
0.9279915658776743
0.17302689004804395
/app $ A=62 python a.py
0.461603098412836
0.7402019819205794
Why do the results differ? And how to make them equal?

Currently your types are different:
if os.getenv('A'):
random.seed(os.getenv('A'))
else:
seed = random.randint(0, 1000)
random.seed(seed)
print('seed: {}'.format(seed))
In the first case, you have a str and in the second an int. You can fix this by casting an int in the first case:
random.seed(int(os.getenv("A")))
I'm also not entirely following your need to seed random directly; I think with Factory Boy you can use factory.random.reseed_random (source).

Related

String concatenation much faster in Python than Go

I'm looking at using Go to write a small program that's mostly handling text. I'm pretty sure, based on what I've heard about Go and Python that Go will be substantially faster. I don't actually have a specific need for insane speeds, but I'd like to get to know Go.
The "Go is going to be faster" idea was supported by a trivial test:
# test.py
print("Hello world")
$ time python dummy.py
Hello world
real 0m0.029s
user 0m0.019s
sys 0m0.010s
// test.go
package main
import "fmt"
func main() {
fmt.Println("hello world")
}
$ time ./test
hello world
real 0m0.001s
user 0m0.001s
sys 0m0.000s
Looks good in terms of raw startup speed (which is entirely expected). Highly non-scientific justification:
$ strace python test.py 2>&1 | wc -l
1223
$ strace ./test 2>&1 | wc -l
174
However, my next contrived test was how fast is Go when faffing with strings, and I was expecting to be similarly blown away by Go's raw speed. So, this was surprising:
# test2.py
s = ""
for i in range(1000000):
s += "a"
$ time python test2.py
real 0m0.179s
user 0m0.145s
sys 0m0.013s
// test2.go
package main
func main() {
s := ""
for i:= 0; i < 1000000; i++ {
s += "a";
}
}
$ time ./test2
real 0m56.840s
user 1m50.836s
sys 0m17.653
So Go is hundreds of times slower than Python.
Now, I know this is probably due to Schlemiel the Painter's algorithm, which explains why the Go implementation is quadratic in i (i is 10 times bigger leads to 100 times slowdown).
However, the Python implementation seems much faster: 10 times more loops only slows it down by twice. The same effect persists if you concatenate str(i), so I doubt there's some kind of magical JIT optimization to s = 100000 * 'a' going on. And it's not much slower if I print(s) at the end, so the variable isn't being optimised out.
Naivety of the concatenation methods aside (there are surely more idiomatic ways in each language), is there something here that I have misunderstood, or is it simply easier in Go than in Python to run into cases where you have to deal with C/C++-style algorithmic issues when handling strings (in which case a straight Go port might not be as uh-may-zing as I might hope without having to, ya'know, think about things and do my homework)?
Or have I run into a case where Python happens to work well, but falls apart under more complex use?
Versions used: Python 3.8.2, Go 1.14.2
TL;DR summary: basically you're testing the two implementation's allocators / garbage collectors and heavily weighting the scale on the Python side (by chance, as it were, but this is something the Python folks optimized at some point).
To expand my comments into a real answer:
Both Go and Python have counted strings, i.e., strings are implemented as a two-element header thingy containing a length (byte count or, for Python 3 strings, Unicode characters count) and data pointer.
Both Go and Python are garbage-collected (GCed) languages. That is, in both languages, you can allocate memory without having to worry about freeing it yourself: the system takes care of that automatically.
But the underlying implementations differ, quite a bit in this particular one important way: the version of Python you are using has a reference counting GC. The Go system you are using does not.
With a reference count, the inner bits of the Python string handler can do this. I'll express it as Go (or at least pseudo-Go) although the actual Python implementation is in C and I have not made all the details line up properly:
// add (append) new string t to existing string s
func add_to_string(s, t string_header) string_header {
need = s.len + t.len
if s.refcount == 1 { // can modify string in-place
data = s.data
if cap(data) >= need {
copy_into(data + s.len, t.data, t.len)
return s
}
}
// s is shared or s.cap < need
new_s := make_new_string(roundup(need))
// important: new_s has extra space for the next call to add_to_string
copy_into(new_s.data, s.data, s.len)
copy_into(new_s.data + s.len, t.data, t.len)
s.refcount--
if s.refcount == 0 {
gc_release_string(s)
}
return new_s
}
By over-allocating—rounding up the need value so that cap(new_s) is large—we get about log2(n) calls to the allocator, where n is the number of times you do s += "a". With n being 1000000 (one million), that's about 20 times that we actually have to invoke the make_new_string function and release (for gc purposes because the collector uses refcounts as a first pass) the old string s.
[Edit: your source archaeology led to commit 2c9c7a5f33d, which suggests less than doubling but still a multiplicative increase. To other readers, see comment.]
The current Go implementation allocates strings without a separate capacity header field (see reflect.StringHeader and note the big caveat that says "don't depend on this, it might be different in future implementations"). Between the lack of a refcount—we can't tell in the runtime routine that adds two strings, that the target has only one reference—and the inability to observe the equivalent of cap(s) (or cap(s.data)), the Go runtime has to create a new string every time. That's one million memory allocations.
To show that the Python code really does use the refcount, take your original Python:
s = ""
for i in range(1000000):
s += "a"
and add a second variable t like this:
s = ""
t = s
for i in range(1000000):
s += "a"
t = s
The difference in execution time is impressive:
$ time python test2.py
0.68 real 0.65 user 0.03 sys
$ time python test3.py
34.60 real 34.08 user 0.51 sys
The modified Python program still beats Go (1.13.5) on this same system:
$ time ./test2
67.32 real 103.27 user 13.60 sys
and I have not poked any further into the details, but I suspect the Go GC is running more aggressively than the Python one. The Go GC is very different internally, requiring write barriers and occasional "stop the world" behavior (of all goroutines that are not doing the GC work). The refcounting nature of the Python GC allows it to never stop: even with a refcount of 2, the refcount on t drops to 1 and then next assignment to t drops it to zero, releasing the memory block for re-use in the next trip through the main loop. So it's probably picking up the same memory block over and over again.
(If my memory is correct, Python's "over-allocate strings and check the refcount to allow expand-in-place" trick was not in all versions of Python. It may have first been added around Python 2.4 or so. This memory is extremely vague and a quick Google search did not turn up any evidence one way or the other. [Edit: Python 2.7.4, apparently.])
Well. You should never, ever use string concatenation in this way :-)
in go, try the strings.Buider
package main
import (
"strings"
)
func main() {
var b1 strings.Builder
for i:= 0; i < 1000000; i++ {
b1.WriteString("a");
}
}

I am trying to run a random variable multiple times

I am very new to python, and am running into an issue I don't fully understand. I am trying to get a random variable to run multiple times, but for some reason it just returns the same random value x times.
I am not entirely certain what to try aside from the code I have already done.
lowTreasureList = "50 gold", "Healing Potion", "10x Magic Arrows", "+1 Magic Weapon"
def ranLowLoot(lowLootGiven):
# This function returns a random string from the passed list of strings.
lootIndex = random.randint(0, len(lowLootGiven) - 1)
return lowLootGiven[lootIndex]
lowLoot = ranLowLoot(lowTreasureList)
treasureSelection = int(input())
if treasureSelection == 1:
numLowTreasure = int(input('How many treasures? '))
for i in range(numLowTreasure):
ranLowLoot(lowTreasureList)
print(lowLoot)
When I do this I get the same random treasure (numLowTreasure) times, but I am trying to get it to select a new random treasure each time.
If you haven't already, it will help to read the documentation on the random module.
There are three alternatives to random.randint that are more suited to your purpose:
random.randrange(start, stop, [step]): step is optional and defaults to one. This will save you the len(...) - 1 you are using to get lootIndex, since stop is an exclusive bound.
random.randrange(stop): uses a default start of zero and default step of 1, which will save you passing 0 as your start index.
random.choice(seq): you can pass your function's parameter lowLootGiven to this as seq, which will save you from using indices and writing your own function entirely.
As for why you're getting the repeated treasure, that's because you aren't updating your variable lowLoot in your for loop. You should write:
for i in range(numLowTreasure):
lowLoot = ranLowLoot(lowTreasureList)
print(lowLoot)
Last thing I want to say is that python is nice for writing simple things quickly. Even if there was some bigger context that you were writing this code in, I might have written it like this:
lowTreasureList = ("50 gold", "Healing Potion", "10x Magic Arrows", "+1 Magic Weapon")
if int(input()) == 1:
for i in range(int(input('How many treasures? '))):
print(random.choice(lowTreasureList))
Using the round parentheses around the tuple declaration like I did isn't necessary in this case, but I like to use it because if you want to make the tuple declaration span multiple lines, it won't work without them.
Reading documentation on standard libraries is something I almost always find helpful. I think Python's documentation is great, and if it's bit too much to digest early on, I found tutorialspoint to be a good place to start.
The problem is that in the main loop you are discarding the result of the call to ranLowLoot(). As a minimal fix, in the main loop assign the result of that function call. Use:
lowLoot = ranLowLoot(lowTreasureList)
rather than simply:
ranLowLoot(lowTreasureList)
As a better fix, ditch your function completely and just use random.choice() (which does what you are trying to do, with much less fuss):
import random
lowTreasureList = ["50 gold", "Healing Potion", "10x Magic Arrows", "+1 Magic Weapon"]
treasureSelection = int(input())
if treasureSelection == 1:
numLowTreasure = int(input('How many treasures? '))
for i in range(numLowTreasure):
lowLoot = random.choice(lowTreasureList)
print(lowLoot)

How to run the python unittest N number of times

I have a python unittest like below , I want to run this whole Test N Number of times
class Test(TestCase)
def test_0(self):
.........
.........
.........
Test.Run(name=__name__)
Any Suggestions?
You can use parameterized tests. There are different modules to do that. I use nose to run my unittests (more powerful than the default unittest module) and there's a package called nose-parameterized that allows you to write a factory test and run it a number of times with different values for variables you want.
If you don't want to use nose, there are several other options for running parameterized tests.
Alternatively you can execute any number of test conditions in a single test (as soon as one fails the test will report error). In your particular case, maybe this makes more sense than parameterized tests, because in reality it's only one test, only that it needs a large number of runs of the function to get to some level of confidence that it's working properly. So you can do:
import random
class Test(TestCase)
def test_myfunc(self):
for _ in range(100):
input = random.random()
self.assertEquals(input, input + 2)
Test.Run(name=__name__)
Why because... the test_0 method contains a random option.. so each time it runs it selects random number of configuration and tests against those configurations. so I am not testing the same thing multiple times..
Randomness in a test makes it non-reproducible. One day you might get 1 failure out of 100, and when you run it again, it’s already gone.
Use a modern testing tool to parametrize your test with a sequential number, then use random.seed to have a random but reproducible test case for each number in a sequence.
portusato suggests nose, but pytest is a more modern and popular tool:
import random, pytest
#pytest.mark.parametrize('i', range(100))
def test_random(i):
orig_state = random.getstate()
try:
random.seed(i)
data = generate_random_data()
assert my_algorithm(data) == works
finally:
random.setstate(orig_state)
pytest.mark.parametrize “explodes” your single test_random into 100 individual tests — test_random[0] through test_random[99]:
$ pytest -q test.py
....................................................................................................
100 passed in 0.14 seconds
Each of these tests generates different, random, but reproducible input data to your algorithm. If test_random[56] fails, it will fail every time, so you will then be able to debug it.
If you don't want your test to stop after the first failure, you can use subTest.
class Test(TestCase):
def test_0(self):
for i in [1, 2, 3]:
with self.subTest(i=i):
self.assertEqual(squared(i), i**2)
Docs

Python: where is random.random() seeded?

Say I have some python code:
import random
r=random.random()
Where is the value of r seeded from in general?
And what if my OS has no random, then where is it seeded?
Why isn't this recommended for cryptography? Is there some way to know what the random number is?
Follow da code.
To see where the random module "lives" in your system, you can just do in a terminal:
>>> import random
>>> random.__file__
'/usr/lib/python2.7/random.pyc'
That gives you the path to the .pyc ("compiled") file, which is usually located side by side to the original .py where readable code can be found.
Let's see what's going on in /usr/lib/python2.7/random.py:
You'll see that it creates an instance of the Random class and then (at the bottom of the file) "promotes" that instance's methods to module functions. Neat trick. When the random module is imported anywhere, a new instance of that Random class is created, its values are then initialized and the methods are re-assigned as functions of the module, making it quite random on a per-import (erm... or per-python-interpreter-instance) basis.
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
randint = _inst.randint
The only thing that this Random class does in its __init__ method is seeding it:
class Random(_random.Random):
...
def __init__(self, x=None):
self.seed(x)
...
_inst = Random()
seed = _inst.seed
So... what happens if x is None (no seed has been specified)? Well, let's check that self.seed method:
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
"""
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
super(Random, self).seed(a)
self.gauss_next = None
The comments already tell what's going on... This method tries to use the default random generator provided by the OS, and if there's none, then it'll use the current time as the seed value.
But, wait... What the heck is that _urandom(16) thingy then?
Well, the answer lies at the beginning of this random.py file:
from os import urandom as _urandom
from binascii import hexlify as _hexlify
Tadaaa... The seed is a 16 bytes number that came from os.urandom
Let's say we're in a civilized OS, such as Linux (with a real random number generator). The seed used by the random module is the same as doing:
>>> long(binascii.hexlify(os.urandom(16)), 16)
46313715670266209791161509840588935391L
The reason of why specifying a seed value is considered not so great is that the random functions are not really "random"... They're just a very weird sequence of numbers. But that sequence will be the same given the same seed. You can try this yourself:
>>> import random
>>> random.seed(1)
>>> random.randint(0,100)
13
>>> random.randint(0,100)
85
>>> random.randint(0,100)
77
No matter when or how or even where you run that code (as long as the algorithm used to generate the random numbers remains the same), if your seed is 1, you will always get the integers 13, 85, 77... which kind of defeats the purpose (see this about Pseudorandom number generation) On the other hand, there are use cases where this can actually be a desirable feature, though.
That's why is considered "better" relying on the operative system random number generator. Those are usually calculated based on hardware interruptions, which are very, very random (it includes interruptions for hard drive reading, keystrokes typed by the human user, moving a mouse around...) In Linux, that O.S. generator is /dev/random. Or, being a tad picky, /dev/urandom (that's what Python's os.urandom actually uses internally) The difference is that (as mentioned before) /dev/random uses hardware interruptions to generate the random sequence. If there are no interruptions, /dev/random could be exhausted and you might have to wait a little bit until you can get the next random number. /dev/urandom uses /dev/random internally, but it guarantees that it will always have random numbers ready for you.
If you're using linux, just do cat /dev/random on a terminal (and prepare to hit Ctrl+C because it will start output really, really random stuff)
borrajax#borrajax:/tmp$ cat /dev/random
_+�_�?zta����K�����q�ߤk��/���qSlV��{�Gzk`���#p$�*C�F"�B9��o~,�QH���ɭ�f�޺�̬po�2o𷿟�(=��t�0�p|m�e
���-�5�߁ٵ�ED�l�Qt�/��,uD�w&m���ѩ/��;��5Ce�+�M����
~ �4D��XN��?ס�d��$7Ā�kte▒s��ȿ7_���- �d|����cY-�j>�
�b}#�W<դ���8���{�1»
. 75���c4$3z���/̾�(�(���`���k�fC_^C
Python uses the OS random generator or a time as a seed. This means that the only place where I could imagine a potential weakness with Python's random module is when it's used:
In an OS without an actual random number generator, and
In a device where time.time is always reporting the same time (has a broken clock, basically)
If you are concerned about the actual randomness of the random module, you can either go directly to os.urandom or use the random number generator in the pycrypto cryptographic library. Those are probably more random. I say more random because...
Image inspiration came from this other SO answer

Floating point calculations debugging

So I recently decided to learn python and as a exercise (plus making something useful) I decided to make a Euler's Modified Method algorithm for solving higher-then-first order differential equations. An example input would be:
python script_name.py -y[0] [10,0]
where the first argument is the deferential equation (here: y''=-y), and the second one the initial conditions (here: y(0)=10, y'(0)=0). It is then meant to out put the resusts to two files (x-data.txt, and y-data.txt).
Heres the problem:
When in run the code with the specified the final line (at t=1) reads -0.0, but if you solve the ODE (y=10*cos(x)), it should read 5.4. Even if you go through the program with a pen and paper and execute the code, your (and the computers) results apart to diverge by the second iteration). Any idea what could have caused this?
NB: I'm using python 2.7 on a os x
Here's my code:
#! /usr/bin/python
# A higher order differential equation solver using Euler's Modified Method
import math
import sys
step_size = 0.01
x=0
x_max=1
def derivative(x, y):
d = eval(sys.argv[1])
return d
y=eval(sys.argv[2])
order = len(y)
y_derivative=y
xfile = open('x-data.txt','w+')
yfile = open('y-data.txt','w+')
while (x<x_max):
xfile.write(str(x)+"\n")
yfile.write(str(y[0])+"\n")
for i in range(order-1):
y_derivative[i]=y[(i+1)]
y_derivative[(order-1)] = derivative(x,y)
for i in range(order):
y[i]=y[i]+step_size*y_derivative[i]
x=x+step_size
xfile.close()
yfile.close()
print('done')
When you say y_derivative=y they are the SAME list with different names. I.e. when you change y_derivative[i]=y[i+1] both lists are changing. You want to use y_derivative=y[:] to make a copy of y to put in y_derivative
See How to clone or copy a list? for more info
Also see http://effbot.org/zone/python-list.htm
Note, I was able to debug this in IDLE by replacing sys.argv with your provided example. Then if you turn on the debugger and step through the code, you can see both lists change.

Categories

Resources