Safe to seed Python RNG using float? - python

Can floating point values be passed as an argument to random.seed()? Does this introduce unforeseen issues?
In other words. Is....
random.seed(0.99999999)
<use a few thousand random numbers>
random.seed(1)
<use a few thousand random numbers>
.... functionally equivalent to....
random.seed(0)
<use a few thousand random numbers>
random.seed(1)
<use a few thousand random numbers>
Quick testing suggests that both sets of code run just fine and on a superficial level the outputs appear to be independent and deterministic.
I'm interested to know if this method of seeding is completely safe to use in cases where independence between seeded sets is important. Obtaining deterministic results is also important. I've checked some of the documentation: Python 2.7 documentation and Python 3.8 documentation and done some googling and only found references to integers being used as a seed (or other data types which are converted to integers). I couldn't see any reference to floats and this makes me wonder if they are "safe" in the sense that they work in a predictable way with no nasty surprises.
I'm currently working with Python 2.7 but am interested in the answer for more modern versions too.

Using a float as a seed is intended functionality:
supported seed types are: None, int, float, str, bytes, and bytearray.
see: https://github.com/python/cpython/blob/master/Lib/random.py#L156
Getting a float of exactly the same value each time is critical for getting the same seed, but this is not too difficult. The most reliable way to always get the same float value is to not do any computation on it, or accept any user input. If you want to ensure complete control, you can use struct.unpack to generate a float from raw binary data.

Yes, it is safe to use a float seed
According to the documentation, random.seed(a) uses a directly if it is an int or long, otherwise (if a is not None) it uses hash(a). Given that python requires that hash(x) == hash(y) if x == y, this means that the same sequence of pseudo-random numbers will be generated for equal float seeds (with the standard caveats about strict comparisons of floating-point numbers).
The python 3 documentation is less clear about how it handles inputs of types other than int, str, bytes, and bytearray, but the behavior itself is the same as python 2 for python 3.8 and earlier. As was mentioned in Aaron's answer, seeding based on hashing is deprecated in 3.9, but float continues to be a supported seed type.

Related

Is there a difference between an int of 5 and a float of 5.0?

I am confused on whether there is or is not any difference between and int of 5 and a float of 5.0, besides the float having a decimal.
What are some of the things I can do with an int that I can't with a float? What is the point of having two separate types, instead of just letting everything be a float in the first place?
They are different data types:
type(5) # int
type(5.0) # float
And therefore they are not, strictly speaking, the same.
However, they are equal:
5 == 5.0 # true
They are different types.
>>> type(5)
<type 'int'>
>>> type(5.0)
<type 'float'>
Internally, they are stored differently.
5 and 5.0 are different objects in python so 5 is 5.0 is False
But in most cases they behave the same like 5 == 5.0 is True
As your question is focused on the difference and need of having two different data types I will try to focus on that and answer.
Need for different data type (why not put everything in float?)
Different data type have different memory usage.int uses 2 byte whereas float uses 4 byte.One can use the correct data type in correct palce and save memory
What are some of the things I can do with an int that I can't with a float?
One of the most important thing one should know while using these these two data type is that,"integer division truncates": any fractional part is discarded.To get desired result you should use the correct type.
A nice example is given in the book "The C Programming Language.Book by Brian Kernighan and Dennis Ritchie" which is applicable regardless of the language used.
This statement converts the temparature from Fahrenheit to Celsius.
float celsius=(5/9)*(Fahrenheit-32)
This code will always give you the answer as 0.That is because the answer of 5/9 is 0.5556 which due to truncation is taken as 0.
now look at this code:
float celsius=(5.0/9.0)*(Fahrenheit-32)
This code will give you the correct answer as 5.0/9.0 gives us the value 0.5556. As we have used float value here the compiler does not truncate the value.The float value prevents the truncation of fraction part and gives us our desired answer
I think this will tell you how important is the difference between 5 and 5.0
This question is already answered: they have different type.
But what does that mean?
One must think in term of object: they are somehow objects of different classes, and the class dictates the object behavior.
Thus they will behave differently.
It's easier to grasp such things when you are in a pure object oriented language like in Smalltalk, because you clearly can browse the Float and Integer classes and learn how they might differ thru their implementation. In Python, it's more complex because the computation model is multi-layer with notions of types, operators, function, and this complexity is somehow obscuring the basic object oriented principles. But from behavioural point of view, it ends up being the same: Python : terminology 'class' VS 'type'
So what are these differences of Behavior?
They are thin because we make our best effort to have uniform and unsurprising arithmetic (including mixed arithmetic) behavior matching the laws of mathematics, whatever the programming language.
Floating point behaves differently because they keep a limited number of significand bits. It's a necessary trade-off for keeping computations simple and fast. Small integers require few significand bits, so they will behave mostly the same than floating point. But when growing larger, they won't. Here is an arithmetic example:
print(5.0**3 == 5**3)
print(5.0**23 == 5**23)
The former expression will print True, the later False... Because 5^23 requires 54bits to be represented and Python VM will in most case depend on IEEE754 double floating point which provide only 53 bits significand.

getting size of primitive data types in python

I am having a lot of confusion using the sys.getsizeof function in python. All I want to find out is that for say a floating point value, is the system using 4 or 8 bytes (i.e. single or double precision in C terms).
I do the following:
import sys
x = 0.0
sys.getsizeof(x) # Returns 24
type(x) # returns float
sys.getsizeof(float) # Returns 400.
How can I simply find out the how many bytes are actually used for the floating point representation. I know it should be 8 bytes but how can I verify this (something like the sizeof operator in C++)
Running
sys.getsizeof(float)
does not return the size of any individual float, it returns the size of the float class. That class contains a lot more data than just any single float, so the returned size will also be much bigger.
If you just want to know the size of a single float, the easiest way is to simply instantiate some arbitrary float. For example:
sys.getsizeof(float())
Note that
float()
simply returns 0.0, so this is actually equivalent to:
sys.getsizeof(0.0)
This returns 24 bytes in your case (and probably for most other people as well). In the case of CPython (the most common Python implementation), every float object will contain a reference counter and a pointer to the type (a pointer to the float class), which will each be 8 bytes for 64bit CPython or 4 bytes each for 32bit CPython. The remaining bytes (24 - 8 - 8 = 8 in your case which is very likely to be 64bit CPython) will be the bytes used for the actual float value itself.
This is not guaranteed to work out the same way for other Python implementations though. The language reference says:
These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture (and C or Java implementation) for the accepted range and handling of overflow. Python does not support single-precision floating point numbers; the savings in processor and memory usage that are usually the reason for using these are dwarfed by the overhead of using objects in Python, so there is no reason to complicate the language with two kinds of floating point numbers.
and I'm not aware of any runtime methods to accurately tell you the number of bytes used. However, note that the quote above from the language reference does say that Python only supports double precision floats, so in most cases (depending on how critical it is for you to always be 100% right) it should be comparable to double precision in C.
import ctypes
ctypes.sizeof(ctypes.c_double)
From the docs:
getsizeof() calls the object’s sizeof method and adds an additional garbage collector overhead if the object is managed by the garbage collector.
sys.getsizeof is not about the byte size as in C.
For int there is bit_length().

Python rounding and inserting into array does not round [duplicate]

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Is integer comparison in Python constant time?

is integer comparison in Python constant time? Can I use it to compare a user-provided int token with a server-stored int for crypto in the way I would compare strings with constant_time_compare from django.utils.crypto, i.e. without suffering timing attacks?
Alternatively, is it more secure to convert to a string and then use the above function?
The answer is yes for a given size of integer - by default python integers that get big become long and then have potentially infinite length - the compare time then grows with the size. If you restrict the size of the integer to a ctypes.c_uint64 or ctypes.c_uint32 this will not be the case.
Note that compare with 0 is a special case, normally much faster, due to the hardware actions many CPUs have a special flag for 0, but if you are using/allowing seeds or tokens with a values of 0 you are asking for trouble.

Byte precision of value in Python?

I have a hash function in Python.
It returns a value.
How do I see the byte-size of this return value? I want to know if it is 4-bytes or 8 or what.
Reason:
I want to make sure that the min value is 0 and the max value is 2**32, otherwise my calculations are incorrect.
I want to make sure that packing it to a I struct (unsigned int) is correct.
More specifically, I am calling murmur.string_hash(`x`).
I want to know sanity-check that I am getting a 4-byte unsigned return value. If I have a value of a different size, them my calculations get messed up. So I want to sanity check it.
If it's an arbitrary function that returns a number, there are only 4 standard types of numbers in Python: small integers (C long, at least 32 bits), long integers ("unlimited" precision), floats (C double), and complex numbers.
If you are referring to the builtin hash, it returns a standard integer (C long):
>>> hash(2**31)
-2147483648
If you want different hashes, check out hashlib.
Generally, thinking of a return value as a particular byte precision in Python is not the best way to go, especially with integers. For most intents and purposes, Python "short" integers are seamlessly integrated with "long" (unlimited) integers. Variables are promoted from the smaller to the larger type as necessary to hold the required value. Functions are not required to return any particular type (the same function could return different data types depending on the input, for example).
When a function is provided by a third-party package (as this one is), you can either just trust the documentation (which for Murmur indicates 4-byte ints as far as I can tell) or test the return value yourself before using it (whether by if, assert, or try, depending on your preference).

Categories

Resources