Byte precision of value in Python? - python

I have a hash function in Python.
It returns a value.
How do I see the byte-size of this return value? I want to know if it is 4-bytes or 8 or what.
Reason:
I want to make sure that the min value is 0 and the max value is 2**32, otherwise my calculations are incorrect.
I want to make sure that packing it to a I struct (unsigned int) is correct.
More specifically, I am calling murmur.string_hash(`x`).
I want to know sanity-check that I am getting a 4-byte unsigned return value. If I have a value of a different size, them my calculations get messed up. So I want to sanity check it.

If it's an arbitrary function that returns a number, there are only 4 standard types of numbers in Python: small integers (C long, at least 32 bits), long integers ("unlimited" precision), floats (C double), and complex numbers.
If you are referring to the builtin hash, it returns a standard integer (C long):
>>> hash(2**31)
-2147483648
If you want different hashes, check out hashlib.

Generally, thinking of a return value as a particular byte precision in Python is not the best way to go, especially with integers. For most intents and purposes, Python "short" integers are seamlessly integrated with "long" (unlimited) integers. Variables are promoted from the smaller to the larger type as necessary to hold the required value. Functions are not required to return any particular type (the same function could return different data types depending on the input, for example).
When a function is provided by a third-party package (as this one is), you can either just trust the documentation (which for Murmur indicates 4-byte ints as far as I can tell) or test the return value yourself before using it (whether by if, assert, or try, depending on your preference).

Related

Safe to seed Python RNG using float?

Can floating point values be passed as an argument to random.seed()? Does this introduce unforeseen issues?
In other words. Is....
random.seed(0.99999999)
<use a few thousand random numbers>
random.seed(1)
<use a few thousand random numbers>
.... functionally equivalent to....
random.seed(0)
<use a few thousand random numbers>
random.seed(1)
<use a few thousand random numbers>
Quick testing suggests that both sets of code run just fine and on a superficial level the outputs appear to be independent and deterministic.
I'm interested to know if this method of seeding is completely safe to use in cases where independence between seeded sets is important. Obtaining deterministic results is also important. I've checked some of the documentation: Python 2.7 documentation and Python 3.8 documentation and done some googling and only found references to integers being used as a seed (or other data types which are converted to integers). I couldn't see any reference to floats and this makes me wonder if they are "safe" in the sense that they work in a predictable way with no nasty surprises.
I'm currently working with Python 2.7 but am interested in the answer for more modern versions too.
Using a float as a seed is intended functionality:
supported seed types are: None, int, float, str, bytes, and bytearray.
see: https://github.com/python/cpython/blob/master/Lib/random.py#L156
Getting a float of exactly the same value each time is critical for getting the same seed, but this is not too difficult. The most reliable way to always get the same float value is to not do any computation on it, or accept any user input. If you want to ensure complete control, you can use struct.unpack to generate a float from raw binary data.
Yes, it is safe to use a float seed
According to the documentation, random.seed(a) uses a directly if it is an int or long, otherwise (if a is not None) it uses hash(a). Given that python requires that hash(x) == hash(y) if x == y, this means that the same sequence of pseudo-random numbers will be generated for equal float seeds (with the standard caveats about strict comparisons of floating-point numbers).
The python 3 documentation is less clear about how it handles inputs of types other than int, str, bytes, and bytearray, but the behavior itself is the same as python 2 for python 3.8 and earlier. As was mentioned in Aaron's answer, seeding based on hashing is deprecated in 3.9, but float continues to be a supported seed type.

Python library to generate hash value as numbers

I am searching for a library where I need to hash a string which should producer numbers rather than alpha numeric
eg:
Input string: hello world
Salt value: 5467865390
Output value: 9223372036854775808
I have searched many libraries, but those library produces alpha-numeric as output, but I need plain numbers as output.
Is there is any such library? Though the problem of having only numbers as output will have high chance of collision, but though it is fine for my business use case.
EDIT 1:
Also I need to control the number of digits in output. I want to store the value in database which has Numeric datatype. So I need to control the number of digits to fit the size within the data type range
Hexadecimal hash codes can be interpreted as (rather large) numbers:
import hashlib
hex_hash = hashlib.sha1('hello world'.encode('utf-8')).hexdigest()
int_hash = int(hex_hash, 16) # convert hexadecimal to integer
print(hex_hash)
print(int_hash)
outputs
'2aae6c35c94fcfb415dbe95f408b9ce91ee846ed'
243667368468580896692010249115860146898325751533
EDIT: As asked in the comments, to limit the number to a certain range, you can simply use the modulus operator. Note, of course, that this will increase the possibility of collisions. For instance, we can limit the "hash" to 0 .. 9,999,999 with modulus 10,000,000.
limited_hex_hash = hex_hash % 10_000_000
print(limited_hex_hash)
outputs
5751533
I think there is no need for libraries. You can simply accomplish this with hash() function in python.
InputString="Hello World!!"
HashValue=hash(InputString)
print(HashValue)
print(type(HashValue))
Output:
8831022758553168752
<class 'int'>
Solution for the problem based on Latest EDIT :
The above method is the simplest solution, changing the hash for each invocation will help us prevent attackers from tampering our application.
If you like to switch off the randomization you can simply do that by assigning
PYTHONHASHSEED to zero.
For information on switching off the randomization check the official docs https://docs.python.org/3.3/using/cmdline.html#cmdoption-R

When does Python perform type conversion when comparing int and float?

Why does Python return True when I compare int and float objects which have the same value?
For example:
>>> 5*2 == 5.0*2.0
True
It's not as simple as a type conversion.
10 == 10.0 delegates to the arguments' __eq__ methods, trying (10).__eq__(10.0) first, and then (10.0).__eq__(10) if the first call returns NotImplemented. It makes no attempt to convert types. (Technically, the method lookup uses a special routine that bypasses instance __dict__ entries and __getattribute__/__getattr__ overrides, so it's not quite equivalent to calling the methods yourself.)
int.__eq__ has no idea how to handle a float:
>>> (10).__eq__(10.0)
NotImplemented
but float.__eq__ knows how to handle ints:
>>> (10.0).__eq__(10)
True
float.__eq__ isn't just performing a cast internally, either. It has over 100 lines of code to handle float/int comparison without the rounding error an unchecked cast could introduce. (Some of that could be simplified if the C-level comparison routine didn't also have to handle >, >=, <, and <=.)
Objects of different types, except different numeric types, never compare equal.
And:
Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the “narrower” type is widened to that of the other, where integer is narrower than floating point, which is narrower than complex. Comparisons between numbers of mixed type use the same rule.
https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
The comparison logic is implemented by each type's __eq__ method. And the standard numeric types are implemented in a way that they support comparisons (and arithmetic operations) among each other. Python as a language never does implicit type conversion (like Javascript's == operator would do implicit type juggling).
The simple answer is that the langue is designed this way. Here is an excerpt from the documentation supporting this:
6.10.1 Value Comparisons
Numbers of built-in numeric types (Numeric Types — int, float, complex) and of the standard library types fractions.Fraction and decimal.Decimal can be compared within and across their types, with the restriction that complex numbers do not support order comparison.
In other words, we want different numeric types with the same value to be equal.
PEP 20
Special cases aren't special enough to break the rules.
Although practicality beats purity.
What benefit is there to making numeric types not comparable, besides making life difficult in most common cases?
You can have a look at the source code for the CPython implementation.
The function is preceded by this comment explaining how the conversion is attempted:
/* Comparison is pretty much a nightmare. When comparing float to float,
* we do it as straightforwardly (and long-windedly) as conceivable, so
* that, e.g., Python x == y delivers the same result as the platform
* C x == y when x and/or y is a NaN.
* When mixing float with an integer type, there's no good *uniform* approach.
* Converting the double to an integer obviously doesn't work, since we
* may lose info from fractional bits. Converting the integer to a double
* also has two failure modes: (1) an int may trigger overflow (too
* large to fit in the dynamic range of a C double); (2) even a C long may have
* more bits than fit in a C double (e.g., on a 64-bit box long may have
* 63 bits of precision, but a C double probably has only 53), and then
* we can falsely claim equality when low-order integer bits are lost by
* coercion to double. So this part is painful too.
*/
Other implementations are not guaranteed to follow the same logic.
From the documentation:
Python fully supports mixed arithmetic: when a binary arithmetic
operator has operands of different numeric types, the operand with the
“narrower” type is widened to that of the other, where plain integer
is narrower than long integer is narrower than floating point is
narrower than complex. Comparisons between numbers of mixed type use
the same rule.
According to this 5*2 is widened to 10.0 and which is equal to 10.0
If you are comparing the mixed data types then the result will be considered on the basics of data type which is having long range, so in your case float range is more then int
float max number can be --> 1.7976931348623157e+308
int max number can be --> 9223372036854775807
Thanks
The == operator compares only the values but not the types. You can use the 'is' keyword to achieve the same effect as using === in other languages. For instance
5 is 5.0
returns
False
==
is a comparison operator
You are actually asking the interpreter if both sides of your expression are equal or not.
In other words you are asking for it to return a Boolean value, not to convert data types. If you want to convert the data types you will have to do so implicitly in your code.

Python rounding and inserting into array does not round [duplicate]

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Is integer comparison in Python constant time?

is integer comparison in Python constant time? Can I use it to compare a user-provided int token with a server-stored int for crypto in the way I would compare strings with constant_time_compare from django.utils.crypto, i.e. without suffering timing attacks?
Alternatively, is it more secure to convert to a string and then use the above function?
The answer is yes for a given size of integer - by default python integers that get big become long and then have potentially infinite length - the compare time then grows with the size. If you restrict the size of the integer to a ctypes.c_uint64 or ctypes.c_uint32 this will not be the case.
Note that compare with 0 is a special case, normally much faster, due to the hardware actions many CPUs have a special flag for 0, but if you are using/allowing seeds or tokens with a values of 0 you are asking for trouble.

Categories

Resources