Python library to generate hash value as numbers - python

I am searching for a library where I need to hash a string which should producer numbers rather than alpha numeric
eg:
Input string: hello world
Salt value: 5467865390
Output value: 9223372036854775808
I have searched many libraries, but those library produces alpha-numeric as output, but I need plain numbers as output.
Is there is any such library? Though the problem of having only numbers as output will have high chance of collision, but though it is fine for my business use case.
EDIT 1:
Also I need to control the number of digits in output. I want to store the value in database which has Numeric datatype. So I need to control the number of digits to fit the size within the data type range

Hexadecimal hash codes can be interpreted as (rather large) numbers:
import hashlib
hex_hash = hashlib.sha1('hello world'.encode('utf-8')).hexdigest()
int_hash = int(hex_hash, 16) # convert hexadecimal to integer
print(hex_hash)
print(int_hash)
outputs
'2aae6c35c94fcfb415dbe95f408b9ce91ee846ed'
243667368468580896692010249115860146898325751533
EDIT: As asked in the comments, to limit the number to a certain range, you can simply use the modulus operator. Note, of course, that this will increase the possibility of collisions. For instance, we can limit the "hash" to 0 .. 9,999,999 with modulus 10,000,000.
limited_hex_hash = hex_hash % 10_000_000
print(limited_hex_hash)
outputs
5751533

I think there is no need for libraries. You can simply accomplish this with hash() function in python.
InputString="Hello World!!"
HashValue=hash(InputString)
print(HashValue)
print(type(HashValue))
Output:
8831022758553168752
<class 'int'>
Solution for the problem based on Latest EDIT :
The above method is the simplest solution, changing the hash for each invocation will help us prevent attackers from tampering our application.
If you like to switch off the randomization you can simply do that by assigning
PYTHONHASHSEED to zero.
For information on switching off the randomization check the official docs https://docs.python.org/3.3/using/cmdline.html#cmdoption-R

Related

How do I check the default decimal precision when converting float to str?

When converting a float to a str, I can specify the number of decimal points I want to display
'%.6f' % 0.1
> '0.100000'
'%.6f' % .12345678901234567890
> '0.123457'
But when simply calling str on a float in python 2.7, it seems to default to 12 decimal points max
str(0.1)
>'0.1'
str(.12345678901234567890)
>'0.123456789012'
Where is this max # of decimal points defined/documented? Can I programmatically get this number?
The number of decimals displayed is going to vary greatly, and there won't be a way to predict how many will be displayed in pure Python. Some libraries like numpy allow you to set precision of output.
This is simply because of the limitations of float representation.
The relevant parts of the link talk about how Python chooses to display floats.
Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine
Python keeps the number of digits manageable by displaying a rounded value instead
Now, there is the possibility of overlap here:
Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction
The method for choosing which decimal values to display was changed in Python 3.1 (But the last sentence implies this might be an implementation detail).
For example, the numbers 0.1 and 0.10000000000000001 are both
approximated by 3602879701896397 / 2 ** 55. Since all of these decimal
values share the same approximation, any one of them could be
displayed while still preserving the invariant eval(repr(x)) == x
Historically, the Python prompt and built-in repr() function would
choose the one with 17 significant digits, 0.10000000000000001.
Starting with Python 3.1, Python (on most systems) is now able to
choose the shortest of these and simply display 0.1.
I do not believe this exists in the python language spec. However, the cpython implementation does specify it. The float_repr() function, which turns a float into a string, eventually calls a helper function with the 'r' formatter, which eventually calls a utility function that hardcodes the format to what comes down to format(float, '.16g'). That code can be seen here. Note that this is for python3.6.
>>> import math
>>> str(math.pi*4)
12.5663706144
giving the maximum number of signification digits (both before and after the decimal) at 16. It appears that in the python2.7 implementation, this value was hardcoded to .12g. As for why this happened (and is somewhat lacking documentation, can be found here.)
So if you are trying to get how long a number will be formatted when printed, simply get it's length with .12g.
def len_when_displayed(n):
return len(format(n, '.12g'))
Well, if you're looking for a pure python way of accomplishing this, you could always use something like,
len(str(.12345678901234567890).split('.')[1])
>>>> 12
I couldn't find it in the documentation and will add it here if I do, but this is a work around that can at least always return the length of precision if you want to know before hand.
As you said, it always seems to be 12 even when feeding bigger floating-points.
From what I was able to find, this number can be highly variable and in these cases, finding it empirically seems to be the most reliable way of doing it. So, what I would do is define a simple method like this,
def max_floating_point():
counter = 0
current_length = 0
str_rep = '.1'
while(counter <= current_length):
str_rep += '1'
current_length = len(str(float(str_rep)).split('.')[1])
counter += 1
return current_length
This will return you the maximum length representation on your current system,
print max_floating_point()
>>>> 12
By looking at the output of random numbers converted, I have been unable to understand how the length of the str() is determined, e.g. under Python 3.6.6:
>>> str(.123456789123456789123456789)
'0.12345678912345678'
>>> str(.111111111111111111111111111)
'0.1111111111111111'
You may opt for this code that actually simulates your real situation:
import random
maxdec=max(map(lambda x:len(str(x)),filter(lambda x:x>.1,[random.random() for i in range(99)])))-2
Here we are testing the length of ~90 random numbers in the (.1,1) open interval after conversion (and deducing the 0. from the left, hence the -2).
Python 2.7.5 on a 64bit linux gives me 12, and Python 3.4.8 and 3.6.6 give me 17.

Python rounding and inserting into array does not round [duplicate]

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Python default behavior of str(x)

I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like
mydec = decimal.Decimal(str(x))
where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?
For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.
>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'
Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:
>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')
So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)
There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!
In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:
/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
the rounding noise created by various operations is suppressed, while
giving plenty of precision for practical use. */
#define PyFloat_STR_PRECISION 12
You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.
There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.
If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).
As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to
return a string containing a nicely printable representation of an object
There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.
If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):
Decimal( "{0:.5f}".format(a_float) )
You can also remove 0s on the right with resulting_string.rstrip("0").
Again, this method does not recover the information that has been lost.

How do I force Python to keep an integer out of scientific notation

I am trying to write a method in Python 3.2 that encrypts a phrase and then decrypts it. The problem is that the numbers are so big that when Python does math with them it immediately converts it into scientific notation. Since my code requires all the numbers to function scientific notation, this is not useful.
What I have is:
coded = ((eval(input(':'))+1213633288469888484)/2)+1042
Basically, I just get a number from the user and do some math to it.
I have tried format() and a couple other things but I can't get them to work.
EDIT: I use only even integers.
In python3, '/' does real division (e.g. floating point). To get integer division, you need to use //. In other words 100/2 yields 50.0 (float) whereas 100//2 yields 50 (integer)
Your code probably needs to be changed as:
coded = ((eval(input(':'))+1213633288469888484)//2)+1042
As a cautionary tale however, you may want to consider using int instead of eval:
coded = ((int(input(':'))+1213633288469888484)//2)+1042
If you know that the floating point value is really an integer, or you don't care about dropping the fractional part, you can just convert it to an int before you print it.
>>> print 1.2e16
1.2e+16
>>> print int(1.2e16)
12000000000000000

Byte precision of value in Python?

I have a hash function in Python.
It returns a value.
How do I see the byte-size of this return value? I want to know if it is 4-bytes or 8 or what.
Reason:
I want to make sure that the min value is 0 and the max value is 2**32, otherwise my calculations are incorrect.
I want to make sure that packing it to a I struct (unsigned int) is correct.
More specifically, I am calling murmur.string_hash(`x`).
I want to know sanity-check that I am getting a 4-byte unsigned return value. If I have a value of a different size, them my calculations get messed up. So I want to sanity check it.
If it's an arbitrary function that returns a number, there are only 4 standard types of numbers in Python: small integers (C long, at least 32 bits), long integers ("unlimited" precision), floats (C double), and complex numbers.
If you are referring to the builtin hash, it returns a standard integer (C long):
>>> hash(2**31)
-2147483648
If you want different hashes, check out hashlib.
Generally, thinking of a return value as a particular byte precision in Python is not the best way to go, especially with integers. For most intents and purposes, Python "short" integers are seamlessly integrated with "long" (unlimited) integers. Variables are promoted from the smaller to the larger type as necessary to hold the required value. Functions are not required to return any particular type (the same function could return different data types depending on the input, for example).
When a function is provided by a third-party package (as this one is), you can either just trust the documentation (which for Murmur indicates 4-byte ints as far as I can tell) or test the return value yourself before using it (whether by if, assert, or try, depending on your preference).

Categories

Resources