Reconstructing two (string concatenated) numbers that were originally floats - python

Unfortunately the printing instruction of a code was written without an end-of-the-line character and one every 26 numbers consists of two numbers joined together. The following is a code that shows an example of such behaviour; at the end there is a fragment of the original database.
import numpy as np
for _ in range(2):
A=np.random.rand()+np.random.randint(0,100)
B=np.random.rand()+np.random.randint(0,100)
C=np.random.rand()+np.random.randint(0,100)
D=np.random.rand()+np.random.randint(0,100)
with open('file.txt','a') as f:
f.write(f'{A},{B},{C},{D}')
And thus the output example file looks very similar to what follows:
40.63358599010553,53.86722741700399,21.800795158561158,13.95828176311762557.217562728494684,2.626308403991772,4.840593988487278,32.401778122213486
With the issue being that there are two numbers 'printed together', in the example they were as follows:
13.95828176311762557.217562728494684
So you cannot know if they should be
13.958281763117625, 57.217562728494684
or
13.9582817631176255, 7.217562728494684
Please understand that in this case they are only two options, but the problem that I want to address considers 'unbounded numbers' which are type Python's "float" (where 'unbounded' means in a range we don't know e.g. in the range +- 1E4)
Can the original numbers be reconstructed based on "some" python internal behavior I'm missing?
Actual data with periodicity 27 (i.e. the 26th number consists of 2 joined together):
0.9221878978925224, 0.9331311610066017,0.8600582424784715,0.8754578588852764,0.8738648974725404, 0.8897837559800233,0.6773502027673041,0.736325377603136,0.7956454122424133, 0.8083168444596229,0.7089031184165164, 0.7475306242508357,0.9702361286847581, 0.9900689384633811,0.7453878225174624, 0.7749000030576826,0.7743879170108678, 0.8032590543649807,0.002434,0.003673,0.004194,0.327903,11.357262,13.782266,20.14374,31.828905,33.9260060.9215201173775437, 0.9349343132442707,0.8605282244327555,0.8741626682026793,0.8742163597524663, 0.8874673376386358,0.7109322043854609,0.7376362393985332,0.796158275345

To expand my comment into an actual answer:
We do have some information - An IEEE-754 standard float only has 32 bits of precision, some of which is taken up by the mantissa (not all numbers can be represented by a float). For datasets like yours, they're brushing up against the edge of that precision.
We can make that work for us - we just need to test whether the number can, in fact, be represented by a float, at each possible split point. We can abuse strings for this, by testing num_str == str(float(num_str)) (i.e. a string remains the same after being converted to a float and back to a string)
If your number is able to be represented exactly by the IEEE float standard, then the before and after will be equal
If the number cannot be represented exactly by the IEEE float standard, it will be coerced into the nearest number that the float can represent. Obviously, if we then convert this back to a string, will not be identical to the original.
Here's a snippet, for example, that you can play around with
def parse_number(s: str) -> List[float]:
if s.count('.') == 2:
first_decimal = s.index('.')
second_decimal = s[first_decimal + 1:].index('.') + first_decimal + 1
split_idx = second_decimal - 1
for i in range(second_decimal - 1, first_decimal + 1, -1):
a, b = s[:split_idx], s[split_idx:]
if str(float(a)) == a and str(float(b)) == b:
return [float(a), float(b)]
# default to returning as large an a as possible
return [float(s[:second_decimal - 1]), float(s[second_decimal - 1:])]
else:
return [float(s)]
parse_number('33.9260060.9215201173775437')
# [33.926006, 0.9215201173775437]
# this is the only possible combination that actually works for this particular input
Obviously this isn't foolproof, and for some numbers there may not be enough information to differentiate the first number from the second. Additionally, for this to work, the tool that generated your data needs to have worked with IEEE standards-compliant floats (which does appear to be the case in this example, but may not be if the results were generated using a class like Decimal (python) or BigDecimal (java) or something else).
Some inputs might also have multiple possibilities. In the above snippet I've biased it to take the longest possible [first number], but you could modify it to go in the opposite order and instead take the shortest possible [first number].

Yes, you have one available weapon: you're using the default precision to display the numbers. In the example you cite, there are 15 digits after the decimal point, making it easy to reconstruct the original numbers.
Let's take a simple case, where you have only 3 digits after the decimal point. It's trivial to separate
13.95857.217
The formatting requires a maximum of 2 digits before the decimal point, and three after.
Any case that has five digits between the points, is trivial to split.
13.958 57.217
However, you run into the "trailing zero" problem in some cases. If you see, instead
13.9557.217
This could be either
13.950 57.217
or
13.955 07.217
Your data do not contain enough information to differentiate the two cases.

Related

Python decimal.Decimal producing result in scientific notation

I'm dividing a very long into much smaller number. Both are of type decimal.Decimal().
The result is coming out in scientific notation. How do I stop this? I need to print the number in full.
>>> decimal.getcontext().prec
50
>>> val
Decimal('1000000000000000000000000')
>>> units
Decimal('1500000000')
>>> units / val
Decimal('1.5E-15')
The precision is kept internally - you just have to explicitly call for the number of decimal places you want at the point you are exporting your decimal value to a string.
So, if you are going a print, or inserting the value in an HTML template, the first step is to use the string format method (or f-strings), to ensure the number is encompassed:
In [29]: print(f"{units/val:.50f}")
0.00000000000000150000000000000000000000000000000000
Unfortunatelly, the string-format minilanguage has no way to eliminate by itself the redundant zeroes on the right hand side. (the left side can be padded with "0", " ", custom characters, whatever one want, but all the precision after the decimal separator is converted to trailing 0s).
Since finding the least significant non-zero digit is complicated - otherwiser we could use a parameter extracted from the number instead of the "50" for precision in the format expression, the simpler thing is to remove those zeros after formatting take place, with the string .rstrip method:
In [30]: print(f"{units/val:.50f}".rstrip("0"))
0.0000000000000015
In short: this seems to be the only way to go: in all interface points, where the number is leaving the core to an output where it is representd as a string, you format it with an excess of precision with the fixed point notation, and strip out the tailing zeros with f-string:
return template.render(number=f"{number:.50f}".rstrip("0"), ...)
Render the decimal into a formatted string with a float type-indicator {:,f}, and it will display just the right number of digits to express the whole number, regardless of whether it is a very large integer or a very large decimal.
>>> val
Decimal('1000000000000000000000000')
>>> units
Decimal('1500000000')
>>> "{:,f}".format(units / val)
'0.0000000000000015'
# very large decimal integer, formatted as float-type string, appears without any decimal places at all when it has none! Nice!
>>> "{:,f}".format(units * val)
'1,500,000,000,000,000,000,000,000,000,000,000'
You don't need to specify the decimal places. It will display only as many as required to express the number, omitting that trail of useless zeros that appear after the final decimal digit when the decimal is shorter than a fixed format width. And you don't get any decimal places if the number has no fraction part.
Very large numbers are therefore accommodated without having to second guess how large they will be. And you don't have to second guess whether they will be have decimal places either.
Any specified thousands separator {:,f} will likewise only have effect if it turns out that the number is a large integer instead of a long decimal.
Proviso
Decimal(), however, has this idea of significant places, by which it will add trailing zeros if it thinks you want them.
The idea is that it intelligently handles situations where you might be dealing with currency digits such as £ 10.15. To use the example from the documentation:
>>> decimal.Decimal('1.30') + decimal.Decimal('1.20')
Decimal('2.50')
It makes no difference if you format the Decimal() - you still get the trailing zero if the Decimal() deems it to be significant:
>>> "{:,f}".format( decimal.Decimal('1.30') + decimal.Decimal('1.20'))
'2.50'
The same thing happens (perhaps for some good reason?) when you treat thousands and fractions together:
>>> decimal.Decimal(2500) * decimal.Decimal('0.001')
Decimal('2.500')
Remove significant trailing zeros with the Decimal().normalize() method:
>>> (2500 * decimal.Decimal('0.001')).normalize()
Decimal('2.5')

int(str) of a huge number

if i have a number that is too big to be represented with 64 bits so i receive a string that contains it.
what happens if i use:
num = int(num_str)
i am asking because it looks like it works accurately and i dont understand how, does is allocate more memory for that?
i was required to check if a huge number is a power of 2. someone suggested:
def power(self, A):
A = int(A)
if A == 1:
return 0
x =bin(A)
if x.count('1')>1:
return 0
else:
return 1
while i understand why under regular circumstances it would work, the fact that the numbers are much larger than 2^64 and it still works baffles me.
According to the Python manual's description on the representation of integers:
These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left.

How to dynamically format string representation of float number in python?

Hi I would like to dynamically adjust the displayed decimal places of a string representation of a floating point number, but i couldn't find any information on how to do it.
E.g:
precision = 8
n = 7.12345678911
str_n = '{0:.{precision}}'.format(n)
print(str_n) should display -> 7.12345678
But instead i'm getting a "KeyError". What am i missing?
You need to specify where precision in your format string comes from:
precision = 8
n = 7.12345678911
print('{0:.{precision}}'.format(n, precision=precision))
The first time, you specified which argument you'd like to be the number using an index ({0}), so the formatting function knows where to get the argument from, but when you specify a placeholder by some key, you have to explicitly specify that key.
It's a little unusual to mix these two systems, i'd recommend staying with one:
print('{number:.{precision}}'.format(number=n, precision=precision)) # most readable
print('{0:.{1}}'.format(n, precision))
print('{:.{}}'.format(n, precision)) # automatic indexing, least obvious
It is notable that these precision values will include the numbers before the point, so
>>> f"{123.45:.3}"
'1.23e+02'
will give drop drop the decimals and only give the first three digits of the number.
Instead, the f can be supplied to the type of the format (See the documentation) to get fixed-point formatting with precision decimal digits.
print('{number:.{precision}f}'.format(number=n, precision=precision)) # most readable
print('{0:.{1}f}'.format(n, precision))
print('{:.{}f}'.format(n, precision)) # automatic indexing, least obvious
In addition to #Talon, for those interested in f-strings, this also works.
precision = 8
n = 7.12345678911
print(f'{n:.{precision}f}')

Is there a way to convert complex number to just number?

So I have this long number (i.e: 1081546747036327937), and when I cleaned up my data in pandas dataframe, I didn't realize Python converted it to complex number (i.e: 1.081546747036328e+18).
I saved this one as csv. The problem is, I accidentally deleted the original file, tried to recover it but no success this far, so...
is there a way to convert this complex number back to their original number?
I tried to convert it to str using str(data) but it stays the same (i.e: 1.081546747036328e+18).
As you were said in comment, this is not a complex number, but a floating point number. You can certainly convert it to a (long) integer, but you cannot be sure to get back the initial number.
In your example:
i = 1081546747036327937
f = float(i)
j = int(f)
print(i, f, j, j-i)
will display:
1081546747036327937 1.081546747036328e+18 1081546747036327936 -1
This is because floating points only have a limited accuracy and rounding errors are to be expected with large integers when the binary representation requires more than 53 bits.
As can be read here, complex numbers are a sum of a real part and an imaginary part.
3+1j is a complex number with the real value 3 and a complex value 1
You have a scientific notation (type is float), which is just an ordinary float multiplied by the specified power of 10.
1e10 equals to 1 times ten to the power of ten
To convert this to int, you can just convert with int(number). For more information about python data types, you can take a look here

Python Format Strings and Floating Point Representation

I'm working with some in-place code dealing with formatting user-stored floating point numbers for human display.
The current implementation does this:
"{0:.24f}".format(some_floating_point).rstrip('0')
which makes sense and works just fine for the most part. But when faced with a value of such as 0.0003 things don't go as well.
>>> "{0:.24f}".format(0.0003).rstrip('0')
'0.000299999999999999973719'
Some further investigation indicates that Python seems to change the underlying representation based on the number of digits requested?
>>> "{0:.15f}".format(0.0003)
'0.000300000000000'
>>> "{0:.20f}".format(0.0003)
'0.00029999999999999997'
My assumption is single precision vs double.
The user enters these values where they are stored in the database as a double, and when the form is rendered again later the same value is prepopulated in the field. Therefore I need a 1:1 mapping of these representations.
My question is therefore: What is an elegant, and more importantly safe way to deal with this behavior? My best efforts so far have involved log10 and are less than ideal to put it nicely.
EDIT: As Prune points out the value is not actually changing, but rather the rounding done by format will carry over causing a set of 9s to become 0s (d'oh). The behavior makes sense then, but the solution is still escaping me.
You are receiving the number as stored. 0.0003 cannot be stored exactly as a binary fraction. To illustrate:
>>> 0.00029999999999999997 == 0.0003
True
Print formatting rounds the number at the least significant digit. Double precision merely pushes the problem farther to the right. To fully "solve" the problem to base-10 eyes, you need to switch to decimal arithmetic, or perhaps build your own string handler for numbers that are sufficiently close to a simpler value (a suspicious string of 9's or 0's in the fractional part).
Here's the start of a function for you. I tested it with 0.0004, which stores as a hair more than 0.0004; the 9's case is left as an exercise :-) .
def str_round(x):
size = 6
nines = '9'*size
zeros = '0'*size
str = "{0:.24f}".format(x).rstrip('0')
str_len = len(str)
print str, str_len
if nines in str:
# replace leading digit with one more
pos = str.index(nines)
# ADD CODE HERE
# Turn the leading portion into an integer;
# increment and convert back to zero-leading string.
# Fill out the rest with zeros.
elif zeros in str:
# Change all trailing digits to 0
pos = str.index(zeros)
str = str[:pos] + '0'*(str_len - pos)
return str
print str_round(0.0004)

Categories

Resources