numpy.subtract performs subtraction wrong on 1-dimensional arrays - python

I read the post is-floating-point-math-broken and get Why it
happens, but I couldn't find a solution that could help me..
How can I do the correct subtraction?
Python version 2.6.6, Numpy version 1.4.1.
I have two numpy.ndarray each one contain float32 values, origin and new. I'm trying to use numpy.subtract to subtract them but I get the following (odd) result:
>>> import numpy as
>>> with open('base_R.l_BSREM_S.9.1_001.bin', 'r+') as fid:
origin = np.fromfile(fid, np.float32)
>>> with open('new_R.l_BSREM_S.9.1_001.bin', 'r+') as fid:
new = np.fromfile(fid, np.float32)
>>> diff = np.subtract(origin, new)
>>> origin[5184939]
0.10000000149011611938
>>> new[5184939]
0.00000000023283064365
>>> diff[5184939]
0.10000000149011611938
Also when I try to subtract the arrays at 5184939 I get the same result as diff[5184939]
>>> origin[5184939] - new[5184939]
0.10000000149011611938
But when I do the following I get this results:
>>> 0.10000000149011611938 - 0.00000000023283064365
0.10000000125728548
and that's not equal to diff[5184939]
How the right subtraction can be done? (0.10000000125728548 is the one that I need)
Please help, and Thanks in advance

You might add your Python and numpy versions to the question.
Differences can arise from np.float32 v np.float64 dtype, the default Python float type, as well as display standards. numpy uses different display rounding than the underlying Python.
The subtraction itself does not differ.
I can reproduce the 0.10000000125728548 value, which may also display as 0.1 (out 8 decimals).
I'm not sure where the 0.10000000149011611938 comes from. That looks as though new[5184939] was identically 0, not just something small like 0.00000000023283064365.

Related

`keepdims` changes output value [duplicate]

This question already has an answer here:
Different slices give different inequalities for same elements
(1 answer)
Closed 1 year ago.
import numpy as np
np.random.seed(2)
x = np.random.randn(1000000).astype('float32')
print(float(np.linalg.norm(x, keepdims=1)**2))
print(float(np.linalg.norm(x, keepdims=0)**2))
998428.125
998428.1084311157
Reproduced in Colab. Also, Colab outputs different values than my CPU:
998425.0625
998425.059075091
Removing **2, they match. Also reproduced with sum, haven't tried other methods.
Why this behavior? I can understand device dependence but keepdims seems buggy.
This is because after keepdims=0, your NumPy turns into a single dim float32, which after **2, becomes a float64. The other still has two axes and for some reason NumPy does not do this.
>>> print(
>>> np.linalg.norm(x, keepdims=1).dtype,
>>> np.linalg.norm(x, keepdims=0).dtype
>>> )
# Returns
float32 float32
>>> print(
>>> (np.linalg.norm(x, keepdims=1)**2).dtype,
>>> (np.linalg.norm(x, keepdims=0) ** 2).dtype
>>> )
# Returns
float32 float64
The version of numpy I'm using is 1.20.3.
I could not find any documentation of why this happens in the NumPy documentation. I think opening a GitHub issue in NumPy's repository, might be a good idea.

Ensure that calculations are done 64 bits (or at least warn of overflow)

I am using python and NumPy. I had the following basic quantity to compute:
(QL * (7**k))**2
Where
QL = 200003
k = 4
What puzzled me is that it returned a wrong (negative) number, which doesn't make sense. Then I realised after looking on the internet that the problem was because k was a 32-bit numpy integer.
A minimal working example can be the following:
QL = 200000
k = np.arange(10)[4]
print((QL * 7**k)**2)
This returns 406556672 instead of the correct answer 230592040000000000. The number is not negative here, but the same problem still occurs.
My question is:
How can I make sure that all the numbers used in my code are of the biggest possible integer size?
I don't want to explicitly specify it for each number that I create.
How can I at least force python to warn me when such things happen?
When you write QL = 200003; k = 4 in Python, the numbers are interpreted as ints. By default, if you were to convert these into numpy arrays or scalars, you would end up with whatever the default integer type is on your system.
Here is an example using one-element arrays:
QL = np.array([200003])
k = np.array([4])
On my system, I find that the dtype of both arrays is int32. You can change that by selecting your preferred dtype:
QL = np.array([200003], dtype=np.int64)
k = np.array([4], dtype=np.int64)
If you don't have access to the arrays at creation time, you can always convert them:
QL = QL.astype(np.int64)
k = k.astype(int64)
An option that is worth considering for integer math is skipping numpy altogether and using Python's infinite precision integers. If one of the numbers is a numpy scalar or one-element array, you can retrieve the corresponding Python object using the item method:
QL = QL.item()
k = k.item()
Numpy should raise at least a warning for overflow, but apparently this fails for some operations: https://github.com/numpy/numpy/issues/8987
TL;DR
In your case, k is a numpy scalar of type int32. You can do either one of the following:
For a numpy 64-bit result:
k = np.int64(k)
For an infinite-precision Python result:
k = k.item()
If you don't want to cast each k explicitly, you can create the range using the correct type:
k = np.arange(10, dtype=np.int64)[4]
There is no reliable way to set the default integer type for all new arrays without specifying it explicitly.

Error with numpy array calculations using int dtype (it fails to cast dtype to 64 bit automatically when needed)

I'm encountering a problem with incorrect numpy calculations when the inputs to a calculation are a numpy array with a 32-bit integer data type, but the outputs include larger numbers that require 64-bit representation.
Here's a minimal working example:
arr = np.ones(5, dtype=int) * (2**24 + 300) # arr.dtype defaults to 'int32'
# Following comment from #hpaulj I changed the first line, which was originally:
# arr = np.zeros(5, dtype=int)
# arr[:] = 2**24 + 300
single_value_calc = 2**8 * (2**24 + 300)
numpy_calc = 2**8 * arr
print(single_value_calc)
print(numpy_calc[0])
# RESULTS
4295044096
76800
The desired output is that the numpy array contains the correct value of 4295044096, which requires 64 bits to represent it. i.e. I would have expected numpy arrays to automatically upcast from int32 to int64 when the output requires it, rather maintaining a 32-bit output and wrapping back to 0 after the value of 2^32 is exceeded.
Of course, I can fix the problem manually by forcing int64 representation:
numpy_calc2 = 2**8 * arr.astype('int64')
but this is undesirable for general code, since the output will only need 64-bit representation (i.e. to hold large numbers) in some cases and not all. In my use case, performance is critical so forcing upcasting every time would be costly.
Is this the intended behaviour of numpy arrays? And if so, is there a clean, performant solution please?
Type casting and promotion in numpy is fairly complicated and occasionally surprising. This recent unofficial write-up by Sebastian Berg explains some of the nuances of the subject (mostly concentrating on scalars and 0d arrays).
Quoting from this document:
Python Integers and Floats
Note that python integers are handled exactly like numpy ones. They are, however, special in that they do not have a dtype associated with them explicitly. Value based logic, as described here, seems useful for python integers and floats to allow:
arr = np.arange(10, dtype=np.int8)
arr += 1
# or:
res = arr + 1
res.dtype == np.int8
which ensures that no upcast (for example with higher memory usage) occurs.
(emphasis mine.)
See also Allan Haldane's gist suggesting C-style type coercion, linked from the previous document:
Currently, when two dtypes are involved in a binary operation numpy's principle is that "the output dtype's range covers the range of both input dtypes", and when a single dtype is involved there is never any cast.
(emphasis again mine.)
So my understanding is that the promotion rules for numpy scalars and arrays differ, primarily because it's not feasible to check every element inside an array to determine whether casting can be done safely. Again from the former document:
Scalar based rules
Unlike arrays, where inspection of all values is not feasable, for scalars (and 0-D arrays) the value is inspected.
This would mean that you can either use np.int64 from the start to be safe (and if you're on linux then dtype=int will actually do this on its own), or check the maximum value of your arrays before suspect operations and determine if you have to promote the dtype yourself, on a case-by-case basis. I understand that this might not be feasible if you are doing a lot of calculations, but I don't believe there is a way around this considering numpy's current type promotion rules.

Python numpy array operations

I have some hard times learning Python array handling with numpy.
I have a .csv file which contains in one column unsigned integer data which represents binary values from an analog digital converter.
I would like to convert this unsigned integer values in 12 bit binary representation using Python inside a jupyter notebook.
I tried several ways of implementing it, but I still fail...
here is my code:
import pandas as pd
df = pd.read_csv('my_adc_values.csv', delimiter ='\s+', header=None, usecols=[19])
decimalValues = df.values
print(decimalValues.shape)
so far so good... I have all my adc data column values in the decimalValues numpy array.
Now, I would like to iterate through the array and convert the integers in the array to a binary representation:
import numpy as np
# destination array of shape of source array
binaryValues = np.zeros(decimalValues.shape)
for i in range(len(decimalValues)):
print(decimalValues[i])
binaryValues[i]=(bin(decimalValues[i]))
print(binaryValues)
With this code I get the error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-890444040b2e> in <module>()
6 for i in range(len(decimalValues)):
7 print(decimalValues[i])
----> 8 binaryValues[i]=(bin(decimalValues[i]))
9
10 print(binaryValues)
TypeError: only integer scalar arrays can be converted to a scalar index
I tried several different solutions, but none of them worked. It seems as if I have a massive misunderstanding of numpy arrays.
I'm looking for a tip on how to solve my described problem. I found some threads, describing the the mentioned error message. I suspected, it had something to do with the shape of the source/destination arrays. therefore, I initialized the destination array with the same shape as the source. It did not help...
Thank you,
Maik
Numpy is primarily for working with numeric data, it doesn't give you much benefit when you're working with strings. Numpy can convert integers to decimal or hexadecimal strings, using the numpy.char.mod function, which utilises the old % string interpolation operator. Unfortunately, that doesn't support binary output. We can create a Numpy vectorized function that uses the standard Python format function to do the conversion. This is better than bin, since you don't get the leading '0b', and you can specify the minimum length.
import numpy as np
# Make some fake numeric data
nums = (1 << np.arange(1, 10)) - 1
print(nums)
# Convert to 12 bit binary strings
func = np.vectorize(lambda n: format(n, '012b'))
bins = func(nums)
print(bins)
output
[ 1 3 7 15 31 63 127 255 511]
['000000000001' '000000000011' '000000000111' '000000001111' '000000011111'
'000000111111' '000001111111' '000011111111' '000111111111']
Alternatively, do the conversion using plain Python. You can convert the result back to a Numpy array, if you really need that. This code uses the str.format method, rather than the format function used by the previous version.
bins = list(map('{:012b}'.format, nums))
What is causing the error in your case is that you are trying to apply a bin function on a slice, whereas it can only be applied on a single value. You might need an extra for loop to iterate over column values. Try changing your code in this way:
for i in range(len(decimalValues)):
for j in range(decimalValues.shape[1]):
print(decimalValues[i])
binaryValues[i, j]=(bin(decimalValues[i, j]))
print(binaryValues)
Let me know if it works!

Are Decimal 'dtypes' available in NumPy?

Are Decimal data type objects (dtypes) available in NumPy?
>>> import decimal, numpy
>>> d = decimal.Decimal('1.1')
>>> s = [['123.123','23'],['2323.212','123123.21312']]
>>> ss = numpy.array(s, dtype=numpy.dtype(decimal.Decimal))
>>> a = numpy.array(s, dtype=float)
>>> type(d)
<class 'decimal.Decimal'>
>>> type(ss[1,1])
<class 'str'>
>>> type(a[1,1])
<class 'numpy.float64'>
I suppose numpy.array doesn't support every dtype, but I sort of thought that it would at least let a dtype propagate as far as it could as long as the right operations were defined. Am I missing something? Is there some way for this to work?
NumPy doesn't recognize decimal.Decimal as a specific type. The closest it can get is the most general dtype, object. So when converting the elements to the desired dtype, the conversion is a no operation.
>>> ss.dtype
dtype('object')
Keep in mind that because the elements of the array are Python objects, you won't get much of a speedup using them. For example, if you try to add this to any other array, the other elements will have to be boxed back into Python objects and added via the normal Python addition code. You might gain some speed in that the iteration will be in C, but not that much.
Unfortunately, you have to cast each of your items to Decimal when you create the numpy.array. Something like
s = [['123.123','23'],['2323.212','123123.21312']]
decimal_s = [[decimal.Decimal(x) for x in y] for y in s]
ss = numpy.array(decimal_s)
Important caveat: this is a bad answer
You would probably do best to skip to the next answer.
It seems that Decimal is available:
>>> import decimal, numpy
>>> d = decimal.Decimal('1.1')
>>> a = numpy.array([d,d,d],dtype=numpy.dtype(decimal.Decimal))
>>> type(a[1])
<class 'decimal.Decimal'>
I'm not sure exactly what you are trying to accomplish. Your example is more complicated than is necessary for simply creating a decimal NumPy array.

Categories

Resources