This question already has answers here:
numpy.sum() giving strange results on large arrays
(4 answers)
Closed 5 years ago.
I am using numpy like this code
>>>import numpy as np
>>>a=np.arange(1,100000001).sum()
>>>a
987459712
I guess the result must be some like
5000000050000000
I noticed that until five numbers the result is ok.
Does someone knows what is happened?
regards
Numpy is not doing a mistake here. This phenomenon is known as integer overflow.
x = np.arange(1,100000001)
print(x.sum()) # 987459712
print(x.dtype) # dtype('int32')
The 32 bit integer type used in arange for the given input simply cannot hold 5000000050000000. At most it can take 2147483647.
If you explicitly use a larger integer or floating point data type you get the expected result.
a = np.arange(1, 100000001, dtype='int64').sum()
print(a) # 5000000050000000
a = np.arange(1.0, 100000001.0).sum()
print(a) # 5000000050000000.0
I suspect you are using Windows, where the data type of the result is a 32 bit integer (while for those using, say, Mac OS X or Linux, the data type is 64 bit). Note that 5000000050000000 % (2**32) = 987459712
Try using
a = np.arange(1, 100000001, dtype=np.int64).sum()
or
a = np.arange(1, 100000001).sum(dtype=np.int64)
P.S. Anyone not using Windows can reproduce the result as follows:
>>> np.arange(1, 100000001).sum(dtype=np.int32)
987459712
Related
This question already has an answer here:
Different slices give different inequalities for same elements
(1 answer)
Closed 1 year ago.
import numpy as np
np.random.seed(2)
x = np.random.randn(1000000).astype('float32')
print(float(np.linalg.norm(x, keepdims=1)**2))
print(float(np.linalg.norm(x, keepdims=0)**2))
998428.125
998428.1084311157
Reproduced in Colab. Also, Colab outputs different values than my CPU:
998425.0625
998425.059075091
Removing **2, they match. Also reproduced with sum, haven't tried other methods.
Why this behavior? I can understand device dependence but keepdims seems buggy.
This is because after keepdims=0, your NumPy turns into a single dim float32, which after **2, becomes a float64. The other still has two axes and for some reason NumPy does not do this.
>>> print(
>>> np.linalg.norm(x, keepdims=1).dtype,
>>> np.linalg.norm(x, keepdims=0).dtype
>>> )
# Returns
float32 float32
>>> print(
>>> (np.linalg.norm(x, keepdims=1)**2).dtype,
>>> (np.linalg.norm(x, keepdims=0) ** 2).dtype
>>> )
# Returns
float32 float64
The version of numpy I'm using is 1.20.3.
I could not find any documentation of why this happens in the NumPy documentation. I think opening a GitHub issue in NumPy's repository, might be a good idea.
A trainee of mine made this short python script for a problem from Project Euler.
We are using Python 3.9.4
Problem: The series, 11 + 22 + 33 + ... + 1010 = 10405071317. Find the last ten digits of the series, 11 + 22 + 33 + ... + 10001000.
It isn't the best solution but in theory it should work for small values(I know it wont work for bigger values to inefficient and even double is too small.
Here is her Code:
import math
import numpy as np
x = np.arange(1,11)
a=[]
for y in x:
z = y**y
a.append(z)
b=sum(a)
print(a)
Output:
[1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489, 1410065408]
The script isn't finished yet obviously but you can see that every power 11, 22, 33 are correct up to 1010 which does not return the correct value.
Do you see any reason for this problem it seem quite odd to me? I could not figure it out.
Yes I know this isn't the best solution and in the end the actually solution will be different but it would still be nice to solve this mystery.
You're probably on some kind of 32-bit systems where numpy defaults to 32-bit integers. This caused the result to be "truncated" to 32 bits. You can verify the result with the following expression:
(10 ** 10) % (2 ** 32)
Use Python built-in int and range unless you need the fancy stuff numpy provides. It's an arbitrary-precision integer implementation and should work for all kinds of integer calculation workload.
The simple solution is to not use numpy unnecessarily. The built-in range works very well:
import math
a=[]
for y in range(1, 11):
z = y**y
a.append(z)
b=sum(a)
print(a)
print(b)
In your case, numpy was using 32 bit integers, which overflowed when they reached their maximum value.
I have some hard times learning Python array handling with numpy.
I have a .csv file which contains in one column unsigned integer data which represents binary values from an analog digital converter.
I would like to convert this unsigned integer values in 12 bit binary representation using Python inside a jupyter notebook.
I tried several ways of implementing it, but I still fail...
here is my code:
import pandas as pd
df = pd.read_csv('my_adc_values.csv', delimiter ='\s+', header=None, usecols=[19])
decimalValues = df.values
print(decimalValues.shape)
so far so good... I have all my adc data column values in the decimalValues numpy array.
Now, I would like to iterate through the array and convert the integers in the array to a binary representation:
import numpy as np
# destination array of shape of source array
binaryValues = np.zeros(decimalValues.shape)
for i in range(len(decimalValues)):
print(decimalValues[i])
binaryValues[i]=(bin(decimalValues[i]))
print(binaryValues)
With this code I get the error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-890444040b2e> in <module>()
6 for i in range(len(decimalValues)):
7 print(decimalValues[i])
----> 8 binaryValues[i]=(bin(decimalValues[i]))
9
10 print(binaryValues)
TypeError: only integer scalar arrays can be converted to a scalar index
I tried several different solutions, but none of them worked. It seems as if I have a massive misunderstanding of numpy arrays.
I'm looking for a tip on how to solve my described problem. I found some threads, describing the the mentioned error message. I suspected, it had something to do with the shape of the source/destination arrays. therefore, I initialized the destination array with the same shape as the source. It did not help...
Thank you,
Maik
Numpy is primarily for working with numeric data, it doesn't give you much benefit when you're working with strings. Numpy can convert integers to decimal or hexadecimal strings, using the numpy.char.mod function, which utilises the old % string interpolation operator. Unfortunately, that doesn't support binary output. We can create a Numpy vectorized function that uses the standard Python format function to do the conversion. This is better than bin, since you don't get the leading '0b', and you can specify the minimum length.
import numpy as np
# Make some fake numeric data
nums = (1 << np.arange(1, 10)) - 1
print(nums)
# Convert to 12 bit binary strings
func = np.vectorize(lambda n: format(n, '012b'))
bins = func(nums)
print(bins)
output
[ 1 3 7 15 31 63 127 255 511]
['000000000001' '000000000011' '000000000111' '000000001111' '000000011111'
'000000111111' '000001111111' '000011111111' '000111111111']
Alternatively, do the conversion using plain Python. You can convert the result back to a Numpy array, if you really need that. This code uses the str.format method, rather than the format function used by the previous version.
bins = list(map('{:012b}'.format, nums))
What is causing the error in your case is that you are trying to apply a bin function on a slice, whereas it can only be applied on a single value. You might need an extra for loop to iterate over column values. Try changing your code in this way:
for i in range(len(decimalValues)):
for j in range(decimalValues.shape[1]):
print(decimalValues[i])
binaryValues[i, j]=(bin(decimalValues[i, j]))
print(binaryValues)
Let me know if it works!
I have a python code where I'm using numpy to do some calculations.
In my code I have some integer variable 'N' I use as an index.
I then use scipy.io.savemat to use this index in my Matlab (2017a) code, and I get this error when doing something like that:
% N is the int variable I have from my python code
>> N
N =
int64
1792
>> (1:2)/N
Error using /
Integers can only be combined with integers of the same class, or scalar doubles.
Apparently Matlab "native integers" are of class 'double'. By which I mean:
>> N=3
N =
3
>> class(N)
ans =
'double'
Had I assigned 'N' in my matlab code I wouldn't have problems doing the above code. But I assign my variables in python then convert them to matlab. Trying to use numpy.double for 'N' in my python code results in numpy warnings e.g.:
>>> N = numpy.double(100)
>>> numpy.zeros(N)
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
To sum it up, I want to use 'N' as an integer in my python code, then as a double (or whatever works as a native matlab integer) in my matlab code.
I thought savemat would do this conversion for me, but for probably other good reasons it doesn't.
The obvious solution would be to convert the integer to double before or after the serialization. But since I have a huge dictionary with many different variables, that would require tracking the types of each, which I'd really like to avoid. I'm looking for a simple more "native" solution.
How would you suggest solving this matter? thanks
As followup to my comment I tried saving some values in numpy, and loading them with octave:
In [552]: from scipy import io
In [553]: io.savemat('test.mat',dict(N=100, fN=float(100),x=np.arange(100)))
In octave
>> load test.mat
>> (1:2)/N
ans =
0 0
>> (1:2)/fN
ans =
0.010000 0.020000
>> (1:2)/100
ans =
0.010000 0.020000
>> M=100
M = 100
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
M 1x1 8 double
N 1x1 4 int32
ans 1x2 16 double
fN 1x1 8 double
x 1x100 400 int32
So yes, saving Python numbers (and arrays) as floats is the closest thing to writing those values in MATLAB/Octave.
>> (1:2)/double(N)
ans =
0.010000 0.020000
also works to convert the imported values to float
I read the post is-floating-point-math-broken and get Why it
happens, but I couldn't find a solution that could help me..
How can I do the correct subtraction?
Python version 2.6.6, Numpy version 1.4.1.
I have two numpy.ndarray each one contain float32 values, origin and new. I'm trying to use numpy.subtract to subtract them but I get the following (odd) result:
>>> import numpy as
>>> with open('base_R.l_BSREM_S.9.1_001.bin', 'r+') as fid:
origin = np.fromfile(fid, np.float32)
>>> with open('new_R.l_BSREM_S.9.1_001.bin', 'r+') as fid:
new = np.fromfile(fid, np.float32)
>>> diff = np.subtract(origin, new)
>>> origin[5184939]
0.10000000149011611938
>>> new[5184939]
0.00000000023283064365
>>> diff[5184939]
0.10000000149011611938
Also when I try to subtract the arrays at 5184939 I get the same result as diff[5184939]
>>> origin[5184939] - new[5184939]
0.10000000149011611938
But when I do the following I get this results:
>>> 0.10000000149011611938 - 0.00000000023283064365
0.10000000125728548
and that's not equal to diff[5184939]
How the right subtraction can be done? (0.10000000125728548 is the one that I need)
Please help, and Thanks in advance
You might add your Python and numpy versions to the question.
Differences can arise from np.float32 v np.float64 dtype, the default Python float type, as well as display standards. numpy uses different display rounding than the underlying Python.
The subtraction itself does not differ.
I can reproduce the 0.10000000125728548 value, which may also display as 0.1 (out 8 decimals).
I'm not sure where the 0.10000000149011611938 comes from. That looks as though new[5184939] was identically 0, not just something small like 0.00000000023283064365.