Converting integers between Python (numpy) and Matlab

Converting integers between Python (numpy) and Matlab - python

I have a python code where I'm using numpy to do some calculations.
In my code I have some integer variable 'N' I use as an index.
I then use scipy.io.savemat to use this index in my Matlab (2017a) code, and I get this error when doing something like that:
% N is the int variable I have from my python code
>> N
N =
int64
1792
>> (1:2)/N
Error using /
Integers can only be combined with integers of the same class, or scalar doubles.
Apparently Matlab "native integers" are of class 'double'. By which I mean:
>> N=3
N =
3
>> class(N)
ans =
'double'
Had I assigned 'N' in my matlab code I wouldn't have problems doing the above code. But I assign my variables in python then convert them to matlab. Trying to use numpy.double for 'N' in my python code results in numpy warnings e.g.:
>>> N = numpy.double(100)
>>> numpy.zeros(N)
VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
To sum it up, I want to use 'N' as an integer in my python code, then as a double (or whatever works as a native matlab integer) in my matlab code.
I thought savemat would do this conversion for me, but for probably other good reasons it doesn't.
The obvious solution would be to convert the integer to double before or after the serialization. But since I have a huge dictionary with many different variables, that would require tracking the types of each, which I'd really like to avoid. I'm looking for a simple more "native" solution.
How would you suggest solving this matter? thanks

As followup to my comment I tried saving some values in numpy, and loading them with octave:
In [552]: from scipy import io
In [553]: io.savemat('test.mat',dict(N=100, fN=float(100),x=np.arange(100)))
In octave
>> load test.mat
>> (1:2)/N
ans =
0 0
>> (1:2)/fN
ans =
0.010000 0.020000
>> (1:2)/100
ans =
0.010000 0.020000
>> M=100
M = 100
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
M 1x1 8 double
N 1x1 4 int32
ans 1x2 16 double
fN 1x1 8 double
x 1x100 400 int32
So yes, saving Python numbers (and arrays) as floats is the closest thing to writing those values in MATLAB/Octave.
>> (1:2)/double(N)
ans =
0.010000 0.020000
also works to convert the imported values to float

Related

Ensure that calculations are done 64 bits (or at least warn of overflow)

I am using python and NumPy. I had the following basic quantity to compute:
(QL * (7**k))**2
Where
QL = 200003
k = 4
What puzzled me is that it returned a wrong (negative) number, which doesn't make sense. Then I realised after looking on the internet that the problem was because k was a 32-bit numpy integer.
A minimal working example can be the following:
QL = 200000
k = np.arange(10)[4]
print((QL * 7**k)**2)
This returns 406556672 instead of the correct answer 230592040000000000. The number is not negative here, but the same problem still occurs.
My question is:
How can I make sure that all the numbers used in my code are of the biggest possible integer size?
I don't want to explicitly specify it for each number that I create.
How can I at least force python to warn me when such things happen?

When you write QL = 200003; k = 4 in Python, the numbers are interpreted as ints. By default, if you were to convert these into numpy arrays or scalars, you would end up with whatever the default integer type is on your system.
Here is an example using one-element arrays:
QL = np.array([200003])
k = np.array([4])
On my system, I find that the dtype of both arrays is int32. You can change that by selecting your preferred dtype:
QL = np.array([200003], dtype=np.int64)
k = np.array([4], dtype=np.int64)
If you don't have access to the arrays at creation time, you can always convert them:
QL = QL.astype(np.int64)
k = k.astype(int64)
An option that is worth considering for integer math is skipping numpy altogether and using Python's infinite precision integers. If one of the numbers is a numpy scalar or one-element array, you can retrieve the corresponding Python object using the item method:
QL = QL.item()
k = k.item()
Numpy should raise at least a warning for overflow, but apparently this fails for some operations: https://github.com/numpy/numpy/issues/8987
TL;DR
In your case, k is a numpy scalar of type int32. You can do either one of the following:
For a numpy 64-bit result:
k = np.int64(k)
For an infinite-precision Python result:
k = k.item()
If you don't want to cast each k explicitly, you can create the range using the correct type:
k = np.arange(10, dtype=np.int64)[4]
There is no reliable way to set the default integer type for all new arrays without specifying it explicitly.

Python numpy array operations

I have some hard times learning Python array handling with numpy.
I have a .csv file which contains in one column unsigned integer data which represents binary values from an analog digital converter.
I would like to convert this unsigned integer values in 12 bit binary representation using Python inside a jupyter notebook.
I tried several ways of implementing it, but I still fail...
here is my code:
import pandas as pd
df = pd.read_csv('my_adc_values.csv', delimiter ='\s+', header=None, usecols=[19])
decimalValues = df.values
print(decimalValues.shape)
so far so good... I have all my adc data column values in the decimalValues numpy array.
Now, I would like to iterate through the array and convert the integers in the array to a binary representation:
import numpy as np
# destination array of shape of source array
binaryValues = np.zeros(decimalValues.shape)
for i in range(len(decimalValues)):
print(decimalValues[i])
binaryValues[i]=(bin(decimalValues[i]))
print(binaryValues)
With this code I get the error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-890444040b2e> in <module>()
6 for i in range(len(decimalValues)):
7 print(decimalValues[i])
----> 8 binaryValues[i]=(bin(decimalValues[i]))
9
10 print(binaryValues)
TypeError: only integer scalar arrays can be converted to a scalar index
I tried several different solutions, but none of them worked. It seems as if I have a massive misunderstanding of numpy arrays.
I'm looking for a tip on how to solve my described problem. I found some threads, describing the the mentioned error message. I suspected, it had something to do with the shape of the source/destination arrays. therefore, I initialized the destination array with the same shape as the source. It did not help...
Thank you,
Maik

Numpy is primarily for working with numeric data, it doesn't give you much benefit when you're working with strings. Numpy can convert integers to decimal or hexadecimal strings, using the numpy.char.mod function, which utilises the old % string interpolation operator. Unfortunately, that doesn't support binary output. We can create a Numpy vectorized function that uses the standard Python format function to do the conversion. This is better than bin, since you don't get the leading '0b', and you can specify the minimum length.
import numpy as np
# Make some fake numeric data
nums = (1 << np.arange(1, 10)) - 1
print(nums)
# Convert to 12 bit binary strings
func = np.vectorize(lambda n: format(n, '012b'))
bins = func(nums)
print(bins)
output
[ 1 3 7 15 31 63 127 255 511]
['000000000001' '000000000011' '000000000111' '000000001111' '000000011111'
'000000111111' '000001111111' '000011111111' '000111111111']
Alternatively, do the conversion using plain Python. You can convert the result back to a Numpy array, if you really need that. This code uses the str.format method, rather than the format function used by the previous version.
bins = list(map('{:012b}'.format, nums))

What is causing the error in your case is that you are trying to apply a bin function on a slice, whereas it can only be applied on a single value. You might need an extra for loop to iterate over column values. Try changing your code in this way:
for i in range(len(decimalValues)):
for j in range(decimalValues.shape[1]):
print(decimalValues[i])
binaryValues[i, j]=(bin(decimalValues[i, j]))
print(binaryValues)
Let me know if it works!

Convert double number to float number

I need to do some elaborations on MATLAB data in Python.
The data is stored as an array of doubles in Matlab. When I retrieve it, despite being stated here that double data types from Matlab are converted in float data types when handled by Python, I get this error:
TypeError: unorderable types: double() < float()
What I'm trying to do is this
import matlab.engine
eng=matlab.engine.connect_matlab()
x = eng.workspace['MyData']
x = x[len(x)-1]
if x < 0.01:
#do stuff
How can I convert the double number stored in the array to a float so that I can use it alongside my other Python variables?

Converting doubles into floats in Matlab is as simple as calling the single function:
A = rand(10);
whos A;
B = single(A);
whos B;
As per console output:
Name Size Bytes Class Attributes
A 10x10 800 double
Name Size Bytes Class Attributes
B 10x10 400 single
Be careful about the loss of precision, since you are converting 64 bit numeric values into 32 bit numeric values.
EDIT
Since you can't manipulate your Matlab data, in order to accomplish this I suggest you to use either Numpy (refer to this function in case: https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.ndarray.astype.html) or struct for a straight conversion (refer to this answer in case: convert double to float in Python).

Sum of consecutive integers in numpy is incorrect

In summing the first 100,000,000 positive integers using the following:
import numpy as np
np.arange(1,100000001).sum()
I return: 987459712, which does not match the formula: N(N+1)/2 for N=100000000. Namely, the formula returns 5000000050000000.
Before posting, I wrote the following, which returns True:
np.arange(1,65536).sum() == ((65535+1) * 65535)/2
However, the number 65536 seems to be a critical point, as
np.arange(1,65537).sum() == ((65536+1) * 65536)/2
returns False.
For integers greater than 65536 the code returns False, whereas integers below this threshold return True.
Could someone explain either what I've done wrong in calculating the sum, or what is going on with the code?

Seems like numpy sometimes has a hard time guessing the correct datatype.
On my system, Win 10 64-bit, Python 3.4.4, numpy 1.13.1:
>> np.arange(1, 100000001).sum()
987459712
>> np.arange(1, 100000001).dtype
dtype('int32')
But, if we "help" numpy it gets the correct result:
>> np.arange(1, 100000001, dtype=np.int64).sum()
500000005000000
The wrong result is obviously due to 32-bit integer overflowing.

It isn't really that numpy has a hard time guessing things, it's just that the default int type is the same as C long type:
int_: Default integer type (same as C long; normally either int64 or int32)
For windows systems, longs are 32bit, even on 64bit builds (see here for more) so that's what's used by default, int32.
As DeepSpace suggested, setting dtype to int64 does the trick. This can be done either in arange or in the sum method.
Additionally, you wrote:
Before posting, I wrote the following, which returns True:
np.arange(1,65536).sum() == ((65535+1) * 65535)/2
However, the number 65536 seems to be a critical point, as
np.arange(1,65537).sum() == ((65536+1) * 65536)/2
returns False.
and this is explained by the fact that the second sum exceeds int32's max value while the first doesn't:
>> np.arange(1,65536).sum() < np.iinfo(np.int32).max
True
>>> np.arange(1,65537).sum() < np.iinfo(np.int32).max
False
of course the Python calculation is correct due to Python 3's arbitrary precision ints.
This is why many of us weren't able to reproduce. On most Unixes the default int size for 64bit machines is int64 (since the C long is 64bits) therefore the sum of those ints was equal to the expected value.

Sum of positive numbers results in a negative number

I am using numpy to do the always fun "count the triangles in an adjacency matrix" task. (Given an nxn Adjacency matrix, how can one compute the number of triangles in the graph (Matlab)?)
Given my matrix A, numpy.matmul() computes the cube of A without problem, but for a large matrix numpy.trace() returns a negative number.
I extracted the diagonal using numpy.diagonal() and summed the entries using math.sum() and also using a for loop -- both returned the same negative number as numpy.trace().
An attempt with math.fsum() finally returned (the assumably correct) number 4,088,103,618 -- a seemingly small number for both python and for my 64-bit operating system, especially since python documents claim integer values are unlimited.
Surely this is an overflow or undefined behavior issue, but where does the inconsistency come from? I have performed the test on the following post to successfully validate my system architecture as 64 bit, and therefore numpy should also be a 64 bit package.
Do I have Numpy 32 bit or 64 bit?
To visualize the summation process print statements were added to the for-loop, output appears as follows with an asterisk marking the interesting line.
.
.
.
adding diag val 2013124 to the running total 2140898426 = 2142911550
adding diag val 2043358 to the running total 2142911550 = 2144954908
adding diag val 2035410 to the running total 2144954908 = 2146990318
adding diag val 2000416 to the running total 2146990318 = -2145976562 *
adding diag val 2062276 to the running total -2145976562 = -2143914286
adding diag val 2092890 to the running total -2143914286 = -2141821396
adding diag val 2092854 to the running total -2141821396 = -2139728542
.
.
.
Why would adding 2000416 to 2146990318 create an overflow? The sum is only 2148990734 -- a very small number for python!

Numpy doesn't use the "python types" but rather underlying C types which you have to specify that meets your needs. By default, an array of integers will be given the "int_" type which from the docs:
int_ Default integer type (same as C long; normally either int64 or int32)
Hence why you're seeing the overflow. You'll have to specify some other type when you construct your array so that it doesn't overflow.

When you do the addition with scalars you probably get a Warning:
>>> import numpy as np
>>> np.int32(2146990318) + np.int32(2035410)
RuntimeWarning: overflow encountered in long_scalars
-2145941568
So yes, it is overflow related. The maximum 32-bit integer is 2.147.483.647!
To make sure your arrays support a bigger range of values you could cast the array (I assume you operate on an array) to int64 (or a floating point value):
array = array.astype('int64') # makes sure the values are 64 bit integers
or when creating the array:
import numpy as np
array = np.array(something, dtype=np.int64)
NumPy uses fixed-size integers and these aren't arbitary precision integers. By default it's either a 32 bit integer or a 64 bit integer, which one depends on your system. For example Windows uses int32 even when python + numpy is compiled for 64-bit.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.