I came across an oddity when loading a .mat-file created in Matlab into Python with scipy.io.loadmat. I found similar 'array structures' being alluded to in other posts, but none explaining them. Also, I found ways to work around this oddity, but I would like to understand why Python (or scipy.io.loadmat) handles files this way.
Let's say I create a cell in Matlab and save it:
my_data = cell(dim1, dim2);
% Fill my_data with strings and floats...
save('my_data.mat','my_data')
Now I load it into Python:
import scipy.io as sio
data = sio.loadmat('my_data.mat')['my_data']
Now data has type numpy.ndarray and dtype object. When I look at a slice, it might look something like this:
data[0]
>>> array([array(['Some descriptive string'], dtype='<U13'),
array([[3.141592]]), array([[2.71828]]), array([[4.66920]]), etc.
], dtype=object).
Why is this happening? Why does Python/sio.loadmat create an array of single-element arrays, rather than an array of floats (assuming I remove the first column, which contain strings)?
I'm sorry if my question is basic, but I'd really like to understand what seems like an unnecessary complication.
As said in the comments:
This behaviour comes because you are saving a cell, an "array" that can contain anything inside. You fill this with matrices of size 1x1 (floats).
That is what python is giving you. an nparray of dtype=object that has inside 1x1 arrays.
Python is doing exactly what MATLAB was doing. For this example, you should just avoid using cells in MATLAB.
Related
I have a Python program that needs to pass an array to a .dll that is expecting an array of c doubles. This is currently done by the following code, which is the fastest method of conversion I could find:
from array import array
from ctypes import *
import numpy as np
python_array = np.array(some_python_array)
temp = array('d', python_array.astype('float'))
c_double_array = (c_double * len(temp)).from_buffer(temp)
...where 'np.array' is just there to show that in my case the python_array is a numpy array. Let's say I now have two c_double arrays: c_double_array_a and c_double_array_b, the issue I'm having is I would like to append c_double_array_b to c_double_array_a without reconverting to/from whatever python typically uses for arrays. Is there a way to do this with the ctypes library?
I've been reading through the docs here but nothing seems to detail combining two c_type arrays after creation. It is very important in my program that they can be combined after creation, of course it would be trivial to just append python_array_b to python_array_a and then convert but that won't work in my case.
Thanks!
P.S. if anyone knows a way to speed up the conversion code that would also be greatly appreciated, it takes on the order of 150ms / million elements in the array and my program typically handles 1-10 million elements at a time.
Leaving aside the construction of the ctypes arrays (for which Mark's comment is surely relevant), the issue is that C arrays are not resizable: you can't append or extend them. (There do exist wrappers that provide these features, which may be useful references in constructing this.) What you can do is make a new array of the size of the two existing arrays together and then ctypes.memmove them into it. It might be possible to improve the performance by using realloc, but you'd have to go even lower than normal ctypes memory management to use it.
I am fairly new to using tensorflow so it is possible there is a very obvious solution to my problem that I am missing. I currently have a 3-dimensional array filled with integer values. the specific values are not important so I have put in a smaller array with filler values for the sake of this question
`Array = tf.constant([[[0,0,1000,0],[3000,3000,3000,3000],[0,2500,0,0]],
[[100,200,300,400],[0,0,0,100],[300,300,400,300]]]).eval()`
So the array looks like this when printed I believe.
`[[[0,0,1000,0],
[3000,3000,3000,3000],
[0,2500,0,0]],
[[100,200,300,400],
[0,0,0,100],
[300,300,400,300]]]`
In reality this array has 23 2-D arrays stacked on top of each other. What I want to do is to create an array or 3 separate arrays that contain the range of values in each row of different levels of the 3-D array.
Something like
`Xrange = tf.constant([Array[0,0,:].range(),Array[1,0,:].range(),Array[2,0,:].range()...,Array[22,0,:].range()])`
Firstly, I am having trouble finding a functioning set of commands strung together using tensorflow that allows me to find the range of the row. I know how to do this easily in numpy but have yet to find any way to do this. Secondly, assuming there is a way to do the above, is there a way to consolidate the code without having to write it out 23 times within one line for each unique row. I know that could simply be done with a for loop, but I would also like to avoid using a solution that requires a loop. Is there a good way to do this, or is more information needed? Also please let me know if I'm screwing up my syntax since I'm still fairly new to both python and tensorflow.
So as I expected, my question has a reasonably simple answer. All that was necessary was to use the tf.reduce_max and tf.reduce_min commands
The code I finally ended with looks like:
Range = tf.subtract(tf.reduce_max(tf.constant(Array),axis=2,keep_dims=True),tf.reduce_min(tf.constant(Array),axis=2,keep_dims=True))
This produced:
[[[1000]
[ 0]
[2500]]
[[ 300]
[ 100]
[ 100]]]
If I construct a numpy matrix like this:
A = array([[1,2,3],[4,5,6]])
and then type A.shape I get the result:
(2L, 3L)
Why am I getting a shape with the format long?
I can restart everything and I still have the same problem. And as far as I can see, it is only when I construct arrays I have this problem, otherwise I get short (regular) integers.
As #CédricJulien puts it on the comment, there is no problem with long numbers in this case - this should be treated as an implementation detail.
The real answer for your question can, of course, only be found inside numpy's source code, but the fact that the dimensions are long in this case should not matter for any use you have for the arrays or these indexes.
I have data in the format of 10000x500 matrix contained in a .txt file. In each row, data points are separated from each other by one whitespace and at the end of each row there a new line starts.
Normally I was able to read this kind of multidimensional array data into Python by using the following snippet of code:
with open("position.txt") as f:
data = [line.split() for line in f]
# Get the data and convert to floats
ytemp = np.array(data)
y = ytemp.astype(np.float)
This code worked until now. When I try to use the exact some code with another set of data formatted in the same way, I get the following error:
setting an array element with a sequence.
When I try to get the 'shape' of ytemp, it gives me the following:
(10001,)
So it converts the rows to array, but not the columns.
I thought of any other information to include, but nothing came to my mind. Basically I'm trying to convert my data from a .txt file to a multidimensional array in Python. The code worked before, but now for some reason that is unclear to me it doesn't work. I tried to look compare the data, of course it's huge, but everything seems quite similar between the data that is working and the data that is not working.
I would be more than happy to provide any other information you may need. Thanks in advance.
Use numpy's builtin function:
data = numpy.loadtxt('position.txt')
Check out the documentation to explore other available options.
If I construct a numpy matrix like this:
A = array([[1,2,3],[4,5,6]])
and then type A.shape I get the result:
(2L, 3L)
Why am I getting a shape with the format long?
I can restart everything and I still have the same problem. And as far as I can see, it is only when I construct arrays I have this problem, otherwise I get short (regular) integers.
As #CédricJulien puts it on the comment, there is no problem with long numbers in this case - this should be treated as an implementation detail.
The real answer for your question can, of course, only be found inside numpy's source code, but the fact that the dimensions are long in this case should not matter for any use you have for the arrays or these indexes.