Converting Python Dictionary to 3D Matlab Matrix - python

I have the following dictionary results_dict in Python 3.2 where the key field is a string value and the value field is a list of 3 arrays. Each array has 400 float values. I want to convert this dictionary into a data structure that can be used in Matlab 2017b. However, if I execute the following:
savemat('GridCellResults.mat', results_dict, oned_as='row');
The command executes successfully but Matlab is not able to understand the matrix file. For this reason, I wrote the following code to convert the previous dictionary into a 3 Dimensional Matrix (X,Y,Z) where X is the size of the array (400 Elements) and Y is the number of arrays for each dictionary key (3 Arrays) and Z is the number of elements in the dictionary. However, when I execute the code below I get the following error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Here is the code. Any clue why I am getting this error. Also even if I try without the transpose function i keep getting the same error.
import numpy as np
CARDINALITY = 400 # Number of angular domain values.
NUM_COLUMNS = 3
NUM_CELLS = 114
matlab_array = np.zeros((CARDINALITY, NUM_COLUMNS, NUM_CELLS))
for key, value in results_dict.items():
matlab_array[:, 0, key] = np.transpose(value[0])
matlab_array[:, 1, key] = np.transpose(value[1])
matlab_array[:, 2, key] = np.transpose(value[2])

Trying to follow your description, I can successfully write and read such a dictionary
In an ipython session:
In [48]: from scipy.io import savemat, loadmat
In [49]: adict = {'a':[np.arange(3),np.ones(3),np.array([4,2,1])]}
In [50]: adict['b'] = [np.arange(3),np.ones(3),np.array([4,2,1])]
In [51]: adict
Out[51]:
{'a': [array([0, 1, 2]), array([1., 1., 1.]), array([4, 2, 1])],
'b': [array([0, 1, 2]), array([1., 1., 1.]), array([4, 2, 1])]}
In [52]: pwd
Out[52]: '/home/paul/mypy'
In [53]: savemat('stack48385062.mat',adict, oned_as='row')
In [54]: data = loadmat('stack48385062.mat')
In [55]: data
Out[55]:
{'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Mon Jan 22 09:15:31 2018',
'__version__': '1.0',
'a': array([[0., 1., 2.],
[1., 1., 1.],
[4., 2., 1.]]),
'b': array([[0., 1., 2.],
[1., 1., 1.],
[4., 2., 1.]])}
The lists of arrays (of constant size) were converted to 2d arrays.
In an Octave session:
>> load stack48385062.mat
>> a
a =
0 1 2
1 1 1
4 2 1
>> b
b =
0 1 2
1 1 1
4 2 1
>>
Or creating your 3d array (using a numeric index rather than string key):
In [56]: M=np.zeros([3, 3, 2])
In [57]: for i in range(len(adict)):
...: for j in range(3):
...: v = adict[list(adict.keys())[i]]
...: M[:, j, i] = v[j]
...:
In [58]: M
Out[58]:
array([[[0., 0.],
[1., 1.],
[4., 4.]],
[[1., 1.],
[1., 1.],
[2., 2.]],
[[2., 2.],
[1., 1.],
[1., 1.]]])
>> load stack48385062_1.mat
>> M
M =
ans(:,:,1) =
0 1 4
1 1 2
2 1 1
ans(:,:,2) =
0 1 4
1 1 2
2 1 1
I should have made the initial dictionary with a list of 3 of 4 element arrays, so it would be easier to track track transpositions. MATLAB and numpy have different axis orders, which can be confusing. savemat tries to compensate.

Related

Assigning values to overwrite python array

I am trying to assign new values to an array based on whether or not the stored value is <3. Coming from an R background this is how I would do it, but this gives me a syntax error in Python. What am I doing wrong, and what is the Python approach?
eurx=[1,2,3,4,5,6,7,'a',8]
sma50=3
tw=eurx
tw[eurx<sma50]=-1
tw[eurx>=sma50]=1
tw[(tw!=1)||(tw!=-1)]=0
print(tw)
GOAL:
-1
-1
1
1
1
1
1
0
1
This is "too much R". A pythonic way would be to use functional filtering:
>>> map(lambda i: -2*int(i<sma50)+1 if type(i) == int else 0, eurx)
[-1, -1, 1, 1, 1, 1, 1, 0, 1]
Or just a simple for-loop with a few ifs:
>>> for i in eurx:
... if type(i) != int:
... print 0
... else:
... print -2*int(i<sma50)+1
...
-1
-1
1
1
1
1
1
0
1
In general: don't try to guess the syntax. It's very simple, just read through some tutorials (e.g. https://docs.python.org/3/tutorial/introduction.html#first-steps-towards-programming)
Edit: the int conversion hack works as follows: you know you can convert bool to int, right?
>>> int(True)
1
>>> int(False)
0
If i<sma50 evaluates to True, int(i<sma50) will be 1. So yor numbers now are converted to ones if i is smaller than sma50 and to zeros otherwise. But apparently you want the values (-1, 1) instead of (1, 0). Just apply the transform -2x+1 and you're done!
Your desired syntax is pretty close to what you'd write in numpy.
The heterogeneous list doesn't make it easy, but here's an example:
>>> import numpy as np
>>> eurx=[1,2,3,4,5,6,7,'a',8]
>>> sma50 = 3
>>> tw = np.array([i if isinstance(i, int) else np.nan for i in eurx])
>>> tw
array([ 1., 2., 3., 4., 5., 6., 7., nan, 8.])
>>> tw[tw < sma50] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
>>> tw[tw >= sma50] = 1
__main__:1: RuntimeWarning: invalid value encountered in greater_equal
>>> tw
array([ -1., -1., 1., 1., 1., 1., 1., nan, 1.])
>>> tw[np.isnan(tw)] = 0
>>> tw
array([-1., -1., 1., 1., 1., 1., 1., 0., 1.])

Efficient way to compare the values of 3 lists in Python?

I have 3 lists with similar float values in a1, a2, a3 (whose lengths are equal).
for i in length(a1,a2,a3):
Find the increasing / decreasing order of a1[i], a2[i], a3[i]
Rank the values based on the order
Is there a simple/efficient way to do this? Rather than writing blocks of if-else statements?
I am trying to calculate the Friedman test ranks in Python. Though there is a scipy.stats.friedmanchisquare function, it doesn't return the ranks The Friedman test
EDIT
I have data like this in the Image 1.
a1 has week 1
a2 has week 2 and
a3 has week 3
I want to rank the values like in this Image 2
I tried comparing the values by using if else loops like this
for i in range(0,10):
if(acc1[i]>acc2[i]):
if(acc1[i]>acc3[i]):
rank1[i] = 1
if(acc2[i]>acc3[i]):
rank2[i] = 2
rank3[i] = 3
friedmanchisquare uses scipy.stats.rankdata. Here's one way you could use rankdata with your three lists. It creates a list called ranks, where ranks[i] is an array containing the ranking of [a1[i], a2[i], a3[i]].
In [41]: a1
Out[41]: [1.0, 2.4, 5.0, 6]
In [42]: a2
Out[42]: [9.0, 5.0, 4, 5.0]
In [43]: a3
Out[43]: [5.0, 6.0, 7.0, 2.0]
In [44]: from scipy.stats import rankdata
In [45]: ranks = [rankdata(row) for row in zip(a1, a2, a3)]
In [46]: ranks
Out[46]:
[array([ 1., 3., 2.]),
array([ 1., 2., 3.]),
array([ 2., 1., 3.]),
array([ 3., 2., 1.])]
If you convert that to a single numpy array, you can then easily work with either the rows or columns of ranks:
In [47]: ranks = np.array(ranks)
In [48]: ranks
Out[48]:
array([[ 1., 3., 2.],
[ 1., 2., 3.],
[ 2., 1., 3.],
[ 3., 2., 1.]])
In [49]: ranks.sum(axis=0)
Out[49]: array([ 7., 8., 9.])
You could define a simple function that returns the order of the sorts:
def sort3(a,b,c):
if (a >= b):
if (b >= c):
return (1, 2, 3)
elif (a >= c):
return (1, 3, 2)
else:
return (3, 1, 2)
elif (b >= c):
if (c >= a):
return (2, 3, 1)
else:
return (2, 1, 3)
else:
return (3, 2, 1)
Or consider using this https://stackoverflow.com/a/3382369/3224664
def argsort(seq):
# http://stackoverflow.com/questions/3071415/efficient-method-to-calculate-the-rank-vector-of-a-list-in-python
return sorted(range(len(seq)), key=seq.__getitem__)
a = [1,3,5,7]
b = [2,2,2,6]
c = [3,1,4,8]
for i in range(len(a)):
print(argsort([a[i],b[i],c[i]]))

Python: convert numpy array of signs to int and back

I'm trying to convert from a numpy array of signs (i.e., a numpy array whose entries are either 1. or -1.) to an integer and back through a binary representation. I have something that works, but it's not Pythonic, and I expect it'll be slow.
def sign2int(s):
s[s==-1.] = 0.
bstr = ''
for i in range(len(s)):
bstr = bstr + str(int(s[i]))
return int(bstr, 2)
def int2sign(i, m):
bstr = bin(i)[2:].zfill(m)
s = []
for d in bstr:
s.append(float(d))
s = np.array(s)
s[s==0.] = -1.
return s
Then
>>> m = 4
>>> s0 = np.array([1., -1., 1., 1.])
>>> i = sign2int(s0)
>>> print i
11
>>> s = int2sign(i, m)
>>> print s
[ 1. -1. 1. 1.]
I'm concerned about (1) the for loops in each and (2) having to build an intermediate representation as a string.
Ultimately, I will want something that works with a 2-d numpy array, too---e.g.,
>>> s = np.array([[1., -1., 1.], [1., 1., 1.]])
>>> print sign2int(s)
[5, 7]
For 1d arrays you can use this one linear Numpythonic approach, using np.packbits:
>>> np.packbits(np.pad((s0+1).astype(bool).astype(int), (8-s0.size, 0), 'constant'))
array([11], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([11], dtype=np.uint8))[-4:]).astype(float)
>>> unpack[unpack==0] = -1
>>> unpack
array([ 1., -1., 1., 1.])
And for 2d array:
>>> x, y = s.shape
>>> np.packbits(np.pad((s+1).astype(bool).astype(int), (8-y, 0), 'constant')[-2:])
array([5, 7], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([5, 7], dtype='uint8'))).astype(float).reshape(x, 8)[:,-y:]
>>> unpack[unpack==0] = -1
>>> unpack
array([[ 1., -1., 1.],
[ 1., 1., 1.]])
I'll start with sig2int.. Convert from a sign representation to binary
>>> a
array([ 1., -1., 1., -1.])
>>> (a + 1) / 2
array([ 1., 0., 1., 0.])
>>>
Then you can simply create an array of powers of two, multiply it by the binary and sum.
>>> powers = np.arange(a.shape[-1])[::-1]
>>> np.power(2, powers)
array([8, 4, 2, 1])
>>> a = (a + 1) / 2
>>> powers = np.power(2, powers)
>>> a * powers
array([ 8., 0., 2., 0.])
>>> np.sum(a * powers)
10.0
>>>
Then make it operate on rows by adding axis information and rely on broadcasting.
def sign2int(a):
# powers of two
powers = np.arange(a.shape[-1])[::-1]
np.power(2, powers, powers)
# sign to "binary" - add one and divide by two
np.add(a, 1, a)
np.divide(a, 2, a)
# scale by powers of two and sum
np.multiply(a, powers, a)
return np.sum(a, axis = -1)
>>> b = np.array([a, a, a, a, a])
>>> sign2int(b)
array([ 11., 11., 11., 11., 11.])
>>>
I tried it on a 4 by 100 bit array and it seemed fast
>>> a = a.repeat(100)
>>> b = np.array([a, a, a, a, a])
>>> b
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> sign2int(b)
array([ 2.58224988e+120, 2.58224988e+120, 2.58224988e+120,
2.58224988e+120, 2.58224988e+120])
>>>
I'll add the reverse if i can figure it. - the best I could do relies on some plain Python without any numpy vectoriztion magic and I haven't figured how to make it work with a sequence of ints other than to iterate over them and convert them one at a time - but the time still seems acceptable.
def foo(n):
'''yields bits in increasing powers of two
bit sequence from lsb --> msb
'''
while n > 0:
n, r = divmod(n, 2)
yield r
def int2sign(n):
n = int(n)
a = np.fromiter(foo(n), dtype = np.int8, count = n.bit_length())
np.multiply(a, 2, a)
np.subtract(a, 1, a)
return a[::-1]
Works on 1324:
>>> bin(1324)
'0b10100101100'
>>> a = int2sign(1324)
>>> a
array([ 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1], dtype=int8)
Seems to work with 1.2e305:
>>> n = int(1.2e305)
>>> n.bit_length()
1014
>>> a = int2sign(n)
>>> a.shape
(1014,)
>>> s = bin(n)
>>> s = s[2:]
>>> all(2 * int(x) -1 == y for x, y in zip(s, a))
True
>>>
Here are some vectorized versions of your functions:
def sign2int(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def int2sign(i, m):
tmp = np.array(list(bin(i)[2:].zfill(m)))
return np.where(tmp == "0", "-1", tmp).astype(int)
s0 = np.array([1., -1., 1., 1.])
sign2int(s0)
# 11
int2sign(11, 5)
# array([-1, 1, -1, 1, 1])
To use your functions on 2-d arrays, you can use map function:
s = np.array([[1., -1., 1.], [1., 1., 1.]])
map(sign2int, s)
# [5, 7]
map(lambda x: int2sign(x, 4), [5, 7])
# [array([-1, 1, -1, 1]), array([-1, 1, 1, 1])]
After a bit of testing, the Numpythonic approach of #wwii that doesn't use strings seems to fit what I need best. For the int2sign, I used a for-loop over the exponents with a standard algorithm for the conversion---which will have at most 64 iterations for 64-bit integers. Numpy's broadcasting happens across each integer very efficiently.
packbits and unpackbits are restricted to 8-bit integers; otherwise, I suspect that would've been the best (though I didn't try).
Here are the specific implementations I tested that follow the suggestions in the other answers (thanks to everyone!):
def _sign2int_str(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def sign2int_str(s):
return np.array(map(_sign2int_str, s))
def _int2sign_str(i, m):
tmp = np.array(list(bin(i)[2:])).astype(int)
return np.pad(np.where(tmp == 0, -1, tmp), (m - len(tmp), 0), "constant", constant_values = -1)
def int2sign_str(i,m):
return np.array(map(lambda x: _int2sign_str(x, m), i.astype(int).tolist())).transpose()
def sign2int_np(s):
p = np.arange(s.shape[-1])[::-1]
s = s + 1
return np.sum(np.power(s, p), axis = -1).astype(int)
def int2sign_np(i,m):
N = i.shape[-1]
S = np.zeros((m, N))
for k in range(m):
b = np.power(2, m - 1 - k).astype(int)
S[k,:] = np.divide(i.astype(int), b).astype(float)
i = np.mod(i, b)
S[S==0.] = -1.
return S
And here is my test:
X = np.sign(np.random.normal(size=(5000, 20)))
N = 100
t = time.time()
for i in range(N):
S = sign2int_np(X)
print 'sign2int_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
S = sign2int_str(X)
print 'sign2int_str: \t{:10.8f} sec'.format((time.time() - t)/N)
m = 20
S = np.random.randint(0, high=np.power(2,m), size=(5000,))
t = time.time()
for i in range(N):
X = int2sign_np(S, m)
print 'int2sign_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
X = int2sign_str(S, m)
print 'int2sign_str: \t{:10.8f} sec'.format((time.time() - t)/N)
This produced the following results:
sign2int_np: 0.00165325 sec
sign2int_str: 0.04121902 sec
int2sign_np: 0.00318024 sec
int2sign_str: 0.24846984 sec
I think numpy.packbits is worth another look. Given a real-valued sign array a, you can use numpy.packbits(a > 0). Decompression is done by numpy.unpackbits. This implicitly flattens multi-dimensional arrays so you'll need to reshape after unpackbits if you have a multi-dimensional array.
Note that you can combine bit packing with conventional compression (e.g., zlib or lzma). If there is a pattern or bias to your data, you may get a useful compression factor, but for unbiased random data, you'll typically see a moderate size increase.

Save one-hot-encoded features into Pandas DataFrame the fastest way

I have a Pandas DataFrame with all my features and labels. One of my feature is categorical and needs to be one-hot-encoded.
The feature is an integer and can only have values from 0 to 4
To save those arrays back in my DataFrame I use the following code
# enc is my OneHotEncoder object
df['mycol'] = df['mycol'].map(lambda x: enc.transform(x).toarray())
My DataFrame has more than 1 million rows so the above code takes a while.Is there a faster way to assign the arrays to the DataFrame cells? Because I have just 5 categories i dont need to call the transform() function 1 million times.
I already tried something like
num_categories = 5
i = 0
while (i<num_categories):
df.loc[df['mycol'] == i, 'mycol'] = enc.transform(i).toarray()
i += 1
Which yields this error
ValueError: Must have equal len keys and value when setting with an ndarray
You can use pd.get_dummies:
>>> s
0 a
1 b
2 c
3 a
dtype: object
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
Alternatively:
>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> a = np.array([1, 1, 3, 2, 2]).reshape(-1, 1)
>>> a
array([[1],
[1],
[3],
[2],
[2]]
>>> one_hot = enc.fit_transform(a)
>>> one_hot.toarray()
array([[ 1., 0., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.]])

Combine numpy arrays to form a matrix

This seems like it should be straightforward, but I can't figure it out.
Data source is a two column, comma delimited input file with these contents:
6,10
5,9
8,13
...
And my code is:
import numpy as np
data = np.loadtxt("data.txt", delimiter=",")
m = len(data)
x = np.reshape(data[:,0], (m,1))
y = np.ones((m,1))
z = np.matrix([x,y])
Which gives me this error:
Users/acpigeon/.virtualenvs/ipynb/lib/python2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/matrixlib/defmatrix.pyc in __new__(subtype, data, dtype, copy)
270 shape = arr.shape
271 if (ndim > 2):
--> 272 raise ValueError("matrix must be 2-dimensional")
273 elif ndim == 0:
274 shape = (1, 1)
ValueError: matrix must be 2-dimensional
No amount of reshaping seems to get this to work, so I'm either missing something really simple or there's a better way to do this.
EDIT:
Would have been helpful to specify the output I am looking for. Here is a line of code that generates the desired result:
In [1]: np.matrix([[5,1],[6,1],[8,1]])
Out[1]:
matrix([[5, 1],
[6, 1],
[8, 1]])
The desired output can be generated this way:
In [12]: np.array((data[:, 0], np.ones(m))).transpose()
Out[12]:
array([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])
The above is copied from ipython and so has ipython style prompts.
Answer to previous version
To eliminate the error, replace:
x = np.reshape(data[:, 0], (m, 1))
with:
x = data[:, 0]
The former line produces a 2-dimensional matrix and that is what causes the error message. The latter produces a 1-D array with the same data.
Or how about first turning the array into a matrix, and then change the last column to 1?
In [2]: data=np.loadtxt('stack23859379.txt',delimiter=',')
In [3]: np.matrix(data)
Out[3]:
matrix([[ 6., 10.],
[ 5., 9.],
[ 8., 13.]])
In [4]: z = np.matrix(data)
In [5]: z[:,1]=1
In [6]: z
Out[6]:
matrix([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])

Categories

Resources