I have the following dictionary results_dict in Python 3.2 where the key field is a string value and the value field is a list of 3 arrays. Each array has 400 float values. I want to convert this dictionary into a data structure that can be used in Matlab 2017b. However, if I execute the following:
savemat('GridCellResults.mat', results_dict, oned_as='row');
The command executes successfully but Matlab is not able to understand the matrix file. For this reason, I wrote the following code to convert the previous dictionary into a 3 Dimensional Matrix (X,Y,Z) where X is the size of the array (400 Elements) and Y is the number of arrays for each dictionary key (3 Arrays) and Z is the number of elements in the dictionary. However, when I execute the code below I get the following error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Here is the code. Any clue why I am getting this error. Also even if I try without the transpose function i keep getting the same error.
import numpy as np
CARDINALITY = 400 # Number of angular domain values.
matlab_array = np.zeros((CARDINALITY, NUM_COLUMNS, NUM_CELLS))
for key, value in results_dict.items():
matlab_array[:, 0, key] = np.transpose(value[0])
matlab_array[:, 1, key] = np.transpose(value[1])
matlab_array[:, 2, key] = np.transpose(value[2])
Trying to follow your description, I can successfully write and read such a dictionary
In an ipython session:
In [48]: from scipy.io import savemat, loadmat
In [49]: adict = {'a':[np.arange(3),np.ones(3),np.array([4,2,1])]}
In [50]: adict['b'] = [np.arange(3),np.ones(3),np.array([4,2,1])]
In [51]: adict
{'a': [array([0, 1, 2]), array([1., 1., 1.]), array([4, 2, 1])],
'b': [array([0, 1, 2]), array([1., 1., 1.]), array([4, 2, 1])]}
In [52]: pwd
Out[52]: '/home/paul/mypy'
In [53]: savemat('stack48385062.mat',adict, oned_as='row')
In [54]: data = loadmat('stack48385062.mat')
In [55]: data
{'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Mon Jan 22 09:15:31 2018',
'__version__': '1.0',
'a': array([[0., 1., 2.],
[1., 1., 1.],
[4., 2., 1.]]),
'b': array([[0., 1., 2.],
[1., 1., 1.],
[4., 2., 1.]])}
The lists of arrays (of constant size) were converted to 2d arrays.
In an Octave session:
>> load stack48385062.mat
>> a
a =
0 1 2
1 1 1
4 2 1
>> b
b =
0 1 2
1 1 1
4 2 1
Or creating your 3d array (using a numeric index rather than string key):
In [56]: M=np.zeros([3, 3, 2])
In [57]: for i in range(len(adict)):
...: for j in range(3):
...: v = adict[list(adict.keys())[i]]
...: M[:, j, i] = v[j]
In [58]: M
array([[[0., 0.],
[1., 1.],
[4., 4.]],
[[1., 1.],
[1., 1.],
[2., 2.]],
[[2., 2.],
[1., 1.],
[1., 1.]]])
>> load stack48385062_1.mat
>> M
M =
ans(:,:,1) =
0 1 4
1 1 2
2 1 1
ans(:,:,2) =
0 1 4
1 1 2
2 1 1
I should have made the initial dictionary with a list of 3 of 4 element arrays, so it would be easier to track track transpositions. MATLAB and numpy have different axis orders, which can be confusing. savemat tries to compensate.
I am trying to assign new values to an array based on whether or not the stored value is <3. Coming from an R background this is how I would do it, but this gives me a syntax error in Python. What am I doing wrong, and what is the Python approach?
This is "too much R". A pythonic way would be to use functional filtering:
>>> map(lambda i: -2*int(i<sma50)+1 if type(i) == int else 0, eurx)
[-1, -1, 1, 1, 1, 1, 1, 0, 1]
Or just a simple for-loop with a few ifs:
>>> for i in eurx:
... if type(i) != int:
... print 0
... else:
... print -2*int(i<sma50)+1
In general: don't try to guess the syntax. It's very simple, just read through some tutorials (e.g. https://docs.python.org/3/tutorial/introduction.html#first-steps-towards-programming)
Edit: the int conversion hack works as follows: you know you can convert bool to int, right?
>>> int(True)
>>> int(False)
If i<sma50 evaluates to True, int(i<sma50) will be 1. So yor numbers now are converted to ones if i is smaller than sma50 and to zeros otherwise. But apparently you want the values (-1, 1) instead of (1, 0). Just apply the transform -2x+1 and you're done!
Your desired syntax is pretty close to what you'd write in numpy.
The heterogeneous list doesn't make it easy, but here's an example:
>>> import numpy as np
>>> eurx=[1,2,3,4,5,6,7,'a',8]
>>> sma50 = 3
>>> tw = np.array([i if isinstance(i, int) else np.nan for i in eurx])
>>> tw
array([ 1., 2., 3., 4., 5., 6., 7., nan, 8.])
>>> tw[tw < sma50] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
>>> tw[tw >= sma50] = 1
__main__:1: RuntimeWarning: invalid value encountered in greater_equal
>>> tw
array([ -1., -1., 1., 1., 1., 1., 1., nan, 1.])
>>> tw[np.isnan(tw)] = 0
>>> tw
array([-1., -1., 1., 1., 1., 1., 1., 0., 1.])
I have 3 lists with similar float values in a1, a2, a3 (whose lengths are equal).
for i in length(a1,a2,a3):
Find the increasing / decreasing order of a1[i], a2[i], a3[i]
Rank the values based on the order
Is there a simple/efficient way to do this? Rather than writing blocks of if-else statements?
I am trying to calculate the Friedman test ranks in Python. Though there is a scipy.stats.friedmanchisquare function, it doesn't return the ranks The Friedman test
I have data like this in the Image 1.
a1 has week 1
a2 has week 2 and
a3 has week 3
I want to rank the values like in this Image 2
I tried comparing the values by using if else loops like this
for i in range(0,10):
rank1[i] = 1
rank2[i] = 2
rank3[i] = 3
friedmanchisquare uses scipy.stats.rankdata. Here's one way you could use rankdata with your three lists. It creates a list called ranks, where ranks[i] is an array containing the ranking of [a1[i], a2[i], a3[i]].
In [41]: a1
Out[41]: [1.0, 2.4, 5.0, 6]
In [42]: a2
Out[42]: [9.0, 5.0, 4, 5.0]
In [43]: a3
Out[43]: [5.0, 6.0, 7.0, 2.0]
In [44]: from scipy.stats import rankdata
In [45]: ranks = [rankdata(row) for row in zip(a1, a2, a3)]
In [46]: ranks
[array([ 1., 3., 2.]),
array([ 1., 2., 3.]),
array([ 2., 1., 3.]),
array([ 3., 2., 1.])]
If you convert that to a single numpy array, you can then easily work with either the rows or columns of ranks:
In [47]: ranks = np.array(ranks)
In [48]: ranks
array([[ 1., 3., 2.],
[ 1., 2., 3.],
[ 2., 1., 3.],
[ 3., 2., 1.]])
In [49]: ranks.sum(axis=0)
Out[49]: array([ 7., 8., 9.])
You could define a simple function that returns the order of the sorts:
def sort3(a,b,c):
if (a >= b):
if (b >= c):
return (1, 2, 3)
elif (a >= c):
return (1, 3, 2)
return (3, 1, 2)
elif (b >= c):
if (c >= a):
return (2, 3, 1)
return (2, 1, 3)
return (3, 2, 1)
Or consider using this https://stackoverflow.com/a/3382369/3224664
def argsort(seq):
# http://stackoverflow.com/questions/3071415/efficient-method-to-calculate-the-rank-vector-of-a-list-in-python
return sorted(range(len(seq)), key=seq.__getitem__)
a = [1,3,5,7]
b = [2,2,2,6]
c = [3,1,4,8]
for i in range(len(a)):
I'm trying to convert from a numpy array of signs (i.e., a numpy array whose entries are either 1. or -1.) to an integer and back through a binary representation. I have something that works, but it's not Pythonic, and I expect it'll be slow.
def sign2int(s):
s[s==-1.] = 0.
bstr = ''
for i in range(len(s)):
bstr = bstr + str(int(s[i]))
return int(bstr, 2)
def int2sign(i, m):
bstr = bin(i)[2:].zfill(m)
s = []
for d in bstr:
s = np.array(s)
s[s==0.] = -1.
return s
>>> m = 4
>>> s0 = np.array([1., -1., 1., 1.])
>>> i = sign2int(s0)
>>> print i
>>> s = int2sign(i, m)
>>> print s
[ 1. -1. 1. 1.]
I'm concerned about (1) the for loops in each and (2) having to build an intermediate representation as a string.
Ultimately, I will want something that works with a 2-d numpy array, too---e.g.,
>>> s = np.array([[1., -1., 1.], [1., 1., 1.]])
>>> print sign2int(s)
[5, 7]
For 1d arrays you can use this one linear Numpythonic approach, using np.packbits:
>>> np.packbits(np.pad((s0+1).astype(bool).astype(int), (8-s0.size, 0), 'constant'))
array([11], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([11], dtype=np.uint8))[-4:]).astype(float)
>>> unpack[unpack==0] = -1
>>> unpack
array([ 1., -1., 1., 1.])
And for 2d array:
>>> x, y = s.shape
>>> np.packbits(np.pad((s+1).astype(bool).astype(int), (8-y, 0), 'constant')[-2:])
array([5, 7], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([5, 7], dtype='uint8'))).astype(float).reshape(x, 8)[:,-y:]
>>> unpack[unpack==0] = -1
>>> unpack
array([[ 1., -1., 1.],
[ 1., 1., 1.]])
I'll start with sig2int.. Convert from a sign representation to binary
>>> a
array([ 1., -1., 1., -1.])
>>> (a + 1) / 2
array([ 1., 0., 1., 0.])
Then you can simply create an array of powers of two, multiply it by the binary and sum.
>>> powers = np.arange(a.shape[-1])[::-1]
>>> np.power(2, powers)
array([8, 4, 2, 1])
>>> a = (a + 1) / 2
>>> powers = np.power(2, powers)
>>> a * powers
array([ 8., 0., 2., 0.])
>>> np.sum(a * powers)
Then make it operate on rows by adding axis information and rely on broadcasting.
def sign2int(a):
# powers of two
powers = np.arange(a.shape[-1])[::-1]
np.power(2, powers, powers)
# sign to "binary" - add one and divide by two
np.add(a, 1, a)
np.divide(a, 2, a)
# scale by powers of two and sum
np.multiply(a, powers, a)
return np.sum(a, axis = -1)
>>> b = np.array([a, a, a, a, a])
>>> sign2int(b)
array([ 11., 11., 11., 11., 11.])
I tried it on a 4 by 100 bit array and it seemed fast
>>> a = a.repeat(100)
>>> b = np.array([a, a, a, a, a])
>>> b
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> sign2int(b)
array([ 2.58224988e+120, 2.58224988e+120, 2.58224988e+120,
2.58224988e+120, 2.58224988e+120])
I'll add the reverse if i can figure it. - the best I could do relies on some plain Python without any numpy vectoriztion magic and I haven't figured how to make it work with a sequence of ints other than to iterate over them and convert them one at a time - but the time still seems acceptable.
def foo(n):
'''yields bits in increasing powers of two
bit sequence from lsb --> msb
while n > 0:
n, r = divmod(n, 2)
yield r
def int2sign(n):
n = int(n)
a = np.fromiter(foo(n), dtype = np.int8, count = n.bit_length())
np.multiply(a, 2, a)
np.subtract(a, 1, a)
return a[::-1]
Works on 1324:
>>> bin(1324)
>>> a = int2sign(1324)
>>> a
array([ 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1], dtype=int8)
Seems to work with 1.2e305:
>>> n = int(1.2e305)
>>> n.bit_length()
>>> a = int2sign(n)
>>> a.shape
>>> s = bin(n)
>>> s = s[2:]
>>> all(2 * int(x) -1 == y for x, y in zip(s, a))
Here are some vectorized versions of your functions:
def sign2int(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def int2sign(i, m):
tmp = np.array(list(bin(i)[2:].zfill(m)))
return np.where(tmp == "0", "-1", tmp).astype(int)
s0 = np.array([1., -1., 1., 1.])
# 11
int2sign(11, 5)
# array([-1, 1, -1, 1, 1])
To use your functions on 2-d arrays, you can use map function:
s = np.array([[1., -1., 1.], [1., 1., 1.]])
map(sign2int, s)
# [5, 7]
map(lambda x: int2sign(x, 4), [5, 7])
# [array([-1, 1, -1, 1]), array([-1, 1, 1, 1])]
After a bit of testing, the Numpythonic approach of #wwii that doesn't use strings seems to fit what I need best. For the int2sign, I used a for-loop over the exponents with a standard algorithm for the conversion---which will have at most 64 iterations for 64-bit integers. Numpy's broadcasting happens across each integer very efficiently.
packbits and unpackbits are restricted to 8-bit integers; otherwise, I suspect that would've been the best (though I didn't try).
Here are the specific implementations I tested that follow the suggestions in the other answers (thanks to everyone!):
def _sign2int_str(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def sign2int_str(s):
return np.array(map(_sign2int_str, s))
def _int2sign_str(i, m):
tmp = np.array(list(bin(i)[2:])).astype(int)
return np.pad(np.where(tmp == 0, -1, tmp), (m - len(tmp), 0), "constant", constant_values = -1)
def int2sign_str(i,m):
return np.array(map(lambda x: _int2sign_str(x, m), i.astype(int).tolist())).transpose()
def sign2int_np(s):
p = np.arange(s.shape[-1])[::-1]
s = s + 1
return np.sum(np.power(s, p), axis = -1).astype(int)
def int2sign_np(i,m):
N = i.shape[-1]
S = np.zeros((m, N))
for k in range(m):
b = np.power(2, m - 1 - k).astype(int)
S[k,:] = np.divide(i.astype(int), b).astype(float)
i = np.mod(i, b)
S[S==0.] = -1.
return S
And here is my test:
X = np.sign(np.random.normal(size=(5000, 20)))
N = 100
t = time.time()
for i in range(N):
S = sign2int_np(X)
print 'sign2int_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
S = sign2int_str(X)
print 'sign2int_str: \t{:10.8f} sec'.format((time.time() - t)/N)
m = 20
S = np.random.randint(0, high=np.power(2,m), size=(5000,))
t = time.time()
for i in range(N):
X = int2sign_np(S, m)
print 'int2sign_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
X = int2sign_str(S, m)
print 'int2sign_str: \t{:10.8f} sec'.format((time.time() - t)/N)
This produced the following results:
sign2int_np: 0.00165325 sec
sign2int_str: 0.04121902 sec
int2sign_np: 0.00318024 sec
int2sign_str: 0.24846984 sec
I think numpy.packbits is worth another look. Given a real-valued sign array a, you can use numpy.packbits(a > 0). Decompression is done by numpy.unpackbits. This implicitly flattens multi-dimensional arrays so you'll need to reshape after unpackbits if you have a multi-dimensional array.
Note that you can combine bit packing with conventional compression (e.g., zlib or lzma). If there is a pattern or bias to your data, you may get a useful compression factor, but for unbiased random data, you'll typically see a moderate size increase.
I have a Pandas DataFrame with all my features and labels. One of my feature is categorical and needs to be one-hot-encoded.
The feature is an integer and can only have values from 0 to 4
To save those arrays back in my DataFrame I use the following code
# enc is my OneHotEncoder object
df['mycol'] = df['mycol'].map(lambda x: enc.transform(x).toarray())
My DataFrame has more than 1 million rows so the above code takes a while.Is there a faster way to assign the arrays to the DataFrame cells? Because I have just 5 categories i dont need to call the transform() function 1 million times.
I already tried something like
num_categories = 5
i = 0
while (i<num_categories):
df.loc[df['mycol'] == i, 'mycol'] = enc.transform(i).toarray()
i += 1
Which yields this error
ValueError: Must have equal len keys and value when setting with an ndarray
You can use pd.get_dummies:
>>> s
0 a
1 b
2 c
3 a
dtype: object
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> a = np.array([1, 1, 3, 2, 2]).reshape(-1, 1)
>>> a
>>> one_hot = enc.fit_transform(a)
>>> one_hot.toarray()
array([[ 1., 0., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
This seems like it should be straightforward, but I can't figure it out.
Data source is a two column, comma delimited input file with these contents:
And my code is:
import numpy as np
data = np.loadtxt("data.txt", delimiter=",")
m = len(data)
x = np.reshape(data[:,0], (m,1))
y = np.ones((m,1))
z = np.matrix([x,y])
Which gives me this error:
Users/acpigeon/.virtualenvs/ipynb/lib/python2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/matrixlib/defmatrix.pyc in __new__(subtype, data, dtype, copy)
270 shape = arr.shape
271 if (ndim > 2):
--> 272 raise ValueError("matrix must be 2-dimensional")
273 elif ndim == 0:
274 shape = (1, 1)
ValueError: matrix must be 2-dimensional
No amount of reshaping seems to get this to work, so I'm either missing something really simple or there's a better way to do this.
Would have been helpful to specify the output I am looking for. Here is a line of code that generates the desired result:
In [1]: np.matrix([[5,1],[6,1],[8,1]])
matrix([[5, 1],
[6, 1],
[8, 1]])
The desired output can be generated this way:
In [12]: np.array((data[:, 0], np.ones(m))).transpose()
array([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])
The above is copied from ipython and so has ipython style prompts.
Answer to previous version
To eliminate the error, replace:
x = np.reshape(data[:, 0], (m, 1))
x = data[:, 0]
The former line produces a 2-dimensional matrix and that is what causes the error message. The latter produces a 1-D array with the same data.
Or how about first turning the array into a matrix, and then change the last column to 1?
In [2]: data=np.loadtxt('stack23859379.txt',delimiter=',')
In [3]: np.matrix(data)
matrix([[ 6., 10.],
[ 5., 9.],
[ 8., 13.]])
In [4]: z = np.matrix(data)
In [5]: z[:,1]=1
In [6]: z
matrix([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])