Related
There is list of list of tuples:
[[(0, 0.5), (1, 0.6)], [(4, 0.01), (5, 0.005), (6, 0.002)], [(1,0.7)]]
I need to get matrix X x Y:
x = num of sublists
y = max among second eleme throught all pairs
elem[x,y] = second elem for x sublist if first elem==Y
0
1
2
3
4
5
6
0.5
0.6
0
0
0
0
0
0
0
0
0
0.01
0.005
0.002
0
0.7
0
0
0
0
0
You can figure out the array's dimensions the following way. The Y dimension is the number of sublists
>>> data = [[(0, 0.5), (1, 0.6)], [(4, 0.01), (5, 0.005), (6, 0.002)], [(1,0.7)]]
>>> dim_y = len(data)
>>> dim_y
3
The X dimension is the largest [0] index of all of the tuples, plus 1.
>>> dim_x = max(max(i for i,j in sub) for sub in data) + 1
>>> dim_x
7
So then initialize an array of all zeros with this size
>>> import numpy as np
>>> arr = np.zeros((dim_x, dim_y))
>>> arr
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
Now to fill it, enumerate over your sublists to keep track of the y index. Then for each sublist use the [0] for the x index and the [1] for the value itself
for y, sub in enumerate(data):
for x, value in sub:
arr[x,y] = value
Then the resulting array should be populated (might want to transpose to look like your desired dimensions).
>>> arr.T
array([[0.5 , 0.6 , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.01 , 0.005, 0.002],
[0. , 0.7 , 0. , 0. , 0. , 0. , 0. ]])
As I commented in the accepted answer, data is 'ragged' and can't be made into a array.
Now if the data had a more regular form, a no-loop solution is possible. But conversion to such a form requires the same double looping!
In [814]: [(i,j,v) for i,row in enumerate(data) for j,v in row]
Out[814]:
[(0, 0, 0.5),
(0, 1, 0.6),
(1, 4, 0.01),
(1, 5, 0.005),
(1, 6, 0.002),
(2, 1, 0.7)]
'transpose' and separate into 3 variables:
In [815]: I,J,V=zip(*_)
In [816]: I,J,V
Out[816]: ((0, 0, 1, 1, 1, 2), (0, 1, 4, 5, 6, 1), (0.5, 0.6, 0.01, 0.005, 0.002, 0.7))
I stuck with the list transpose here so as to not convert the integer indices to floats. It may also be faster, since making an array from a list isn't a time-trivial task.
Now we can assign values via numpy magic:
In [819]: arr = np.zeros((3,7))
In [820]: arr[I,J]=V
In [821]: arr
Out[821]:
array([[0.5 , 0.6 , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.01 , 0.005, 0.002],
[0. , 0.7 , 0. , 0. , 0. , 0. , 0. ]])
I,J,V could also be used as input to a scipy.sparse.coo_matrix call, making a sparse matrix.
Speaking of a sparse matrix, here's what a sparse version of arr looks like:
In list-of-lists format:
In [822]: from scipy import sparse
In [823]: M = sparse.lil_matrix(arr)
In [824]: M
Out[824]:
<3x7 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in List of Lists format>
In [825]: M.A
Out[825]:
array([[0.5 , 0.6 , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.01 , 0.005, 0.002],
[0. , 0.7 , 0. , 0. , 0. , 0. , 0. ]])
In [826]: M.rows
Out[826]: array([list([0, 1]), list([4, 5, 6]), list([1])], dtype=object)
In [827]: M.data
Out[827]:
array([list([0.5, 0.6]), list([0.01, 0.005, 0.002]), list([0.7])],
dtype=object)
and the more common coo format:
In [828]: Mc=M.tocoo()
In [829]: Mc.row
Out[829]: array([0, 0, 1, 1, 1, 2], dtype=int32)
In [830]: Mc.col
Out[830]: array([0, 1, 4, 5, 6, 1], dtype=int32)
In [831]: Mc.data
Out[831]: array([0.5 , 0.6 , 0.01 , 0.005, 0.002, 0.7 ])
and the csr used for most calculations:
In [832]: Mr=M.tocsr()
In [833]: Mr.data
Out[833]: array([0.5 , 0.6 , 0.01 , 0.005, 0.002, 0.7 ])
In [834]: Mr.indices
Out[834]: array([0, 1, 4, 5, 6, 1], dtype=int32)
In [835]: Mr.indptr
Out[835]: array([0, 2, 5, 6], dtype=int32)
I have many (x,y) coordinates on a picture stimulus and I want to
Divide the entire size of the picture into several small cells
Give each cell a name (e.g., A, B, C, D....)
Map each point to the corresponding cell.
For example:
import numpy as np
n = 5
x = np.linspace(0, 10, n)
y = np.linspace(0, 10, n)
xv, yv = np.meshgrid(x, y, indexing='xy')
np.array([xv, yv])
array([[[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ],
[ 0. , 2.5, 5. , 7.5, 10. ]],
[[ 0. , 0. , 0. , 0. , 0. ],
[ 2.5, 2.5, 2.5, 2.5, 2.5],
[ 5. , 5. , 5. , 5. , 5. ],
[ 7.5, 7.5, 7.5, 7.5, 7.5],
[ 10. , 10. , 10. , 10. , 10. ]]])
If I have a big list of points (x,y)
points = [(1,3), (2,4), (0.4, 0.8), (3.5, 7.9), ...]
What I want to get is a pandas data frame with one column as the coordinates and one column as the cell names. For example, for the above four points if I get these cell names:
location = ['A','K','B','F']
Then I can create a data frame:
pd.DataFrame({'points': points,'location':location})
I want to know how to get the corresponding cell names (i.e. location). Constructing a dataframe from there is easy. Because I have a lot of cells and a lot of coordinates, I was wondering what is an efficient way to do this. The order of the cell names doesn't matter as long as each one is unique. If a point happens to sit on the cell boundary we can simply return np.nan.
If you divide a (N*M) picture into n*m cells like this (n=4,m=3):
A B C D
E F G H
I J K L
You can get the cell-row and -column by using np.digitize and from them find the corresponding letter.
In [115]: N, M = (10, 10)
In [116]: n, m = (4, 3)
In [117]: x_boundries = np.linspace(0, N, n+1)
In [118]: y_boundries = np.linspace(0, M, m+1)
In [119]: letters = np.array([chr(65+i) for i in range(n*m)]).reshape((m, n))
In [120]: letters
Out[120]:
array([['A', 'B', 'C', 'D'],
['E', 'F', 'G', 'H'],
['I', 'J', 'K', 'L']], dtype='<U1')
In [121]: points = [(1,3), (2,4), (0.4, 0.8), (3.5, 7.9)]
In [122]: xs, ys = zip(*points)
In [123]: xs, ys
Out[123]: ((1, 2, 0.4, 3.5), (3, 4, 0.8, 7.9))
In [124]: cell_row = np.digitize(ys, y_boundries)-1
In [125]: cell_column = np.digitize(xs, x_boundries)-1
In [126]: cell_row, cell_column
Out[126]: (array([0, 1, 0, 2]), array([0, 0, 0, 1]))
In [127]: locations = letters[cell_row, cell_column]
In [128]: locations
Out[128]: array(['A', 'E', 'A', 'J'], dtype='<U1')
I have a DataFrame and I want to get both group names and corresponding group counts as a list or numpy array. However when I convert the output to matrix I only get group counts I dont get the names. Like in the example below:
df = pd.DataFrame({'a':[0.5, 0.4, 5 , 0.4, 0.5, 0.6 ]})
b = df['a'].value_counts()
print(b)
output:
[0.4 2
0.5 2
0.6 1
5.0 1
Name: a, dtype: int64]
what I tried is print[b.as_matrix()]. Output:
[array([2, 2, 1, 1])]
In this case I do not have the information of corresponding group names which also I need. Thank you.
Convert it to a dict:
bd = dict(b)
print(bd)
# {0.40000000000000002: 2, 0.5: 2, 0.59999999999999998: 1, 5.0: 1}
Don't worry about the long decimals. They're just a result of floating point representation; you still get what you expect from the dict.
bd[0.4]
# 2
most simplest way
list(df['a'].value_counts())
One approach with np.unique -
np.c_[np.unique(df.a, return_counts=1)]
Sample run -
In [270]: df
Out[270]:
a
0 0.5
1 0.4
2 5.0
3 0.4
4 0.5
5 0.6
In [271]: np.c_[np.unique(df.a, return_counts=1)]
Out[271]:
array([[ 0.4, 2. ],
[ 0.5, 2. ],
[ 0.6, 1. ],
[ 5. , 1. ]])
We can zip the outputs from np.unique for list output -
In [283]: zip(*np.unique(df.a, return_counts=1))
Out[283]: [(0.40000000000000002, 2), (0.5, 2), (0.59999999999999998, 1), (5.0, 1)]
Or use zip directly on the value_counts() output -
In [338]: b = df['a'].value_counts()
In [339]: zip(b.index, b.values)
Out[339]: [(0.40000000000000002, 2), (0.5, 2), (0.59999999999999998, 1), (5.0, 1)]
I have an nd array that looks as follows:
[[ 0. 1.73205081 6.40312424 7.21110255 2.44948974]
[ 1.73205081 0. 5.09901951 5.91607978 1. ]
[ 6.40312424 5.09901951 0. 1. 4.35889894]
[ 7.21110255 5.91607978 1. 0. 5.09901951]
[ 2.44948974 1. 4.35889894 5.09901951 0. ]]
Each element in this array is a distance and I need to turn this into a list with the row,col,distance as follows:
l = [(0,0,0),(0,1, 1.73205081),(0,2, 6.40312424),...,(1,0, 1.73205081),(1,1,0),...,(4,4,0)]
Additionally, it would be cool to remove the diagonal elements and also the elements (j,i) as (i,j) are already there. Essentially, is it possible to take just the top triangular matrix of this?
Is this possible to do efficiently (without a lot of loops)? I had created this array with squareform, but couldn't find any docs to do this.
squareform does all this. Read the docs and experiment. It works in both directions. If you give it a matrix it returns the upper triangle values (condensed form). If you give it those values, it returns the matrix.
In [668]: M
Out[668]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
In [669]: spatial.distance.squareform(M)
Out[669]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [670]: v=spatial.distance.squareform(M)
In [671]: v
Out[671]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [672]: spatial.distance.squareform(v)
Out[672]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
You can also specify a force and checks parameter, but without those it just goes by the shape.
Indicies can come from triu
In [677]: np.triu_indices(4,1)
Out[677]:
(array([0, 0, 0, 1, 1, 2], dtype=int32),
array([1, 2, 3, 2, 3, 3], dtype=int32))
In [680]: np.vstack((np.triu_indices(4,1),v)).T
Out[680]:
array([[ 0. , 1. , 0.1],
[ 0. , 2. , 0.5],
[ 0. , 3. , 0.2],
[ 1. , 2. , 2. ],
[ 1. , 3. , 0.3],
[ 2. , 3. , 0.2]])
Just to check, we can fill in a 4x4 matrix with these values
In [686]: A=np.vstack((np.triu_indices(4,1),v)).T
In [687]: MM = np.zeros((4,4))
In [688]: MM[A[:,0].astype(int),A[:,1].astype(int)]=A[:,2]
In [689]: MM
Out[689]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0. , 0. , 2. , 0.3],
[ 0. , 0. , 0. , 0.2],
[ 0. , 0. , 0. , 0. ]])
Those triu indices can also fetch the values from M:
In [693]: I,J = np.triu_indices(4,1)
In [694]: M[I,J]
Out[694]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
squareform uses compiled code in spatial.distance._distance_wrap so I expect it will be quite fast for large arrays. Only problem it just returns the condensed form values, but not the indices. But given the shape,the indices can always be calculated. They don't need to be stored with the values.
If your input is x, first generate the indices:
i0,i1 = np.indices(x.shape)
Then:
np.concatenate((i1,i0,x)).reshape(3,5,5).T
That gives you the first result--for the entire matrix.
As for taking only the upper triangle, you might considering trying np.triu() but I'm not sure exactly what result you're looking for. You can probably figure out how to mask the parts you don't want now though.
you can try this,
print([(x,y, value) for (x,y), value in np.ndenumerate(numpymatrixarray)])
output [(0, 0, 0.0), (0, 1, 1.7320508100000001), (0, 2, 6.4031242400000004), (0, 3, 7.2111025499999997), (0, 4, 2.4494897400000002), (1, 0, 1.7320508100000001), (1, 1, 0.0), (1, 2, 5.0990195099999998), (1, 3, 5.9160797799999996), (1, 4, 1.0), (2, 0, 6.4031242400000004), (2, 1, 5.0990195099999998), (2, 2, 0.0), (2, 3, 1.0), (2, 4, 4.3588989400000004), (3, 0, 7.2111025499999997), (3, 1, 5.9160797799999996), (3, 2, 1.0), (3, 3, 0.0), (3, 4, 5.0990195099999998), (4, 0, 2.4494897400000002), (4, 1, 1.0), (4, 2, 4.3588989400000004), (4, 3, 5.0990195099999998), (4, 4, 0.0)]
Do you really want the top triangular matrix for an [nxm] matrix where n>m? That will give you (nxn-n)/2 elements and lose all the data where m⊖n.
What you probably want is the lower triangular matrix:
def tri_reduce(m):
n=m.shape
if n[0]>n[1]:
i=np.tril_indices(n[0],1,n[1])
else:
i=np.triu_indices(n[0],1,n[1])
return np.vstack((i,m[i])).T
Rebuilding it into a list of tuples would require a loop though I believe. list(tri_reduce(m)) would give a list of nd arrays.
Is there a simple way to get the ratios of consecutive elements of a numpy array?
Basically something similar to numpy.diff(x)?
so if x=[1,2,10,100 ...]
I would like [0.5 ,0.2, 0.1 ...]
ie [x1/x2, x2/x3 , x3/x4]
I know I can do this easily by shifting and dividing, but it seems clumsy compared to numpy.diff(x)
Using numpy:
In [6]: x
Out[6]: array([ 1., 2., 10., 100., 150., 75.])
In [7]: x[:-1]/x[1:]
Out[7]: array([ 0.5 , 0.2 , 0.1 , 0.66666667, 2. ])
That might be what you meant when you said "I can do this easily by shifting and dividing", but I don't see anything clumsy about it.
I hate to even post this, but for x = [6, 2, 4, 10],
np.exp(-np.diff(np.log(x)))
returns array([3. , 0.5, 0.4]).
xs = [1, 2, 10, 100, ...]
[x1/x2 for (x1, x2) in zip(xs, xs[1:])]