I want to use this code on very huge array. this code take long time to execute and it is not efficient.
is there any way to remove loop and convert this code to optimum way?
>>> import numpy as np
>>> x=np.random.randint(10, size=(4,5,3))
>>> x
array([[[3, 2, 6],
[4, 6, 6],
[3, 7, 9],
[6, 4, 2],
[9, 0, 1]],
[[9, 0, 4],
[1, 8, 9],
[6, 8, 1],
[9, 4, 5],
[1, 5, 2]],
[[6, 1, 6],
[1, 8, 8],
[3, 8, 3],
[7, 1, 0],
[7, 7, 0]],
[[5, 6, 6],
[8, 3, 1],
[0, 5, 4],
[6, 1, 2],
[5, 6, 1]]])
>>> y=[]
>>> for i in range(x.shape[1]):
for j in range(x.shape[2]):
y.append(x[:, i, j].tolist())
>>> y
[[3, 9, 6, 5], [2, 0, 1, 6], [6, 4, 6, 6], [4, 1, 1, 8], [6, 8, 8, 3], [6, 9, 8, 1], [3, 6, 3, 0], [7, 8, 8, 5], [9, 1, 3, 4], [6, 9, 7, 6], [4, 4, 1, 1], [2, 5, 0, 2], [9, 1, 7, 5], [0, 5, 7, 6], [1, 2, 0, 1]]
You could permute axes with np.transpose and then reshape to 2D -
y = x.transpose(1,2,0).reshape(-1,x.shape[0])
Append with .tolist() for list output.
yes, either use np.reshape(x, shape) or try it with np.ndarray.flatten(x, order='F') (F for Fortran style, column first, according to your example).
read the documentation to find out which parameters fit the best. IMHO, I think ndarray.flatten is the better and more elegant option for you here. However, depending on your exact wanted solution, you might have to reshape the array first.
Related
for example
I have a point list
a = np.array([[0,0,0],
[1,1,1],
[2,2,2],
[3,3,3],
[4,4,4],
[5,5,5],
[6,6,6],
[7,7,7],
[8,8,8],
[9,9,9]])
and I have another array represents the number of elements
b = np.array([2,0,3,5])
how can I split array a according the number of elements of array b so that I can get the output
[[[0,0,0],[1,1,1]],
[],
[[2,2,2],[3,3,3],[4,4,4]],
[[5,5,5],[6,6,6],[7,7,7],[8,8,8],[9,9,9]]]
You can use numpy.split using cumsum on b to get the split points:
out = np.split(a, b.cumsum()[:-1])
output:
[array([[0, 0, 0],
[1, 1, 1]]),
array([], shape=(0, 3), dtype=int64),
array([[2, 2, 2],
[3, 3, 3],
[4, 4, 4]]),
array([[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9]])]
If you want lists:
out = [x.tolist() for x in np.split(a, b.cumsum()[:-1])]
output:
[[[0, 0, 0], [1, 1, 1]],
[],
[[2, 2, 2], [3, 3, 3], [4, 4, 4]],
[[5, 5, 5], [6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9]]]
intermediate:
b.cumsum()[:-1]
# array([2, 2, 5])
I have a matrix:
m = [
[5, 1, 7, 5],
[2, 4, 9, 5],
[3, 4, 5, 5],
[3, 4, 6, 7]]
When I print the matrix, the output is:
[[5, 1, 7, 5], [2, 4, 9, 5], [3, 4, 5, 5], [3, 4, 6, 7]]
How do you print this matrix to where the output is the same as the initial input
like this below:
[
[5, 1, 7, 5],
[2, 4, 9, 5],
[3, 4, 5, 5],
[3, 4, 6, 7]
]
Most answers I see erase the square brackets when printing. Is there a way to do this and still have the square brackets there like I did when I first defined the 2D array?
I think it will be dependent on your console/IDE. You could try to use pprint.
>>> m
[[5, 1, 7, 5], [2, 4, 9, 5], [3, 4, 5, 5], [3, 4, 6, 7]]
>>> pprint(m, width=40)
[[5, 1, 7, 5],
[2, 4, 9, 5],
[3, 4, 5, 5],
[3, 4, 6, 7]]
Attempt at a more general approach of determining the width (not sure how this would fair for other nested lists, but works here):
pprint(m, width=len(str(m))-1)
I would like to provide suggestions as to which items are similar to each other. I'm using here for k-nearst-neighbors. Now my question is how do I get the nearest neighbors with the probability?
I would like to have something [[item, probability] ..., [item n, probability n]].
How do I get such a list?
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
list = [[0, 3, 8, 0], [0, 8, 7, 0], [0, 2, 9, 0], [1, 10, 10, 1], [2, 3, 8, 2], [2, 10, 10, 2], [3, 4, 12, 3], [3, 12, 4, 3], [3, 3, 8, 3], [4, 12, 4, 4], [4, 3, 8, 4], [4, 4, 12, 4], [5, 8, 7, 5], [5, 6, 13, 5], [5, 3, 8, 5], [6, 0, 3, 6], [6, 5, 11, 6], [6, 12, 4, 6], [7, 9, 6, 7], [7, 9, 6, 7], [8, 13, 5, 8], [9, 1, 0, 9], [9, 7, 2, 9], [9, 11, 1, 9], [9, 11, 1, 9]]
# Note: location isn't relevant
df = DataFrame (list,columns=['buyerid','itemid', 'group', 'location'])
sparse_item_user = sparse.csr_matrix((df['group'].astype(float), (df['itemid'], df['buyerid'])))
sparse_user_item = sparse.csr_matrix((df['group'].astype(float), (df['buyerid'], df['itemid'])))
model_knn= NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20)
model_knn.fit(sparse_item_user)
desired_item = 8
model_knn.kneighbors(....) # Now get the nearst items
From the sklearn docs, the following is the signature of kneighbors:
kneighbors(X=None, n_neighbors=None, return_distance=True)
Thus, to get the nearest neighbor of some point x, you do kneighbors(x, return_distance=True). In this case, n_neighbors was already specified in your constructor to be 20, so we need not give it here.
I need to write a code that gives back the subsets of a given size of a set in a list.
So first let's say I want subsets of size 3 from a set (0,1,2,3,4,5,6,7,8)
And I want to write out the subsets in a list:
[[0,1,2],[0,2,3],[0,3,4]....]
And then I would like to go with recursion in it and compare all the elements except the first with my dictionary(graph), to check there are in my value, which is a list. The key of dictionary is the first element in my subset.
Like for example:
in [0,1,2]:
is 1 and 2 in graph[0]?
the dictionary graph is sth like: {0:[1,2,3,6,7], 1:[0,2,4,6,7]....}
And if I am done and everything is there, I want to check the next subset.
So my problem how can i put this in a list? I know I have a problem with k too but not sure how to change it.
def indep(graph,a,b):
l=list( itertools.combinations(range(a), b))
for k in l:
k=list(k)
while j<=len(k):
for j in range(len(k)):
if k[j] in graph[k[j]]:
j+=1
else:
return "no"
This will give you expected result
from itertools import combinations
original_set = (0,1,2,3,4,5,6,7,8)
final_set = [list(pair) for pair in combinations(l, 3)]
Out[6]:
[[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 1, 5],
[0, 1, 6],
[0, 1, 7],
[0, 1, 8],
[0, 2, 3],
[0, 2, 4],
[0, 2, 5],
[0, 2, 6],
[0, 2, 7],
[0, 2, 8],
[0, 3, 4],
[0, 3, 5],
[0, 3, 6],
[0, 3, 7],
[0, 3, 8],
[0, 4, 5],
[0, 4, 6],
[0, 4, 7],
[0, 4, 8],
[0, 5, 6],
[0, 5, 7],
[0, 5, 8],
[0, 6, 7],
[0, 6, 8],
[0, 7, 8],
[1, 2, 3],
[1, 2, 4],
[1, 2, 5],
[1, 2, 6],
[1, 2, 7],
[1, 2, 8],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7],
[1, 3, 8],
[1, 4, 5],
[1, 4, 6],
[1, 4, 7],
[1, 4, 8],
[1, 5, 6],
[1, 5, 7],
[1, 5, 8],
[1, 6, 7],
[1, 6, 8],
[1, 7, 8],
[2, 3, 4],
[2, 3, 5],
[2, 3, 6],
[2, 3, 7],
[2, 3, 8],
[2, 4, 5],
[2, 4, 6],
[2, 4, 7],
[2, 4, 8],
[2, 5, 6],
[2, 5, 7],
[2, 5, 8],
[2, 6, 7],
[2, 6, 8],
[2, 7, 8],
[3, 4, 5],
[3, 4, 6],
[3, 4, 7],
[3, 4, 8],
[3, 5, 6],
[3, 5, 7],
[3, 5, 8],
[3, 6, 7],
[3, 6, 8],
[3, 7, 8],
[4, 5, 6],
[4, 5, 7],
[4, 5, 8],
[4, 6, 7],
[4, 6, 8],
[4, 7, 8],
[5, 6, 7],
[5, 6, 8],
[5, 7, 8],
[6, 7, 8]]
import itertools
a = [0,1,2,3,4,5,6]
# all sets here.
sets = [list(x) for x in itertools.permutations(a, 3) if x[1]==x[2]-1]
#here are all the sets
#[[0, 1, 2], [0, 2, 3], [0, 3, 4], [0, 4, 5], [0, 5, 6], [1, 2, 3], [1, 3, 4], [1, 4, 5], [1, 5, 6], [2, 0, 1], [2, 3, 4],
#[2, 4, 5], [2, 5, 6], [3, 0, 1], [3, 1, 2], [3, 4, 5], [3, 5, 6], [4, 0, 1], [4, 1, 2], [4, 2, 3], [4, 5, 6], [5, 0, 1],
#[5, 1, 2], [5, 2, 3], [5, 3, 4], [6, 0, 1], [6, 1, 2], [6, 2, 3], [6, 3, 4], [6, 4, 5]]
d = dict()
#make your thingy
for i in sets:
try:
d[i[0]] = d[i[0]]+i[1:]
except:
d[i[0]] = i[1:]
d[i[0]] = list(set(d[i[0]]))
#output D
{0: [1, 2, 3, 4, 5, 6],
1: [2, 3, 4, 5, 6],
2: [0, 1, 3, 4, 5, 6],
3: [0, 1, 2, 4, 5, 6],
4: [0, 1, 2, 3, 5, 6],
5: [0, 1, 2, 3, 4],
6: [0, 1, 2, 3, 4, 5]}
is this what you wanted? :D
I'd like to obtain a 1D array of indexes from a 3D matrix.
For instance given x = np.random.randint(10, size=(10,3,3)), I'd like to do something like np.argmax(x, axis=(1,2)) just like you can do with np.max, that is, obtain a 1D array of length 10 containing the indexes (0 to 8) of the maximums of each submatrix of size (3,3).
I have not found anything helpful so far and I want to avoid looping on the first dimension (and use np.argmax(x)) as it is quite big.
Cheers!
Reshape to merge those last two axes and then use np.argmax -
idx = x.reshape(x.shape[0],-1).argmax(-1)
out = np.unravel_index(idx, x.shape[-2:])
Sample run -
In [263]: x = np.random.randint(10, size=(4,3,3))
In [264]: x
Out[264]:
array([[[0, 9, 2],
[7, 7, 8],
[2, 5, 9]],
[[1, 7, 2],
[8, 9, 0],
[2, 8, 3]],
[[7, 5, 0],
[7, 1, 6],
[5, 1, 1]],
[[0, 7, 3],
[5, 4, 1],
[9, 8, 9]]])
In [265]: idx = x.reshape(x.shape[0],-1).argmax(-1)
In [266]: np.unravel_index(idx, x.shape[-2:])
Out[266]: (array([0, 1, 0, 2]), array([1, 1, 0, 0]))
If you meant getting the merged index, then its simpler -
x.reshape(x.shape[0],-1).argmax(1)
Sample run -
In [283]: x
Out[283]:
array([[[2, 3, 7],
[8, 1, 0],
[3, 6, 9]],
[[8, 0, 5],
[2, 2, 9],
[9, 0, 9]],
[[1, 9, 2],
[5, 0, 3],
[7, 2, 1]],
[[1, 6, 5],
[2, 3, 7],
[7, 4, 6]]])
In [284]: x.reshape(x.shape[0],-1).argmax(1)
Out[284]: array([8, 5, 1, 5])