Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a numpy array with the following shape (11617, 37). The data is multi class data, and to establish a baseline, I need to find which class (or classes) are the most common.
I have tried this formula and also this
A = np.array([[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
answer = u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
None, np.max(indices) + 1), axis=axis)]
I need to find the most frequent combination of the 37 classes in my array
Expected output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]
To find the most frequent combination (rows, which means axis=0), you can try this!
A = np.array([[1,0,0,0],
[1,0,0,1],
[1,0,0,0]])
unique_rows,counts = np.unique(A, return_counts=True,axis=0)
unique_rows[np.argmax(counts)]
FYI, If the array you mentioned in the question is your target variable, then it is an example of multi-label data.
This may be of use for you to understand multi-class and multi-label
You could try np.unique with return_counts parameter:
from operator import itemgetter
import numpy as np
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
uniques, counts = np.unique(A, axis=0, return_counts=True)
idxmax, _ = max(zip(range(len(counts)), counts), key=itemgetter(1))
print(uniques[idxmax])
Output
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]
You can use collections.Counter.most_common if you convert your list of list elements to a tuple (convert the lists to tuples so they can be counted)
from collections import Counter
A = [[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]]
c = Counter(tuple(x) for x in A)
print(c.most_common()[0]) # ((0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0), 2)
This returns a tuple containing the most common list and the number of occurrences.
A really quick and easy solution:
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
print(max(A, key=A.count))
Which prints:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
If you need to pay attention to runtime or want to optimize your code - this is not the way you want to go. However, if you just need a quick solution, it might help to keep this one-liner in mind.
(A.tolist() gets you a list from a np.ndarray if you need that first.)
from collections import Counter
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
most_common = [Counter(i).most_common(1).pop()[0] for i in A]
most_common
[0, 0, 0]
Related
how to create the dotted line in the below NumPy array
import NumPy as np
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
x=np.array( [ [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]])
def make_figure(inp_arr: np.array, outputname):
# create graphical output for visual check
cmap = ListedColormap([ 'r','b','g'])
plt.imshow(inp_arr, cmap=cmap)
plt.grid(color='b', linestyle=':', linewidth=0.55)
plt.savefig(input_folder + 'pics_' + str(outputname) + '.png', format='png', dpi=350)
# plt.show()
#plt.clf()
bh=make_figure(b,'gh')
requirement: how to convert element 1 into 0 with the step of two expected outputs is like
I tried with a brute force algorithm, but I am not able to find the solution
output array looks like
y=np.array( [ [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]])
for visual representation like making a dotted line
Here's one way to find the minimum weight full path, then take the first point, skip two points, and repeat until the end of the path.
import numpy as np
from sklearn.neighbors import radius_neighbors_graph
from scipy import sparse
import networkx as nx
x = np.array( [ [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1] ] )
x_nonzeros = x.nonzero()
num_points = len(x_nonzeros[0])
x_coords = [[x_nonzeros[0][k], x_nonzeros[1][k]] for k in range(num_points)]
neighbors = radius_neighbors_graph(x_coords, radius=1.5, mode="distance")
G = nx.Graph(neighbors)
full_paths = [
{"path": path, "weight": nx.classes.path_weight(G, path, weight="weight")}
for path in nx.all_simple_paths(G, 0, 40) if len(path)==num_points
]
full_paths.sort(key=lambda rec: rec["weight"])
the_path = full_paths[0]["path"]
y_coords = [x_coords[coord] for coord in the_path[0::3]]
y = sparse.coo_array(([1]*len(y_coords),np.array(y_coords).T)).toarray()
print(y)
# [[1 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 1 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 1 0 0 1 0 0 1]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 0 0 1 0 0 1 0 0 1 0 0 1]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 1 0 0 1 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 1 0 0 1 0]]
I have a dataframe that looks like this:
feature target
0 2 0
1 0 0
2 0 0
3 0 0
4 1 0
... ... ...
33208 1 0
33209 0 0
33210 2 0
33211 2 0
33212 1 0
In the feature column there are 3 classes (0, 1, 2) and in the target column there are two classes (0, 1). If I group the dataframe by this two columns, I get:
df.groupby(['feature', 'target']).size()
feature target
0 0 4282
1 81
1 0 8537
1 37
2 0 20161
1 115
dtype: int64
Each feature class have 0s and 1s as target values, I need to find a way of sampling this values, my intention is to have something like this at the end:
new_df.groupby(['feature', 'target']).size()
feature target
0 0 4282
1 81
1 0 4282
1 37
2 0 4282
1 115
dtype: int64
I need to sample the amount of target values for each feature class, any suggestions?
You have different distributions, depending on the value of feature.
You need to sample n value from a distribution, provided the value of feature: given that there are 2 possible outcomes, that is a binomial distribution problem.
The approach shown below should facilitate situation when target is not necessarily (0, 1) - could be anything (win vs lose, team A vs team B, as so forth) as far as I can see:
import numpy as np
import pandas as pd
# this is just reproducting your grouped end stated
df = pd.DataFrame({"feature":[0, 0, 1, 1, 2, 2], "target":[0, 1, 0, 1, 0, 1], "number":[4282, 81, 4282, 37, 4282, 115]})
df = df.set_index(["feature", "target"])
def sample_values(feature, sample_size):
# select one of the distribution by feature
df_sub = df.loc[feature]
(event1, number1), (event2, number2) = zip(df_sub.index,df_sub["number"].tolist())
return [event2 if np.random.binomial(1, number2/(number1+number2))==1 else event1 for _ in range(sample_size)]
print(sample_values(2, 100))
OUTPUT
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I have a series that is a list of lists that contain integers that I am attempting to turn into an array. This is a small snip-it of the list I am trying to convert into an array.
['[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]',
'[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]',
'[0, 0, 0, 0, 0, 0, 0, 1, 0, 0]',
'[0, 0, 0, 0, 0, 0, 0, 1, 0, 1]',
'[0, 0, 0, 0, 0, 0, 0, 1, 1, 1]']
I've tried to replace the quotes with .replace, but that hasn't worked out.
sequence = [i.replace(" '' ", ' ') for i in sequence]
You can use ast.literal_eval to change the string to list of lists of ints
sequence = [literal_eval(i) for i in sequence]
# [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]]
You can change it to numpy array
import numpy as np
array = np.asarray(sequence)
print(array)
output
[[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 0 1]
[0 0 0 0 0 0 0 1 1 1]]
Or to 1d pandas array
import pandas as pd
array = pd.array([item for items in sequence for item in items])
print(array)
outout
<IntegerArray>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
Length: 50, dtype: Int64
I am trying to write a big matrix that includes a smaller row matrix (size changeable) that are spread on the "diagonal" of the matrix. All the other values are 0. How do I create such a matrix?
I've tried np.put, np.append. Here's what I have so far:
t = [1,2,3]
n=3
m=4
A = np.zeros((2*m,m*n+m),dtype=int)
for i in range (m):
A[i-1:i-1+t.shape[0], n*(i-1):n*(i-1)+t.shape[1]] += t
print("A= \n",np.matrix(A))
I want the following matrix (I'm sorry I don't know how to show matrix but if someone can help me with this too I would appreciate it a lot) :
A=
[[1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 2 3 0 0 0 0 0 0 0 0 0 0 ]
[0 0 0 0 0 0 1 2 3 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1 2 3 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
It causes the following error:
ValueError: operands could not be broadcast together with shapes (0,0) (1,3) (0,0)
You can use careful reshaping like so:
t = [1,2,3]
n=3
m=4
A = np.zeros((2*m,m*n+m),dtype=int)
A.ravel()[:m*(m*n+m+n)].reshape(m,-1)[:,:len(t)] = t
A
# array([[1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Make mask for 12 positions and use it for assignment
idx = np.zeros(A.shape).astype(bool)
for i in range(m):
idx[i,i*n:i*n+3] = True
A[idx]= t*m
array([[1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
I am a relative beginner to python, and in order to strengthen my skills, I am (attempting) to write a compiler for the Brainfu** language. All is good, except for the bracket [] loops. The program I am using to test my code is >++[>++<-]>+, which should set cell 2 to 5. When I run this, however, it does this:
0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0 >
1 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1 +
2 [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 +
3 [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 [
4 [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0 >
5 [0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1 +
6 [0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 +
7 [0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 <
8 [0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1 -
3 [0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1 [
10 [0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 >
11 [0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 3 +
[0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
(The lines are formatted in the iteration, then the list at that point, then the value it's focused on and then the character it's running.)
My current code is
def generateArray(code):
array = []
for i in range(0,20):
array.append(0);
return array
def run(code):
print code
data = generateArray(code)
chars = list(code)
pointer = 0
for i in range(0, len(chars)):
current = chars[i]
if(current == "+"):
data[pointer] += 1
if(current == ">"):
pointer += 1
if(current == "-"):
data[pointer] -= 1
if(current == "<"):
pointer -= 1
if(current == "."):
print str(chr(data[pointer]))
if(current == ","):
given = raw_input()
data[pointer] = ord( given )
if(current == "["):
posOfEnd = chars[i:len(chars)].index("]")
if(data[pointer] == 0):
i += posOfEnd+1
if(current == "]"):
posOfBegin = len(chars) - 1 - chars[::-1].index('[')
i = posOfBegin
print i, data, data[pointer], chars[i]
return data
print run(">++[>++<-]>+")
posOfEnd is trying to find out where the next bracket is, and posOfBegin is trying to find out where the previous bracket is.
I suppose the problem is your loop variable i which you modify during the loop:
i += posOfEnd+1
and
i = posOfBegin
However python for loops are different from their C/C++ counterparts. In python the variable i will be set to each element of the iterable you provide it, in this case range. range(n) evaluates to a list containing all numbers from 0 up to n-1. If you modify your loop variable during an iteration then this modification remains for only that iteration but for the next iteration the loop variable will be assigned the next element of the iterable (not preserving your modifications).
You might want to use a while loop instead.