My input is a list of y_true labels, where the element in position i contains a value in the range of 0..len(classes) and depicts what class that element of the data set truly is. i ranges from 0 to len(data). Example below:
# 5 elements in data, 3 classes, all of which had representation in the data:
y_true = [0,2,1,0,1]
I want my output to be a list of lists that islen(data) by len(classes), where inner list i would have a 1 in the position of y_true[i], and 0 in the other len(classes)-1 slots, example:
#same configuration as the previous example
y_true = [0,2,1,0,1]
result = [[1,0,0],[0,0,2],[0,1,0],[1,0,0],[0,1,0]]
Here's how I'm initilazing result:
result = np.zeros((len(y_true), max(y_true)+1))
However I haven't been able to make any further progress with this issue. I tried using add.at(result, y_true, 1) and this with y_true's shape flipped, but neither produced the result I wanted. What fuction(s) can achieve what I'm trying to do here?
Edit: For better clarity on what I want to achieve, I made it using a for loop:
result = np.zeros((len(y_true), max(y_true)+1))
for x in range(4):
result[x][y_true[x]] = 1
You can use fancy indexing:
result = np.zeros((len(y_true), max(y_true)+1), dtype=int)
result[np.arange(len(y_true)), y_true] = 1
output:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0]])
alternative
an interesting alternative might be to use pandas.get_dummies:
import pandas as pd
result = pd.get_dummies(y_true).to_numpy()
output:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0]], dtype=uint8)
Related
I'm working with numpy and I got a problem with index, I have a numpy array of zeros, and a 2D array of indexes, what I need is to use this indexes to change the values of the array of zeros by the value of 1, I tried something, but it's not working, here is what I tried.
import numpy as np
idx = np.array([0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1)) #This repeats the array of zeros to match the number of rows of the index array
res = []
for i, j in zip(repeat, idx):
res.append(i[j] = 1) #Here I try to replace the matching index by the value of 1
output = np.array(res)
but I get the syntax error
expression cannot contain assignment, perhaps you meant "=="?
my desired output should be
output = [[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]]
This is just an example, the idx array can be bigger, I think the problem is the indexing, and I believe there is a much simple way of doing this without repeating the array of zeros and using the zip function, but I can't figure it out, any help would be aprecciated, thank you!
EDIT: When I change the = by == I get a boolean array which I don't need, so I don't know what's happening there either.
You can use np.put_along_axis to assign values into the array repeat based on indices in idx. This is more efficient than a loop (and easier).
import numpy as np
idx = np.array([[0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6).astype(int) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1))
np.put_along_axis(repeat, idx, 1, 1)
repeat will then be:
array([[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]])
FWIW, you can also make the array of zeros directly by passing in the shape:
np.zeros([idx.shape[0], 6])
I'm trying to solve the following python interview questions using Pandas:
Given a m x n matrix, if an element is 0, set its entire row and column to 0. Do it in-place.
without using (enumerate)!!!
Here are some examples:
Example 1
[[1, 1, 1], [1, 0, 1], [1, 1, 1]] # input
[[1, 0, 1], [0, 0, 0], [1, 0, 1]] # output
Example 2
[[0, 1, 2, 0], [3, 4, 5, 2], [1, 3, 1, 5]] # input
[[0, 0, 0, 0], [0, 4, 5, 0], [0, 3, 1, 0]] # output
You can try this:
lst = [[1, 1, 1], [1, 0, 1], [1, 1, 1]]
df = pd.DataFrame(lst)
df_result = df.copy(deep=True)
df_result.loc[df.eq(0).any(axis=1)] = 0
df_result.loc[:, df.eq(0).any(axis=0)] = 0
result = df_result.values.tolist()
output:
[[1, 0, 1], [0, 0, 0], [1, 0, 1]]
Using only built-in Python functions:
# Example data (list)
lst = [[1, 1, 1], [1, 0, 1], [1, 1, 1]]
# For each row, if any of the values in the row is 0, replace all the values with 0
# Obs: I'm using a `list comprehension` to make the code shorter
for row in lst:
if any([value==0 for value in row]):
row[:] = [0] * len(row)
Using numpy:
# Import and create the array from the list
import numpy as np
a = np.array(lst)
# Set zeros in-place
a[(a==0).any(1), :] = 0
Using pandas:
# Import and create the dataframe from the list
import pandas as pd
df = pd.DataFrame(lst)
# Set zeros in-place
df.iloc[df.eq(0).any(1), :] = 0
The output for all of them is the same (rows with all zeros if there's at least one original zero on them). That logic was applied in all examples here. As you're still learning nested lists in Python, I would recommend to continue your studies with Python built-in classes, methods, functions, and etc. Afterwards you may want to take a look how indexing works in numpy and pandas so that you can get a better understanding of the code here.
Output:
print(lst)
[[1, 1, 1], [0, 0, 0], [1, 1, 1]]
print(a)
[[1 1 1]
[0 0 0]
[1 1 1]]
# ignore the first line and column,
# as they indicate the row and column names, respectively:
print(df)
0 1 2
0 1 1 1
1 0 0 0
2 1 1 1
Given a 3 dimensional boolean data:
np.random.seed(13)
bool_data = np.random.randint(2, size=(2,3,6))
>> bool_data
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
I wish to count the number of consecutive 1's bounded by two 0's in each row (along axis=1) and return a single array with the tally. For bool_data, this would give array([1, 1, 2, 4]).
Due to the 3D structure of bool_data and the variable tallies for each row, I had to clumsily convert the tallies into nested lists, flatten them using itertools.chain, then back-convert the list into an array:
# count consecutive 1's bounded by two 0's
def count_consect_ones(input):
return np.diff(np.where(input==0)[0])-1
# run tallies across all rows in bool_data
consect_ones = []
for i in range(len(bool_data)):
for j in range(len(bool_data[i])):
res = count_consect_ones(bool_data[i, j])
consect_ones.append(list(res[res!=0]))
>> consect_ones
[[], [1, 1], [], [2], [4], []]
# combines nested lists
from itertools import chain
consect_ones_output = np.array(list(chain.from_iterable(consect_ones)))
>> consect_ones_output
array([1, 1, 2, 4])
Is there a more efficient or clever way for doing this?
consect_ones.append(list(res[res!=0]))
If you use .extend instead, the content of the sequence is appended directly. That saves the step to combine the nested lists afterwards:
consect_ones.extend(res[res!=0])
Furthermore, you could skip the indexing, and iterate over the dimensions directly:
consect_ones = []
for i in bool_data:
for j in i:
res = count_consect_ones(j)
consect_ones.extend(res[res!=0])
We could use a trick to pad the columns with zeros and then look for ramp-up and ramp-down indices on a flattened version and finally filter out the indices corresponding to the border ones to give ourselves a vectorized solution, like so -
# Input 3D array : a
b = np.pad(a, ((0,0),(0,0),(1,1)), 'constant', constant_values=(0,0))
# Get ramp-up and ramp-down indices/ start-end indices of 1s islands
s0 = np.flatnonzero(b[...,1:]>b[...,:-1])
s1 = np.flatnonzero(b[...,1:]<b[...,:-1])
# Filter only valid ones that are not at borders
n = b.shape[2]
valid_mask = (s0%(n-1)!=0) & (s1%(n-1)!=a.shape[2])
out = (s1-s0)[valid_mask]
Explanation -
The idea with padding zeros at either ends of each row as "sentients" is that when we get one-off sliced array versions and compare, we could detect the ramp-up and ramp-down places with b[...,1:]>b[...,:-1] and b[...,1:]<b[...,:-1] respectively. Thus, we get s0 and s1 as the start and end indices for each of the islands of 1s. Now, we don't want the border ones, so we need to get their column indices traced back to the original un-padded input array, hence that bit : s0%(n-1) and s1%(n-1). We need to remove all cases where the start of each island of 1s are at the left border and end of each island of 1s at the right side border. The starts and ends are s0 and s1. So, we use those to check if s0 is 0 and s1 is a.shape[2]. These give us the valid ones. The island lengths are obtained with s1-s0, so mask it with valid-mask to get our desired output.
Sample input, output -
In [151]: a
Out[151]:
array([[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]],
[[1, 0, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]]])
In [152]: out
Out[152]: array([1, 1, 2, 4])
I am trying to extract the full set of indices into an N-dimensional cube, and it seems like np.mgrid is just what I need for that. For example, np.mgrid[0:4,0:4] produces a 4 by 4 matrix containing all the indices into an array of the same shape.
The problem is that I want to do this in an arbitrary number of dimensions, based on the shape of another array. I.e. if I have an array a of arbitrary dimension, I want to do something like idx = np.mgrid[0:a.shape], but that syntax is not allowed.
Is it possible to construct the slice I need for np.mgrid to work? Or is there perhaps some other, elegant way of doing this? The following expression does what I need, but it is rather complicated and probably not very efficient:
np.reshape(np.array(list(np.ndindex(a.shape))),list(a.shape)+[len(a.shape)])
I usually use np.indices:
>>> a = np.arange(2*3).reshape(2,3)
>>> np.mgrid[:2, :3]
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
>>> np.indices(a.shape)
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
>>> a = np.arange(2*3*5).reshape(2,3,5)
>>> (np.mgrid[:2, :3, :5] == np.indices(a.shape)).all()
True
I believe the following does what you're asking:
>>> a = np.random.random((1, 2, 3))
>>> np.mgrid[map(slice, a.shape)]
array([[[[0, 0, 0],
[0, 0, 0]]],
[[[0, 0, 0],
[1, 1, 1]]],
[[[0, 1, 2],
[0, 1, 2]]]])
It produces exactly the same result as np.mgrid[0:1,0:2,0:3]except that it uses a's shape instead of hard-coded dimensions.
new = zero(rows_A,cols_B)
for i in range(rows_A):
for j in range(cols_B):
new[i][j] += np.sum(A[i] * B[:,j])
If I'm using this form of array [[0, 0, 0], [0, 1, 0], [0, 2, 1]] in B
it is giving me an error
TypeError: list indices must be integers, not tuple
but if I'm using same array B, in place of A, it's working well.
I am getting this type of return array
[[0, 0, 0], [0, 1, 0], [0, 2, 1]]
so i want to convert it into this form
[[0 0 0]
[0 1 0]
[0 2 1]]
numpy.asarray will do that.
import numpy as np
B = np.asarray([[0, 0, 0], [0, 1, 0], [0, 2, 1]])
This produces
array([[0, 0, 0],
[0, 1, 0],
[0, 2, 1]])
which can be indexed with [:, j].
Also, it looks like you're trying to do a matrix product. You can do the same thing with just one line of code using np.dot:
new = np.dot(A, B)
It appears that B is a list. You can't index it as B[:,i] -- Which is implcitly passed to __getitem__ as (slice(None,None,None),i) -- i.e. a tuple.
You could convert B to a numpy array first (B = np.array(B)) and then go from there ...