I wonder if there is a way to perform the MultiLabelBinarizer in sklearn with a specific dimension. For example we have the code as below:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
MultiLabelBinarizer().fit_transform(y)
We will get the dimension 5 as the existed numbers are 0,1..,4
array([[0, 0, 1, 1, 1],
[0, 0, 1, 0, 0],
[1, 1, 0, 1, 0],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0]])
My question is that How can we get the specific number of dimensions for this array for example dimension 6 so the answer should provide:
array([[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 1, 0, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]])
Is there a way to do this in sklearn or other methods or module in python that can handle this kind of desired result easily or we can just create this kind of array by our own algorithm?
Any ideas for this will be much appreciated. Thanks.
MultiLabelBinarizer accepts a parameter classes where you can indicate the ordering of the classes to be found. Providing a class that is not in the original array will add an extra dimension of 0 entries:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
MultiLabelBinarizer(classes=[0, 1, 2, 3, 4, 5]).fit_transform(y)
# output
>>>[[0 0 1 1 1 0]
[0 0 1 0 0 0]
[1 1 0 1 0 0]
[1 1 1 1 1 0]
[1 1 1 0 0 0]]
Note that since the parameter is actually meant to indicate the ordering of the classes, the sequence you provide is important. Further, when providing too few classes the unknown classes will be ignored and not appear in the transformed array.
Related
I want to create this from multiple arrays, best using NumPy:
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
However, I prefer if a library is used to create this, how do I go about doing this?
Note: NumPy can be used to create the array as well.
There are a lot of answers on SO, but they all provide answers that do not use libraries, and I haven't been able to find anything online to produce this!
You can use np.tril:
>>> np.tril(np.ones((6, 6), dtype=int))
array([[1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]])
Using numpy.tri
Syntax:
numpy.tri(N, M=None, k=0, dtype=<class 'float'>, *, like=None)
Basically it creates an array with 1's at and below the given diagonal and 0's elsewhere.
Example:
import numpy as np
np.tri(6, dtype=int)
>>>
array([[1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]])
I have an array of numbers between 0 and 3 and I want to create a 2D array of their binary digits.
in the future may be I need to have array of numbers between 0 and 7 or 0 to 15.
Currently my array is defined like this:
a = np.array([[0], [1], [2], [3]], dtype=np.uint8)
I used numpy unpackbits function:
b = np.unpackbits(a, axis=1)
and the result is this :
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1]], dtype=uint8)
As you can see it created a 2d array with 8 items in column while I'm looking for 2 columns 2d array.
here is my desired array:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
Is this related to data type uint8 ?
what is your idea?
One way of approaching the problem is to just adapt your b to match your desired output via a simple slicing, similarly to what suggested in #GrzegorzSkibinski answer:
import numpy as np
def gen_bits_by_val(values):
n = int(max(values)).bit_length()
return np.unpackbits(values, axis=1)[:, -n:].copy()
print(gen_bits_by_val(a))
# [[0 0]
# [0 1]
# [1 0]
# [1 1]]
Alternatively, you could create a look-up table, similarly to what suggested in #WarrenWeckesser answer, using the following:
import numpy as np
def gen_bits_by_num(n):
values = np.arange(2 ** n, dtype=np.uint8).reshape(-1, 1)
return np.unpackbits(values, axis=1)[:, -n:].copy()
bits2 = gen_bits_by_num(2)
print(bits2)
# [[0 0]
# [0 1]
# [1 0]
# [1 1]]
which allows for all kind of uses thereby indicated, e.g.:
bits4 = gen_bits_by_num(4)
print(bits4[[1, 3, 12]])
# [[0 0 0 1]
# [0 0 1 1]
# [1 1 0 0]]
EDIT
Considering #PaulPanzer answer the line:
return np.unpackbits(values, axis=1)[:, -n:]
has been replaced with:
return np.unpackbits(values, axis=1)[:, -n:].copy()
which is more memory efficient.
It could have been replaced with:
return np.unpackbits(values << (8 - n), axis=1, count=n)
with similar effects.
You can use the count keyword. It cuts from the right so you also have to shift bits before applying unpackbits.
b = np.unpackbits(a<<6, axis=1, count=2)
b
# array([[0, 0],
# [0, 1],
# [1, 0],
# [1, 1]], dtype=uint8)
This produces a "clean" array:
b.flags
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : True
# WRITEABLE : True
# ALIGNED : True
# WRITEBACKIFCOPY : False
# UPDATEIFCOPY : False
In contrast, slicing the full 8-column output of unpackbits is in a sense a memory leak because the discarded columns will stay in memory as long as the slice lives.
You can truncate b to keep just the columns since the first column with 1:
b=b[:, int(np.argwhere(b.max(axis=0)==1)[0]):]
For such a small number of bits, you can use a lookup table.
For example, here bits2 is an array with shape (4, 2) that holds the bits of the integers 0, 1, 2, and 3. Index bits2 with the values from a to get the bits:
In [43]: bits2 = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
In [44]: a = np.array([[0], [1], [2], [3]], dtype=np.uint8)
In [45]: bits2[a[:, 0]]
Out[45]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
This works fine for 3 or 4 bits, too:
In [46]: bits4 = np.array([[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 1], [0, 1, 0, 0], [
...: 0, 1, 0, 1], [0, 1, 1, 0], [0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 1], [1, 0, 1, 0], [1, 0,
...: 1, 1], [1, 1, 0, 0], [1, 1, 0, 1], [1, 1, 1, 0], [1, 1, 1, 1]])
In [47]: bits4
Out[47]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 1, 0, 0],
[0, 1, 0, 1],
[0, 1, 1, 0],
[0, 1, 1, 1],
[1, 0, 0, 0],
[1, 0, 0, 1],
[1, 0, 1, 0],
[1, 0, 1, 1],
[1, 1, 0, 0],
[1, 1, 0, 1],
[1, 1, 1, 0],
[1, 1, 1, 1]])
In [48]: x = np.array([0, 1, 5, 14, 9, 8, 15])
In [49]: bits4[x]
Out[49]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 1],
[1, 1, 1, 0],
[1, 0, 0, 1],
[1, 0, 0, 0],
[1, 1, 1, 1]])
I have tried this the below mentioned code:
counts=[[[(col.count(i)) for i in range(1,n)] for col in matrix] for matrix in lists]
print(counts)
and the code gives me
[[[1, 1, 1, 1, 0], [1, 1, 1, 0, 1], [1, 1, 0, 1, 1]], [[4, 0, 0, 0, 0], [1, 2, 1, 0, 0], [0, 0, 1, 3, 0]]]
this output
here 3, 5 element list created. I want 5, 3 element list. The output should look like:
[[[1, 1, 1], [1, 0, 1], [1, 1, 0], [1, 1, 1], [0, 1, 1]], [[4, 0, 0], [0, 0, 1], [2, 1, 0], [0, 0, 0], [1, 3, 0]]]
what kind of work should be performed on 'counts' list so that I get this desired output.
you don't really transpose anything as far as i can tell; this is just a reshaping. numpy lets you do that this way:
import numpy as np
lst = [[[1, 1, 1, 1, 0], [1, 1, 1, 0, 1], [1, 1, 0, 1, 1]],
[[4, 0, 0, 0, 0], [1, 2, 1, 0, 0], [0, 0, 1, 3, 0]]]
arr = np.array(lst)
res = arr.reshape((2, 5, 3))
which gives:
[[[1 1 1]
[1 0 1]
[1 1 0]
[1 1 1]
[0 1 1]]
[[4 0 0]
[0 0 1]
[2 1 0]
[0 0 0]
[1 3 0]]]
If I have an array of data like this:
[[1, 1, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 0, 1, 1],
[1, 0, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 0, 0]]
How do I cluster each grouping of 1s and assign each grouping of 1s a count such that I get an array like this:
[[1, 1, 0, 0, 0, 2, 2],
[1, 0, 0, 0, 0, 2, 2],
[1, 0, 3, 3, 0, 0, 2],
[0, 0, 3, 3, 0, 0, 0]]
Basically trying to identify each cluster of data points and assign that cluster of data points a specific value identifying it.
The skimage.measure.label() function (as already mentioned by Aaron) should give exactly the result you're looking for:
import numpy as np
import skimage
# Initialize example array
arr = np.array([
[1, 1, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 0, 1, 1],
[1, 0, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 0, 0],
])
# Label connected regions
result = skimage.measure.label(arr)
print(result)
# Output:
# [[1 1 0 0 0 2 2]
# [1 0 0 0 0 2 2]
# [1 0 3 3 0 0 2]
# [0 0 3 3 0 0 0]]
I have a list of 5 matrices:
import numpy as np
import pandas as pd
a=[(np.random.randint(2,size=(2,3))) for i in xrange(5)]
How do I create a pandas DataFrame of 5 records with a single column containing a matrrix for each row?
You can create the dataframe by running :
df= pd.DataFrame({'array':a})
Output :
array
0 [[0, 0, 0], [0, 0, 0]]
1 [[0, 1, 1], [0, 0, 0]]
2 [[1, 0, 0], [0, 1, 1]]
3 [[1, 0, 1], [1, 0, 0]]
4 [[0, 0, 0], [0, 0, 1]]
If you want to apply cumsum over the column you can use apply
df['array']=df['array'].apply(np.cumsum)
output:
array
0 [0, 0, 0, 0, 0, 0]
1 [0, 1, 2, 2, 2, 2]
2 [1, 1, 1, 1, 2, 3]
3 [1, 1, 2, 3, 3, 3]
4 [0, 0, 0, 0, 0, 1]