Clustering histograms of different lengths in Python - python

How can I cluster a large dataset of histograms with the same # of bins (8), but of different lengths? Specifically, I'd like to cluster their density distributions. I think I can do this with kmeans or hierarchical clustering, but it seems that the lengths are an issue or my setup is causing ValueError: setting an array element with a sequence.
hist_data[:, 1]
array([
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6,
6, 6, 6, 6, 6, 6, 5, 4, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5,
5], dtype=int64),
...,
array([6, 6, 6, 6, 6, 6, 6, 5, 5, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 4, 4, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 5, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 5, 5, 4, 4, 3, 3, 3, 3, 4, 5, 6, 6, 6, 6, 6,
6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5], dtype=int64)], dtype=object)

Does this fit the bill:
import numpy as np
from sklearn.cluster import KMeans
data = hist_data[:, 1]
data = np.array([np.bincount(datum, minlength=9) for datum in data])
km = KMeans(n_clusters = 10, init="k-means++").fit(data) # 10 clusters
print(km.cluster_centers_) # cluster centres, 10x8 array

Related

Confusion matrix: ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets

I'm working on a multiclass classification problem (12 classes), but I can't create the confusion matrix.
What I'm trying to do is:
from sklearn.metrics import multilabel_confusion_matrix
pred = model_BiLSTM.predict(X_val)
y_unique = np.unique(y_val)
mcm =confusion_matrix(y_val, pred, labels = y_unique)
But as per the title, the returned error is: "ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets", and the same happens when using confusion_matrix().
Instead, this is the output of
y_val, pred, y_val.shape, pred.shape
16/16 [==============================] - 0s 6ms/step
(array([11, 8, 3, 3, 9, 9, 9, 3, 3, 1, 3, 6, 6, 6, 2, 9, 1,
3, 10, 6, 9, 2, 9, 4, 3, 9, 6, 9, 3, 3, 3, 3, 8, 1,
9, 9, 9, 9, 2, 1, 9, 3, 9, 3, 9, 9, 2, 10, 3, 2, 9,
3, 8, 6, 8, 9, 9, 6, 6, 9, 8, 7, 9, 2, 9, 9, 2, 3,
9, 9, 2, 6, 3, 7, 11, 9, 9, 2, 11, 6, 7, 11, 6, 9, 2,
6, 9, 2, 9, 9, 3, 6, 9, 1, 11, 4, 4, 2, 6, 2, 9, 3,
10, 3, 2, 9, 9, 10, 6, 3, 9, 9, 6, 8, 6, 9, 4, 6, 5,
9, 6, 6, 3, 3, 3, 9, 6, 2, 9, 11, 6, 9, 3, 9, 6, 9,
2, 9, 3, 9, 6, 1, 6, 9, 9, 8, 3, 2, 9, 2, 8, 9, 9,
3, 10, 2, 4, 9, 9, 2, 8, 3, 6, 9, 9, 9, 6, 2, 2, 9,
7, 9, 3, 6, 7, 2, 9, 2, 9, 10, 9, 2, 9, 5, 7, 6, 5,
6, 3, 9, 7, 9, 3, 11, 5, 3, 3, 0, 8, 3, 9, 5, 9, 10,
9, 3, 3, 11, 2, 1, 8, 6, 2, 9, 3, 6, 2, 8, 9, 2, 8,
3, 3, 9, 6, 2, 1, 9, 9, 2, 2, 10, 9, 1, 6, 9, 9, 2,
9, 5, 6, 3, 9, 7, 6, 9, 9, 6, 3, 3, 2, 3, 9, 6, 9,
9, 9, 9, 9, 9, 9, 2, 9, 9, 9, 9, 3, 1, 6, 6, 6, 10,
9, 9, 9, 2, 9, 2, 3, 9, 9, 10, 6, 10, 6, 1, 1, 6, 1,
6, 3, 4, 6, 1, 11, 3, 9, 9, 4, 9, 9, 5, 8, 3, 5, 9,
6, 9, 9, 9, 3, 2, 5, 1, 9, 6, 1, 3, 9, 5, 3, 9, 6,
7, 2, 9, 9, 3, 5, 9, 6, 6, 3, 1, 9, 3, 9, 7, 9, 9,
7, 9, 10, 2, 9, 4, 9, 9, 9, 3, 9, 5, 3, 2, 9, 3, 8,
2, 9, 3, 11, 3, 3, 9, 2, 10, 9, 3, 9, 1, 9, 10, 5, 1,
2, 5, 9, 2, 3, 5, 9, 4, 8, 9, 5, 10, 3, 2, 6, 3, 6,
10, 11, 3, 6, 9, 3, 3, 0, 6, 7, 8, 9, 6, 5, 3, 1, 9,
2, 9, 9, 5, 2, 5, 3, 6, 11, 2, 9, 3, 6, 2, 9, 9, 3,
3, 3, 5, 6, 4, 4, 9, 2, 2, 5, 1, 0, 9, 4, 3, 3, 9,
9, 5, 2, 2, 2, 5, 7, 9, 3, 9, 9, 1, 2, 9, 6, 8, 2,
3, 4, 2, 3, 3, 2, 6, 5, 9, 5, 2, 2, 2, 9, 9, 6, 2,
9, 4, 9, 9, 2, 3, 8, 11, 9, 9], dtype=int32),
array([[5.7949889e-03, 2.5301890e-03, 5.9659913e-05, ..., 2.7534673e-03,
1.8798949e-03, 4.0977496e-01],
[2.1629781e-04, 1.0219574e-02, 1.2285617e-03, ..., 4.0498661e-04,
3.6948815e-04, 8.3618681e-04],
[8.1547890e-03, 1.1354284e-04, 1.3678521e-04, ..., 3.6535120e-01,
1.1546685e-03, 3.5349184e-03],
...,
[1.1976730e-03, 6.8558909e-02, 8.7605380e-03, ..., 1.7384565e-01,
5.4570078e-04, 2.0005915e-02],
[2.1097453e-02, 7.7744485e-03, 2.5690982e-01, ..., 5.4854238e-01,
3.9467164e-03, 1.6034273e-02],
[2.0812787e-03, 1.6885218e-05, 4.7070305e-05, ..., 4.3611538e-01,
3.6522493e-04, 1.4385413e-02]], dtype=float32),
(486,),
(486, 12))
If it helps, this is my model:
def build_BiLSTM_classifier(input_shape, classes):
input_layer = tfkl.Input(shape=input_shape, name='Input')
bilstm = tfkl.Bidirectional(tfkl.LSTM(128, return_sequences=True))(input_layer)
bilstm = tfkl.Bidirectional(tfkl.LSTM(128))(bilstm)
dropout = tfkl.Dropout(.5, seed=seed)(bilstm)
classifier = tfkl.Dense(128, activation='tanh')(dropout)
output_layer = tfkl.Dense(classes, activation='softmax')(classifier)
model = tfk.Model(inputs=input_layer, outputs=output_layer, name='model')
model.compile(loss=tfk.losses.SparseCategoricalCrossentropy(), optimizer=tfk.optimizers.Adam(), metrics='accuracy')
return model
What could I try to do?
Your pred array appears to contain class scores rather than predictions. Try pred.argmax(axis=1) instead.

Automatically reformat Python code in VS Code

Let's assume I copied some comma-separated data from somewhere and want to create a Python list. The data might look like this:
x = [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]
In PyCharm, I select this line and run "Code - Reformat Code", which produces the desired result (wrapped at column 79, PEP8):
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7,
8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5,
6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3,
4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1,
2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8,
9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6,
7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4,
5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2,
3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9,
1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]
How can achieve this in VS Code? I tried the Rewrap extension (only works for comments) as well as "Format Document" using autopep8 and black (via the Python extension), but none of these methods work.
Use yapf and use format on save.
editor.formatOnSave
python.formatting.provider:
vs code setting
result:
x = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7,
8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5,
6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3,
4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1,
2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8,
9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6,
7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4,
5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2,
3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9,
1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9
]

shuffle a list and then append it to another list

I use a for loop to shuffle a list and append it to another empty list (List A).
I can see each shuffled list is different, but the List A has been appended with multiple of the last shuffled list only.
print('---------shuffle list-------------------------------------')
matr=[ ]
entry=[1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in range(9):
shuffle(entry )
print(entry )
matr.append(entry)
print(matr)
'''
results below:
---------shuffle list-------------------------------------
[3, 1, 7, 5, 8, 9, 2, 6, 4]
[5, 4, 6, 8, 1, 9, 7, 2, 3]
[6, 4, 7, 5, 1, 3, 2, 9, 8]
[4, 9, 8, 1, 7, 3, 6, 5, 2]
[5, 1, 9, 2, 8, 6, 4, 7, 3]
[3, 5, 1, 4, 2, 6, 8, 9, 7]
[1, 2, 4, 6, 7, 8, 3, 9, 5]
[4, 8, 1, 6, 7, 3, 5, 9, 2]
[6, 4, 2, 1, 9, 8, 3, 5, 7]
[[6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7], [6, 4, 2, 1, 9, 8, 3, 5, 7]]
'''
it should have appended each of the shuffled list rather the last shuffled list.
Their same objects, so you gotta do, shuffle is an exception where you need to do this:
matr=[]
entry=[1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in range(9):
entry = entry.copy()
shuffle(entry)
print(entry)
matr.append(entry)
print(matr)
Output:
[9, 7, 6, 4, 3, 2, 8, 5, 1]
[4, 3, 2, 8, 7, 1, 5, 9, 6]
[3, 9, 7, 4, 5, 8, 2, 1, 6]
[6, 1, 9, 5, 2, 3, 4, 7, 8]
[5, 4, 7, 9, 8, 2, 6, 3, 1]
[2, 3, 5, 8, 6, 7, 9, 4, 1]
[5, 4, 9, 8, 3, 6, 1, 7, 2]
[5, 1, 2, 3, 7, 8, 6, 9, 4]
[6, 8, 9, 2, 1, 5, 3, 7, 4]
[[9, 7, 6, 4, 3, 2, 8, 5, 1], [4, 3, 2, 8, 7, 1, 5, 9, 6], [3, 9, 7, 4, 5, 8, 2, 1, 6], [6, 1, 9, 5, 2, 3, 4, 7, 8], [5, 4, 7, 9, 8, 2, 6, 3, 1], [2, 3, 5, 8, 6, 7, 9, 4, 1], [5, 4, 9, 8, 3, 6, 1, 7, 2], [5, 1, 2, 3, 7, 8, 6, 9, 4], [6, 8, 9, 2, 1, 5, 3, 7, 4]]

combinations of a list [duplicate]

This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 4 years ago.
I'm trying to get all the combinations of my list, but everytime two elements need to be removed, how do i remove those elements?
I tried to make a for loop two times and every time its removes two elements, but at the end it does not restore the list
indexes = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for een in indexes:
for twee in indexes:
temp = indexes
if een == twee:
pass
else:
if een in temp:
temp.remove(een)
temp.remove(twee)
print(temp)
temp = indexes
i expected to output every time a list of length of 9 but the list keeps getting shorter.
the output i got was:
[2, 3, 4, 5, 6, 7, 8, 9, 10]
[2, 3, 5, 6, 7, 8, 9, 10]
[2, 3, 5, 7, 8, 9, 10]
[2, 3, 5, 7, 9, 10]
[2, 3, 5, 7, 9]
[5, 7, 9]
[5, 9]
the first list is correct, but on the next one, the 1 does not return to the list. what am i doing wrong? Also after this one is done een should equal to 1 and do it all over again, after that een should equal to 2.....
this should be the output
[2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 3, 4, 5, 6, 7, 8, 9, 10]
[0, 3, 4, 5, 6, 7, 8, 9, 10]
[0, 2, 4, 5, 6, 7, 8, 9, 10]
[0, 2, 3, 5, 6, 7, 8, 9, 10]
[0, 2, 3, 4, 6, 7, 8, 9, 10]
[0, 2, 3, 4, 5, 7, 8, 9, 10]
[0, 2, 3, 4, 5, 6, 8, 9, 10]
[0, 2, 3, 4, 5, 6, 7, 9, 10]
[0, 2, 3, 4, 5, 6, 7, 8, 10]
[0, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 3, 4, 5, 6, 7, 8, 9, 10]
[0, 3, 4, 5, 6, 7, 8, 9, 10]
[0, 1, 4, 5, 6, 7, 8, 9, 10]
and should go one untill every combination is reached
Replace temp = indexes by temp = indexes[:]

Making a 10x10 grid from a list of arrays

I'm struggling to list my array as a 10x10 grid, the output I keep getting isn't what I'm looking for. I was hoping someone could help me out.
import numpy as np
x = 1
y = 1
scale = 10
nn = []
for x in range(1,scale+1):
mm = []
for y in range(1,scale+1):
xy = [x,y]
mm.append(xy)
#print(xy)
y=+1
nn.append(mm)
x=+1
nn
grid_array = np.array(nn)
grid=np.meshgrid(grid_array)
But the output I get isn't displaying 10x10
[array([ 1, 1, 1, 2, 1, 3, 1, 4, 1, 5, 1, 6, 1, 7, 1, 8, 1,
9, 1, 10, 2, 1, 2, 2, 2, 3, 2, 4, 2, 5, 2, 6, 2, 7,
2, 8, 2, 9, 2, 10, 3, 1, 3, 2, 3, 3, 3, 4, 3, 5, 3,
6, 3, 7, 3, 8, 3, 9, 3, 10, 4, 1, 4, 2, 4, 3, 4, 4,
4, 5, 4, 6, 4, 7, 4, 8, 4, 9, 4, 10, 5, 1, 5, 2, 5,
3, 5, 4, 5, 5, 5, 6, 5, 7, 5, 8, 5, 9, 5, 10, 6, 1,
6, 2, 6, 3, 6, 4, 6, 5, 6, 6, 6, 7, 6, 8, 6, 9, 6,
10, 7, 1, 7, 2, 7, 3, 7, 4, 7, 5, 7, 6, 7, 7, 7, 8,
7, 9, 7, 10, 8, 1, 8, 2, 8, 3, 8, 4, 8, 5, 8, 6, 8,
7, 8, 8, 8, 9, 8, 10, 9, 1, 9, 2, 9, 3, 9, 4, 9, 5,
9, 6, 9, 7, 9, 8, 9, 9, 9, 10, 10, 1, 10, 2, 10, 3, 10,
4, 10, 5, 10, 6, 10, 7, 10, 8, 10, 9, 10, 10])]
Edited.
This is what I have thus far, thanks for the help guys.
import numpy as np
scale = 10
array = np.empty(shape=(scale, scale, 2)).astype(int)
for x in range(1,scale+1):
for y in range(1,scale+1):
#print([x,y])
array[x-1,y-1] = [x,y]
print(array)
You can use numpy to do that. like this
np.reshape(arr, (-1,10))
See.
Convert a 1D array to a 2D array in numpy
It's pretty far from clear what you want to achieve, but if you simply want to know how to generate a 10x10 numpy array using two for loops, here is what you can do (not he most pythonic way to do it though):
import numpy as np
scale = 10
array = np.empty(shape=(scale, scale))
for x in range(scale):
for y in range(scale):
array[x,y] = 42 # replace with whatever dynamically assigned value you want there
print(array)

Categories

Resources