I have clustered some data points twice and obtained four clusters (A=1,B=2,C=3,D=4) for both of them. I want to assess the overall stability of the clustering, but also assess each cluster individually (cluster A for the first result(A1) vs cluster A for the second result(A2), B1 vs B2, C1 vs C2, and D1 vs D2).
For the overall stability, I am using the adjusted rand index (ARI) function and have no problem. Nevertheless, when I want to assess ex. A1 vs A2, I don't really know how I should proceed.
The clustering results are the following:
c1 <- c(1, 2, 3, 2, 1, 3, 4, 3, 2, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 2, 3, 2, 3, 2, 1, 3, 4, 4, 4, 4, 3, 2, 3, 2, 3, 1, 3, 2, 1, 2, 3, 4, 3, 2, 1, 4, 3, 2, 2, 2, 3, 4, 3, 3, 3, 2, 1, 1, 1, 2)
c2 <- c(1, 2, 4, 4, 1, 3, 4, 2, 2, 2, 3, 4, 1, 2, 1, 2, 3, 4, 3, 2, 1, 2, 2, 4, 2, 3, 2, 3, 2, 1, 3, 3, 4, 3, 4, 3, 2, 3, 2, 3, 1, 1, 1, 1, 2, 3, 4, 3, 2, 1, 4, 3, 2, 2, 2, 3, 4, 3, 3, 3, 2, 1, 1, 1, 2)
Is there any good strategy to look between each type of cluster (ex. A1 vs A2)?
Suggestions that require R or python syntax are accepted.
Thanks in advance!
I'm making a script that turns pixel art into voxel art. I have an image and have created a grouped list of each pixel's RGBA values as tuples
RGBA_list = list(img.pixels)
gl = [RGBA_list[ipx] for ipx in range(0, len(RGBA_list), 4)]
This is for pixel art so many of the colours are the exact same and there are a small number of unique ones. Is there a way to get multiple lists of the indices of each colour?
Or better worded: get each individual colour within the image, then for each colour get a list of indices from the grouped list with that colour
You have a list with repeated values and you want to convert that into value->list_of_index_where_it_occurs map. You can make use of defaultdict. (other way is to use setdefault, but defaultdict make it easier and cleaner.)
Please see example below:
>>> info_list = [random.randint(1,5) for x in range(100)]
>>> info_list
[4, 5, 2, 2, 4, 3, 3, 1, 2, 4, 4, 2, 3, 1, 2, 3, 1, 4, 4, 2, 1, 1, 3, 3, 2, 1, 4, 4, 1, 5, 2, 2, 3, 5, 1, 4, 1, 4, 1, 3, 3, 2, 3, 2, 5, 4, 5, 3, 4, 4, 3, 2, 3, 2, 1, 2, 2, 4, 4, 1, 5, 2, 1, 1, 2, 3, 4, 5, 3, 4, 4, 3, 4, 1, 3, 4, 2, 1, 5, 3, 4, 3, 3, 5, 2, 2, 4, 5, 2, 2, 1, 5, 4, 5, 5, 1, 5, 3, 2, 2]
>>> from collections import defaultdict
>>> info_dict = defaultdict(list)
>>> for i, x in enumerate(info_list):
... info_dict[x].append(i)
...
>>> info_dict.keys()
dict_keys([4, 5, 2, 3, 1])
>>> info_dict[1]
[7, 13, 16, 20, 21, 25, 28, 34, 36, 38, 54, 59, 62, 63, 73, 77, 90, 95]
This question already has answers here:
Sort list by frequency
(8 answers)
Closed 3 years ago.
A given array is to be sorted on the basis of the frequency of occurrence of its elements.
I tried using key=arr.count (arr is the name of the list I want to sort). It works for some inputs. I also tried using the collections.Counter() class object, it behaved similarly to how arr.count did.
>>> arr = [6, 4, 6, 4, 4, 6, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 1, 7, 7, 7, 2, 2, 2, 7, 1, 7, 1, 2, 1, 2, 7, 1, 1, 7, 2, 1, 2]
>>> sorted(arr, key=arr.count)
[6, 4, 6, 4, 4, 6, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 1, 7, 7, 7, 2, 2, 2, 7, 1, 7, 1, 2, 1, 2, 7, 1, 1, 7, 2, 1, 2]
>>> sorted(arr, key=counts.get)
[6, 4, 6, 4, 4, 6, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 1, 7, 7, 7, 2, 2, 2, 7, 1, 7, 1, 2, 1, 2, 7, 1, 1, 7, 2, 1, 2]
Expected output is:
1 1 1 1 1 1 1 2 2 2 2 2 2 2 7 7 7 7 7 7 7 3 3 3 3 3 3 5 5 5 5 4 4 4 6 6 6
Not sure what I am doing wrong here.
Use a tuple to sort first by frequency and then by value, for inverting the ordering you can use - (so smallest numbers comes first), and then since you want the biggest count first use reverse:
sorted(arr, key=lambda x: (arr.count(x), -x), reverse=True)
Output:
[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 7, 7, 7, 7, 7, 7, 7, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 4, 4, 4, 6, 6, 6]
I think the problem is that some entries have the same frequency, e.g.:
arr.count(1) == arr.count(2) == arr.count(7)
To make sure that these entries remain grouped, you have to sort not only by counts, but also by value:
counts = collections.Counter(arr)
sorted(arr, key=lambda x: (counts[x], x), reverse=True)
Output:
[7, 7, 7, 7, 7, 7, 7, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 6, 6, 6, 4, 4, 4]
I have created a list
a=[1,2,3,4,5]*100
I now need to create another list that will contain the first 8 prime number locations from within a.
I have tried these two lines of code and they didn't work
b=a[2:3:5:7:11:13:17:19]
a[2:3:5:7:11:13:17:19]=b
The output for list A is "[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]" so its the locations 2,3,5,7,11,13,17,19 out of that output
a=[1,2,3,4,5]*100
indices = [2,3,5,7,11,13,17,19]
b = []
for i in indices:
b.append(a[i])
print(b)
You have to access each element individually. b=a[2:3:5:7:11:13:17:19] is not valid syntatically in Python. Actually, this is not the way to access elements at particular indices.
Pythonic way to do the same thing (It will reduce code length) using List Comprehension:
indices = [2,3,5,7,11,13,17,19]
b = [a[i] for i in indices]
I would try it like this using list comprehension (beware the test_prime method is not optimized at all):
def test_prime(n):
if (n==1):
return False
elif (n==2):
return True;
else:
for x in range(2,n):
if(n % x==0):
return False
return True
a=[1,2,3,4,5]*100
b = [item for item in range(len(a)) if test_prime(a[item])]
b = b[0:8]
print b
which outputs (note Python counts from 0, so the first element of an array is 0 and not 1):
[1, 2, 4, 6, 7, 9, 11, 12]