Pairwise similarity - python

I have pandas dataframe that looks like this:
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
name cards
0 ['A', 'B', 'C', 'D']
1 ['B', 'C', 'D', 'E']
2 ['E', 'F', 'G', 'H']
3 ['A', 'A', 'E', 'F']
And I'd like to create a matrix that looks like this:
name 0 1 2 3
name
0 4 3 0 1
1 3 4 1 1
2 0 1 4 2
3 1 1 2 4
Where the values are the number of items in common.
Any ideas?

Using .apply method and lambda we can directly get a dataframe
def func(df, j):
return pd.Series([len(set(i)&set(j)) for i in df.cards])
newdf = df.cards.apply(lambda x: func(df, x))
newdf
0 1 2 3
0 4 3 0 1
1 3 4 1 1
2 0 1 4 2
3 1 1 2 3

By list comprehension and iterate through all pairs we can make the result:
import pandas as pd
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
result=[[len(list(set(x) & set(y))) for x in df['cards']] for y in df['cards']]
print(result)
output :
[[4, 3, 0, 1], [3, 4, 1, 1], [0, 1, 4, 2], [1, 1, 2, 3]]
'&' is used to calculate intersection of two sets
This is exactly what you want:
import pandas as pd
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
result=[[len(x)-max(len(set(y) - set(x)),len(set(x) - set(y))) for x in df['cards']] for y in df['cards']]
print(result)
output:
[[4, 3, 0, 1], [3, 4, 1, 1], [0, 1, 4, 2], [1, 1, 2, 4]]

import pandas as pd
import numpy as np
df = pd.DataFrame([['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']])
nrows = df.shape[0]
# Initialization
matrix = np.zeros((nrows,nrows),dtype= np.int64)
for i in range(0,nrows):
for j in range(0,nrows):
matrix[i,j] = sum(df.iloc[:,i] == df.iloc[:,j])
output
print(matrix)
[[4 1 0 0]
[1 4 0 0]
[0 0 4 0]
[0 0 0 4]]

Related

Need take indices from list in specific order

I have some list A = ['a', 'b', 'c', 'd', 'e', 'f']
I need take indices of elements in this order 0 1 1 2 2 3 3 4 4 5
But my code made this order 0 1 2 3 4 5
A = ['a', 'b', 'c', 'd', 'e', 'f']
for i in A:
print(A.index(i), end=' ')
If you have the desired indices why not try this:
X = [0, 1, 1, 2, 2, 3, 3, 4, 4, 5]
A = ['a', 'b', 'c', 'd', 'e', 'f']
for i in X:
print(A[i], end=' ')
Using list comprehension to extract the values corresponding to the indices.
X = [0, 1, 1, 2, 2, 3, 3, 4, 4, 5]
A = ['a', 'b', 'c', 'd', 'e', 'f']
new_list = [A[x] for x in X]
Update
How to make a flat list out of a list of lists Used to flatten nested list
list_of_list = [[x,x] for x in range(1,len(A))]
new_list = [0]+[item for sublist in list_of_list for item in sublist]

Combination of pair elements within list in a list

I'm trying to obtain the combinations of each element in a list within a list. Given this case:
my_list
[['A', 'B'], ['C', 'D', 'E'], ['F', 'G', 'H', 'I']]
The output would be:
0
1
0
A
B
1
C
D
2
C
E
3
D
E
4
F
G
5
F
H
6
F
I
7
G
H
8
G
I
9
H
I
Or it could also be a new list instead of a DataFrame:
my_new_list
[['A','B'], ['C','D'], ['C','E'],['D','E'], ['F','G'],['F','H'],['F','I'],['G','H'],['G','I'],['H','I']]
This should do it. You have to flatten the result of combinations.
from itertools import combinations
x = [['A', 'B'], ['C', 'D', 'E'], ['F', 'G', 'H', 'I']]
y = [list(combinations(xx, 2)) for xx in x]
z = [list(item) for subl in y for item in subl]
z
[['A', 'B'],
['C', 'D'],
['C', 'E'],
['D', 'E'],
['F', 'G'],
['F', 'H'],
['F', 'I'],
['G', 'H'],
['G', 'I'],
['H', 'I']]
Create combination by itertools.combinations with flatten values in list comprehension:
from itertools import combinations
L = [['A', 'B'], ['C', 'D', 'E'], ['F', 'G', 'H', 'I']]
data = [list(j) for i in L for j in combinations(i, 2)]
print (data)
[['A', 'B'], ['C', 'D'], ['C', 'E'],
['D', 'E'], ['F', 'G'], ['F', 'H'],
['F', 'I'], ['G', 'H'], ['G', 'I'],
['H', 'I']]
And then pass to DataFrame by constructor:
df = pd.DataFrame(data)
print (df)
0 1
0 A B
1 C D
2 C E
3 D E
4 F G
5 F H
6 F I
7 G H
8 G I
9 H I
def get_pair( arrs ):
result = []
for arr in arrs:
for i in range(0, len(arr) - 1 ):
for j in range( i + 1, len(arr) ):
result.append( [arr[i], arr[j]] )
return result
arrs = [['A', 'B'], ['C', 'D', 'E'], ['F', 'G', 'H', 'I']]
print( get_pair(arrs) )

Get right label using indices?

Really stupid question as I am new to python:
If I have labels = ['a', 'b', 'c', 'd'],
and indics = [2, 3, 0, 1]
How should I get the corresponding label using each index so I can get: ['c', 'd', 'a', 'b']?
There are a few alternatives, one, is to use a list comprehension:
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = [labels[i] for i in indices]
print(result)
Output
['c', 'd', 'a', 'b']
Basically iterate over each index and fetch the item at that position. The above is equivalent to the following for loop:
result = []
for i in indices:
result.append(labels[i])
A third option is to use operator.itemgetter:
from operator import itemgetter
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = list(itemgetter(*indices)(labels))
print(result)
Output
['c', 'd', 'a', 'b']

Python: how to replicate the same row of a matrix?

How can I copy each row of an array n times?
So if I have a 2x3 array, and I copy each row 3 times, I will have a 6x3 array. For example, I need to convert A to B below:
A = np.array([[1, 2, 3],
[4, 5, 6]])
B = np.array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[4, 5, 6]])
If possible, I would like to avoid a for loop.
If I read correctly, this is probably what you want assuming you started with mat:
transformed = np.concatenate([np.vstack([mat[:, i]] * 3).T for i in range(mat.shape[1])], axis=1)
Here's a verifiable example:
# mocking a starting array
import string
mat = np.random.choice(list(string.ascii_lowercase), size=(5,3))
>>> mat
array([['s', 'r', 'e'],
['g', 'v', 'c'],
['i', 'b', 'd'],
['f', 'g', 's'],
['o', 'm', 'w']], dtype='<U1')
Transform it:
# this repeats it 3 times for sake of displaying
transformed = np.concatenate([np.vstack([mat[i, :]] * 3).T for i in range(mat.shape[0])], axis=1).T
>>> transformed
array([['s', 'r', 'e'],
['s', 'r', 'e'],
['s', 'r', 'e'],
['g', 'v', 'c'],
['g', 'v', 'c'],
['g', 'v', 'c'],
['i', 'b', 'd'],
['i', 'b', 'd'],
['i', 'b', 'd'],
['f', 'g', 's'],
['f', 'g', 's'],
['f', 'g', 's'],
['o', 'm', 'w'],
['o', 'm', 'w'],
['o', 'm', 'w']], dtype='<U1')
The idea of this is to use vstack to concatenate each column to itself multiple time, and then concatenate the result of that to get the final array.
You can use np.repeat with integer positional indexing:
B = A[np.repeat(np.arange(A.shape[0]), 3)]
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[4, 5, 6]])
v1=[3,2]
v3=v1[:]*10
print(v3)
np.repeat is exactly what you are looking for. You can use the axis option to specify that you want to duplicate rows.
B = np.repeat(A, 3, axis=0)

python: output data from a list

I'm trying to figure out how to output list items. the code below is taking answers and checking them against a key to see which answers are correct. for each student correct answers are stored in correct_count. Then I'm sorting in ascending order based on the correct count.
def main():
answers = [
['A', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['D', 'B', 'A', 'B', 'C', 'A', 'E', 'E', 'A', 'D'],
['E', 'D', 'D', 'A', 'C', 'B', 'E', 'E', 'A', 'D'],
['C', 'B', 'A', 'E', 'D', 'C', 'E', 'E', 'A', 'D'],
['A', 'B', 'D', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['E', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D']]
keys = ['D', 'B', 'D', 'C', 'C', 'D', 'A', 'E', 'A', 'D']
grades = []
# Grade all answers
for i in range(len(answers)):
# Grade one student
correct_count = 0
for j in range(len(answers[i])):
if answers[i][j] == keys[j]:
correct_count += 1
grades.append([i, correct_count])
grades.sort(key=lambda x: x[1])
# print("Student", i, "'s correct count is", correct_count)
if __name__ == '__main__':
main()
if I print out grades the output looks like this
[[0, 7]]
[[1, 6], [0, 7]]
[[2, 5], [1, 6], [0, 7]]
[[3, 4], [2, 5], [1, 6], [0, 7]]
[[3, 4], [2, 5], [1, 6], [0, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]]
what I'm interested in is the last row. The first number of each set corresponds to a student id and it's sorted in ascending order based on the 2nd number which represents a grade (4, 5, 6, 7, 7, 7, 7, 8).
I'm not sure how to grab that last row and iterate through it so that i get output like
student 3 has a grade of 4 and student 2 has a grade of 5
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]]
def main():
answers = [
['A', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['D', 'B', 'A', 'B', 'C', 'A', 'E', 'E', 'A', 'D'],
['E', 'D', 'D', 'A', 'C', 'B', 'E', 'E', 'A', 'D'],
['C', 'B', 'A', 'E', 'D', 'C', 'E', 'E', 'A', 'D'],
['A', 'B', 'D', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['E', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D']]
keys = ['D', 'B', 'D', 'C', 'C', 'D', 'A', 'E', 'A', 'D']
grades = []
# Grade all answers
for i in range(len(answers)):
# Grade one student
correct_count = 0
for j in range(len(answers[i])):
if answers[i][j] == keys[j]:
correct_count += 1
grades.append([i, correct_count])
grades.sort(key=lambda x: x[1])
for student, correct in grades:
print("Student", student,"'s correct count is", correct)
if __name__ == '__main__':
main()
What you were doing was printing grades while you were still in the loop. If you would've printed grades after both loops, you would've only seen the last line: [[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]], then just loop through grades and python will "unpack" the list into the student, and grade, respectively ash shown above.
Here is the output:
Student 3 's correct count is 4
Student 2 's correct count is 5
Student 1 's correct count is 6
Student 0 's correct count is 7
Student 5 's correct count is 7
Student 6 's correct count is 7
Student 7 's correct count is 7
Student 4 's correct count is 8
Don't forget to click the check mark if you like this answer.
What about something like the following:
students_grade = {}
for id, ans in enumerate(answers):
students_grade[id] = sum([x == y for x, y in zip(ans, key)])
Now you have a dictionary with the id of students mapping to their score ;)
Of course, you can change the enumerate to have the true list of ids instead!
While MMelvin0581 already addressed the problem in your code, You can also use nested list comprehension to achieve the same results
>>> [(a,sum([1 if k==i else 0 for k,i in zip(keys,j)])) for a,j in enumerate(answers)]
This will produce output like:
>>> [(0, 7), (1, 6), (2, 5), (3, 4), (4, 8), (5, 7), (6, 7), (7, 7)]
Then you can sort your results based on the criteria
>>> from operator import itemgetter
>>> sorted(out, key=itemgetter(1))
Note: itemgetter will have slight performance benefit over lambda. The above operation will produce output like:
>>> [(3, 4), (2, 5), (1, 6), (0, 7), (5, 7), (6, 7), (7, 7), (4, 8)]
Then finally print your list like:
for item in sorted_list:
print("Student: {} Scored: {}".format(item[0],item[1]))

Categories

Resources