Pythonic Nested for - loops in Python - python

I am working on this code where I have nested for loops. a_list and b_list are list of tuples, where each tuple is made up of two tensors [(tens1, tens2), ...]. I am trying to compute the similarity of every tens1 in a_list to every tens1 in b_list. Below is the code I have. And the nested loop appears to be a bottleneck. Is there a better way(pythonic) that I can re-write the loops?
a2b= defaultdict(dict)
b2a= defaultdict(dict)
ab_sim = []
for a, vec_a in a_list:
for b, vec_b in b_list:
# Ignore combination if the first element in both a and b are same
if a[0] == b[0]:
continue
# Calculate cosine similarity of combination
sim = self.calculate_similarity(vec_a, vec_b )
a2b[a][b] = sim
b2a[b][a] = sim
ab_sim.append(sim)
The calculate_similarity is just a method computing cosine similarity. a_list and b_list could be of any size. I have b2a and a2b because I need them for other computations.

You could use a dictionary comprehension:
a2b = {a: {b: self.calculate_similarity(vec_a, vec_b )
for (b, vec_b) in b_list if a[0] != b[0]} for (a, vec_a) in a_list}

I think the most natural way to store this information is in a Matrix
from random import random
import numpy as np
n=3
a=[ (n*np.random.rand(n)//1, n*np.random.rand(n)//1) for _ in range(3) ]
b=[ (n*np.random.rand(n)//1, n*np.random.rand(n)//1) for _ in range(3) ]
similarity = lambda x,y: np.dot(x, y)/(np.linalg.norm(x)*np.linalg.norm(y))
sim_matrix = [[ similarity(x,y) if x[0]!=y[0] else np.inf for y,_ in b ] for x,_ in a]
print(*sim_matrix, sep="\n")
Complete code.

You can also use product from itertools:
from itertools import product
result = {(a,b): self.calculate_similarity(vec_a, vec_b)
if a[0] != b[0] else 1
for ((a, vec_a) ,(b, vec_b)) in product(a_list, b_list)}
Note that you get one dictionary with tuples index, instead of two redundant dictionaries.
EDITED: to get those two dictionaries back, you can use dict comprehension:
a2b = {a: {_b:v for (_a,_b), v in result.items() if _a==a} for (a,b) in result.keys()}
b2a = {b: {_a:v for (_a,_b), v in result.items() if _b==b} for (a,b) in result.keys()}
to get the list of similarity values, you can use ab_sim = list(result.values())

Related

Random selection from a sub-list to fill an empty sub-list

I have two list of lists,
A = [['a'],[],['l'],[]]
B = [['m','n'],['p'],[],['q','r','s']]
I needed the output to be
c = [['a'],['p'],['l'],['s']]
Whenever there is a empty sublist in A I want to append a random selection from the corresponding sublist from B.
my approach for this is not working
import random
c = [x+random.sample(y,1) for x,y in zip(A,B) if len(x)==0 and len(y)>=1]
you can use a conditional expression to decide what element to store in your list:
from random import choice
[a if a else [choice(b)] for a, b in zip(A, B)]
You can use the or operator to fall back to a random selection from B using random.choices:
from random import choices
[a or choices(b) for a, b in zip(A, B)]
You need to use the if like this:
import random
A = [['a'],[],['l'],[]]
B = [['m','n'],['p'],[],['q','r','s']]
C = [
a_sub_list if a_sub_list else [random.choice(b_sub_list)] for a_sub_list, b_sub_list in zip(A,B)
]
print(C)
>>> [['a'], ['p'], ['l'], ['q']]

How can I copy the order of one array into another? [Python]

I want to match the order of the values of one array to another.
For example, I have A and B:
A = [239 1678 2678 4430 199]
Order A:
A5 < A1 < A2 < A3 < A4
Now my B list is:
B = [ 4126.77552299 984.39685939 237.92397237 497.72447701 3377.17916825]
Order B:
B3 < B4 < B2 < B5 < B1
I want B to be in the same order than A, like this:
B = [ 497.72447701 984.39685939 3377.17916825 4126.77552299 237.92397237]
Order B:
B5 < B1 < B2 < B3 < B4
A and B are examples, in my case I have a lot of tuples of vectors with a lot of different sizes. I need a general expresion to order one vector in function of another.
How can I do it?
We sort B in regular sorted order:
sorted_B = sorted(B)
Find where each element of sorted_B should go:
locations = sorted(range(len(A)), key=A.__getitem__)
And place those elements where they need to go in the result:
result = [None]*len(B)
for i, elem in zip(locations, sorted_B):
result[i] = elem
Here's a link to a demo showing that this produces the correct output for your example.
If you want to fill the results back into B directly instead of a copy, you can do that:
sorted_B = sorted(B)
locations = sorted(range(len(A)), key=A.__getitem__)
for i, elem in zip(locations, sorted_B):
B[i] = elem
If you're working in NumPy and these are NumPy arrays, rather than lists, we should do the job with NumPy operations:
sorted_B = numpy.sort(B)
locations = numpy.argsort(A)
result = numpy.empty_like(B)
result[locations] = sorted_B
or in fewer lines:
result = numpy.empty_like(B)
result[numpy.argsort(A)] = numpy.sort(B)
Demo
Or if you want to fill the results into B instead of a copy:
B[np.argsort(A)] = np.sort(B)
You may create a temporary dict to map the index of sorted list with the order. Then use the temporary dict to get the needed result as:
>>> A = [ 239, 1678, 2678, 4430, 199]
>>> B = [4126.77552299, 984.39685939, 237.92397237, 497.72447701, 3377.17916825]
# Temporary dict for mapping the `index` key with order as `value`
>>> order = {A.index(j): i for i, j in enumerate(sorted(A))}
>>> order
{0: 1, 1: 2, 2: 3, 3: 4, 4: 0}
>>> sorted_B = sorted(B) # sorted `B` list
>>> ordered_B = [sorted_B[order[i]] for i in range(len(B))]
>>> ordered_B # desired output
[497.72447701, 984.39685939, 3377.17916825, 4126.77552299, 237.92397237]
Here's another way to do it. Using sorted with the key argument to use any key function you want.
from operator import itemgetter
A = [239, 1678, 2678, 4430, 199]
B = [4126.77552299, 984.39685939, 237.92397237, 497.72447701, 3377.17916825]
# first get the order of list A
order, _ = zip(*sorted(enumerate(A), key=itemgetter(1)))
# then use that order to sort the other list
[n[0] for n in sorted(zip(sorted(B), order), key=itemgetter(1))]
[497.72447701, 984.39685939, 3377.17916825, 4126.77552299, 237.92397237]
Instead of itemgetter(1), you could use this lambda key=lambda x: x[1]
Edit: As #mkrieger pointed out, you don't actually need a key function if you sort a list of tuples, if you put the key value as the first element in the tuple.
zip(A, range(len(A))) is similar to enumerate, but with the indices and items swapped.
So this would also get the same result, with a bit simpler code, since you don't need the key functions, and no imports are needed either.
order = [i for _, i in sorted(zip(A, range(len(A))))]
B = [b for _, b in sorted(zip(order, sorted(B)))]
demo: http://ideone.com/zCXOHu

Check if items in list a are found in list b and return list c with matching indexes of list b in Python

I have list a = ["string2" , "string4"] and list b = ["string1" , "string2" , "string3" , "string4" , "string5"] and I want to check if "string2" and "string4" from list a match those in list b and if it does, append list c with it's corresponding index in list b so list c should be [1,3]
My code so far:
for x in a:
for y in b:
if x == y:
print (x)
So I managed to print them out but don't know how to get the index.
Now this is the simpler version of my problem and I could just solve it like this but just for fun I will tell you the whole thing.
I have a list of tuples generated with nltk.word_tokenize in the following format [('string1', 'DT'), ('string2', 'NNP'), ('string3', 'NNP'), ('string4', 'NNP'), ('string5', 'VBZ'), ("string6", 'RB')] and I want to check witch of the words(string1, string2, string3 etc) are found in another list of words (the stopwords list ex: stopwords = ["string312" , "string552" , string631"]) and if found I would like to know their index in my list of tuples by creating another list that will store those indexes or remain empty if none found.
You can use index from your second list, while iterating over elements of the first list in a list comprehension.
>>> a = ["string2" , "string4"]
>>> b = ["string1" , "string2" , "string3" , "string4" , "string5"]
>>> c = [b.index(i) for i in a]
>>> c
[1, 3]
If there is a possibility that an element may be in a but not in b then you can modify this slightly
>>> [b.index(i) for i in a if i in b]
[1, 3]
A continuation to your posted code:
c = []
for x in a:
for y in b:
if x == y:
print(x)
c.append(b.index(x))
Use enumerate combined with list comprehension to get the indexes directly in a list.
>>> [i for i,j in enumerate(b) if j in a]
[1,3]
You can make a dictionary of element->index by using enumerate on b. This has linear time complexity, but after you complete this step, all of your index lookups will be in constant time O(1), and you'll also have an easy way to see if the value from a could not be found in b, because dict.get will return None. You will also be able to do an O(1) filter operation on a by checking the existence of its elements in the dictionary first, which also makes your second loop have linear time complexity.
>>> a = [50, 150, 250]
>>> b = list(range(200))
>>> bindex = {x: i for i, x in enumerate(b)}
>>> [bindex.get(x) for x in a]
[50, 150, None]
>>> [bindex[x] for x in a if x in bindex]
[50, 150]
If you are comfortable with sets, you can use the intersection property of sets.
set1 = set(a)
set2 = set(b)
set3 = a & b #intersection
You can convert back 'set3' to a list and use a list comprehension.
c = list(set3)
[c.index(i) for i in c]

compare two lists with zeros en larger numbers

I have to compare two lists of elements with the same length. (as an example [0,562,256,0,0,856] and [265,0,265,0,874,958]. Both list have an amount of zeroes and an amount of numbers above 249. I want to compare these lists. If at an index both lists have a number different from 0 the number should be saved in a list. The result should be two lists with the same length with only numbers above 249 (in the example [256,856] and [265,958]). Thanks for your help!
Use zip() to pair up the elements of each list:
listA = [0,562,256,0,0,856]
listB = [265,0,265,0,874,958]
combined = zip(listA, listB)
resultA = [a for a, b in combined if a and b]
resultB = [b for a, b in combined if a and b]
gives:
>>> resultA
[256, 856]
>>> resultB
[265, 958]
You could also first use filter() to remove all pairs where one or the other element is 0:
combined = filter(lambda (a, b): (a and b), zip(listA, listB))
resultA = [a for a, b in combined]
resultB = [b for a, b in combined]
maybe we will find a better way,but
list1 = [0,562,256,0,0,856]
list2 = [265,0,265,0,874,958]
rest1 = []
rest2 = []
result1 = []
result2 = []
for i in range(len(list1)):
if list1[i] and list2[i]:
rest1.append(list1[i])
rest2.append(list2[i])
for i in range(len(rest1)):
if rest1[i] >249 and rest2[i]>249:
result1.append(rest1[i])
result2.append(rest2[i])
print(result1,result1)

How to check if elements of one list/dict present in another list/dict in python

here is a scenario, I am checking if elements of A are present in B. while this code works, it takes a lot of time when I read through million of lines. The efficient way would be to make each list in A and B as dictionary and look if they are present in each other. But I am not able to think of a simple way to do dictionary lookup. That is for each key-value pair in dict A, I want to check if that key-value pair is present in dictB
A = [['A',[1,2,3]],['D',[3,4]],['E',[6,7]]]
B= [['A',[1,2,3]],['E',[6,7]],['F',[8,9]]]
count = 0
for line in A:
if len(line[1]) > 1:
if line in B:
count = count + 1
print count
convert the lists to tuples
convert the list of tuples to a set
intersect the two sets
print the length of the intersection
Example:
A = [['A',[1,2,3]],['D',[3,4]],['E',[6,7]]]
B = [['A',[1,2,3]],['E',[6,7]],['F',[8,9]]]
A_set = set((a, tuple(b)) for a, b in A)
B_set = set((a, tuple(b)) for a, b in B)
print len(A_set & B_set)
You could always try with a list comprehension and work your way up using this as a basis:
a = [[1], [5], [7]]
b = [[5], [7], [0]]
r = [x for x in a if x in b]
Make A and B into dictionaries:
dA = dict(A)
dB = dict(B)
Then, just check that the keys and values match:
count = 0
for k,v in dA.iteritems():
if dB.get(k) == v:
count += 1
You will need to do some pre-processing on B, so it's elements are immutable.
def deep_tuple(x):
if type(x) in [type(()), type([])]:
return tuple(map(deep_tuple,x))
return x
A = ['A',[1,2,3]],['D',[3,4]],['E',[6,7]]
B = ['A',[1,2,3]],['E',[6,7]],['F',[8,9]]]
B = set(deep_tuple(B))
count = 0
for line in A:
if len(line[1]) > 1:
if line in B:
count = count + 1
print count
Make B into a set. Lookup in a set is O(1), as compared to O(|B|) otherwise. The overall complexity of this operation will be proportional to O(|A|).

Categories

Resources