Compare two lists for same entries at same place - python

I am trying to compare two lists for the same element at the same index. The idea is to verify whether both lists contain same element at the same index. If yes, I want to count such occurrences. Here is my code:
count = 0
a = ['.ps2\n >|<4 *|*.ps2xml', '.c\n >|<2 *|*.wsc', '.h\n >|<2 *|*.wsh', '.c\n >|<2 *|*.chm', '.h\n >|<2 *|*.hta' ]
b = ['.ps2xml', '.chm', '.hta']
for x in a:
for y in b:
if y==x[x.index(" *|*")+4:]:
print "match"
count += 1
print count
This gives me a count of 3. What I expect is 1 because only first element of b matched with a's first element. The second element of both lists differ. The third elements are also different. The remaining elements in list a should not count as there is no such index in b.
Hope it makes sense. Thanks

In that case, you should not use nested loops (since this means you will repeat the search over b for each line in a); but use a zip(..):
for x,y in zip(a,b):
if y==x[x.index(" *|*")+4:]:
print "match"
count += 1
print count
zip takes some iterators and generates tuples. In this case the i-th tuple is thus (a[i],b[i]) so to speak.

Short solution using min() function(to get a limit size of compared sequences):
for i in range(min([len(a), len(b)])):
if (a[i][a[i].index('*|*') + 3:] == b[i]):
count += 1
print(count)
The output:
1

does the match have to be qualified as following '*|*' ?
if not then really simple is:
sum([1 for e, f in zip(a, b) if f in e])
or in later versions of python where iterator args are automatically unpacked:
sum(f in e for e, f in zip(a, b)) # relies on bools True, False = ints 1, 0
if the match is just the last bit you could split
'.ps2\n >|<4 *|*.ps2xml'.split(" *|*")
Out[13]: ['.ps2\n >|<4', '.ps2xml']
'.ps2\n >|<4 *|*.ps2xml'.split(" *|*")[1]
Out[14]: '.ps2xml'
sum([1 for e, f in zip(a, b) if f in e.split(" *|*")[1]])
and while sum() is more "intentional" len() could be used for a speed advantage since it doesn't have to iterate over the list

Related

Finding the count of how many elements of list A appear before than in the similar but mixed list B

A=[2,3,4,1] B=[1,2,3,4]
I need to find how many elements of list A appear before than the same element of list B. In this case values 2,3,4 and the expected return would be 3.
def count(a, b):
muuttuja = 0
for i in range(0, len(a)-1):
if a[i] != b[i] and a[i] not in b[:i]:
muuttuja += 1
return muuttuja
I have tried this kind of solution but it is very slow to process lists that have great number of values. I would appreciate some suggestions for alternative methods of doing the same thing but more efficiently. Thank you!
If both the lists have unique elements you can make a map of element (as key) and index (as value). This can be achieved using dictionary in python. Since, dictionary uses only O(1) time for lookup. This code will give a time complexity of O(n)
A=[2,3,4,1]
B=[1,2,3,4]
d = {}
count = 0
for i,ele in enumerate(A) :
d[ele] = i
for i,ele in enumerate(B) :
if i > d[ele] :
count+=1
Use a set of already seen B-values.
def count(A, B):
result = 0
seen = set()
for a, b in zip(A, B):
seen.add(b)
if a not in seen:
result += 1
return result
This only works if the values in your lists are immutable.
Your method is slow because it has a time complexity of O(N²): checking if an element exists in a list of length N is O(N), and you do this N times. We can do better by using up some more memory instead of time.
First, iterate over b and create a dictionary mapping the values to the first index that value occurs at:
b_map = {}
for index, value in enumerate(b):
if value not in b_map:
b_map[value] = index
b_map is now {1: 0, 2: 1, 3: 2, 4: 3}
Next, iterate over a, counting how many elements have an index less than that element's value in the dictionary we just created:
result = 0
for index, value in enumerate(a):
if index < b_map.get(value, -1):
result += 1
Which gives the expected result of 3.
b_map.get(value, -1) is used to protect against the situation when a value in a doesn't occur in b, and you don't want to count it towards the total: .get returns the default value of -1, which is guaranteed to be less than any index. If you do want to count it, you can replace the -1 with len(a).
The second snippet can be replaced by a single call to sum:
result = sum(index < b_map.get(value, -1)
for index, value in enumerate(a))
You can make a prefix-count of A, which is an array where for each index you keep track of the number of occurrences of each element before the index.
You can use this to efficiently look-up the prefix-counts when looping over B:
import collections
A=[2,3,4,1]
B=[1,2,3,4]
prefix_count = [collections.defaultdict(int) for _ in range(len(A))]
prefix_count[0][A[0]] += 1
for i, n in enumerate(A[1:], start=1):
prefix_count[i] = collections.defaultdict(int, prefix_count[i-1])
prefix_count[i][n] += 1
prefix_count_b = sum(prefix_count[i][n] for i, n in enumerate(B))
print(prefix_count_b)
This outputs 3.
This still could be O(NN) because of the copy from the previous index when initializing the prefix_count array, if someone knows a better way to do this, please let me know*

How to find out if a number is in a list of ranges?

Okay so say I have a list of ranges like
a = [[167772352, 167772415], [167772160, 167772223], [167772288, 167772351], [167772224, 167772255]]
and then I have a number like
b = 167772241
Now I know that b is within the 4th item of the list but how would I check that b is within that in a optimal way? I've thought of using a for loop going through each number of the list and then inserting when the loop breaks but I feel like there has to be some python library function that could handle this? Any suggestion would be welcome!
Simply iterate over the list, take both values and create a range from those values and check if b in range(...), also use enumerate, start it from 1 and you will get in which consecutive range in the list the number is.
a = [[167772352, 167772415], [167772160, 167772223], [167772288, 167772351], [167772224, 167772255]]
b = 167772241
for index, (start, end) in enumerate(a, start=1):
if b in range(start, end + 1):
print(index)
break
You can also use a list comprehension:
a = [[167772352, 167772415], [167772160, 167772223], [167772288, 167772351], [167772224, 167772255]]
b = 167772241
index = [b in range(start, end + 1) for start, end in a].index(True) + 1
print(index)
Also note the end + 1 used in both ranges, that is because the range doesn't include its end value so adding 1 means that the range is inclusive on both sides. Also both methods will get the index that starts from one, which is how you would usually count (1, 2, 3, 4, ...) as you have stated in your question that b should be in the fourth range (which means that you started counting from 1)
You could use map in the following way:
a = [[167772352, 167772415], [167772160, 167772223], [167772288, 167772351], [167772224, 167772255]]
b = 167772241
c = list(map(lambda a_: b in range(a_[0], a_[1] + 1), a))
The output will be a list of booleans that will indicate whether b is contained in each of a's ranges:
out: [False, False, False, True]
map takes two arguments. The first is a function (or a lambda), that it will then apply to each element of the list that you pass as a second parameter. map returns an special object, but you can easily convert it into a list by using list().
You could write a regular function that will check if b is in range, but using a lambda allows you to write the expression in one line. It takes one argument, a_, which will be populated with each element of the list.

How to go through all combinations of 3 lists?

I have three lists and want to call a function which takes 3 arguments with all possible combinations of values of that 3 lists.
And if a condition is met, print the 3 values of the combination.
What is the fastest and best way to do that?
Here are my three lists:
a = np.linspace(0.01,constants.pi/2,50)
b = np.arange(20,62,2)
c = np.arange(0.3,1.5,0.1)
And I want to call a function let's say testAllCombination(a[i],b[j],c[k]) in each iteration, and if a the value returned is > 0, print the 3 values a[i], b[j] and c[k]. Is it possible to do this in a simple way?
It seems you need the Cartesian product of your lists.
import itertools
list(itertools.product(a,b,c))
Note that this operation results in 50*21*12=12600 triples of items from a,b,c.
if the position is fixed to (a,b,c), you may consider simple loop. Otherwise if you need to change to other combinations like (b,c,a), (c,b,a)... use itertools
a = np.linspace(0.01,3.14/2,50)
b = np.arange(20,62,2)
c = np.arange(0.3,1.5,0.1)
myCombination=[]
for i in a:
for j in b:
for k in c:
myCombination.append((i,j,k))
print(myCombination)
for item in myCombination:
testCondition(item)
https://docs.python.org/2/library/itertools.html

How to check if sum of 3 integers in list matches another integer? (python)

Here's my issue:
I have a large integer (anywhere between 0 and 2^32-1). Let's call this number X.
I also have a list of integers, unsorted currently. They are all unique numbers, greater than 0 and less than X. Assume that there is a large amount of items in this list, let's say over 100,000 items.
I need to find up to 3 numbers in this list (let's call them A, B and C) that add up to X.
A, B and C all need to be inside of the list, and they can be repeated (for example, if X is 4, I can have A=1, B=1 and C=2 even though 1 would only appear once in the list).
There can be multiple solutions for A, B and C but I just need to find one possible solution for each the quickest way possible.
I've tried creating a for loop structure like this:
For A in itemlist:
For B in itemlist:
For C in itemlist:
if A + B + C == X:
exit("Done")
But since my list of integers contains over 100,000 items, this uses too much memory and would take far too long.
Is there any way to find a solution for A, B and C without using an insane amount of memory or taking an insane amount of time? Thanks in advance.
you can reduce the running time from n^3 to n^2 by using set something like that
s = set(itemlist)
for A in itemlist:
for B in itemlist:
if X-(A+B) in s:
print A,B,X-(A+B)
break
you can also sort the list and use binary search if you want to save memory
import itertools
nums = collections.Counter(itemlist)
target = t # the target sum
for i in range(len(itemlist)):
if itemlist[i] > target: continue
for j in range(i+1, len(itemlist)):
if itemlist[i]+itemlist[j] > target: continue
if target - (itemlist[i]+itemlist[j]) in nums - collections.Counter([itemlist[i], itemlist[j]]):
print("Found", itemlist[i], itemlist[j], target - (itemlist[i]+itemlist[j]))
Borrowing from #inspectorG4dget's code, this has two modifications:
If C < B then we can short-circuit the loop.
Use bisect_left() instead of collections.Counter().
This seems to run more quickly.
from random import randint
from bisect import bisect_left
X = randint(0, 2**32 - 1)
itemset = set(randint(0,X) for _ in range(100000))
itemlist = sorted(list(itemset)) # sort the list for binary search
l = len(itemlist)
for i,A in enumerate(itemlist):
for j in range(i+1, l): # use numbers above A
B = itemlist[j]
C = X - A - B # calculate C
if C <= B: continue
# see https://docs.python.org/2/library/bisect.html#searching-sorted-lists
i = bisect_left(itemlist, C)
if i != l and itemlist[i] == C:
print("Found", A, B, C)
To reduce the number of comparisons, we enforce A < B < C.

Removing common elements in two lists

I have two sorted lists of positive integers which can have repeated elements and I must remove matching pairs of numbers, one from each list:
a=[1,2,2,2,3]
b=[2,3,4,5,5]
should become:
a=[1,2,2]
b=[4,5,5]
That is, the 2's and the 3's have been removed because they appear in both lists.
Set intersection can't be used here because of the repeated elements.
How do I go about this?
To remove elements appearing in both lists, use the following:
for i in a[:]:
if i in b:
a.remove(i)
b.remove(i)
To create a function which does it for you, simply do:
def removeCommonElements(a, b):
for e in a[:]:
if e in b:
a.remove(e)
b.remove(e)
Or to return new lists and not to edit the old ones:
def getWithoutCommonElements(a, b): # Name subject to change
a2 = a.copy()
b2 = b.copy()
for e in a:
if e not in b:
a2.remove(e)
b2.remove(e)
return a2, b2
However the former could be replaced with removeCommonElements like so:
a2, b2 = a.copy(), b.copy()
removeCommonElements(a2, b2)
Which would keep a and b, but create a duplicates without common elements.
The Counter object from collections can do this quite concisely:
from collections import Counter
a=Counter([1,2,2,2,3])
b=Counter([2,3,4,5,5])
print list((a-b).elements())
print list((b-a).elements())
The idea is:
Count up how often each element appears (e.g. 2 appears 3 times in a, and 1 time in b)
Subtract the counts to work out how many extra times the element appears (e.g. 2 appears 3-1=2 times more in a than b)
Output each element the extra number of times it appears (the collections elements method automatically drops any elements with counts less than 1)
(Warning: the output lists won't necessarily be sorted)
Given that the lists are sorted, you can merge/distribute element-wise, like for example:
x, y = [], []
while a and b:
if a[0] < b[0]:
x.append(a.pop(0))
elif a[0] > b[0]:
y.append(b.pop(0))
else: # a[0]==b[0]
a.pop(0)
b.pop(0)
x += a
y += b
The solution given by #Mahi is nearly correct. The simplest way to achieve what you want is this:
def remove_common_elements(a, b):
for i in a[:]:
if i in b:
a.remove(i)
b.remove(i)
return a, b
The important thing here is to make a copy of a by writing a[:]. If you iterate through a list while removing elements from it, you won't get correct results.
If you don't want to modify the lists in place, make a copy of both lists beforehand and return the copied lists.
def remove_common_elements(a, b):
a_new = a[:]
b_new = b[:]
for i in a:
if i in b_new:
a_new.remove(i)
b_new.remove(i)
return a_new, b_new
One solution would be to create a new copy of a and removing common elements from b.
a = [1,2,2,2,3]
b = [2,2,3,4,5]
a_new = []
for ai in a:
if ai in b:
b.remove(ai)
else:
a_new.append(ai)
print a_new
print b

Categories

Resources