Comparing two Lists using Zip function in Python - python

I am comparing two lists together and if they match I want to increment a counter.
Right now the counter is saying 0 each time I print it out even though there should be some matches. Both lists have data within them as well because I can print them out. Below is the code that I am using to find a match in the lists and increment if they do match. What could be going wrong?
numCorrect = sum(1 for a, b in zip(trueLabels, predLabels) if a == b)
Any advice helps, Thanks

Your code works well:
trueLabels = [1, 2, 3, 4, 5]
predLabels = [1, 2, 4, 4, 5]
numCorrect = sum(1 for a, b in zip(trueLabels, predLabels) if a == b)
print(numCorrect)
# 4
You may have shifted indices in your list(s).

Related

Create combinations from the list without considering adjacent elements

I want to generate combinations from the list without considering the adjacent elements.
I have tried a code which provides combinations without considering adjacent elements, and it works with unique elements in the list.
But it does not work with repeat elements in the list Eg. [4,5,4,3]
Code:
import itertools
b = []
stuff = [4,5,4,3]
for L in range(2, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
a =list(subset)
for i in range(1,len(a)):
if stuff.index(a[i-1]) == stuff.index(a[i])-1:
a.clear()
break
else:
b.append(a)
print('b = ',b)
Expected result = [[4,4],[4,3],[5,3]]
Actual result = [[4, 4], [4, 3], [5, 4], [5, 3], [4, 3], [4, 4, 3], [4, 4, 3], [5, 4, 3], [5, 4, 3]]
I can explain with example: Suppose list is [1,2,3,4,5], then the possible non adjacent combinations are [[1,3],[1,4],[1,5],[2,4],[2,5],[3,5],[1,3,5]]. I want these combinations. The code which I am trying works well with unique set but when there is repetition of numbers in the given list such as [1,3,2,3,2,5], then while taking index it always take first 3 and not other one. So how to get the combinations from this set
Instead of generating all the itertools.combinations and then filtering out the valid ones with index, which (a) is very inefficient and (b) does not work with duplicate elements, you should implement your own combinations algorithm, which is not too hard at all, and might look somewhat like this:
def comb(lst, num):
if num == 0:
yield []
if 0 < num <= len(lst):
first, *rest = lst
for c in comb(rest, num-1):
yield [first] + c
for c in comb(rest, num):
yield c
To add the "no adjacent elements" constraint, simply keep track of whether you took the last element, and only add the next element if this is not the case:
def comb_no_adj(lst, num, last=False):
if num == 0:
yield []
if 0 < num <= len(lst):
first, *rest = lst
if not last:
for c in comb_no_adj(rest, num-1, True):
yield [first] + c
for c in comb_no_adj(rest, num, False):
yield c
Example combinations for comb_no_adj([1,2,3,4,5,6], 3) are [1, 3, 5], [1, 3, 6], [1, 4, 6], [2, 4, 6] (This example does not contain duplicates, simply for the sake of being easier to understand; since this algo does not use index, duplicate elements are not an issue.)
Update: In fact, first generating all the combinations and then filtering invalid ones can not work. Consider this example: [1,1,1]. All combinations with two elements would be [1,1], [1,1], [1,1] (the first and second, first and third, and second and third 1). How would you decide which of those to keep and which to discard? And it gets worse for [1,1,1,1]. (You could generate all combinations of element-index-pairs and then filter those, though, but due to the large number of combinations that will be filtered out anyway this would still be less efficient.)

Python arranging a list to include duplicates

I have a list in Python that is similar to:
x = [1,2,2,3,3,3,4,4]
Is there a way using pandas or some other list comprehension to make the list appear like this, similar to a queue system:
x = [1,2,3,4,2,3,4,3]
It is possible, by using cumcount
s=pd.Series(x)
s.index=s.groupby(s).cumcount()
s.sort_index()
Out[11]:
0 1
0 2
0 3
0 4
1 2
1 3
1 4
2 3
dtype: int64
If you split your list into one separate list for each value (groupby), you can then use the itertools recipe roundrobin to get this behavior:
x = ([1, 2, 2, 3, 3, 3, 4, 4])
roundrobin(*(g for _, g in groupby(x)))
If I'm understanding you correctly, you want to retain all duplicates, but then have the list arranged in an order where you create what are in essence separate lists of unique values, but they're all concatenated into a single list, in order.
I don't think this is possible in a listcomp, and nothing's occurring to me for getting it done easily/quickly in pandas.
But the straightforward algorithm is:
Create a different list for each set of unique values: For i in x: if x not in list1, add to list 1; else if not in list2, add to list2; else if not in list3, ad to list3; and so on. There's certainly a way to do this with recursion, if it's an unpredictable number of lists.
Evaluate the lists based on their values, to determine the order in which you want to have them listed in the final list. It's unclear from your post exactly what order you want them to be in. Querying by the value in the 0th position could be one way. Evaluating the entire lists as >= each other is another way.
Once you have that set of lists and their orders, it's straightforward to concatenate them in order, in the final list.
essentially what you want is pattern, this pattern is nothing but the order in which we found unique numbers while traversing the list x for eg: if x = [4,3,1,3,5] then pattern = 4 3 1 5 and this will now help us in filling x again such that output will be [4,3,1,5,3]
from collections import defaultdict
x = [1,2,2,3,3,3,4,4]
counts_dict = defaultdict(int)
for p in x:
counts_dict[p]+=1
i =0
while i < len(x):
for p,cnt in counts_dict.items():
if i < len(x):
if cnt > 0:
x[i] = p
counts_dict[p]-=1
i+=1
else:
continue
else:
# we have placed all the 'p'
break
print(x) # [1, 2, 3, 4, 2, 3, 4, 3]
note: python 3.6+ dict respects insertion order and I am assuming that you are using python3.6+ .
This is what I thought of doing at first but It fails in some cases..
'''
x = [3,7,7,7,4]
i = 1
while i < len(x):
if x[i] == x[i-1]:
x.append(x.pop(i))
i = max(1,i-1)
else:
i+=1
print(x) # [1, 2, 3, 4, 2, 3, 4, 3]
# x = [2,2,3,3,3,4,4]
# output [2, 3, 4, 2, 3, 4, 3]
# x = [3,7,1,7,4]
# output [3, 7, 1, 7, 4]
# x = [3,7,7,7,4]
# output time_out
'''

Iterating portions of lists - PYTHON

Bit of a generic noob question. I am heavily dealing with long lists of integer/float values and wasting a lot of time.
my_list = [1,2,3,4,5,6,7,8,9,10.....] etc.
Say I want to pass a portion of that list to a function. It could be the first 3 elements, then the following 3 etc....it could also be in groups of 4,5,6....it might even be required that I take different a different amount of elements each time.
def myfunc(x,y,z):
do something
return something
What is the most efficient way to iterate by a specified number of values, as efficiency is always appreciated and these simple iterations are the places where I can gain something.
len_ml = len(my_list)
for i in range(0, len_ml, 3):
chunk = my_list[i:min(len_ml, i+3)]
This is a way. I am not sure that it is the best, though.
With a list you can get just the items you want with list[start:end].
So to skip the first list[1:] or last list[:-1] or first 3 and last 3 list[3:-3]
if you don't know how may items are in the list to start you can do a len(list) to get number. So if you had X items and wanted 3 groups :
numberofgroups = len(list) / 3
To do for loop over only certain ones:
start_index=1
end_index=-1
for item in my_list[start_index:end_index]
print item
>>>my_list = [1,2,3,4,5,6,7,8,9,10]
>>>group_len = 3 #you can change the length as per your requirement i.e 2,3,4,5,6...
>>>for i in range(0,len(my_list)):
if i*group_len < len(my_list):
my_list[i*group_len:(i+1)*group_len]
else:
break;
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]
Result for group_len = 5
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]

getting a first even from a nested list of number from python

I need help figuring out this code. This is my first programming class and we have a exam next week and I am trying to do the old exams.
There is one class with nested list that I am having trouble understanding. It basically says to convert (list of [list of ints]) -> int.
Basically given a list of list which ever has a even number in this case 0 is even return that index and if there are no even numbers we return -1.
Also we are given three examples
>>> first_even([[9, 1, 3], [2, 5, 7], [9, 9, 7, 2]])
1
>>> first_even([[1, 3, 5], [7, 9], [1, 0]])
2
>>> first_even([[1, 3, 5]])
-1
We are using python 3 in our class and I kind of have a idea in where to begin but I know its wrong. but ill give it a try
def first_even(L1):
count = 0
for i in range(L1):
if L1[i] % 2 = 0:
count += L1
return count
I thought this was it but it didn't work out.
If you guys could please help me out with hints or solution to this it would be helpful to me.
If I understand correctly and you want to return the index of the first list that contains at least one even number:
In [1]: def first_even(nl):
...: for i, l in enumerate(nl):
...: if not all(x%2 for x in l):
...: return i
...: return -1
...:
In [2]: first_even([[9, 1, 3], [2, 5, 7], [9, 9, 7, 2]])
Out[2]: 1
In [3]: first_even([[1, 3, 5], [7, 9], [1, 0]])
Out[3]: 2
In [4]: first_even([[1, 3, 5]])
Out[4]: -1
enumerate is a convenient built-in function that gives you both the index and the item if an iterable, and so you don't need to mess with the ugly range(len(L1)) and indexing.
all is another built-in. If all remainders are non-zero (and thus evaluate to True) then the list doesn't contain any even numbers.
There are some minor problems with your code:
L1[i] % 2 = 0 is using the wrong operator. = is for assigning variables a value, while == is used for equality.
You probably meant range(len(L1)), as range expects an integer.
Lastly, you're adding the whole list to the count, when you only wanted to add the index. This could be achieved with .index(), but this doesn't work for duplicates in the list. You can use enumerate, as I'm about to show below.
If you're ever working with indexes, enumerate() is your function:
def first_even(L):
for x, y in enumerate(L):
if any(z % 2 == 0 for z in y): # If any of the numbers in the subsists are even
return x # Return the index. Function breaks
return -1 # No even numbers found. Return -1
So here's what I came up with.
def first_even(L1):
for aList in range(len(L1)):
for anItem in range(len(L1[aList])):
if L1[aList][anItem] % 2 == 0:
return aList
return -1
First a fix. You need to use == for "equal to", '=' is for assigning variables.
L1[i] % 2 == 0
And for the code, here's the idea in some more pseudocodey style:
Iterate through the list of lists (L1):
Iterate through the list's (aList) items (anItem):
if List[current list][current item] is even:
Return the current list's index
Return -1 at this point, because if the code gets this far, an even number isn't here.
Hope it helps, if you need any further explanation then I'll be happy to.
def first_even(L1):
return ''.join('o' if all(n%2 for n in sl) else 'e' for sl in L1).find('e')

How can I verify if one list is a subset of another?

I need to verify if a list is a subset of another - a boolean return is all I seek.
Is testing equality on the smaller list after an intersection the fastest way to do this? Performance is of utmost importance given the number of datasets that need to be compared.
Adding further facts based on discussions:
Will either of the lists be the same for many tests? It does as one of them is a static lookup table.
Does it need to be a list? It does not - the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on.
What would be the optimal solution given the scenario?
>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>>
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
Use set.issubset
Example:
a = {1,2}
b = {1,2,3}
a.issubset(b) # True
a = {1,2,4}
b = {1,2,3}
a.issubset(b) # False
The performant function Python provides for this is set.issubset. It does have a few restrictions that make it unclear if it's the answer to your question, however.
A list may contain items multiple times and has a specific order. A set does not. Additionally, sets only work on hashable objects.
Are you asking about subset or subsequence (which means you'll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?
Your other post intersect a dict and list made the types clearer and did get a recommendation to use dictionary key views for their set-like functionality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
all(x in two for x in one)
Explanation: Generator creating booleans by looping through list one checking if that item is in list two. all() returns True if every item is truthy, else False.
There is also an advantage that all return False on the first instance of a missing element rather than having to process every item.
Assuming the items are hashable
>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False
If you don't care about duplicate items eg. [1, 2, 2] and [1, 2] then just use:
>>> set([1, 2, 2]).issubset([1, 2])
True
Is testing equality on the smaller list after an intersection the fastest way to do this?
.issubset will be the fastest way to do it. Checking the length before testing issubset will not improve speed because you still have O(N + M) items to iterate through and check.
One more solution would be to use a intersection.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one).intersection(set(two)) == set(one)
The intersection of the sets would contain of set one
(OR)
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one) & (set(two)) == set(one)
Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.
For example:
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)
has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.
Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.
#!/usr/bin/env python
from collections import Counter
def containedInFirst(a, b):
a_count = Counter(a)
b_count = Counter(b)
for key in b_count:
if a_count.has_key(key) == False:
return False
if b_count[key] > a_count[key]:
return False
return True
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
Then running this you get:
$ python contained.py
b in a: False
b in a: True
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(x in two for x in one) == set([True])
If list1 is in list 2:
(x in two for x in one) generates a list of True.
when we do a set(x in two for x in one) has only one element (True).
Pardon me if I am late to the party. ;)
To check if one set A is subset of set B, Python has A.issubset(B) and A <= B. It works on set only and works great BUT the complexity of internal implementation is unknown. Reference: https://docs.python.org/2/library/sets.html#set-objects
I came up with an algorithm to check if list A is a subset of list B with following remarks.
To reduce complexity of finding subset, I find it appropriate to
sort both lists first before comparing elements to qualify for
subset.
It helped me to break the loop when value of element of second list B[j] is greater than value of element of first list A[i].
last_index_j is used to start loop over list B where it last left off. It helps avoid starting comparisons from the start of
list B (which is, as you might guess unnecessary, to start list B from index 0 in subsequent iterations.)
Complexity will be O(n ln n) each for sorting both lists and O(n) for checking for subset.
O(n ln n) + O(n ln n) + O(n) = O(n ln n).
Code has lots of print statements to see what's going on at each iteration of the loop. These are meant for understanding
only.
Check if one list is subset of another list
is_subset = True;
A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];
print(A, B);
# skip checking if list A has elements more than list B
if len(A) > len(B):
is_subset = False;
else:
# complexity of sorting using quicksort or merge sort: O(n ln n)
# use best sorting algorithm available to minimize complexity
A.sort();
B.sort();
print(A, B);
# complexity: O(n^2)
# for a in A:
# if a not in B:
# is_subset = False;
# break;
# complexity: O(n)
is_found = False;
last_index_j = 0;
for i in range(len(A)):
for j in range(last_index_j, len(B)):
is_found = False;
print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");
if B[j] <= A[i]:
if A[i] == B[j]:
is_found = True;
last_index_j = j;
else:
is_found = False;
break;
if is_found:
print("Found: " + str(A[i]));
last_index_j = last_index_j + 1;
break;
else:
print("Not found: " + str(A[i]));
if is_found == False:
is_subset = False;
break;
print("subset") if is_subset else print("not subset");
Output
[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset
Below code checks whether a given set is a "proper subset" of another set
def is_proper_subset(set, superset):
return all(x in superset for x in set) and len(set)<len(superset)
In python 3.5 you can do a [*set()][index] to get the element. It is much slower solution than other methods.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
result = set(x in two for x in one)
[*result][0] == True
or just with len and set
len(set(a+b)) == len(set(a))
Here is how I know if one list is a subset of another one, the sequence matters to me in my case.
def is_subset(list_long,list_short):
short_length = len(list_short)
subset_list = []
for i in range(len(list_long)-short_length+1):
subset_list.append(list_long[i:i+short_length])
if list_short in subset_list:
return True
else: return False
Most of the solutions consider that the lists do not have duplicates. In case your lists do have duplicates you can try this:
def isSubList(subList,mlist):
uniqueElements=set(subList)
for e in uniqueElements:
if subList.count(e) > mlist.count(e):
return False
# It is sublist
return True
It ensures the sublist never has different elements than list or a greater amount of a common element.
lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]
print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False
Since no one has considered comparing two strings, here's my proposal.
You may of course want to check if the pipe ("|") is not part of either lists and maybe chose automatically another char, but you got the idea.
Using an empty string as separator is not a solution since the numbers can have several digits ([12,3] != [1,23])
def issublist(l1,l2):
return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])
If you are asking if one list is "contained" in another list then:
>>>if listA in listB: return True
If you are asking if each element in listA has an equal number of matching elements in listB try:
all(True if listA.count(item) <= listB.count(item) else False for item in listA)

Categories

Resources