list and elements comparison python - python

list1 = ['A','B']
list2 = ['a','c']
list3 = ['x','y','z']
list4 = [['A','b','c'],['a','x'],['Y','Z'],['d','g']]
I want to check if all elements of list (list1, list2, list3) is contained in any of list in another bigger list (list4).
I want the comparison to be case insensitive.
To be sure, here list1 and list2 is in list4 but not list3. How can I do it?
On the other note, How would I know if a list is collection of list.
In other words, how can I distinguish if list is a collection of list of just list of elements, if I am not the one who is defining the lists.

First item - you want to do case-insensitive matching. The best way to do that is to convert everything to one case (upper or lower). So for each list, run
list1 = map(lambda x: x.lower(), list1)
That will convert your lists to lowercase. Let's assume you've done that.
Second, for a comparison of two "simple" lists (not-nested), you can simply say
if set(list1) < set(list2):
to compare if list1 is a subset of list2. In your example, it would be false.
Finally, if you want to check if a list is nested:
if ( type(list4[0]) == list) :
which in this case, would be true. Then, just iterate over the elements of list4 and do the set comparison above.

You can use lower() to make all elements of all lists to lowercase to achieve case-insensitivity.
def make_case_insensitive(lst):
return [i.lower() for i in lst]
For example,
list1=make_case_insensitive(list1)
As, biggerlist is slightly different (contains list as element), you have to change the function slightly.
def make_bigger_list_caseinsensitive(bigger_list):
return [[i.lower() for i in element] for element in bigger_list]
list4=make_bigger_list_caseinsensitive(list4)
Check if any element of the biggerlist is the superset of smaller set. Print Is in bigger list if condition satisfied, print not in biggger list otherwise. Make set from the list first.
print "Is in bigger list" if any(set(element).issuperset(set(list1)) for element in list4) else "not in biggger list"
To write it with slightly more readability, do:
if any(set(element).issuperset(set(list1)) for element in list4):
print "Is in bigger list"
else:
print "not in biggger list"
Finally,to check if nested list exists in biggerlist:
print any(type(element)==list for element in list4)

Using set is a good way.
list1 = ['A','B']
list2 = ['a','c']
list3 = ['x','y','z']
list4 = [['A','b','c'],['a','x'],['Y','Z'],['d','g']]
set1 = set(map(lambda s: s.lower(), list1))
set2 = set(map(lambda s: s.lower(), list2))
set3 = set(map(lambda s: s.lower(), list3))
set4 = map(lambda l: set(map(lambda s: s.lower(), l)), list4)
print(set1) # set(['a', 'b'])
print(set2) # set(['a', 'c'])
print(set3) # set(['y', 'x', 'z'])
print(set4) # [set(['a', 'c', 'b']), set(['a', 'x']), set(['y', 'z']), set(['d', 'g'])]
lor = lambda x, y: x or y
reduce(lor, map(lambda s: set1.issubset(s), set4)) # True
reduce(lor, map(lambda s: set2.issubset(s), set4)) # True
reduce(lor, map(lambda s: set3.issubset(s), set4)) # False
To do a case-insensitive string comparison, covert both strings to lowercase or uppercase.
To test all elements in list1 are contained in list4, use set.issubset.

Related

Manipulate a list based on another list

I have a list of English Words(list2) and I want to remove the words from the list that contain the alphabets/letters in (list1)
For this example:
list1 = ['A','B']
list2 = ['AARON', 'ABAFT', 'ABASE', 'ABASK', 'ABAVE', 'ABBAS', 'ABBIE', 'ABDAL', 'ABEAM', 'ABELE', 'ABIDE', 'ABIES', 'ABKAR', 'ABLOW', 'ABNER', 'ABODE', 'ABOHM']
I want to write a loop to remove all elements as they contain either A or B.
The result should be an empty list.
list1 = ['A','B']
list2 = ['AARON', 'ABAFT', 'ABASE', 'ABASK', 'ABAVE', 'ABBAS', 'ABBIE', 'ABDAL', 'ABEAM', 'ABELE', 'ABIDE', 'ABIES', 'ABKAR', 'ABLOW', 'ABNER', 'ABODE', 'ABOHM']
for x in list1:
for y in list2:
if x in y:
list2.remove(y)
print(list2)
I was expecting an empty list but the result was:
['ABASK', 'ABDAL', 'ABIES', 'ABODE']
As commented by tripleee, changing a list while iterating over it can cause trouble. I'd suggest using a list comprehension and set to check for intersecting characters:
# Or list1 = {'A','B'}
list1 = set(list1)
# returns empty list
[w for w in list2 if not list1.intersection(w)]
You can also use a regex match if the list1 is not complex. An example approach can be like below:
import re
matcher = re.compile("|".join(list1))
list2 = [s for s in list2 if not matcher.search(s)]
Using a compositions of built-in functions, doc.
filter return a generator so should be casted to list.
list1 = ['A','B']
list2 = ['ARON', 'ABAFT', 'ABASE', 'ABASK', 'ABAVE', 'ABBAS', 'ABBIE', 'ABDAL', 'ABEAM', 'ABELE', 'ABIDE', 'ABIES', 'ABKAR', 'ABLOW', 'ABNER', 'ABODE', 'ABOHM']
a = filter(lambda s: not any(map(s.__contains__, list1)), list2)
print(list(a))

Python - filter list from another other list with condition

list1 = ['/mnt/1m/a_pre.geojson','/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
I have multiple lists and I want to find all the elements of list1 which do not have entry in list2 with a filtering condition.
The condition is it should match 'm' like 1m,2m.. and name of geojson file excluding 'pre or post' substring.
For in e.g. list1 '/mnt/1m/a_pre.geojson' is processed but '/mnt/2m/b_pre.geojson' is not so the output should have a list ['/mnt/2m/b_pre.geojson']
I am using 2 for loops and then splitting the string which I am sure is not the only one and there might be easier way to do this.
for i in list1:
for j in list2:
pre_tile = i.split("/")[-1].split('_pre', 1)[0]
post_tile = j.split("/")[-1].split('_post', 1)[0]
if pre_tile == post_tile:
...
I believe you have similar first part of the file paths. If so, you can try this:
list1 = ['/mnt/1m/a_pre.geojson','/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
res = [x for x in list1 if x[:7] not in [y[:7] for y in list2]]
res:
['/mnt/2m/b_pre.geojson']
If I understand you correctly, using a regular expression to do this kind of string manipulation can be fast and easy.
Additionally, to do multiple member-tests in list2, it's more efficient to convert the list to a set.
import re
list1 = ['/mnt/1m/a_pre.geojson', '/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
pattern = re.compile(r'(.*?/[0-9]m/.*?)_pre.geojson')
set2 = set(list2)
result = [
m.string
for m in map(pattern.fullmatch, list1)
if m and f"{m[1]}_post.geojson" not in set2
]
print(result)

Comparing 2 lists and printing the differences

I am trying to compare 2 different lists and find the differences between them. Say for example I have list 1 which consists of cat,dog,whale,hamster and list 2 which consists of dog,whale,hamster. How would I compare these two and then assign a variable to the difference which in this case is cat. Order does not matter however if there is more than one difference each of these differences should be assigned to an individual variable.
In my actual code im comparing html which consists of thousands of lines so I would prefer something as fast as possible but any is appreciated :)
str1 = 'cat,dog,whale,hamster'
str2 = 'dog,whale,hamster'
Change strings into python sets:
set1 = set(str1.split(','))
set2 = set(str2.split(','))
Get the difference:
result = set1 - set2
Which prints:
{'cat'}
You can convert it to a list or a string:
result_as_list = list(result)
result_as_string = ','.join(result)
If your lists can contain duplicates or if you need to know the elements that are only in one of the two lists, you can use Counter (from the collections module):
list1 = ['cat','dog','whale','hamster','dog']
list2 = ['dog','whale','hamster','cow','horse']
from collections import Counter
c1,c2 = Counter(list1),Counter(list2)
differences = [*((c1-c2)+(c2-c1)).elements()]
print(differences) # ['cat', 'dog', 'cow', 'horse']
This is how you are gonna do it. The function defined here will print the difference between the two lists
def Diff(list1, list2):
li_dif = [i for i in list1 + list2 if i not in list1 or i not in list2]
return li_dif
# Driver Code
list1 = ['cat','dog','whale','hamster']
list2 = ['dog','whale','hamster']
diff = Diff(list1, list2)
print(diff)
output:
['cat']
here cat is generated by the variable diff
Now if there is more than one difference, as follows:
def Diff(list1, list2):
li_dif = [i for i in list1 + list2 if i not in list1 or i not in list2]
return li_dif
# Driver Code
list1 = ['cat','dog','whale','hamster','ostrich','yak','sheep','lion','tiger']
list2 = ['dog','whale','hamster']
diff = Diff(list1, list2)
print(diff)
the output will be:
['cat','ostrich','yak','sheep','lion','tiger']
Your question is that if there is more than one difference, each of these differences should be assigned to an individual variable.
for that, we will treat the printed item as a list, let's name it list3
diff==list3
here, list3=['cat','ostrich','yak','sheep','lion','tiger']
Here, is only 6 list items, we can assign a variable to each of them as follows:
v1=list3[0]
v2=list3[1]
v3=list3[2]
v4=list3[3]
v5=list3[4]
v6=list3[5]

Given a list of string, determine if one string is a prefix of another string

I want to write a Python function which checks if one string is a prefix string of another; not an arbitrary sub string of another; must be prefix. If it is, return True. For instance,
list = ['abc', 'abcd', 'xyx', 'mno']
Return True because 'abc' is a prefix of 'abcd'.
list = ['abc', 'xyzabc', 'mno']
Return False
I tried the startwith() and list comprehension, but it didn't quite work.
Appreciate for any help or pointers.
Let us first sort the given lst w.r.t length of the string, due to the known fact that sub strings always have length less than or equal to the original string, so after sorting we have strings with smaller length at the start of the list, and then we iterate over the sorted list comparing the current element with all the elements next to it, This small optimization would reduce the complexity of the problem as now we don't have to comapre each element with every other element.
lst1 = ['abc', 'abcd', 'xyx', 'mno']
lst2 = ['abc', 'xyzabc', 'mno']
lst3 = ["abc", "abc"]
def check_list(lst):
lst = list(set(lst)) #if you want to avoid redundant strings.
lst.sort(key = lambda x:len(x))
n = len(lst)
for i in xrange(n):
for j in xrange(i+1, n):
if lst[j].startswith(lst[i]):
return True
return False
print check_list(lst1)
print check_list(lst2)
print check_list(lst3)
>>> True
>>> False
>>> False #incase you use lst = list(set(lst))
Using itertools
import itertools
list1 = ["abc", "xyz", "abc123"]
products = itertools.product(list1, list1)
is_substringy = any(x.startswith(y) for x, y in products if x != y)
This isn't very optimised, but depending on the amount of data you've got to deal with, the code is fairly elegant (and short); that might trump speed in your use case.
This assumes that you don't have pure repeats in the list however (but you don't have that in your example).
import itertools
mlist = ['abc', 'abcd', 'xyx', 'mno']
#combination of list elements, 2-by-2. without repetition
In [638]: for i,j in itertools.combinations(mlist,2):
print (i,j)
.....:
('abc', 'abcd')
('abc', 'xyx')
('abc', 'mno')
('abcd', 'xyx')
('abcd', 'mno')
('xyx', 'mno')
#r holds the final result. if there is any pair where one is a prefixed of another
r=False
In [639]: for i,j in itertools.combinations(mlist,2):
r = r or i.startswith(j) # if i is the prefix of j. logical or
r = r or j.startswith(i) # if j is the prefix of i
.....:
In [640]: r
Out[640]: True

How do i add two lists' elements into one list?

For example, I have a list like this:
list1 = ['good', 'bad', 'tall', 'big']
list2 = ['boy', 'girl', 'guy', 'man']
and I want to make a list like this:
list3 = ['goodboy', 'badgirl', 'tallguy', 'bigman']
I tried something like these:
list3=[]
list3 = list1 + list2
but this would only contain the value of list1
So I used for :
list3 = []
for a in list1:
for b in list2:
c = a + b
list3.append(c)
but it would result in too many lists(in this case, 4*4 = 16 of them)
You can use list comprehensions with zip:
list3 = [a + b for a, b in zip(list1, list2)]
zip produces a list of tuples by combining elements from iterables you give it. So in your case, it will return pairs of elements from list1 and list2, up to whichever is exhausted first.
A solution using a loop that you try is one way, this is more beginner friendly than Xions solution.
list3 = []
for index, item in enumerate(list1):
list3.append(list1[index] + list2[index])
This will also work for a shorter solution. Using map() and lambda, I prefer this over zip, but thats up to everyone
list3 = map(lambda x, y: str(x) + str(y), list1, list2);
for this or any two list of same size you may also use like this:
for i in range(len(list1)):
list3[i]=list1[i]+list2[i]
Using zip
list3 = []
for l1,l2 in zip(list1,list2):
list3.append(l1+l2)
list3 = ['goodboy', 'badgirl', 'tallguy', 'bigman']

Categories

Resources