How to check if list1 contains some elements of list2? - python

I need to remove few strings from the list1 so i put them into list2 and just cant find out how to make it works.
list1 = ['abc', 'def', '123']
list2 = ['def', 'xyz', 'abc'] # stuff to delete from list1
And I would like to remove 'abc' and 'def' from list1 so it only contains things that i need

You can do this by using list comprehension as a filter, like this
set2, list1 = set(['def', 'xyz', 'abc']), ['abc', 'def', '123']
print [item for item in list1 if item not in set2]
# ['123']
We convert the elements of list2 to a set, because they offer faster lookups.
The logic is similar to writing like this
result = []
for item in list1:
if item not in set2:
result.append(item)

If you don't have any duplicate in list1 (or if you want to remove duplicates), you can use this:
list1 = set(['abc', 'def', '123'])
list2 = set(['def', 'xyz', 'abc'])
print(list(list1 - list2))

list1 = set(['abc', 'def', '123'])
list2 = set(['def', 'xyz', 'abc'])
# here result will contain only the intersected element
# so its very less.
result = set(filter(set(list1).__contains__, list2))
newlist = list()
for elm in list1:
if elm not in result:
newlist.append(elm)
print newlist
Output:
['123']

An even shorter answer using builtin set methods:
list1 = ['abc', 'def', '123']
list2 = ['def', 'xyz', 'abc']
set1 = set(list1)
set2 = set(list2)
print(set1.difference(set2)))
Quote from above documentation:
"Return a new set with elements in the set that are not in the others."

Related

Add list items of one list to another list of list at same index

How can I achieve the below result? Both the list have same index size.
list_1 = [ 'arn1', 'arn2' ]
list_2 =[
['abc', '123'],
['pqr' , '789']
]
expected_output = [
['abc', '123', 'arn1'],
['pqr' , '789', 'arn2']
]
When trying to combine two lists item by item, zip is something you should always start with.
zip(list_1, list_2)
In this case, what you want is:
[ys + [x] for x, ys in zip(list_1, list_2)]
Which gives:
[['abc', '123', 'arn1'], ['pqr', '789', 'arn2']]
You can just use enumerate to loop to the first list, get the index then append to the second list.
list_1 = [ 'arn1', 'arn2' ]
list_2 =[
['abc', '123'],
['pqr' , '789']
]
for i, item in enumerate(list_1):
list_2[i].append(item)
print(list_2)
A slightly longer solution, but simpler:
list_1 = ['arn1', 'arn2']
list_2 = [['abc', '123'], ['pqr', '789']]
expected_output = [['abc', '123', 'arn1'], ['pqr', '789', 'arn2']]
output = []
for i in range(0, len(list_1)): # iterates
added_list = list_2[i] + [list_1[i]]
output.append(added_list)
print(output == expected_output)
# True
Or a list comprehension, if you want one:
output_list_comprehension = [list_2[i] + [list_1[i]] for i in range(0, len(list_1))]
#returns same answer

How to make a dictionary from two nested list?

I have two nested lists:
list1 = [['s0'], ['s1'], ['s2']]
list2 = [['hello','world','the'],['as','per','the'],['assets','order']]
and I want to make a dictionary from these lists with keys from list1 and values from list2:
d = {s0:['hello','world','the'],s1:['as','per','the'],s2:['assets','order']}
The output should look like this:
d = {s0:['hello','world','the'],s1:['as','per','the'],s2:['assets','order']}
The following code works if list1 is a normal (non-nested) list. But it doesn't work when list1 is a nested list.
dict(zip(list1, list2))
The problem here is that lists are not hashable, so one thing you can do is to flatten your list with itertools.chain and then build the dictionary with strings (which are immutable) as keys following you're current approach (read here for a more detailed explanation on this topic):
from itertools import chain
dict(zip(chain.from_iterable(list1),list2))
{'s0': ['hello', 'world', 'the'],
's1': ['as', 'per', 'the'],
's2': ['assets', 'order']}
If you want to do it manually (to understand algorithm for exemple), here is a way to do so:
list1 = [['s0'], ['s1'], ['s2']]
list2 = [['hello','world','the'],['as','per','the'],['assets','order']]
if len(list1) != len(list2):
exit(-1)
res = {}
for index, content in enumerate(list1):
res[content[0]] = list2[index]
print(res)
Another answer could be :
list1 = [['s0'], ['s1'], ['s2']]
list2 = [['hello','world','the'],['as','per','the'],['assets','order']]
output_dict = {element1[0]: element2 for element1, element2 in zip(list1, list2)}
An similar way of this dict-comprehension :
output_dict = {element1: element2 for [element1], element2 in zip(list1, list2)}
Output :
{'s0': ['hello', 'world', 'the'],
's1': ['as', 'per', 'the'],
's2': ['assets', 'order']}
It's a strange way to store matching information in the first place, but I would combine them like this:
list1 = [['s0'], ['s1'], ['s2']]
list2 = [['hello','world','the'],['as','per','the'],['assets','order']]
assert(len(list1) == len(list2))
output_dict = dict()
for index in range(len(list1)):
output_dict[list1[index][0] = list2[index]
result:
{'s0': ['hello', 'world', 'the'], 's1': ['as', 'per', 'the'], 's2': ['assets', 'order']}
I am assuming that the variables s0, s1 and s2 are meant to be strings like in the first list.

delete the second item that starts with the same substring

I have a list l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
I want to delete the elements that start with the same sub-string if they exist (in this case 'abcd' and 'ghi').
N.B: in my situation, I know that the 'repeated' elements, if they exist, can be only 'abcd' or 'ghi'.
To delete them, I used this:
>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
>>> l.remove('ghi') if ('ghijklm' in l and 'ghi' in l) else l
>>> l
>>> ['abcdef', 'ghijklm', 'xyz', 'pqrs']
Is there a more efficient (or more automated) way to do this?
You can do it in linear time and O(n*m²) memory (where m is the length of your elements):
prefixes = {}
for word in l:
for x in range(len(word) - 1):
prefixes[word[:x]] = True
result = [word for word in l if word not in prefixes]
Iterate over each word and create a dictionary of the first character of each word, then the first two characters, then three, all the way up to all the characters of the word except the last one. Then iterate over the list again and if a word appears in that dictionary it's a shorter subset of some other word in the list
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for a in l[:]:
for b in l[:]:
if a.startswith(b) and a != b:
l.remove(b)
print(l)
Output
['abcdef', 'ghijklm', 'xyz', 'pqrs']
The following code does what you described.
your_list = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
print("Original list: %s" % your_list)
helper_list = []
for element in your_list:
for element2 in your_list:
if element.startswith(element2) and element != element2:
print("%s starts with %s" % (element, element2))
print("Remove: %s" % element)
your_list.remove(element)
print("Removed list: %s" % your_list)
Output:
Original list: ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
abcdef starts with abcd
Remove: abcdef
ghijklm starts with ghi
Remove: ghijklm
Removed list: ['abcd', 'ghi', 'xyz', 'pqrs']
On the other hand, I think there is more simple solution and you can solve it with list comprehension if you want.
#Andrew Allen's way
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
i=0
l = sorted(l)
while True:
try:
if l[i] in l[i+1]:
l.remove(l[i])
continue
i += 1
except:
break
print(l)
#['abcdef', 'ghijklm', 'pqrs', 'xyz']
Try this it will work
l =['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for i in l:
for j in l:
if len(i)>len(j) and j in i:
l.remove(j)
You can use
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
if "abcdef" in l: # only 1 check for containment instead of 2
l = [x for x in l if x != "abcd"] # to remove _all_ abcd
# or
l = l.remove("abcd") # if you know there is only one abcd in it
This might be slightly faster (if you have far more elements then you show) because you only need to check once for "abcdef" - and then once untile the first/all of list for replacement.
>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
checks l twice for its full size to check containment (if unlucky) and then still needs to remove something from it
DISCLAIMER:
If this is NOT proven, measured bottleneck or security critical etc. I would not bother to do it unless I have measurements that suggests this is the biggest timesaver/optimization of all code overall ... with lists up to some dozends/hundreds (tummy feeling - your data does not support any analysis) the estimated gain from it is negligable.

How to return all items from a list that dont start with a digit?

I have a list that consists of both words and digits. Lets say:
list1 = ['1','100', 'Stack', 'over','flow']
From this list I would like to filter all the digits and keep the words. I have imported re and found the re code for it, namely:
[^0-9]
However, I am not sure how to implement this so that I get a list like below.
result = ['Stack', 'over', 'flow']
No need to regex, use isdigit() :
list1 = ['1','100', 'Stack', 'over','flow']
print([i for i in list1 if not i.isdigit()])
returns :
['Stack', 'over', 'flow']
use list-comprehension and string method isdigit:
[elem for elem in list1 if not elem.isdigit()]
You can do this quite nicely with list comprehension:
list1 = ['1','100', 'Stack', 'over','flow']
list2 = [i for i in list1 if not i.isdigit()]
If, for whatever reason, you did want to use regex to do this (maybe you have more complex filtering criteria), you could do it using something like this:
import re
list1 = ['1','100', 'Stack', 'over','flow']
list2 = [i for i in list1 if re.fullmatch('[^0-9]+', i)]
Using filter + lambda:
list(filter(lambda x: not x.isdigit(), list1))
# ['Stack', 'over', 'flow']
Like other answers suggested, you don't really need Regexes, but they can be more flexible if your requirements change in the future. For example.
from re import match
list1 = ['1','100', 'Stack', 'over','flow']
result = list(filter(lambda el: match(r'^[^0-9]*$', el), list1))
^: start of the string
[...]: character group
^: negates the character group
0-9: digits 0-9 (you could use \d as well)
*: zero or more times
$: end of the string
If you want all elements that don't start with a number, use ^[^0-9].* where . is any character.
I don't know exact pattern of your list element but this code should work for given example
import re
pattern = re.compile("([A-Za-z])")
list1 = ['1','100', 'Stack', 'over','flow']
result = []
for x in list1:
check = pattern.match(x)
if check is not None:
result.append(x)
print (result)
#python 3
olist = list(filter(lambda s: s.isalpha() , list1))
<br>print(olist) # ['Stack', 'over', 'flow']
#python2
olist = filter(lambda s:s.isalpha(), list1)
<br>print olist # ['Stack', 'over', 'flow']

List comprehension: Multiply each string to a single list

I have a list of strings and want to get a new list consisting on each element a number of times.
lst = ['abc', '123']
n = 3
I can do that with a for loop:
res = []
for i in lst:
res = res + [i]*n
print( res )
['abc', 'abc', 'abc', '123', '123', '123']
How do I do it with list comprehension?
My best try so far:
[ [i]*n for i in ['abc', '123'] ]
[['abc', 'abc', 'abc'], ['123', '123', '123']]
Use a nested list comprehension
>>> lst = ['abc', '123']
>>> n = 3
>>> [i for i in lst for j in range(n)]
['abc', 'abc', 'abc', '123', '123', '123']
The idea behind this is, you loop through the list twice and you print each of the element thrice.
See What does "list comprehension" mean? How does it work and how can I use it?
It can also be done as:
>>> lst = ['abc', '123']
>>> n=3
>>> [j for i in lst for j in (i,)*n]
['abc', 'abc', 'abc', '123', '123', '123']

Categories

Resources