python regular expression through each member of a list - python

From these two lists:
list_A = ["eyes", "clothes", "body" "etc"]
list_B = ["xxxx_eyes", "xxx_zzz", "xxxxx_bbbb_zzzz_clothes" ]
I want to populate a third list wit those objects from 2nd list, only if some part of his names matchs one of the names from the first list.
In the previous example, the third list has to be:
["xxxx_eyes", "xxxxx_bbbb_zzzz_clothes"]

If you want to use a list comprehension, this will work:
list_C = [word for word in list_B if any(test in word for test in list_A)]

If you want to use regexs for this:
search = re.compile("|".join(map(re.escape, list_A))).search
result = filter(search, list_B)
Although Blender's answer might be enough in most cases.

In [1]: list_A = ["eyes", "clothes", "body" "etc"]
In [2]: list_B = ["xxxx_eyes", "xxx_zzz", "xxxxx_bbbb_zzzz_clothes" ]
In [7]: [x for x in list_B if any(y in list_A for y in x.split('_'))]
Out[7]: ['xxxx_eyes', 'xxxxx_bbbb_zzzz_clothes']

Slowest but simplest would be:
list_A = ["eyes", "clothes", "body" "etc"]
list_B = ["xxxx_eyes", "xxx_zzz", "xxxxx_bbbb_zzzz_clothes" ]
list_C=[]
for _ in list_A:
for __ in list_B:
if _ in __:
list_C.append(__)

Related

Python - filter list from another other list with condition

list1 = ['/mnt/1m/a_pre.geojson','/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
I have multiple lists and I want to find all the elements of list1 which do not have entry in list2 with a filtering condition.
The condition is it should match 'm' like 1m,2m.. and name of geojson file excluding 'pre or post' substring.
For in e.g. list1 '/mnt/1m/a_pre.geojson' is processed but '/mnt/2m/b_pre.geojson' is not so the output should have a list ['/mnt/2m/b_pre.geojson']
I am using 2 for loops and then splitting the string which I am sure is not the only one and there might be easier way to do this.
for i in list1:
for j in list2:
pre_tile = i.split("/")[-1].split('_pre', 1)[0]
post_tile = j.split("/")[-1].split('_post', 1)[0]
if pre_tile == post_tile:
...
I believe you have similar first part of the file paths. If so, you can try this:
list1 = ['/mnt/1m/a_pre.geojson','/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
res = [x for x in list1 if x[:7] not in [y[:7] for y in list2]]
res:
['/mnt/2m/b_pre.geojson']
If I understand you correctly, using a regular expression to do this kind of string manipulation can be fast and easy.
Additionally, to do multiple member-tests in list2, it's more efficient to convert the list to a set.
import re
list1 = ['/mnt/1m/a_pre.geojson', '/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
pattern = re.compile(r'(.*?/[0-9]m/.*?)_pre.geojson')
set2 = set(list2)
result = [
m.string
for m in map(pattern.fullmatch, list1)
if m and f"{m[1]}_post.geojson" not in set2
]
print(result)

remove element from python list based on match from another list

I have list of s3 objects like this:
list1 = ['uid=123/2020/06/01/625e2ghvh.parquet','uid=876/2020/04/01/hgdshct7.parquet','uid=0987/2019/03/01/323dc.parquet']
list2 = ['123','876']
result_list = ['uid=0987/2019/03/01/323dc.parquet']
With out using any loop is there any efficient way to achieve this considering large no of elements in list1?
You could build a set from list2 for a faster lookup and use a list comprehension to check for membership using the substring of interest:
list1 = ['uid=123/2020/06/01/625e2ghvh.parquet','uid=876/2020/04/01/hgdshct7.parquet',
'uid=0987/2019/03/01/323dc.parquet']
list2 = ['123','876']
set2 = set(list2)
[i for i in list1 if i.lstrip('uid=').split('/',1)[0] not in set2]
# ['uid=0987/2019/03/01/323dc.parquet']
The substring is obtained through:
s = 'uid=123/2020/06/01/625e2ghvh.parquet'
s.lstrip('uid=').split('/',1)[0]
# '123'
This does the job. For different patterns though, or to also cover slight variations, you could go for a regex. For this example you'd need something like:
import re
[i for i in list1 if re.search(r'^uid=(\d+).*?', i).group(1) not in set2]
# ['uid=0987/2019/03/01/323dc.parquet']
This is one way to do it without loops
def filter_function(item):
uid = int(item[4:].split('/')[0])
if uid not in list2:
return True
return False
list1 = ['uid=123/2020/06/01/625e2ghvh.parquet','uid=876/2020/04/01/hgdshct7.parquet','uid=0987/2019/03/01/323dc.parquet']
list2 = [123, 876]
result_list = list(filter(filter_function, list1))
How about this one:
_list2 = [f'uid={number}' for number in list2]
result = [item for item in list1 if not any([item.startswith(i) for i in _list2])] # ['uid=0987/2019/03/01/323dc.parquet']

How to filter a list by using another list?

I have these two lists:
my_targets = ["aa1","bb2"]
my_list = ["aa1_rtc","aa1fp","aar1","bb","bb2_11"]
How can I select only those entries from my_list that contain any of my_targets. Please notice that aa1_rtc and aa1fp contain aa1, while aar1 should be filtered out.
final_list = [i for i in my_list if i in my_targets]
len(final_list)
Expected result:
final_list =
["aa1_rtc","aa1fp","bb2_11"]
You can use a list comprehension with any for this:
[i for i in my_list if any(j in i for j in my_targets)]
# ['aa1_rtc', 'aa1fp', 'bb2_11']

How to remove item in list in list with list comprehension

I have a large list like this:
mylist = [['pears','apples','40'],['grapes','trees','90','bears']]
I'm trying to remove all numbers within the lists of this list. So I made a list of numbers as strings from 1 to 100:
def integers(a, b):
return list(range(a, b+1))
numb = integers(1,100)
numbs = []
for i in range(len(numb)):
numbs.append(str(numb[i])) # strings
numbs = ['1','2',....'100']
How can I iterate through lists in mylist and remove the numbers in numbs? Can I use list comprehension in this case?
If number is always in the end in sublist
mylist = [ x[:-1] for x in mylist ]
mylist = [[item for item in sublist if item not in numbs] for sublist in mylist] should do the trick.
However, this isn't quite what you've asked. Nothing was actually removed from mylist, we've just built an entirely new list and reassigned it to mylist. Same logical result, though.
If numbers are always at the end and only once, you can remove the last item like:
my_new_list = [x[:-1] for x in mylist]
If there is more (of if they are not ordered), you have to loop thru each elements, in that case you can use:
my_new_list = [[elem for elem in x if elem not in integer_list] for x in mylist]
I would also recommend to generate the list of interger as follow :
integer_list = list(map(str, range(1, 100)))
I hope it helps :)
Instead of enumerating all the integers you want to filter out you can use the isdigit to test each string to see if it really is only numbers:
mylist = [['pears','apples','40'],['grapes','trees','90','bears']]
mylist2 = [[x for x in aList if not x.isdigit()] for aList in mylist]
print mylist2
[['pears', 'apples'], ['grapes', 'trees', 'bears']]
If you have the following list:
mylist = [['pears','apples','40'],['grapes','trees','90','bears']]
numbs = [str(i) for i in range(1, 100)]
Using list comprehension to remove element in numbs
[[l for l in ls if l not in numbs] for ls in mylist]
This is a more general way to remove digit elements in a list
[[l for l in ls if not l.isdigit()] for ls in mylist]

List Comprehensions and Conditions?

I am trying to see if I can make this code better using list comprehensions.
Lets say that I have the following lists:
a_list = [
'HELLO',
'FOO',
'FO1BAR',
'ROOBAR',
'SHOEBAR'
]
regex_list = [lambda x: re.search(r'FOO', x, re.IGNORECASE),
lambda x: re.search(r'RO', x, re.IGNORECASE)]
I basically want to add all the elements that do not have any matches in the regex_list into another list.
E.g. ==>
newlist = []
for each in a_list:
for regex in regex_list:
if(regex(each) == None):
newlist.append(each)
How can I do this using list comprehensions? Is it even possible?
Sure, I think this should do it
newlist = [s for s in a_list if not any(r(s) for r in regex_list)]
EDIT: on closer inspection, I notice that your example code actually adds to the new list each string in a_list that doesn't match all the regexes - and what's more, it adds each string once for each regex that it doesn't match. My list comprehension does what I think you meant, which is add only one copy of each string that doesn't match any of the regexes.
I'd work your code down to this:
a_list = [
'HELLO',
'FOO',
'FO1BAR',
'ROOBAR',
'SHOEBAR'
]
regex_func = lambda x: not re.search(r'(FOO|RO)', x, re.IGNORECASE)
Then you have two options:
Filter
newlist = filter(regex_func, a_list)
List comprehensions
newlist = [x for x in a_list if regex_func(x)]

Categories

Resources