String matching in lists - Python - python

I have a list like below - from this i have filter the tables that begin with 'test:SF.AcuraUsage_' (string matching)
test:SF.AcuraUsage_20150311
test:SF.AcuraUsage_20150312
test:SF.AcuraUsage_20150313
test:SF.AcuraUsage_20150314
test:SF.AcuraUsage_20150315
test:SF.AcuraUsage_20150316
test:SF.AcuraUsage_20150317
test:SF.ClientUsage_20150318
test:SF.ClientUsage_20150319
test:SF.ClientUsage_20150320
test:SF.ClientUsage_20150321
I am using this for loop but not sure why it does not work:
for x in list:
if(x 'test:SF.AcuraUsage_'):
print x
I tried this out:
for x in list:
alllist = x
vehiclelist = [x for x in alllist if x.startswith('geotab-bigdata-test:StoreForward.VehicleInfo')]
Still i get the error ' dictionary object has no attribute startswith'.

You shouldn't name your list list, since it overrides the built-in type list
But, if you'd like to filter that list using Python, consider using this list comprehension:
acura = [x for x in list if x.startswith('test:SF.AcuraUsage')]
then, if you'd like to output it
for x in acura:
print(x)

List comprehensions are good for that.
Get a list with all the items that begin with 'test:SF.AcuraUsage_' :
new_list = [x for x in list if x.startswith('test:SF.AcuraUsage_' ')]
Or the items that do not begin with 'test:SF.AcuraUsage_' :
new_list = [x for x in list if not x.startswith('test:SF.AcuraUsage_' )]

using re module:
import re
for x in list:
ret = re.match('test:SF.AcuraUsage_(.*)',x)
if ret:
print(re.group())

Related

Remove file name duplicates in a list

I have a list l:
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
In this list, I need to remove duplicates without considering the extension. The expected output is below.
l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
I tried:
l = list(set(x.split('.')[0] for x in l))
But getting only unique filenames without extension
How could I achieve it?
You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:
>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:
>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.
oldList = l
setKeys = set()
l = []
for item in oldList:
itemKey = item.split(".")[0]
if itemKey in setKeys:
pass
else:
setKeys.add(itemKey)
l.append(item)
Try this
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
name = x.split('.')[0]
find = 0
for index,d in enumerate(l, start=0):
txt = d.split('.')[0]
if name == txt:
find += 1
if find > 1:
l.pop(index)
print(l)
#Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer.
But I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this:
list({x[:x.rfind('.')]: x for x in l}.values())

Python - filter list from another other list with condition

list1 = ['/mnt/1m/a_pre.geojson','/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
I have multiple lists and I want to find all the elements of list1 which do not have entry in list2 with a filtering condition.
The condition is it should match 'm' like 1m,2m.. and name of geojson file excluding 'pre or post' substring.
For in e.g. list1 '/mnt/1m/a_pre.geojson' is processed but '/mnt/2m/b_pre.geojson' is not so the output should have a list ['/mnt/2m/b_pre.geojson']
I am using 2 for loops and then splitting the string which I am sure is not the only one and there might be easier way to do this.
for i in list1:
for j in list2:
pre_tile = i.split("/")[-1].split('_pre', 1)[0]
post_tile = j.split("/")[-1].split('_post', 1)[0]
if pre_tile == post_tile:
...
I believe you have similar first part of the file paths. If so, you can try this:
list1 = ['/mnt/1m/a_pre.geojson','/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
res = [x for x in list1 if x[:7] not in [y[:7] for y in list2]]
res:
['/mnt/2m/b_pre.geojson']
If I understand you correctly, using a regular expression to do this kind of string manipulation can be fast and easy.
Additionally, to do multiple member-tests in list2, it's more efficient to convert the list to a set.
import re
list1 = ['/mnt/1m/a_pre.geojson', '/mnt/2m/b_pre.geojson']
list2 = ['/mnt/1m/a_post.geojson']
pattern = re.compile(r'(.*?/[0-9]m/.*?)_pre.geojson')
set2 = set(list2)
result = [
m.string
for m in map(pattern.fullmatch, list1)
if m and f"{m[1]}_post.geojson" not in set2
]
print(result)

how to filter a value from list in python?

I have list of values , need to filter out values , that doesn't follow a naming convention.
like below list : list = ['a1-23','b1-24','c1-25','c1-x-25']
need to filter : all values that starts with 'c1-' , except 'c1-x-' )
output expected: ['a1-23','b1-24','c1-x-25']
list = ['a1-23','b1-24','c1-25','c1-x-25']
[x for x in list if not x.startswith('c1-')]
['a1-23', 'b1-24']
You have the right idea, but you're missing the handing of values that start with c1-x-:
[x for x in list if not x.startswith('c1-') or x.startswith('c1-x-')]
import re
list1 = ['a1-23','b1-24','c1-25','c1-x-25',"c1-22"]
r = re.compile(r"\bc1-\b\d{2}$") # this regex matches anything with `c1-{2 digits}` exactly
[x for x in list1 if x not in list(filter(r.match,list1))]
# output
['a1-23', 'b1-24', 'c1-x-25']
So what my pattern does is match EXACTLY a word that starts with c1- and ends with two digits only.
Therefore, list(filter(r.match,list1)) will give us all the c1-## and then we do a list comprehension to filter out from list1 all the x's that aren't in the new provided list containing the matches.
x for x in [1,2,3] if x not in [1,2]
#output
[3]

combine two strings in python

I have a list1 like this,
list1 = [('my', '1.2.3', 2),('name', '9.8.7', 3)]
I want to get a new list2 like this (joining first element with second element's second entry);
list2 = [('my2', 2),('name8', 3)]
As a first step, I am checking to join the first two elements in the tuple as follow,
for i,j,k in list1:
#print(i,j,k)
x = j.split('.')[1]
y = str(i).join(x)
print(y)
but I get this
2
8
I was expecting this;
my2
name8
what I am doing wrong? Is there any good way to do this? a simple way..
try
y = str(i) + str(x)
it should works.
The str(i).join(x), means that you see x as an iterable of strings (a string is an iterable of strings), and you are going to construct a string by adding i in between the elements of x.
You probably want to print('{}{}'.format(i+x)) however:
for i,j,k in list1:
x = j.split('.')[1]
print('{}{}'.format(i+x))
Try this:
for x in list1:
print(x[0] + x[1][2])
or
for x in list1:
print(x[0] + x[1].split('.')[1])
output
# my2
# name8
You should be able to achieve this via f strings and list comprehension, though it'll be pretty rigid.
list_1 = [('my', '1.2.3', 2),('name', '9.8.7', 3)]
# for item in list_1
# create tuple of (item[0], item[1].split('.')[1], item[2])
# append to a new list
list_2 = [(f"{item[0]}{item[1].split('.')[1]}", f"{item[2]}") for item in list_1]
print(list_2)
List comprehensions (and dict comprehensions) are some of my favorite things about python3
https://www.pythonforbeginners.com/basics/list-comprehensions-in-python
https://www.digitalocean.com/community/tutorials/understanding-list-comprehensions-in-python-3
Going with the author's theme,
list1 = [('my', '1.2.3', 2),('name', '9.8.7', 3)]
for i,j,k in list1:
extracted = j.split(".")
y = i+extracted[1] # specified the index here instead
print(y)
my2
name8
[Program finished]

python evaluate bool within a list comprehension

Can I evaluate a bool within a list comprehension?
I would like to create a list that does not contain items that end with '.zip':
outlist = [x for x in os.listdir(path) if x *DOES NOT* end with '.zip']
I've used list comprehension for the exact opposite:
outlist2 = [x for x in os.listdir(path) if x.endswith('.zip')]
here is the outout of my list
os.listdir(path):
[
'sample1.zip', 'sample2.zip', 'sample3.zip', 'sample4.zip',
'sample1.txt', 'sample2.pdf', 'sample3.csv', 'sample4.xlsx'
]
Just add not:
outlist2 = [x for x in os.listdir(path) if not x.endswith('.zip')]
The if expression can be any valid Python expression, including one that uses boolean operators.
Demo:
>>> sample = ['sample1.zip', 'sample2.zip', 'sample3.zip', 'sample4.zip',
... 'sample1.txt', 'sample2.pdf', 'sample3.csv', 'sample4.xlsx']
>>> [x for x in sample if not x.endswith('.zip')]
['sample1.txt', 'sample2.pdf', 'sample3.csv', 'sample4.xlsx']
Sure you can - just slap a not operator in there:
outlist2 = [x for x in os.listdir(path) if not x.endswith('.zip')]

Categories

Resources