def cut(path):
test = str(foundfiles)
newList = [s for s in test if test.endswith('.UnitTests.vbproj')]
for m in newList:
print m
return newList
This function parses through foundliles which is a list of files in a folder that I have already parsed through of about 20+ files. I need to parse through that list of every file thta ends in ".UnitTests.vbproj" However, I can't get it working. Any advice would be greatly appreciated!
Edit1: This is what I made my code now, and I get the atrribute error message box saying that 'tuple' object has no attribute 'endswith'
def cut(path):
test = foundfiles
newList = [s for s in foundfiles if s.endswith('.UnitTests.vbproj')]
for m in newList:
print m
return newList
You turned the list into a string. Looping over test gives you individual characters instead:
>>> foundfiles = ['foo', 'bar']
>>> for c in str(foundfiles):
... print c
...
[
'
f
o
o
'
,
'
b
a
r
'
]
There is no need to turn foundfiles into a string. You also need to test the elements of the list, not test:
newList = [s for s in foundfiles if s.endswith('.UnitTests.vbproj')]
I really don't know what's the type of your 'foundfiles'.
Maybe this way will help you:
def cut(path):
import os
newlist = []
for parent,dirnames,filenames in os.walk(path):
for FileName in filenames:
fileName = os.path.join(parent,FileName)
if fileName.endswith('.UnitTests.vbproj'):newlist.append(fileName)
return newlist
Related
I have a list l:
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
In this list, I need to remove duplicates without considering the extension. The expected output is below.
l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
I tried:
l = list(set(x.split('.')[0] for x in l))
But getting only unique filenames without extension
How could I achieve it?
You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:
>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:
>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.
oldList = l
setKeys = set()
l = []
for item in oldList:
itemKey = item.split(".")[0]
if itemKey in setKeys:
pass
else:
setKeys.add(itemKey)
l.append(item)
Try this
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
name = x.split('.')[0]
find = 0
for index,d in enumerate(l, start=0):
txt = d.split('.')[0]
if name == txt:
find += 1
if find > 1:
l.pop(index)
print(l)
#Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer.
But I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this:
list({x[:x.rfind('.')]: x for x in l}.values())
I got a list of strings. Those strings have all the two markers in. I would love to extract the string between those two markers for each string in that list.
example:
markers 'XXX' and 'YYY' --> therefore i want to extract 78665786 and 6866
['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
You can just loop over your list and grab the substring. You can do something like:
import re
my_list = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
output = []
for item in my_list:
output.append(re.search('XXX(.*)YYY', item).group(1))
print(output)
Output:
['78665786', '6866']
import re
l = ['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
l = [re.search(r'XXX(.*)YYY', i).group(1) for i in l]
This should work
Another solution would be:
import re
test_string=['XXX78665786YYYjajk','XXX78665783336YYYjajk']
int_val=[int(re.search(r'\d+', x).group()) for x in test_string]
the command split() splits a String into different parts.
list1 = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
list2 = []
for i in list1:
d = i.split("XXX")
for g in d:
d = g.split("YYY")
list2.append(d)
print(list2)
it's saved into a list
I need to remove the end off a filename below:
Testfile_20190226114536.CSV.986466.1551204043175
So anything after CSV needs to be removed so i have a file named:
Testfile_20190226114536.CSV
Suppose file_name = "Testfile_20190226114536.CSV.986466.1551204043175"
file_name = file_name.split('.CSV')[0] + '.CSV'
As simple as this:
s = 'Testfile_20190226114536.CSV.986466.1551204043175'
suffix = '.CSV'
s[:s.rindex(suffix) + len(suffix)]
=> 'Testfile_20190226114536.CSV'
You can use re.sub:
import re
result = re.sub('(?<=\.CSV)[\w\W]+', '', 'Testfile_20190226114536.CSV.986466.1551204043175')
Output:
'Testfile_20190226114536.CSV'
The easy way is this
All of your files have this "CSV" in the middle?
You can try split and join your name like this:
name = "Testfile_20190226114536.CSV.986466.1551204043175"
print ".".join(name.split(".")[0:2])
Here's the steps to see what's going on
>>> filename = 'Testfile_20190226114536.CSV.986466.1551204043175'
# split the string into a list at '.'
>>> l = filename.split('.')
>>> print(l)
['Testfile_20190226114536', 'CSV', '986466', '1551204043175']
# index the list to get all the elements before and including 'CSV'
>>> filtered_list = l[0:l.index('CSV')+1]
>>> print(filtered_list)
['Testfile_20190226114536', 'CSV']
# join together the elements of the list with '.'
>>> out_string = '.'.join(filtered_list)
>>> print(out_string)
Testfile_20190226114536.CSV
Here's a full function:
def filter_filename(filename):
l = filename.split('.')
filtered_list = l[0:l.index('CSV')+1]
out_string = '.'.join(filtered_list)
return out_string
>>> filter_filename('Testfile_20190226114536.CSV.986466.1551204043175')
'Testfile_20190226114536.CSV'
I had to read all text files in a directory via python script, but first I had to remove the first 3 letters from every file to make index list.
the files names which contains data in a directory are are as follow.
zzz143
zzz146
zzz150
.
.
.
zzz250
I had to remove zzz from all files and make index list of all those files in a directory to read data from those files.
I know how to deal with files e.g
zzz.160.dat
for these kinds of files I use following code to remove the prefix and suffix.
def get_list(path, path_of_module_files ):
prefix, suffix = path_of_module_files.split("<index>")
d = {}
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for item in onlyfiles:
if item.endswith(suffix) and item.startswith(prefix):
text = item
text = text[(find_str(text, prefix)+len(prefix)):]
text = text[:find_str(text, suffix)]
d[int(text)] = "/".join([path, item])
index_list = collections.OrderedDict(sorted(d.items(), key=lambda t: t[0]))
return index_list
this code deals with suffix and prefix, but now in my case they are a kind of prefix only.
and in my case now it is not split by . or - and it is just zzz143. I had to get files names by removing zzz and list should be like this
143
146
150
.
.
.
250
instead of
zzz143
zzz144
zzz145
.
.
.
.
zzz250
If someone give me an idea or example how to do get all files names through looping to extract all files in that directory. i will really thankful
To remove the first 3 characters of each item you can use list slicing like below:
my_list = ['zzz143', 'zzz146', 'zzz150']
new_list = [item[3:] for item in my_list]
Output:
>>> new_list
['143', '146', '150']
If you need to extract numbers from filenames for indexing, then how matter the filename was, you can do it with:
>>> import re
>>> s = '250.zzz'
>>> s1 = 'zzz123'
>>> s2 = 'abc.444.zzz'
>>>
>>> re.search(r'\d+', s).group(0)
'250'
>>>
>>> re.search(r'\d+', s1).group(0)
'123'
>>>
>>> re.search(r'\d+', s2).group(0)
'444
EDIT, this will work for all cases of filenames you mentioned:
def get_list(path, path_of_module_files):
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
d = {}
for fil in onlyfiles:
seq = re.search(r'\d+', fil)
if seq:
d[seq.group(0)] = os.path.abspath(fil)
EDIT2: You can also do it with map function:
>>> onlyfiles
['250.zzz', 'zzz123', 'abc.444.zzz']
>>>
>>> list(map(lambda s: re.search(r'\d+', s).group(0), onlyfiles))
['250', '123', '444']
But again, if all you have is filenames with this format: 'zzz123.ext', then you don't need to overload your process with re.search, better use built-in method for faster process, like so:
>>> onlyfiles = ['zzz123', 'zzz456', 'zzz789']
>>>
>>> list(map(lambda s: s[3:], onlyfiles))
['123', '456', '789']
>>>
>>> list(map(lambda s: s.strip('zzz'), onlyfiles))
['123', '456', '789']
This method will automatically loop through all elements of you list with the need to explicitly writing a for loop.
EDIT3: using OrderedDict:
Either Simple for loop:
>>> from collections import OrderedDict
>>>
>>> index_dict = OrderedDict()
>>>
>>> for fil in onlyfiles:
k = int(fil.strip('zzz'))
index_dict[k] = fil
>>> index_dict
OrderedDict([(123, 'zzz123'), (456, 'zzz456'), (789, 'zzz789')])
Or with zip and map as one liner expression:
>>> OrderedDict(zip(map(lambda s: int(s.strip('zzz')), onlyfiles), onlyfiles))
OrderedDict([(123, 'zzz123'), (456, 'zzz456'), (789, 'zzz789')])
If you are sure that the prefix is 'zzz' you could just replace it by '', like so :
def get_list(path, path_of_module_files):
filepath = os.path.join(path, path_of_module_files)
d = {}
if os.path.isfile(filepath):
suffix = device_name_format.split(".")[0].replace('zzz', '')
d[suffix] = os.path.abspath(filepath)
index_list = collections.OrderedDict(sorted(d.items(), key=lambda t: t[0]))
You can either use slice notation, if the three letters are different each time:
your_string = "ABC123"
your_string[3:]
>>> '123'
Or string.lstrip if the prefix is the same every time.
your_string = "zzz123"
your_string.lstrip("zzz")
>>>> '123'
I have tried several methods but none worked to translate it to Python, specially because I have this error:
'str' object does not support item assignment
R can do the same with the following code:
f<-0
text<- c("foo", "btextr", "cool", "monsttex")
for (i in 1:length(text)){
f[i]<-paste(text[i],text[i+1], sep = "_")
}
f
The output is:
"foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
I would appreciate so much if you can help me to do the same for Python. Thanks.
In R your output would have been (next time please put this in the question):
> f
[1] "foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
In Python strings are immutable. So you'll need to create new strings, e.g.:
new_strings = []
text = ['foo', 'btextr', 'cool', 'monsttex']
for i,t in enumerate(text):
try:
new_strings.append(text[i] + '_' + text[i+1])
except IndexError:
new_strings.append(text[i] + '_NA')
Which results in:
>>> new_strings
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']
this works:
>>> from itertools import zip_longest
>>>
>>> f = ['foo', 'btextr', 'cool', 'monsttex']
>>>
>>> ['_'.join(i) for i in zip_longest(f, f[1:], fillvalue='NA')]
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']