Removing the end of a filename in python - python

I need to remove the end off a filename below:
Testfile_20190226114536.CSV.986466.1551204043175
So anything after CSV needs to be removed so i have a file named:
Testfile_20190226114536.CSV

Suppose file_name = "Testfile_20190226114536.CSV.986466.1551204043175"
file_name = file_name.split('.CSV')[0] + '.CSV'

As simple as this:
s = 'Testfile_20190226114536.CSV.986466.1551204043175'
suffix = '.CSV'
s[:s.rindex(suffix) + len(suffix)]
=> 'Testfile_20190226114536.CSV'

You can use re.sub:
import re
result = re.sub('(?<=\.CSV)[\w\W]+', '', 'Testfile_20190226114536.CSV.986466.1551204043175')
Output:
'Testfile_20190226114536.CSV'

The easy way is this
All of your files have this "CSV" in the middle?
You can try split and join your name like this:
name = "Testfile_20190226114536.CSV.986466.1551204043175"
print ".".join(name.split(".")[0:2])

Here's the steps to see what's going on
>>> filename = 'Testfile_20190226114536.CSV.986466.1551204043175'
# split the string into a list at '.'
>>> l = filename.split('.')
>>> print(l)
['Testfile_20190226114536', 'CSV', '986466', '1551204043175']
# index the list to get all the elements before and including 'CSV'
>>> filtered_list = l[0:l.index('CSV')+1]
>>> print(filtered_list)
['Testfile_20190226114536', 'CSV']
# join together the elements of the list with '.'
>>> out_string = '.'.join(filtered_list)
>>> print(out_string)
Testfile_20190226114536.CSV
Here's a full function:
def filter_filename(filename):
l = filename.split('.')
filtered_list = l[0:l.index('CSV')+1]
out_string = '.'.join(filtered_list)
return out_string
>>> filter_filename('Testfile_20190226114536.CSV.986466.1551204043175')
'Testfile_20190226114536.CSV'

Related

How to convert a list to a dict?

I am using subprocess to print the output of ls.
output = subprocess.getoutput("ssh -i key.pem ubuntu#10.127.6.83 ls -l --time-style=long-iso /opt/databases | awk -F' ' '{print $6 $8}'")
lines = output.splitlines()
print(lines)
format = '%Y-%m-%d'
for line in lines:
if line != '':
date = datetime.strptime(line, format)
And when I print lines am getting a large list in the following format:
['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
I am trying to convert the above output to a dict with the dates in '%Y-%m-%d' format. So output would be something like:
{ '2019-04-25' : 'friendship_graph_43458',
'2019-07-18': 'friendship_graph_45359',
'2019-09-03': 'friendship_graph_46553' }
and so on, but not quite sure how to do so.
Technically if you don't want to use re if all dates are formatted the same then they will all be 10 characters long thus just slice the strings to make the dict in a comprehension:
data = ['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
output = {s[:10]: s[10:] for s in data if len(s) > 10}
{'2019-04-25': 'friendship_graph_43458', '2019-07-18': 'friendship_graph_45359', '2019-09-03': 'friendship_graph_46553', '2019-10-02': 'friendship_graph_46878'}
You could use a regular expression for each item in the list. For example:
(\d{4}-\d{2}-\d{2})(.*)
Then, you can just iterate through each item in the list and use the regular expression to the get the string in its two parts.
>>> import re
>>> regex = re.compile(r"(\d{4}-\d{2}-\d{2})(.*)")
>>> items = ['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
>>> items_dict = {}
>>> for i in items:
match = regex.search(i)
if match is None:
continue
items_dict[match.group(1)] = match.group(2)
>>> items_dict
{'2019-04-25': 'friendship_graph_43458', '2019-07-18': 'friendship_graph_45359', '2019-09-03': 'friendship_graph_46553', '2019-10-02': 'friendship_graph_46878'}
For lines that start with the date; use slices to separate the key from the value.
>>> s = '2019-04-25friendship_graph_43458'
>>> d = {}
>>> d[s[:10]] = s[10:]
>>> d
{'2019-04-25': 'friendship_graph_43458'}
>>>
Use re.findall and dictionary comprehension:
import re
lst = ['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
dct = {k: v for s in lst for k, v in re.findall(r'(\d\d\d\d-\d\d-\d\d)(.*)', s) }
print(dct)
# {'2019-04-25': 'friendship_graph_43458', '2019-07-18': 'friendship_graph_45359', '2019-09-03': 'friendship_graph_46553', '2019-10-02': 'friendship_graph_46878'}

get files names in a directory by neglecting first 3 letters of file name

I had to read all text files in a directory via python script, but first I had to remove the first 3 letters from every file to make index list.
the files names which contains data in a directory are are as follow.
zzz143
zzz146
zzz150
.
.
.
zzz250
I had to remove zzz from all files and make index list of all those files in a directory to read data from those files.
I know how to deal with files e.g
zzz.160.dat
for these kinds of files I use following code to remove the prefix and suffix.
def get_list(path, path_of_module_files ):
prefix, suffix = path_of_module_files.split("<index>")
d = {}
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for item in onlyfiles:
if item.endswith(suffix) and item.startswith(prefix):
text = item
text = text[(find_str(text, prefix)+len(prefix)):]
text = text[:find_str(text, suffix)]
d[int(text)] = "/".join([path, item])
index_list = collections.OrderedDict(sorted(d.items(), key=lambda t: t[0]))
return index_list
this code deals with suffix and prefix, but now in my case they are a kind of prefix only.
and in my case now it is not split by . or - and it is just zzz143. I had to get files names by removing zzz and list should be like this
143
146
150
.
.
.
250
instead of
zzz143
zzz144
zzz145
.
.
.
.
zzz250
If someone give me an idea or example how to do get all files names through looping to extract all files in that directory. i will really thankful
To remove the first 3 characters of each item you can use list slicing like below:
my_list = ['zzz143', 'zzz146', 'zzz150']
new_list = [item[3:] for item in my_list]
Output:
>>> new_list
['143', '146', '150']
If you need to extract numbers from filenames for indexing, then how matter the filename was, you can do it with:
>>> import re
>>> s = '250.zzz'
>>> s1 = 'zzz123'
>>> s2 = 'abc.444.zzz'
>>>
>>> re.search(r'\d+', s).group(0)
'250'
>>>
>>> re.search(r'\d+', s1).group(0)
'123'
>>>
>>> re.search(r'\d+', s2).group(0)
'444
EDIT, this will work for all cases of filenames you mentioned:
def get_list(path, path_of_module_files):
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
d = {}
for fil in onlyfiles:
seq = re.search(r'\d+', fil)
if seq:
d[seq.group(0)] = os.path.abspath(fil)
EDIT2: You can also do it with map function:
>>> onlyfiles
['250.zzz', 'zzz123', 'abc.444.zzz']
>>>
>>> list(map(lambda s: re.search(r'\d+', s).group(0), onlyfiles))
['250', '123', '444']
But again, if all you have is filenames with this format: 'zzz123.ext', then you don't need to overload your process with re.search, better use built-in method for faster process, like so:
>>> onlyfiles = ['zzz123', 'zzz456', 'zzz789']
>>>
>>> list(map(lambda s: s[3:], onlyfiles))
['123', '456', '789']
>>>
>>> list(map(lambda s: s.strip('zzz'), onlyfiles))
['123', '456', '789']
This method will automatically loop through all elements of you list with the need to explicitly writing a for loop.
EDIT3: using OrderedDict:
Either Simple for loop:
>>> from collections import OrderedDict
>>>
>>> index_dict = OrderedDict()
>>>
>>> for fil in onlyfiles:
k = int(fil.strip('zzz'))
index_dict[k] = fil
>>> index_dict
OrderedDict([(123, 'zzz123'), (456, 'zzz456'), (789, 'zzz789')])
Or with zip and map as one liner expression:
>>> OrderedDict(zip(map(lambda s: int(s.strip('zzz')), onlyfiles), onlyfiles))
OrderedDict([(123, 'zzz123'), (456, 'zzz456'), (789, 'zzz789')])
If you are sure that the prefix is 'zzz' you could just replace it by '', like so :
def get_list(path, path_of_module_files):
filepath = os.path.join(path, path_of_module_files)
d = {}
if os.path.isfile(filepath):
suffix = device_name_format.split(".")[0].replace('zzz', '')
d[suffix] = os.path.abspath(filepath)
index_list = collections.OrderedDict(sorted(d.items(), key=lambda t: t[0]))
You can either use slice notation, if the three letters are different each time:
your_string = "ABC123"
your_string[3:]
>>> '123'
Or string.lstrip if the prefix is the same every time.
your_string = "zzz123"
your_string.lstrip("zzz")
>>>> '123'

Text Merging - How to do this in Python? (R source)

I have tried several methods but none worked to translate it to Python, specially because I have this error:
'str' object does not support item assignment
R can do the same with the following code:
f<-0
text<- c("foo", "btextr", "cool", "monsttex")
for (i in 1:length(text)){
f[i]<-paste(text[i],text[i+1], sep = "_")
}
f
The output is:
"foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
I would appreciate so much if you can help me to do the same for Python. Thanks.
In R your output would have been (next time please put this in the question):
> f
[1] "foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
In Python strings are immutable. So you'll need to create new strings, e.g.:
new_strings = []
text = ['foo', 'btextr', 'cool', 'monsttex']
for i,t in enumerate(text):
try:
new_strings.append(text[i] + '_' + text[i+1])
except IndexError:
new_strings.append(text[i] + '_NA')
Which results in:
>>> new_strings
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']
this works:
>>> from itertools import zip_longest
>>>
>>> f = ['foo', 'btextr', 'cool', 'monsttex']
>>>
>>> ['_'.join(i) for i in zip_longest(f, f[1:], fillvalue='NA')]
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']

Why wont this loop work

def cut(path):
test = str(foundfiles)
newList = [s for s in test if test.endswith('.UnitTests.vbproj')]
for m in newList:
print m
return newList
This function parses through foundliles which is a list of files in a folder that I have already parsed through of about 20+ files. I need to parse through that list of every file thta ends in ".UnitTests.vbproj" However, I can't get it working. Any advice would be greatly appreciated!
Edit1: This is what I made my code now, and I get the atrribute error message box saying that 'tuple' object has no attribute 'endswith'
def cut(path):
test = foundfiles
newList = [s for s in foundfiles if s.endswith('.UnitTests.vbproj')]
for m in newList:
print m
return newList
You turned the list into a string. Looping over test gives you individual characters instead:
>>> foundfiles = ['foo', 'bar']
>>> for c in str(foundfiles):
... print c
...
[
'
f
o
o
'
,
'
b
a
r
'
]
There is no need to turn foundfiles into a string. You also need to test the elements of the list, not test:
newList = [s for s in foundfiles if s.endswith('.UnitTests.vbproj')]
I really don't know what's the type of your 'foundfiles'.
Maybe this way will help you:
def cut(path):
import os
newlist = []
for parent,dirnames,filenames in os.walk(path):
for FileName in filenames:
fileName = os.path.join(parent,FileName)
if fileName.endswith('.UnitTests.vbproj'):newlist.append(fileName)
return newlist

String of values separated by commas or semicolons into a Python list

I'm reading a list of email addresses from a config file. The addresses can be delimited by comma or semicolon - e.g.,
billg#microsoft.com,steve#apple.com, dhh#37signals.com
billg#microsoft.com;steve#apple.com; dhh#37signals.com
I'd like to get rid of any whitespace around the email addresses too.
I need to get them into a Python list like this:
['billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com']
What's the most Pythonic way to do it? Thanks.
In this case I whould use the re module
>>> import re
>>>
>>> data = "billg#microsoft.com;steve#apple.com; dhh#37signals.com"
>>> stuff = re.split(r"\s*[,;]\s*", data.strip())
Regular expressions are powerful, and probably the way to go here; but for something as simple as this, string methods are OK too. Here's a terse solution:
[s.strip() for s in s1.replace(',', ';').split(';')]
Test output:
>>> s1 = "billg#microsoft.com,steve#apple.com, dhh#37signals.com"
>>> s2 = " billg#microsoft.com;steve#apple.com; dhh#37signals.com "
>>> print [s.strip() for s in s1.replace(',', ';').split(';')]
['billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com']
>>> print [s.strip() for s in s2.replace(',', ';').split(';')]
['billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com']
If it's only ';' or only ',' and you know which, use string.split:
>>> 'adjifjdasf;jdiafjodafs;jdiajof'.split(';')
['adjifjdasf', 'jdiafjodafs', 'jdiajof']
http://docs.python.org/library/stdtypes.html#str.split
EDIT For whitespace you can also do:
>>> map(str.strip, 'adjifjdasf;jdiafjodafs ; jdiajof'.split(';'))
['adjifjdasf', 'jdiafjodafs', 'jdiajof']
You can use string.maketrans to replace multiple separators with spaces in a single pass
import string
data = "one two, three ; four "
stuff = [i for i in data.translate(string.maketrans(";,", " ")).split()]
print stuff # -> ['one', 'two', 'three', 'four']
You could do it using just Python's string manipulation facilities:
import string
s1 = "billg#microsoft.com,steve#apple.com, dhh#37signals.com"
s2 = "billg#microsoft.com;steve#apple.com; dhh#37signals.com"
print s1.translate(string.maketrans(';',','), string.whitespace).split(',')
# ['billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com']
print s2.translate(string.maketrans(';',','), string.whitespace).split(',')
# ['billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com']
data = ''' billg#microsoft.com,steve#apple.com, dhh#37signals.com
billg#microsoft.com;steve#apple.com;\t \rdhh#37signals.com '''
print repr(data),'\n'
import re
print re.findall('[^,\s;]+', data)
result
' billg#microsoft.com,steve#apple.com, dhh#37signals.com \n billg#microsoft.com;steve#apple.com;\t \rdhh#37signals.com '
['billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com', 'billg#microsoft.com', 'steve#apple.com', 'dhh#37signals.com']
notice the '\n' , '\t' and '\r' in this data
def gen_list(file_path):
read= open(file_path, "r")
split1= read.split(";")
new_list= []
for i in split1:
split2 = i.split(",")
split_list = [item.strip() for item in split2 if "#" in item]
new_list.extend(split_list)
return new_list
#
This works for both comma and ;. The number of lines can further be reduced

Categories

Resources