How to get part of filename into a variable? - python

I have a lot of .csv files and I'd like to parse the file names.
The file names are in this format:
name.surname.csv
How can I write a function that populates two variables with the components of the file name?
A = name
B = surname

Use str.split and unpack the result in A, B and another "anonymous" variable to store (and ignore) the extension.
filename = 'name.surname.csv'
A, B, _ = filename.split('.')

Try this, the name is split by . and stored in A and B
a="name.surname.csv"
A,B,C=a.split('.')
Of course, this assumes that your file name is in the form first.second.csv

If the file names always have the exact same form, with exactly two periods, then you can do:
>>> name, surname, ext = "john.doe.csv".split(".")
>>> name
'john'
>>> surname
'doe'
>>> ext
'csv'
>>>

Simple use str.split() method and this function.
def split_names(input:str):
splitted = input.split(".")
return splitted[0], splitted[1]
A, B = split_names("name.surname.csv")

First find all the files in your directory with the extention '.csv', then split it by '.'
import os
for file in os.listdir("/mydir"):
if file.endswith(".csv"):
# print the file name
print(os.path.join("/mydir", file))
# split the file name by '.'
name, surname, ext = file.split(".")
# print or append or whatever you will do with the result here

If you have file saved at a specific location in the system , then you have to first get only the file name :
# if filename = name.surname.csv then discard first two lines
filename = "C://CSVFolder//name.surname.csv"
absfilename = filename.split('//')[-1]
# by concept of packing unpacking
A,B,ext = absfilename.split('.')
else you can just provide
A,B,ext = "name.surname.csv".split('.')
print A,B,ext
Happy coding :)

Related

Python grab substring between two specific characters

I have a folder with hundreds of files named like:
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
Convention:
year_month_ID_zone_date_0_L2A_B01.tif ("_0_L2A_B01.tif", and "zone" never change)
What I need is to iterate through every file and build a path based on their name in order to download them.
For example:
name = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
path = "2017/5/S2B_7VEG_20170528_0_L2A/B01.tif"
The path convention needs to be: path = year/month/ID_zone_date_0_L2A/B01.tif
I thought of making a loop which would "cut" my string into several parts every time it encounters a "_" character, then stitch the different parts in the right order to create my path name.
I tried this but it didn't work:
import re
filename =
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
try:
found = re.search('_(.+?)_', filename).group(1)
except AttributeError:
# _ not found in the original string
found = '' # apply your error handling
How could I achieve that on Python ?
Since you only have one separator character, you may as well simply use Python's built in split function:
import os
items = filename.split('_')
year, month = items[:2]
new_filename = '_'.join(items[2:])
path = os.path.join(year, month, new_filename)
Try the following code snippet
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
found = re.sub('(\d+)_(\d+)_(.*)_(.*)\.tif', r'\1/\2/\3/\4.tif', filename)
print(found) # prints 2017/05/S2B_7VEG_20170528_0_L2A/B01.tif
No need for a regex -- you can just use split().
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
parts = filename.split("_")
year = parts[0]
month = parts[1]
Maybe you can do like this:
from os import listdir, mkdir
from os.path import isfile, join, isdir
my_path = 'your_soure_dir'
files_name = [f for f in listdir(my_path) if isfile(join(my_path, f))]
def create_dir(files_name):
for file in files_name:
month = file.split('_', '1')[0]
week = file.split('_', '2')[1]
if not isdir(my_path):
mkdir(month)
mkdir(week)
### your download code
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
temp = filename.split('_')
result = "/".join(temp)
print(result)
result is
2017/05/S2B/7VEG/20170528/0/L2A/B01.tif

iterating through specific files in folder with name matching pattern in python

I have a folder with a lot of csv files with different names.
I want to work only with the files that their name is made up of numbers only,
though I have no information of the range of the numbers in the title of the files.
for example, I have
['123.csv', 'not.csv', '75839.csv', '2.csv', 'bad.csv', '23bad8.csv']
and I would like to only work with ['123.csv', '75839.csv', '2.csv']
I tried the following code:
for f in file_list:
if f.startwith('1' or '2' or '3' ..... or '9'):
# do something
but this does not some the problem if the file name starts with a number but still includes letters or other symbols later.
You can use Regex to do the following:
import re
lst_of_files = ['temo1.csv', '12321.csv', '123123.csv', 'fdao123.csv', '12312asdv.csv', '123otk123.csv', '123.txt']
pattern = re.compile('^[0-9]+.csv')
newlst = [re.findall(pattern, filename) for filename in lst_of_files if len(re.findall(pattern, filename)) > 0]
print(newlst)
You can do it this way:
file_list = ["123.csv", "not.csv", "75839.csv", "2.csv", "bad.csv", "23bad8.csv"]
for f in file_list:
name, ext = f.rsplit(".", 1) # split at the rightmost dot
if name.isnumeric():
print(f)
Output is
123.csv
75839.csv
2.csv
One of the approaches:
import re
lst_of_files = ['temo1.csv', '12321.csv', '123123.csv', 'fdao123.csv', '12312asdv.csv', '123otk123.csv', '123.txt', '876.csv']
for f in lst_of_files:
if re.search(r'^[0-9]+.csv', f):
print (f)
Output:
12321.csv
123123.csv
876.csv

How to create a list of images and their names in a folder

I am using the following code to create a list of image files with tif extension. The current result of the code is a list with the address of the .tif files.
raster_list=[]
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".tif"):
raster_list.append(os.path.join(path, file))
getFiles(fpath)
print(raster_list)
len(raster_list)
result:
['C:/dataset/test\\ras1.tif', 'C:dataset/test\\ras2.tif', 'C:/dataset/test\\ras3.tif', 'C:/dataset/test\\ras4.tif', 'C:/dataset/test\\ras5.tif']
How can I revise the code to create two lists a) name of the file with .tif b) name of the file. here is the example:
raster_list = ['ras1.tif','ras2.tif','ras3.tif','ras4.tif', 'ras5.tif' ]
raster_name = ['ras1','ras2','ras3','ras4', 'ras5']
edit to the code from solutions:
from os import path
raster_list=[]
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".tif"):
raster_list.append(file) #remove the os.join, just take the file name
getFiles(fpath)
raster_names = [path.splitext(x)[0] for x in raster_list]
print(raster_list)
print(raster_names)
result:
['ras1.tif', 'ras2.tif', 'ras3.tif', 'ras4.tif', 'ras5.tif']
['ras1', 'ras2', 'ras3', 'ras4', 'ras5']
Personally, I would use pathlib because it's OS independent and reliable:
>>> from pathlib import Path
>>> files = ['/tmp/ras1.tif', '/tmp/ras2.tif', '/tmp/ras3.tif']
>>> [Path(file).name for file in files]
['ras1.tif', 'ras2.tif', 'ras3.tif']
>>> [Path(file).stem for file in files]
['ras1', 'ras2', 'ras3']
For the first list you want you can just use the function split("") on each one of the term of your list and use the last element of the list created by split.
And for the second one you can just take the first list you've created and split on ".", you will just have to save the first element of the list.
L = ['C:/dataset/test\\ras1.tif', 'C:dataset/test\\ras2.tif', 'C:/dataset/test\\ras3.tif', 'C:/dataset/test\\ras4.tif', 'C:/dataset/test\\ras5.tif']
raster_list = []
raster_name = []
for i in L:
a = i.split("\\")[1]
raster_list.append(a)
raster_name.append(a.split(".")[0])
print(raster_list, raster_name)
But you can also try to not include the path in rather_list like that:
raster_list=[]
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".tif"):
raster_list.append(file) #remove the os.join, just take the file name
getFiles(fpath)
print(raster_list)
len(raster_list)
You can merge the two to have this:
raster_list=[]
raster_names=[]
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".tif"):
raster_list.append(file)
raster_names.append(file.split(".")[0])
getFiles(fpath)
print(raster_list, raster_names)
split by '\', take the last element and append it to first list,
then split that by '.', take first element and append it to second list?
Alternatively, you can use these in-built functions: https://docs.python.org/3/library/os.path.html
You can use the os.path() methods to achieve what you want.
import os
raster_list = []
raster_name = []
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".tif"):
raster_list.append(os.path.basename(file))
raster_name.append(os.path.splitext(file)[0])
getFiles(fpath)
print(raster_list)
print(raster_name)
len(raster_list)

Find specific substring while iterating through multiple file names

I need to find the identification number of a big number of files while iterating throught them.
The file names are loaded onto a list and look like:
ID322198.nii
ID9828731.nii
ID23890.nii
FILEID988312.nii
So the best way to approach this would be to find the number that sits between ID and .nii
Because number of digits varies I can't simply select [-10:-4] of thee file name. Any ideas?
You can use a regex (see it in action here):
import re
files = ['ID322198.nii','ID9828731.nii','ID23890.nii','FILEID988312.nii']
[re.findall(r'ID(\d+)\.nii', file)[0] for file in files]
Returns:
['322198', '9828731', '23890', '988312']
to find the position of ID and .nii, you can use python's index() function
for line in file:
idpos =
nilpos =
data =
or as a list of ints:
[ int(line[line.index("ID")+1:line.index(".nii")]) for line in file ]
Using rindex:
s = 'ID322198.nii'
s = s[s.rindex('D')+1 : s.rindex('.')]
print(s)
Returns:
322198
Then apply this sintax to a list of strings.
It seems like you could filter the digits out, like this:
digits = ''.join(d for d in filename if d.isdigit())
That will work nicely as long as there are no other digits in the filename (e.g backups with a .1 suffix or something).
for name in files:
name = name.replace('.nii', '')
id_num = name.replace(name.rstrip('0123456789'), '')
How this works:
# example
name = 'ID322198.nii'
# remove '.nii'. -> name1 = 'ID322198'
name1 = name.replace('.nii', '')
# strip all digits from the end. -> name2 = 'ID'
name2 = name1.rstrip('0123456789')
# remove 'ID' from 'ID322198'. -> id_num = '322198'
id_num = name1.replace(name2, '')

Python simple string method

I basically want to call only the part of my string that falls before the "."
For example, if my filename is sciPHOTOf105w0.fits, I want to call "sciPHOTOf105w" as its own string so that I can use it to name a new file that's related to it. How do you do this? I can't just use numeral values "ex. if my file name is 'file', file[5:10]." I need to be able to collect everything up to the dot without having to count, because the file names can be of different lengths.
You can also use os.path like so:
>>> from os.path import splitext
>>> splitext('sciPHOTOf105w0.fits') # separates extension from file name
('sciPHOTOf105w0', '.fits')
>>> splitext('sciPHOTOf105w0.fits')[0]
'sciPHOTOf105w0'
If your file happens to have a longer path, this approach will also account for your full path.
import os.path
filename = "sciPHOTOf105w0.fits"
root, ext = os.path.splitext(filename)
print "root is: %s" % root
print "ext is: %s" % ext
result:
>root is: sciPHOTOf105w0
>ext is: .fits
In [33]: filename = "sciPHOTOf105w0.fits"
In [34]: filename.rpartition('.')[0]
Out[34]: 'sciPHOTOf105w0'
In [35]: filename.rsplit('.', 1)[0]
Out[35]: 'sciPHOTOf105w0'
You can use .index() on a string to find the first occurence of a substring.
>>> filename = "sciPHOTOf105w0.fits"
>>> filename.index('.')
14
>>> filename[:filename.index('.')]
'sciPHOTOf105w0'

Categories

Resources