Extracting extension from filename in Python

Extracting extension from filename in Python - python

Is there a function to extract the extension from a filename?

Use os.path.splitext:
>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'
Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:
>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

New in version 3.4.
import pathlib
print(pathlib.Path('yourPath.example').suffix) # '.example'
print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz']
I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!

import os.path
extension = os.path.splitext(filename)[1]

import os.path
extension = os.path.splitext(filename)[1][1:]
To get only the text of the extension, without the dot.

For simple use cases one option may be splitting from dot:
>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'
No error when file doesn't have an extension:
>>> "filename".split(".")[-1]
'filename'
But you must be careful:
>>> "png".split(".")[-1]
'png' # But file doesn't have an extension
Also will not work with hidden files in Unix systems:
>>> ".bashrc".split(".")[-1]
'bashrc' # But this is not an extension
For general use, prefer os.path.splitext

worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list.
os.path.splitext(filename)[1][1:].strip().lower()

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:
import os.path
extension = os.path.splitext(filename)[1][1:].strip()

You can find some great stuff in pathlib module (available in python 3.x).
import pathlib
x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix
print(x)
# Output
'.txt'

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)
>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension
'.gz'
but should be: .tar.gz
The possible solutions are here

Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:
to get extension of a given file absolute path, you can simply type:
filepath.rpartition('.')[-1]
example:
path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]
will give you: 'csv'

Just join all pathlib suffixes.
>>> x = 'file/path/archive.tar.gz'
>>> y = 'file/path/text.txt'
>>> ''.join(pathlib.Path(x).suffixes)
'.tar.gz'
>>> ''.join(pathlib.Path(y).suffixes)
'.txt'

Surprised this wasn't mentioned yet:
import os
fn = '/some/path/a.tar.gz'
basename = os.path.basename(fn) # os independent
Out[] a.tar.gz
base = basename.split('.')[0]
Out[] a
ext = '.'.join(basename.split('.')[1:]) # <-- main part
# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz
Benefits:
Works as expected for anything I can think of
No modules
No regex
Cross-platform
Easily extendible (e.g. no leading dots for extension, only last part of extension)
As function:
def get_extension(filename):
basename = os.path.basename(filename) # os independent
ext = '.'.join(basename.split('.')[1:])
return '.' + ext if ext else None

You can use a split on a filename:
f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))
This does not require additional library

filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]

Extracting extension from filename in Python
Python os module splitext()
splitext() function splits the file path into a tuple having two values – root and extension.
import os
# unpacking the tuple
file_name, file_extension = os.path.splitext("/Users/Username/abc.txt")
print(file_name)
print(file_extension)
Get File Extension using Pathlib Module
Pathlib module to get the file extension
import pathlib
pathlib.Path("/Users/pankaj/abc.txt").suffix
#output:'.txt'

Even this question is already answered I'd add the solution in Regex.
>>> import re
>>> file_suffix = ".*(\..*)"
>>> result = re.search(file_suffix, "somefile.ext")
>>> result.group(1)
'.ext'

This is a direct string representation techniques :
I see a lot of solutions mentioned, but I think most are looking at split.
Split however does it at every occurrence of "." .
What you would rather be looking for is partition.
string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]

Another solution with right split:
# to get extension only
s = 'test.ext'
if '.' in s: ext = s.rsplit('.', 1)[1]
# or, to get file name and extension
def split_filepath(s):
"""
get filename and extension from filepath
filepath -> (filename, extension)
"""
if not '.' in s: return (s, '')
r = s.rsplit('.', 1)
return (r[0], r[1])

you can use following code to split file name and extension.
import os.path
filenamewithext = os.path.basename(filepath)
filename, ext = os.path.splitext(filenamewithext)
#print file name
print(filename)
#print file extension
print(ext)

A true one-liner, if you like regex.
And it doesn't matter even if you have additional "." in the middle
import re
file_ext = re.search(r"\.([^.]+)$", filename).group(1)
See here for the result: Click Here

Well , i know im late
that's my simple solution
file = '/foo/bar/whatever.ext'
extension = file.split('.')[-1]
print(extension)
#output will be ext

try this:
files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']
for file in files: #1
if (file.split(".")[-2] in pen_ext): #2
ext = file.split(".")[-2]+"."+file.split(".")[-1]#3
else:
ext = file.split(".")[-1] #4
print (ext) #5
get all file name inside the list
splitting file name and check the penultimate extension, is it in the pen_ext list or not?
if yes then join it with the last extension and set it as the file's extension
if not then just put the last extension as the file's extension
and then check it out

You can use endswith to identify the file extension in python
like bellow example
for file in os.listdir():
if file.endswith('.csv'):
df1 =pd.read_csv(file)
frames.append(df1)
result = pd.concat(frames)

For funsies... just collect the extensions in a dict, and track all of them in a folder. Then just pull the extensions you want.
import os
search = {}
for f in os.listdir(os.getcwd()):
fn, fe = os.path.splitext(f)
try:
search[fe].append(f)
except:
search[fe]=[f,]
extensions = ('.png','.jpg')
for ex in extensions:
found = search.get(ex,'')
if found:
print(found)

This method will require a dictonary, list, or set. you can just use ".endswith" using built in string methods. This will search for name in list at end of file and can be done with just str.endswith(fileName[index]). This is more for getting and comparing extensions.
https://docs.python.org/3/library/stdtypes.html#string-methods
Example 1:
dictonary = {0:".tar.gz", 1:".txt", 2:".exe", 3:".js", 4:".java", 5:".python", 6:".ruby",7:".c", 8:".bash", 9:".ps1", 10:".html", 11:".html5", 12:".css", 13:".json", 14:".abc"}
for x in dictonary.values():
str = "file" + x
str.endswith(x, str.index("."), len(str))
Example 2:
set1 = {".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"}
for x in set1:
str = "file" + x
str.endswith(x, str.index("."), len(str))
Example 3:
fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
for x in range(0, len(fileName)):
str = "file" + fileName[x]
str.endswith(fileName[x], str.index("."), len(str))
Example 4
fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
str = "file.txt"
str.endswith(fileName[1], str.index("."), len(str))
Examples 5, 6, 7 with output
Example 8
fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
exts = []
str = "file.txt"
for x in range(0, len(x)):
if str.endswith(fileName[1]) == 1:
exts += [x]

The easiest way to get is to use mimtypes, below is the example:
import mimetypes
mt = mimetypes.guess_type("file name")
file_extension = mt[0]
print(file_extension)

Here if you want to extract the last file extension if it has multiple
class functions:
def listdir(self, filepath):
return os.listdir(filepath)
func = functions()
os.chdir("C:\\Users\Asus-pc\Downloads") #absolute path, change this to your directory
current_dir = os.getcwd()
for i in range(len(func.listdir(current_dir))): #i is set to numbers of files and directories on path directory
if os.path.isfile((func.listdir(current_dir))[i]): #check if it is a file
fileName = func.listdir(current_dir)[i] #put the current filename into a variable
rev_fileName = fileName[::-1] #reverse the filename
currentFileExtension = rev_fileName[:rev_fileName.index('.')][::-1] #extract from beginning until before .
print(currentFileExtension) #output can be mp3,pdf,ini,exe, depends on the file on your absolute directory
Output is mp3, even works if has only 1 extension name

I'm definitely late to the party, but in case anyone wanted to achieve this without the use of another library:
file_path = "example_tar.tar.gz"
file_name, file_ext = [file_path if "." not in file_path else file_path.split(".")[0], "" if "." not in file_path else file_path[file_path.find(".") + 1:]]
print(file_name, file_ext)
The 2nd line is basically just the following code but crammed into one line:
def name_and_ext(file_path):
if "." not in file_path:
file_name = file_path
else:
file_name = file_path.split(".")[0]
if "." not in file_path:
file_ext = ""
else:
file_ext = file_path[file_path.find(".") + 1:]
return [file_name, file_ext]
Even though this works, it might not work will all types of files, specifically .zshrc, I would recomment using os's os.path.splitext function, example below:
import os
file_path = "example.tar.gz"
file_name, file_ext = os.path.splitext(file_path)
print(file_name, file_ext)
Cheers :)

# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs
import os.path
class LinkChecker:
#staticmethod
def get_link_extension(link: str)->str:
if link is None or link == "":
return ""
else:
paths = os.path.splitext(link)
ext = paths[1]
new_link = paths[0]
if ext != "":
return LinkChecker.get_link_extension(new_link) + ext
else:
return ""

def NewFileName(fichier):
cpt = 0
fic , *ext = fichier.split('.')
ext = '.'.join(ext)
while os.path.isfile(fichier):
cpt += 1
fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
return fichier

Related

Python - How to create a csv, if csv already exists [duplicate]

Does Python have any built-in functionality to add a number to a filename if it already exists?
My idea is that it would work the way certain OS's work - if a file is output to a directory where a file of that name already exists, it would append a number or increment it.
I.e: if "file.pdf" exists it will create "file2.pdf", and next time "file3.pdf".

I ended up writing my own simple function for this. Primitive, but gets the job done:
def uniquify(path):
filename, extension = os.path.splitext(path)
counter = 1
while os.path.exists(path):
path = filename + " (" + str(counter) + ")" + extension
counter += 1
return path

In a way, Python has this functionality built into the tempfile module. Unfortunately, you have to tap into a private global variable, tempfile._name_sequence. This means that officially, tempfile makes no guarantee that in future versions _name_sequence even exists -- it is an implementation detail.
But if you are okay with using it anyway, this shows how you can create uniquely named files of the form file#.pdf in a specified directory such as /tmp:
import tempfile
import itertools as IT
import os
def uniquify(path, sep = ''):
def name_sequence():
count = IT.count()
yield ''
while True:
yield '{s}{n:d}'.format(s = sep, n = next(count))
orig = tempfile._name_sequence
with tempfile._once_lock:
tempfile._name_sequence = name_sequence()
path = os.path.normpath(path)
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
fd, filename = tempfile.mkstemp(dir = dirname, prefix = filename, suffix = ext)
tempfile._name_sequence = orig
return filename
print(uniquify('/tmp/file.pdf'))

I was trying to implement the same thing in my project but #unutbu's answer seemed too 'heavy' for my needs so I came up with following code finally:
import os
index = ''
while True:
try:
os.makedirs('../hi'+index)
break
except WindowsError:
if index:
index = '('+str(int(index[1:-1])+1)+')' # Append 1 to number in brackets
else:
index = '(1)'
pass # Go and try create file again
Just in case someone stumbled upon this and requires something simpler.

If all files being numbered isn't a problem, and you know beforehand the name of the file to be written, you could simply do:
import os
counter = 0
filename = "file{}.pdf"
while os.path.isfile(filename.format(counter)):
counter += 1
filename = filename.format(counter)

recently I encountered the same thing and here is my approach:
import os
file_name = "file_name.txt"
if os.path.isfile(file_name):
expand = 1
while True:
expand += 1
new_file_name = file_name.split(".txt")[0] + str(expand) + ".txt"
if os.path.isfile(new_file_name):
continue
else:
file_name = new_file_name
break

Let's say you already have those files:
This function generates the next available non-already-existing filename, by adding a _1, _2, _3, ... suffix before the extension if necessary:
import os
def nextnonexistent(f):
fnew = f
root, ext = os.path.splitext(f)
i = 0
while os.path.exists(fnew):
i += 1
fnew = '%s_%i%s' % (root, i, ext)
return fnew
print(nextnonexistent('foo.txt')) # foo_3.txt
print(nextnonexistent('bar.txt')) # bar_1.txt
print(nextnonexistent('baz.txt')) # baz.txt

Since the tempfile hack A) is a hack and B) still requires a decent amount of code anyway, I went with a manual implementation. You basically need:
A way to Safely create a file if and only if it does not exist (this is what the tempfile hack affords us).
A generator for filenames.
A wrapping function to hide the mess.
I defined a safe_open that can be used just like open:
def iter_incrementing_file_names(path):
"""
Iterate incrementing file names. Start with path and add " (n)" before the
extension, where n starts at 1 and increases.
:param path: Some path
:return: An iterator.
"""
yield path
prefix, ext = os.path.splitext(path)
for i in itertools.count(start=1, step=1):
yield prefix + ' ({0})'.format(i) + ext
def safe_open(path, mode):
"""
Open path, but if it already exists, add " (n)" before the extension,
where n is the first number found such that the file does not already
exist.
Returns an open file handle. Make sure to close!
:param path: Some file name.
:return: Open file handle... be sure to close!
"""
flags = os.O_CREAT | os.O_EXCL | os.O_WRONLY
if 'b' in mode and platform.system() == 'Windows':
flags |= os.O_BINARY
for filename in iter_incrementing_file_names(path):
try:
file_handle = os.open(filename, flags)
except OSError as e:
if e.errno == errno.EEXIST:
pass
else:
raise
else:
return os.fdopen(file_handle, mode)
# Example
with safe_open("some_file.txt", "w") as fh:
print("Hello", file=fh)

I haven't tested this yet but it should work, iterating over possible filenames until the file in question does not exist at which point it breaks.
def increment_filename(fn):
fn, extension = os.path.splitext(path)
n = 1
yield fn + extension
for n in itertools.count(start=1, step=1)
yield '%s%d.%s' % (fn, n, extension)
for filename in increment_filename(original_filename):
if not os.isfile(filename):
break

This works for me.
The initial file name is 0.yml, if it exists, it will add one until meet the requirement
import os
import itertools
def increment_filename(file_name):
fid, extension = os.path.splitext(file_name)
yield fid + extension
for n in itertools.count(start=1, step=1):
new_id = int(fid) + n
yield "%s%s" % (new_id, extension)
def get_file_path():
target_file_path = None
for file_name in increment_filename("0.yml"):
file_path = os.path.join('/tmp', file_name)
if not os.path.isfile(file_path):
target_file_path = file_path
break
return target_file_path

import os
class Renamer():
def __init__(self, name):
self.extension = name.split('.')[-1]
self.name = name[:-len(self.extension)-1]
self.filename = self.name
def rename(self):
i = 1
if os.path.exists(self.filename+'.'+self.extension):
while os.path.exists(self.filename+'.'+self.extension):
self.filename = '{} ({})'.format(self.name,i)
i += 1
return self.filename+'.'+self.extension

I found that the os.path.exists() conditional function did what I needed. I'm using a dictionary-to-csv saving as an example, but the same logic could work for any file type:
import os
def smart_save(filename, dict):
od = filename + '_' # added underscore before number for clarity
for i in np.arange(0,500,1): # I set an arbitrary upper limit of 500
d = od + str(i)
if os.path.exists(d + '.csv'):
pass
else:
with open(d + '.csv', 'w') as f: #or any saving operation you need
for key in dict.keys():
f.write("%s,%s\n"%(key, dictionary[key]))
break
Note: this appends a number (starting at 0) to the file name by default, but it's easy to shift that around.

This function validates if the file name exists using regex expresion and recursion
def validate_outfile_name(input_path):
filename, extension = os.path.splitext(input_path)
if os.path.exists(input_path):
output_path = ""
pattern = '\([0-9]\)'
match = re.search(pattern, filename)
if match:
version = filename[match.start() + 1]
try: new_version = int(version) + 1
except: new_version = 1
output_path = f"{filename[:match.start()]}({new_version}){extension}"
output_path = validate_outfile_name(output_path)
else:
version = 1
output_path = f"{filename}({version}){extension}"
return output_path
else:
return input_path

I've implemented a similar solution with pathlib:
Create file-names that match the pattern path/<file-name>-\d\d.ext. Perhaps this solution can help...
import pathlib
from toolz import itertoolz as itz
def file_exists_add_number(path_file_name, digits=2):
pfn = pathlib.Path(path_file_name)
parent = pfn.parent # parent-dir of file
stem = pfn.stem # file-name w/o extension
suffix = pfn.suffix # NOTE: extension starts with '.' (dot)!
try:
# search for files ending with '-\d\d.ext'
last_file = itz.last(parent.glob(f"{stem}-{digits * '?'}{suffix}"))
except:
curr_no = 1
else:
curr_no = int(last_file.stem[-digits:]) + 1
# int to string and add leading zeros
curr_no = str(last_no).zfill(digits)
path_file_name = parent / f"{stem}-{curr_no}{suffix}"
return str(path_file_name)
Pls note: That solution starts at 01 and will only find file-pattern containing -\d\d!

def create_file():
counter = 0
filename = "file"
while os.path.isfile(f"dir/{filename}{counter}.txt"):
counter += 1
print(f"{filename}{counter}.txt")

A little bit later but there is still something like this should work properly, mb it will be useful for someone.
You can use built-in iterator to do this ( image downloader as example for you ):
def image_downloader():
image_url = 'some_image_url'
for count in range(10):
image_data = requests.get(image_url).content
with open(f'image_{count}.jpg', 'wb') as handler:
handler.write(image_data)
Files will increment properly. Result is:
image.jpg
image_0.jpg
image_1.jpg
image_2.jpg
image_3.jpg
image_4.jpg
image_5.jpg
image_6.jpg
image_7.jpg
image_8.jpg
image_9.jpg

Easy way for create new file if this name in your folder
if 'sample.xlsx' in os.listdir('testdir/'):
i = 2
while os.path.exists(f'testdir/sample ({i}).xlsx'):
i += 1
wb.save(filename=f"testdir/sample ({i}).xlsx")
else:
wb.save(filename=f"testdir/sample.xlsx")

Python: Change file names to the names of people in a list

I have a couple slides, each slide corresponds to a person. I need to name each file (.pptx) after the individual name it references. A lot of the examples I see on mass renaming have the renaming become sequential like:
file1
file2
file3
I need:
bob.pptx
sue.pptx
jack.pptx
I was able to change names using os found on this site https://www.marsja.se/rename-files-in-python-a-guide-with-examples-using-os-rename/:
import os, fnmatch
file_path = 'C:\\Users\\Documents\\Files_To_Rename\\Many_Files\\'
files_to_rename = fnmatch.filter(os.listdir(file_path), '*.pptx')
print(files_to_rename)
new_name = 'Datafile'
for i, file_name in enumerate(files_to_rename):
new_file_name = new_name + str(i) + '.pptx'
os.rename(file_path + file_name,
file_path + new_file_name)
But again, this just names it:
Datafile1
Datafile2
etc

my example
import os from pathlib
import Path
files = os.listdir("c:\\tmp\\")
for key in range(0, len(files)):
print (files[key])
os.rename("c:\\tmp\\" + files[key], "c:\\tmp\\" + files[key].replace("-",""))
Path("c:\\tmp\\" + files[key] + '.ok').touch() # if u need add some extension

Here's how I ran your code (avoiding file paths I don't have!), getting it to print output not just rename
import os, fnmatch
file_path = '.\\'
files_to_rename = fnmatch.filter(os.listdir(file_path), '*.pptx')
print(files_to_rename)
new_name = 'Datafile'
for i, file_name in enumerate(files_to_rename):
new_file_name = new_name + str(i) + '.pptx'
print (file_path + new_file_name)
os.rename(file_path + file_name,
file_path + new_file_name)
This gave me
.\Datafile0.pptx
.\Datafile1.pptx
...
and did give me the correct sequence of pptx files in that folder.
So I suspect the problem is that you are getting the file names you want, but you can't see them in Windows. Solution: show file types in Windows. Here's one of many available links as to how: https://www.thewindowsclub.com/show-file-extensions-in-windows

Thank you everyone for your suggestions, I think I found it with a friend's help:
import os, fnmatch
import pandas as pd
file_path = 'C:\\Users\\Documents\\FolderwithFiles\\'
files_to_rename = fnmatch.filter(os.listdir(file_path), '*.pptx') #looks for any .ppt in path, can make any ext
df = pd.read_excel('Names.xlsx') #make a list of names in an xl, this won't read the header, header should be Names, then list your names)
for i, file_name in zip(df['Names'], files_to_rename): #zip instead of a nest for loop
new_file_name = i + '.pptx'
os.rename(file_path + file_name, file_path + new_file_name)
print(new_file_name)

Python: How to change a filename to lowercase but NOT the extension

I'm trying to change filenames like WINDOW.txt to lowercase but then I also need to change the extension .txt to uppercase. I am thinking I can just change the entire thing to lowercase as the extension is already lowercase and then using something like .endswith() to change the extension to uppercase but I can't seem to figure it out. I know this may seem simple to most so thank you for your patience.

This one handles filenames, paths across different operating systems:
import os.path
def lower_base_upper_ext(path):
"""Filename to lowercase, extension to uppercase."""
path, ext = os.path.splitext(path)
head, tail = os.path.split(path)
return head + tail.lower() + ext.upper()
It leaves possible directory names untouched, just the filename portion is lower-cased and extension upper-cased.

oldname='HeLlO.world.TxT'
if '.' in oldname:
(basename, ext) = oldname.rsplit('.', 1)
newname = basename.lower() + '.' + ext.upper()
else:
newname = oldname.lower()
print(f'{oldname} => {newname}')
...properly emits:
HeLlO.world.TxT => hello.world.TXT

name = "MyFile.txt"
new_name = name.rsplit(sep= ".", maxsplit=1)
print(new_name[0].lower()+"."+new_name[1].upper())

filename = "WINDOW.txt"
filename = filename.split('.')
filename = ".".join(filename[0:-1]).lower() + '.' + filename[-1].upper()
print(filename)
>> window.TXT
filename = "foo.bar.maz.txt"
filename = filename.split('.')
filename = ".".join(filename[0:-1]).lower() + '.' + filename[-1].upper()
print(filename)
>> foo.bar.maz.TXT

If I read the question correctly, it wants the lowercase name and upper case file extension, which is weird, but here is a simple solution.
filename = "WINDOW.txt"
ext_ind = filename.rindex('.')
filename = filename[0:ext_ind].lower() + '.' + filename[ext_ind+1:len(filename)].upper()
print(filename)
>> window.TXT

Check if a string contains any file extension whatsoever

I'm sure this is a simple thing to do but I don't know how. What I want to achieve is something like this:
templateFilename = str( templateFilename )
# If no file extension is found, assume it is a .npy file
if templateFilename.endswith( '.*' ):
templateFilename += ".npy"
However, this syntax doesn't seem to work. I want the * to represent any file extension so that, if the parsed file does contain a file extension, that one will be used but, if not, a standard extension will be added.
I have read about the glob module and people seem to be using that for finding things such as *.txt, etc. but I'm not sure how it works.

I would suggest os.path.splitext. The following uses .npy as the extension if none exists:
root, ext = os.path.splitext(path)
if not ext:
ext = '.npy'
path = root + ext

(Speaking from experience and hair-loss)
Doing a split on . and then selecting the second element [1] will only work if you can absolutely guarantee that there are no . in the filename; otherwise you'll need something like this:
file_extension = [".csv", ".xml", ".html"]
if '.' in templateFilename: #checks if you can actually split, if you can't perform a split; you would raise an index error.
if templateFilename.split(".")[-1] in file_extension: #[-1] = the last element in the list.
has_extension = true
has_verified_extension = true
else:
has_extension = true
has_verified_extension = false
else: #no '.'. in the filename, so no extension.
has_extension = false

Usage:
file_extension = [".pyo", ".npy", ".py"]
templateFilename = str( templateFilename )
# If no file extension is found, assume it is a .npy file
if not templateFilename.split(".")[1] in file_extension:
templateFilename += ".npy"

If you want in one line then here it is :
templatefilename = "abcd"
non_ext_file_list = [filename + ".npy" for filename in templateFilename.split(".") if not "." in templateFilename]
#output
[abcd.npy]

python os.rename(...) won't work !

I am writing a Python function to change the extension of a list of files into another extension, like txt into rar, that's just an idle example. But I'm getting an error. The code is:
import os
def dTask():
#Get a file name list
file_list = os.listdir('C:\Users\B\Desktop\sil\sil2')
#Change the extensions
for file_name in file_list:
entry_pos = 0;
#Filter the file name first for '.'
for position in range(0, len(file_name)):
if file_name[position] == '.':
break
new_file_name = file_name[0:position]
#Filtering done !
#Using the name filtered, add extension to that name
new_file_name = new_file_name + '.rar'
#rename the entry in the file list, using new file name
print 'Expected change from: ', file_list[entry_pos]
print 'into File name: ', new_file_name
os.rename(file_list[entry_pos], new_file_name)
++entry_pos
Error:
>>> dTask()
Expected change from: New Text Document (2).txt
into File name: New Text Document (2).rar
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
dTask()
File "C:\Users\B\Desktop\dTask.py", line 19, in dTask
os.rename(file_list[entry_pos], new_file_name)
WindowsError: [Error 2] The system cannot find the file specified
I can succeed in getting the file name with another extension in variable level as you can see in the print-out, but not in reality because I can not end this process in OS level. The error is coming from os.rename(...). Any idea how to fix this ?

As the others have already stated, you either need to provide the path to those files or switch the current working directory so the os can find the files.
++entry_pos doesn't do anything. There is no increment operator in Python. Prefix + is just there fore symmetry with prefix -. Prefixing something with two + is just two no-ops. So you're not actually doing anything (and after you change it to entry_pos += 1, you're still resetting it to zero in each iteration.
Also, your code is very inelegant - for example, you are using a separate index to file_list and fail to keep that in synch with the iteration variable file_name, even though you could just use that one! To show how this can be done better.
-
def rename_by_ext(to_ext, path):
if to_ext[0] != '.':
to_ext = '.'+to_ext
print "Renaming files in", path
for file_name in os.listdir(path):
root, ext = os.path.splitext(file_name)
print "Renaming", file_name, "to", root+ext
os.rename(os.path.join(path, file_name), os.path.join(path, root+to_ext))
rename_by_ext('.rar', '...')

os.rename really doesn't like variables. Use shutil. Example taken from How to copy and move files with Shutil.
import shutil
import os
source = os.listdir("/tmp/")
destination = "/tmp/newfolder/"
for files in source:
if files.endswith(".txt"):
shutil.move(files,destination)
In your case:
import shutil
shutil.move(file_list[entry_pos], new_file_name)

You also want to double backslashes to escape them in Python strings, so instead of
file_list = os.listdir('C:\Users\B\Desktop\sil\sil2')
you want
file_list = os.listdir('C:\\Users\\B\\Desktop\\sil\\sil2')
Or use forward slashes - Python magically treats them as path separators on Windows.

You must use the full path for the rename.
import os
def dTask():
#Get a file name list
dir = 'C:\Users\B\Desktop\sil\sil2'
file_list = os.listdir(dir)
#Change the extensions
for file_name in file_list:
entry_pos = 0;
#Filter the file name first for '.'
for position in range(0, len(file_name)):
if file_name[position] == '.':
break
new_file_name = file_name[0:position]
#Filtering done !
#Using the name filtered, add extension to that name
new_file_name = new_file_name + '.rar'
#rename the entry in the file list, using new file name
print 'Expected change from: ', file_list[entry_pos]
print 'into File name: ', new_file_name
os.rename( os.path.join(dir, file_list[entry_pos]), os.path.join(dir,new_file_name))
++entry_pos

If you aren't in the directory C:\Users\B\Desktop\sil\sil2, then Python certainly won't be able to find those files.

import os
def extChange(path,newExt,oldExt=""):
if path.endswith != "\\" and path.endswith != "/":
myPath = path + "\\"
directory = os.listdir(myPath)
for i in directory:
x = myPath + i[:-4] + "." + newExt
y = myPath + i
if oldExt == "":
os.rename(y,x)
else:
if i[-4:] == "." + oldExt:
os.rename(y,x)
now call it:
extChange("C:/testfolder/","txt","lua") #this will change all .txt files in C:/testfolder to .lua files
extChange("C:/testfolder/","txt") #leaving the last parameter out will change all files in C:/testfolder to .txt

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting extension from filename in Python - python

Is there a function to extract the extension from a filename?

New in version 3.4. import pathlib print(pathlib.Path('yourPath.example').suffix) # '.example' print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz'] I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!

import os.path extension = os.path.splitext(filename)[1]

import os.path extension = os.path.splitext(filename)[1][1:] To get only the text of the extension, without the dot.

worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list. os.path.splitext(filename)[1][1:].strip().lower()

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example: import os.path extension = os.path.splitext(filename)[1][1:].strip()

You can find some great stuff in pathlib module (available in python 3.x). import pathlib x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix print(x) # Output '.txt'

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..) >>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz') >>> fileExtension '.gz' but should be: .tar.gz The possible solutions are here

Just join all pathlib suffixes. >>> x = 'file/path/archive.tar.gz' >>> y = 'file/path/text.txt' >>> ''.join(pathlib.Path(x).suffixes) '.tar.gz' >>> ''.join(pathlib.Path(y).suffixes) '.txt'

You can use a split on a filename: f_extns = filename.split(".") print ("The extension of the file is : " + repr(f_extns[-1])) This does not require additional library

filename='ext.tar.gz' extension = filename[filename.rfind('.'):]

Even this question is already answered I'd add the solution in Regex. >>> import re >>> file_suffix = ".(\..)" >>> result = re.search(file_suffix, "somefile.ext") >>> result.group(1) '.ext'

you can use following code to split file name and extension. import os.path filenamewithext = os.path.basename(filepath) filename, ext = os.path.splitext(filenamewithext) #print file name print(filename) #print file extension print(ext)

A true one-liner, if you like regex. And it doesn't matter even if you have additional "." in the middle import re file_ext = re.search(r"\.([^.]+)$", filename).group(1) See here for the result: Click Here

Well , i know im late that's my simple solution file = '/foo/bar/whatever.ext' extension = file.split('.')[-1] print(extension) #output will be ext

You can use endswith to identify the file extension in python like bellow example for file in os.listdir(): if file.endswith('.csv'): df1 =pd.read_csv(file) frames.append(df1) result = pd.concat(frames)

The easiest way to get is to use mimtypes, below is the example: import mimetypes mt = mimetypes.guess_type("file name") file_extension = mt[0] print(file_extension)

def NewFileName(fichier): cpt = 0 fic , *ext = fichier.split('.') ext = '.'.join(ext) while os.path.isfile(fichier): cpt += 1 fichier = '{0}-({1}).{2}'.format(fic, cpt, ext) return fichier

Related

Python - How to create a csv, if csv already exists [duplicate]

Python: Change file names to the names of people in a list

Python: How to change a filename to lowercase but NOT the extension

Check if a string contains any file extension whatsoever

python os.rename(...) won't work !

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting extension from filename in Python - python

Is there a function to extract the extension from a filename?

New in version 3.4. import pathlib print(pathlib.Path('yourPath.example').suffix) # '.example' print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz'] I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!

import os.path extension = os.path.splitext(filename)[1]

import os.path extension = os.path.splitext(filename)[1][1:] To get only the text of the extension, without the dot.

worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list. os.path.splitext(filename)[1][1:].strip().lower()

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example: import os.path extension = os.path.splitext(filename)[1][1:].strip()

You can find some great stuff in pathlib module (available in python 3.x). import pathlib x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix print(x) # Output '.txt'

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..) >>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz') >>> fileExtension '.gz' but should be: .tar.gz The possible solutions are here

Just join all pathlib suffixes. >>> x = 'file/path/archive.tar.gz' >>> y = 'file/path/text.txt' >>> ''.join(pathlib.Path(x).suffixes) '.tar.gz' >>> ''.join(pathlib.Path(y).suffixes) '.txt'

You can use a split on a filename: f_extns = filename.split(".") print ("The extension of the file is : " + repr(f_extns[-1])) This does not require additional library

filename='ext.tar.gz' extension = filename[filename.rfind('.'):]

Even this question is already answered I'd add the solution in Regex. >>> import re >>> file_suffix = ".*(\..*)" >>> result = re.search(file_suffix, "somefile.ext") >>> result.group(1) '.ext'

you can use following code to split file name and extension. import os.path filenamewithext = os.path.basename(filepath) filename, ext = os.path.splitext(filenamewithext) #print file name print(filename) #print file extension print(ext)

A true one-liner, if you like regex. And it doesn't matter even if you have additional "." in the middle import re file_ext = re.search(r"\.([^.]+)$", filename).group(1) See here for the result: Click Here

Well , i know im late that's my simple solution file = '/foo/bar/whatever.ext' extension = file.split('.')[-1] print(extension) #output will be ext

You can use endswith to identify the file extension in python like bellow example for file in os.listdir(): if file.endswith('.csv'): df1 =pd.read_csv(file) frames.append(df1) result = pd.concat(frames)

The easiest way to get is to use mimtypes, below is the example: import mimetypes mt = mimetypes.guess_type("file name") file_extension = mt[0] print(file_extension)

def NewFileName(fichier): cpt = 0 fic , *ext = fichier.split('.') ext = '.'.join(ext) while os.path.isfile(fichier): cpt += 1 fichier = '{0}-({1}).{2}'.format(fic, cpt, ext) return fichier

Related

Python - How to create a csv, if csv already exists [duplicate]

Python: Change file names to the names of people in a list

Python: How to change a filename to lowercase but NOT the extension

Check if a string contains any file extension whatsoever

python os.rename(...) won't work !

Categories

Resources

Even this question is already answered I'd add the solution in Regex. >>> import re >>> file_suffix = ".(\..)" >>> result = re.search(file_suffix, "somefile.ext") >>> result.group(1) '.ext'