How to get numbers from filenames?

How to get numbers from filenames? - python

I have many files in directory according to the key:
pic001.jpg
pic002.jpg
pic012.jpg
[...]
ico001.jpg
ico002.jpg
ico012.jpg
[...]
and I want to list this files and create structure like this:
for r,d,f in os.walk(directory):
for file in f:
if file.startswith("pic"):
pic = file
ico = ???
images_list.append({
'big': directory + '/' + pic,
'thumb': directory + '/' + ico,
})
How to get "pic" file and "ico" assigned to him (only if ico exist)?

the simplest answer seems to be:
ico = 'ico' + file[3:]

You can do it using a regular expression.
import re
icon = 'ico%s.jpg' % re.findall(r'^pic(\d+).jpg$', file)[0]
It's definitely going to be more intuitive and easier to maintain than using slices.

Why not use regexp?
import re
...
m = re.search('\d+', f.name)
print 'ico' + str(m.group(0)) + 'jpg'

Related

Python: How to change a filename to lowercase but NOT the extension

I'm trying to change filenames like WINDOW.txt to lowercase but then I also need to change the extension .txt to uppercase. I am thinking I can just change the entire thing to lowercase as the extension is already lowercase and then using something like .endswith() to change the extension to uppercase but I can't seem to figure it out. I know this may seem simple to most so thank you for your patience.

This one handles filenames, paths across different operating systems:
import os.path
def lower_base_upper_ext(path):
"""Filename to lowercase, extension to uppercase."""
path, ext = os.path.splitext(path)
head, tail = os.path.split(path)
return head + tail.lower() + ext.upper()
It leaves possible directory names untouched, just the filename portion is lower-cased and extension upper-cased.

oldname='HeLlO.world.TxT'
if '.' in oldname:
(basename, ext) = oldname.rsplit('.', 1)
newname = basename.lower() + '.' + ext.upper()
else:
newname = oldname.lower()
print(f'{oldname} => {newname}')
...properly emits:
HeLlO.world.TxT => hello.world.TXT

name = "MyFile.txt"
new_name = name.rsplit(sep= ".", maxsplit=1)
print(new_name[0].lower()+"."+new_name[1].upper())

filename = "WINDOW.txt"
filename = filename.split('.')
filename = ".".join(filename[0:-1]).lower() + '.' + filename[-1].upper()
print(filename)
>> window.TXT
filename = "foo.bar.maz.txt"
filename = filename.split('.')
filename = ".".join(filename[0:-1]).lower() + '.' + filename[-1].upper()
print(filename)
>> foo.bar.maz.TXT

If I read the question correctly, it wants the lowercase name and upper case file extension, which is weird, but here is a simple solution.
filename = "WINDOW.txt"
ext_ind = filename.rindex('.')
filename = filename[0:ext_ind].lower() + '.' + filename[ext_ind+1:len(filename)].upper()
print(filename)
>> window.TXT

python file name change

I am trying to change file names like below:
the 000000 are the same number.
000000_ABC.png --->000000+1_ABC.png
000000_DEF.png --->000000+2_DEF.png
000000_GHI.png --->000000+3_GHI.png
000000_JKL.png --->000000+4_JKL.png
In order to do so, I wrote code like below.
img_files = os.listdir(PATH_TO_PNG_FILES)
for img_file, i in zip(img_files, range(len(img_files))):
new_img_file = img_file.replace("_", "+"+str(i)+"_")
os.rename(path + img_file, path + new_img_file)
There are more than just four files and more of similar lines.
The problem is that immediately after running pycharm, it successfully produces the desired results, but after I run another page related to the result directories, the results continue to be changed like below even after the process finished. I do not understand why.
000000+1+1_ABC.png
000000+2+2_DEF.png
000000+3+3_GHI.png
000000+4+4_JKL.png
or
otherwise "+unexpected number"

This is because the other directory may already contain file in the format of "000000+1_ABC.png" and your script is changing _ to "+1_" resulting in "000000+1+1_ABC.png". To solve this you can add a if statement to check it should not contain "+" symbol.
img_files = os.listdir(path inside of which the png files are saved)
for img_file, i in zip(img_files, range(len(img_files))):
if not ("+" in img_file):
new_img_file = img_file.replace("_", "+"+str(i)+"_")
os.rename(path + img_file, path + new_img_file)

A simple and naive way would be to add a verification to check whether there is a '+' in the filename. If you have other files which may contain a +, you may have to check for a stricter pattern.

I made a YouTube video https://youtu.be/K9jhAPZLZLc on how to rename multiple files like the one you have assuming all your files are in the same directory.
To answer your question. assuming all image files are in the same folder.
path = 'C:\\Users\\USER\\Desktop\\rename_images\\images\\' # path to your images
files = os.listdir(path)
for count, filename in enumerate(files):
# Get the file extension
file, file_extension = os.path.splitext(filename)
# check if the current file is a folder or not
full_path = f'{path}{filename}'
if os.path.isdir(full_path):
print('This is a directory')
elif os.path.isfile(full_path):
print('This is a normal file')
# Rename
if not '+' in file:
try:
file_split = file.split('_')
zeros = file_split[0]
alpha = file_split[-1]
current_file_name = os.path.join(path, filename)
new_file_name = os.path.join(path, ''.join([f'{zeros}+{count}_{alpha}', file_extension]))
os.rename(current_file_name, new_file_name)
except:
pass
else:
pass
else:
print('This is a special file')

I would imagine that the problem comes from modifying the name insted of overwriting.
import os
DIR_PATH = 'files'
def rename_files(dir_name):
img_files = os.listdir(dir_name)
for i in range(len(img_files)):
file_name = img_files[i].split('_')[-1]
file_name = '000000+{0}_{1}'.format(i, file_name)
os.rename(
os.path.join(dir_name, img_files[i]),
os.path.join(dir_name, file_name)
)
if __name__ == '__main__':
rename_files(DIR_PATH)

Check if a string contains any file extension whatsoever

I'm sure this is a simple thing to do but I don't know how. What I want to achieve is something like this:
templateFilename = str( templateFilename )
# If no file extension is found, assume it is a .npy file
if templateFilename.endswith( '.*' ):
templateFilename += ".npy"
However, this syntax doesn't seem to work. I want the * to represent any file extension so that, if the parsed file does contain a file extension, that one will be used but, if not, a standard extension will be added.
I have read about the glob module and people seem to be using that for finding things such as *.txt, etc. but I'm not sure how it works.

I would suggest os.path.splitext. The following uses .npy as the extension if none exists:
root, ext = os.path.splitext(path)
if not ext:
ext = '.npy'
path = root + ext

(Speaking from experience and hair-loss)
Doing a split on . and then selecting the second element [1] will only work if you can absolutely guarantee that there are no . in the filename; otherwise you'll need something like this:
file_extension = [".csv", ".xml", ".html"]
if '.' in templateFilename: #checks if you can actually split, if you can't perform a split; you would raise an index error.
if templateFilename.split(".")[-1] in file_extension: #[-1] = the last element in the list.
has_extension = true
has_verified_extension = true
else:
has_extension = true
has_verified_extension = false
else: #no '.'. in the filename, so no extension.
has_extension = false

Usage:
file_extension = [".pyo", ".npy", ".py"]
templateFilename = str( templateFilename )
# If no file extension is found, assume it is a .npy file
if not templateFilename.split(".")[1] in file_extension:
templateFilename += ".npy"

If you want in one line then here it is :
templatefilename = "abcd"
non_ext_file_list = [filename + ".npy" for filename in templateFilename.split(".") if not "." in templateFilename]
#output
[abcd.npy]

python os.rename(...) won't work !

I am writing a Python function to change the extension of a list of files into another extension, like txt into rar, that's just an idle example. But I'm getting an error. The code is:
import os
def dTask():
#Get a file name list
file_list = os.listdir('C:\Users\B\Desktop\sil\sil2')
#Change the extensions
for file_name in file_list:
entry_pos = 0;
#Filter the file name first for '.'
for position in range(0, len(file_name)):
if file_name[position] == '.':
break
new_file_name = file_name[0:position]
#Filtering done !
#Using the name filtered, add extension to that name
new_file_name = new_file_name + '.rar'
#rename the entry in the file list, using new file name
print 'Expected change from: ', file_list[entry_pos]
print 'into File name: ', new_file_name
os.rename(file_list[entry_pos], new_file_name)
++entry_pos
Error:
>>> dTask()
Expected change from: New Text Document (2).txt
into File name: New Text Document (2).rar
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
dTask()
File "C:\Users\B\Desktop\dTask.py", line 19, in dTask
os.rename(file_list[entry_pos], new_file_name)
WindowsError: [Error 2] The system cannot find the file specified
I can succeed in getting the file name with another extension in variable level as you can see in the print-out, but not in reality because I can not end this process in OS level. The error is coming from os.rename(...). Any idea how to fix this ?

As the others have already stated, you either need to provide the path to those files or switch the current working directory so the os can find the files.
++entry_pos doesn't do anything. There is no increment operator in Python. Prefix + is just there fore symmetry with prefix -. Prefixing something with two + is just two no-ops. So you're not actually doing anything (and after you change it to entry_pos += 1, you're still resetting it to zero in each iteration.
Also, your code is very inelegant - for example, you are using a separate index to file_list and fail to keep that in synch with the iteration variable file_name, even though you could just use that one! To show how this can be done better.
-
def rename_by_ext(to_ext, path):
if to_ext[0] != '.':
to_ext = '.'+to_ext
print "Renaming files in", path
for file_name in os.listdir(path):
root, ext = os.path.splitext(file_name)
print "Renaming", file_name, "to", root+ext
os.rename(os.path.join(path, file_name), os.path.join(path, root+to_ext))
rename_by_ext('.rar', '...')

os.rename really doesn't like variables. Use shutil. Example taken from How to copy and move files with Shutil.
import shutil
import os
source = os.listdir("/tmp/")
destination = "/tmp/newfolder/"
for files in source:
if files.endswith(".txt"):
shutil.move(files,destination)
In your case:
import shutil
shutil.move(file_list[entry_pos], new_file_name)

You also want to double backslashes to escape them in Python strings, so instead of
file_list = os.listdir('C:\Users\B\Desktop\sil\sil2')
you want
file_list = os.listdir('C:\\Users\\B\\Desktop\\sil\\sil2')
Or use forward slashes - Python magically treats them as path separators on Windows.

You must use the full path for the rename.
import os
def dTask():
#Get a file name list
dir = 'C:\Users\B\Desktop\sil\sil2'
file_list = os.listdir(dir)
#Change the extensions
for file_name in file_list:
entry_pos = 0;
#Filter the file name first for '.'
for position in range(0, len(file_name)):
if file_name[position] == '.':
break
new_file_name = file_name[0:position]
#Filtering done !
#Using the name filtered, add extension to that name
new_file_name = new_file_name + '.rar'
#rename the entry in the file list, using new file name
print 'Expected change from: ', file_list[entry_pos]
print 'into File name: ', new_file_name
os.rename( os.path.join(dir, file_list[entry_pos]), os.path.join(dir,new_file_name))
++entry_pos

If you aren't in the directory C:\Users\B\Desktop\sil\sil2, then Python certainly won't be able to find those files.

import os
def extChange(path,newExt,oldExt=""):
if path.endswith != "\\" and path.endswith != "/":
myPath = path + "\\"
directory = os.listdir(myPath)
for i in directory:
x = myPath + i[:-4] + "." + newExt
y = myPath + i
if oldExt == "":
os.rename(y,x)
else:
if i[-4:] == "." + oldExt:
os.rename(y,x)
now call it:
extChange("C:/testfolder/","txt","lua") #this will change all .txt files in C:/testfolder to .lua files
extChange("C:/testfolder/","txt") #leaving the last parameter out will change all files in C:/testfolder to .txt

Extracting extension from filename in Python

Is there a function to extract the extension from a filename?

Use os.path.splitext:
>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'
Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:
>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')

New in version 3.4.
import pathlib
print(pathlib.Path('yourPath.example').suffix) # '.example'
print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz']
I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!

import os.path
extension = os.path.splitext(filename)[1]

import os.path
extension = os.path.splitext(filename)[1][1:]
To get only the text of the extension, without the dot.

For simple use cases one option may be splitting from dot:
>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'
No error when file doesn't have an extension:
>>> "filename".split(".")[-1]
'filename'
But you must be careful:
>>> "png".split(".")[-1]
'png' # But file doesn't have an extension
Also will not work with hidden files in Unix systems:
>>> ".bashrc".split(".")[-1]
'bashrc' # But this is not an extension
For general use, prefer os.path.splitext

worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list.
os.path.splitext(filename)[1][1:].strip().lower()

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:
import os.path
extension = os.path.splitext(filename)[1][1:].strip()

You can find some great stuff in pathlib module (available in python 3.x).
import pathlib
x = pathlib.PurePosixPath("C:\\Path\\To\\File\\myfile.txt").suffix
print(x)
# Output
'.txt'

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)
>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension
'.gz'
but should be: .tar.gz
The possible solutions are here

Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:
to get extension of a given file absolute path, you can simply type:
filepath.rpartition('.')[-1]
example:
path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]
will give you: 'csv'

Just join all pathlib suffixes.
>>> x = 'file/path/archive.tar.gz'
>>> y = 'file/path/text.txt'
>>> ''.join(pathlib.Path(x).suffixes)
'.tar.gz'
>>> ''.join(pathlib.Path(y).suffixes)
'.txt'

Surprised this wasn't mentioned yet:
import os
fn = '/some/path/a.tar.gz'
basename = os.path.basename(fn) # os independent
Out[] a.tar.gz
base = basename.split('.')[0]
Out[] a
ext = '.'.join(basename.split('.')[1:]) # <-- main part
# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz
Benefits:
Works as expected for anything I can think of
No modules
No regex
Cross-platform
Easily extendible (e.g. no leading dots for extension, only last part of extension)
As function:
def get_extension(filename):
basename = os.path.basename(filename) # os independent
ext = '.'.join(basename.split('.')[1:])
return '.' + ext if ext else None

You can use a split on a filename:
f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))
This does not require additional library

filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]

Extracting extension from filename in Python
Python os module splitext()
splitext() function splits the file path into a tuple having two values – root and extension.
import os
# unpacking the tuple
file_name, file_extension = os.path.splitext("/Users/Username/abc.txt")
print(file_name)
print(file_extension)
Get File Extension using Pathlib Module
Pathlib module to get the file extension
import pathlib
pathlib.Path("/Users/pankaj/abc.txt").suffix
#output:'.txt'

Even this question is already answered I'd add the solution in Regex.
>>> import re
>>> file_suffix = ".*(\..*)"
>>> result = re.search(file_suffix, "somefile.ext")
>>> result.group(1)
'.ext'

This is a direct string representation techniques :
I see a lot of solutions mentioned, but I think most are looking at split.
Split however does it at every occurrence of "." .
What you would rather be looking for is partition.
string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]

Another solution with right split:
# to get extension only
s = 'test.ext'
if '.' in s: ext = s.rsplit('.', 1)[1]
# or, to get file name and extension
def split_filepath(s):
"""
get filename and extension from filepath
filepath -> (filename, extension)
"""
if not '.' in s: return (s, '')
r = s.rsplit('.', 1)
return (r[0], r[1])

you can use following code to split file name and extension.
import os.path
filenamewithext = os.path.basename(filepath)
filename, ext = os.path.splitext(filenamewithext)
#print file name
print(filename)
#print file extension
print(ext)

A true one-liner, if you like regex.
And it doesn't matter even if you have additional "." in the middle
import re
file_ext = re.search(r"\.([^.]+)$", filename).group(1)
See here for the result: Click Here

Well , i know im late
that's my simple solution
file = '/foo/bar/whatever.ext'
extension = file.split('.')[-1]
print(extension)
#output will be ext

try this:
files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']
for file in files: #1
if (file.split(".")[-2] in pen_ext): #2
ext = file.split(".")[-2]+"."+file.split(".")[-1]#3
else:
ext = file.split(".")[-1] #4
print (ext) #5
get all file name inside the list
splitting file name and check the penultimate extension, is it in the pen_ext list or not?
if yes then join it with the last extension and set it as the file's extension
if not then just put the last extension as the file's extension
and then check it out

You can use endswith to identify the file extension in python
like bellow example
for file in os.listdir():
if file.endswith('.csv'):
df1 =pd.read_csv(file)
frames.append(df1)
result = pd.concat(frames)

For funsies... just collect the extensions in a dict, and track all of them in a folder. Then just pull the extensions you want.
import os
search = {}
for f in os.listdir(os.getcwd()):
fn, fe = os.path.splitext(f)
try:
search[fe].append(f)
except:
search[fe]=[f,]
extensions = ('.png','.jpg')
for ex in extensions:
found = search.get(ex,'')
if found:
print(found)

This method will require a dictonary, list, or set. you can just use ".endswith" using built in string methods. This will search for name in list at end of file and can be done with just str.endswith(fileName[index]). This is more for getting and comparing extensions.
https://docs.python.org/3/library/stdtypes.html#string-methods
Example 1:
dictonary = {0:".tar.gz", 1:".txt", 2:".exe", 3:".js", 4:".java", 5:".python", 6:".ruby",7:".c", 8:".bash", 9:".ps1", 10:".html", 11:".html5", 12:".css", 13:".json", 14:".abc"}
for x in dictonary.values():
str = "file" + x
str.endswith(x, str.index("."), len(str))
Example 2:
set1 = {".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"}
for x in set1:
str = "file" + x
str.endswith(x, str.index("."), len(str))
Example 3:
fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
for x in range(0, len(fileName)):
str = "file" + fileName[x]
str.endswith(fileName[x], str.index("."), len(str))
Example 4
fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
str = "file.txt"
str.endswith(fileName[1], str.index("."), len(str))
Examples 5, 6, 7 with output
Example 8
fileName = [".tar.gz", ".txt", ".exe", ".js", ".java", ".python", ".ruby", ".c", ".bash", ".ps1", ".html", ".html5", ".css", ".json", ".abc"];
exts = []
str = "file.txt"
for x in range(0, len(x)):
if str.endswith(fileName[1]) == 1:
exts += [x]

The easiest way to get is to use mimtypes, below is the example:
import mimetypes
mt = mimetypes.guess_type("file name")
file_extension = mt[0]
print(file_extension)

Here if you want to extract the last file extension if it has multiple
class functions:
def listdir(self, filepath):
return os.listdir(filepath)
func = functions()
os.chdir("C:\\Users\Asus-pc\Downloads") #absolute path, change this to your directory
current_dir = os.getcwd()
for i in range(len(func.listdir(current_dir))): #i is set to numbers of files and directories on path directory
if os.path.isfile((func.listdir(current_dir))[i]): #check if it is a file
fileName = func.listdir(current_dir)[i] #put the current filename into a variable
rev_fileName = fileName[::-1] #reverse the filename
currentFileExtension = rev_fileName[:rev_fileName.index('.')][::-1] #extract from beginning until before .
print(currentFileExtension) #output can be mp3,pdf,ini,exe, depends on the file on your absolute directory
Output is mp3, even works if has only 1 extension name

I'm definitely late to the party, but in case anyone wanted to achieve this without the use of another library:
file_path = "example_tar.tar.gz"
file_name, file_ext = [file_path if "." not in file_path else file_path.split(".")[0], "" if "." not in file_path else file_path[file_path.find(".") + 1:]]
print(file_name, file_ext)
The 2nd line is basically just the following code but crammed into one line:
def name_and_ext(file_path):
if "." not in file_path:
file_name = file_path
else:
file_name = file_path.split(".")[0]
if "." not in file_path:
file_ext = ""
else:
file_ext = file_path[file_path.find(".") + 1:]
return [file_name, file_ext]
Even though this works, it might not work will all types of files, specifically .zshrc, I would recomment using os's os.path.splitext function, example below:
import os
file_path = "example.tar.gz"
file_name, file_ext = os.path.splitext(file_path)
print(file_name, file_ext)
Cheers :)

# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs
import os.path
class LinkChecker:
#staticmethod
def get_link_extension(link: str)->str:
if link is None or link == "":
return ""
else:
paths = os.path.splitext(link)
ext = paths[1]
new_link = paths[0]
if ext != "":
return LinkChecker.get_link_extension(new_link) + ext
else:
return ""

def NewFileName(fichier):
cpt = 0
fic , *ext = fichier.split('.')
ext = '.'.join(ext)
while os.path.isfile(fichier):
cpt += 1
fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
return fichier

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get numbers from filenames? - python

the simplest answer seems to be: ico = 'ico' + file[3:]

You can do it using a regular expression. import re icon = 'ico%s.jpg' % re.findall(r'^pic(\d+).jpg$', file)[0] It's definitely going to be more intuitive and easier to maintain than using slices.

Why not use regexp? import re ... m = re.search('\d+', f.name) print 'ico' + str(m.group(0)) + 'jpg'

Related

Python: How to change a filename to lowercase but NOT the extension

python file name change

Check if a string contains any file extension whatsoever

python os.rename(...) won't work !

Extracting extension from filename in Python

Categories

Resources