In Python how to whitelist certain characters in a filename?

In Python how to whitelist certain characters in a filename? - python

To secure uploaded image names, I'd like to strip out image's filenames from anything but string.ascii_letters , string.digits, dot and (one) whitespace.
So I'm wondering what is the best method to check a text against other characters?

import re
import os
s = 'asodgnasAIDID12313%*(#&(!$ 1231'
result = re.sub('[^a-zA-Z\d\. ]|( ){2,}','',s )
if result =='' or os.path.splitext(result)[0].isspace():
print "not a valid name"
else:
print "valid name"
EDIT:
changed it so it will also whitelist only one whitespace + added import re

Not sure if it's what you need but give it a try:
import sys, os
fileName, fileExtension = os.path.splitext('image 11%%22.jpg')
fileExtension = fileExtension.encode('ascii', 'ignore')
fileName = fileName.encode('ascii', 'ignore')
if fileExtension[1:] in ['jpg', 'jpeg', 'png', 'gif', 'bmp', 'tiff', 'tga']:
fileName = ''.join(e for e in fileName if e.isalnum())
print fileName+fileExtension
#image1122.jpg
else:
print "Extension not supported"
isalnum()
https://docs.python.org/2/library/stdtypes.html#str.isalnum

I wouldn't use regex for this. The only tricky requirement is the single space, but that can be done, too.
import string
whitelist = set(string.ascii_letters + string.digits)
good_filename = "herearesomelettersand123numbers andonespace"
bad_filename = "symbols&#! and more than one space"
def strip_filename(fname, whitelist):
"""Strips a filename
Removes any character from string `fname` and removes all but one
whitespace.
"""
whitelist.add(" ")
stripped = ''.join([ch for ch in fname if ch in whitelist])
split = stripped.split()
result = " ".join([split[0], ''.join(split[1:])])
return result
Then call it with:
good_sanitized = strip_filename(good_filename, whitelist)
bad_sanitized = strip_filename(bad_filename, whitelist)
print(good_sanitized)
# 'herearesomelettersand123numbers andonespace'
print(bad_sanitized)
# 'symbols andmorethanonespace'

Related

Replacing a set of characters in a string

I have this code below trying to remove the leading characters of a string by using an indicator to where to stop the trim.
I just want to know if there are some better ways to do this.
#Get user's input: file_name
file_name = str.casefold(input("Filename: ")).strip()
#Get the index of "." from the right of the string
i = file_name.rfind(".")
# getting the file extension from the index i
ext = file_name[i+1:]
# Concatinating "image/" with the extractted file extension
new_fname = "image/" + ext
print(new_fname)

Looking at your code you can shorten it to:
file_name = input("Filename: ")
new_fname = f"image/{file_name.rsplit('.', maxsplit=1)[-1]}"
print(new_fname)

Removing white space after looping a list

So, I've tried stripping from several of the variables, and I know their is no white space previously to the return statement, so I tried striping the variable in the return statement but the white space is still there...
Something easy I'm sure or maybe it would be best to re-write the loop?
def main():
file = input("File name:")
extension(file)
def extension(s):
split = (s.split("."))
join_s = (''.join(split[1]))
image_e = ['jpg', 'gif', 'jpeg', 'png']
for i in image_e:
print(image_e)
if join_s in image_e:
return print("Image/", join_s)
else:
return print("Application/", join_s)
main()
Output looks something like this:
Image/ jpg
Edit: One of the comments had asked why I used return and it was because if I just used print it would display the print 3-4 different times, is there a reason why I shouldn't use return in this situation or why it exactly does display it 4 times in row? (Assuming because of the loop.)

It looks like you want to generate a content type string. This will do it:
import os
def extension(s):
ext = s.rsplit('.')[1] # split on the *last* period
if ext in ('jpg', 'gif', 'jpeg', 'png'):
return f'Image/{ext}'
else:
return f'Application/{ext}'
file = input('File name: ')
content_type = extension(file)
print(content_type)
Output:
File name: test.jpg
Image/jpg

Looks like you want to determine the mimetype from a given filename.
import mimetypes
filename = "somefilename.png"
guessed_type, encoding = mimetypes.guess_type(filename)
guessed_type:
image/png

Python has many features/functions available to you via the standard libraries.
Here are some other methods:
os
import os

filename = "somefilename.png"
base, ext = os.path.splitext(filename)
('somefilename', '.png')
pathlib
from pathlib import Path
filename = "somefilename.png"
f = Path(filename)
f.suffix
'.png'
For strings python has .startswith() and .endswith() methods, which can optionally take an iterable, so you can write this without splitting the string:
filename = "somefilename.png"
image_exts = ('jpg', 'gif', 'jpeg', 'png')
if filename.endswith(image_exts):
ext = filename.split(".")[-1]
print(f"Image/{ext}")

How to have all possibilities of path with specific beginning?

I have to check a variable file at a location :
variable = "40014ee0aee34570"
os.path.realpath(/dev/disk/by-id/wwn-0x{0}'.format(variable))
I need to check if there is also 40014ee0aee34570-part1, 40014ee0aee34570-part2 and etc.
For now I can do it like this
os.path.realpath('/dev/disk/by-id/wwn-0x{0}{1}'.format(variable, '-part1')
But how I can do check for every possibility number after part in this line programatically ?
Thank you

I would suggest using glob module.
import glob
variable = "40014ee0aee34570"
# file wildcard, use * at the end to get all suffixes
file_wildcard = "/dev/disk/by-id/wwn-0x{0}*".format(variable)
possible_file_paths = glob.glob(file_wildcard)
for file_path in possible_file_paths:
os.path.realpath(file_path)

You can simply concatenate srtings with +
Example :
string1 = "I am "
string2 = "foo"
string3 = string1 + string2
print(string3)
OUT[1]:
>> I am foo
Thus, you can use it to generate your path to your file programatically with all the components (root path , variable name, suffixes, numbers ,etc..) :
import os
path = r"/dev/disk/by-id/"
file_prefix = "wwn-0x"
variable = "40014ee0aee34570"
file_suffix = "-part"
for number in range(0,10):
file = os.path.join(path , file_prefix + variable + file_suffix + str(number))
if os.path.exists(file):
print(f"{file} exists")
else :
print(f"no file with that part number {number}")

Python: How to change a filename to lowercase but NOT the extension

I'm trying to change filenames like WINDOW.txt to lowercase but then I also need to change the extension .txt to uppercase. I am thinking I can just change the entire thing to lowercase as the extension is already lowercase and then using something like .endswith() to change the extension to uppercase but I can't seem to figure it out. I know this may seem simple to most so thank you for your patience.

This one handles filenames, paths across different operating systems:
import os.path
def lower_base_upper_ext(path):
"""Filename to lowercase, extension to uppercase."""
path, ext = os.path.splitext(path)
head, tail = os.path.split(path)
return head + tail.lower() + ext.upper()
It leaves possible directory names untouched, just the filename portion is lower-cased and extension upper-cased.

oldname='HeLlO.world.TxT'
if '.' in oldname:
(basename, ext) = oldname.rsplit('.', 1)
newname = basename.lower() + '.' + ext.upper()
else:
newname = oldname.lower()
print(f'{oldname} => {newname}')
...properly emits:
HeLlO.world.TxT => hello.world.TXT

name = "MyFile.txt"
new_name = name.rsplit(sep= ".", maxsplit=1)
print(new_name[0].lower()+"."+new_name[1].upper())

filename = "WINDOW.txt"
filename = filename.split('.')
filename = ".".join(filename[0:-1]).lower() + '.' + filename[-1].upper()
print(filename)
>> window.TXT
filename = "foo.bar.maz.txt"
filename = filename.split('.')
filename = ".".join(filename[0:-1]).lower() + '.' + filename[-1].upper()
print(filename)
>> foo.bar.maz.TXT

If I read the question correctly, it wants the lowercase name and upper case file extension, which is weird, but here is a simple solution.
filename = "WINDOW.txt"
ext_ind = filename.rindex('.')
filename = filename[0:ext_ind].lower() + '.' + filename[ext_ind+1:len(filename)].upper()
print(filename)
>> window.TXT

Python: simple batch rename files in windows folder

Trying to create a simple code, to batch rename a folder in windows.
Musts:
change every number , like "file02.txt", turn 02 into 0002
maybe work for every file format, like jpg, png, txt, docx and so on (becuse I'm not sure what will be in the folder, this code might be used for image sequences...)
Is this possible?
I did test versions, combination of the little knowledge I have, but it gets me confused.
my code so far:
import os
import sys
folder_path = os.listdir(raw_input("Insert folder path: "))
print "Files in folder: %s" % folder_path
# a split tool
def mysplit(s):
head = s.rstrip('0123456789')
tail = s[len(head):]
return head, tail
# a function to make a new name with needed 0000
def new_filename(filename):
file_name_part, ext = os.path.splitext(filename) # file01 and .ext
original_name, number = mysplit(file_name_part) # 01 and file
add_zero = number.rjust(4, "0") # add 0001
new_name = original_name + add_zero + ext
print new_name
# new_name comes like this ['file0001.txt'] but seperate, not in a list? Why?
for current_file_n in folder_path:
new = new_filename(current_file_n)
print list([new]) # trying to make the str into a list....
re_name = os.renames(current_file_n, new)
print re_name
print "Renamed files: %s" % folder_path
The desired outcome is the same as the beginning list, but collated with zeros,like this: ['file0001.txt', 'file0002.txt', 'file0003.txt'......'file0015.txt']
I've got errors like windows error: can't find file, and another error; can't connect str and list?
I need an explanation of what I'm doing wrong as simple as possible, or is there another method that I can use that will give me the desired outcome?

As martineau said your indentation is messed up.
Here's the working code:
import os
import sys
# a split tool
def mysplit(s):
head = s.rstrip('0123456789')
tail = s[len(head):]
return head, tail
# a function to make a new name with needed 0000
def new_filename(filename):
file_name_part, ext = os.path.splitext(filename) # file01 and .ext
original_name, number = mysplit(file_name_part) # 01 and file
add_zero = number.rjust(4, "0") # add 0001
new_name = original_name + add_zero + ext
return new_name
# new_name comes like this ['file0001.txt'] but seperate, not in a list? Why?
if __name__ == '__main__':
folder_path = os.listdir(raw_input("Insert folder path: "))
print "Files in folder: %s" % folder_path
renamed_files = []
for current_file_n in folder_path:
new = new_filename(current_file_n)
renamed_files.append(new) # Add renamed file's name to a list
try:
os.renames(current_file_n, new) #It doesn't return anything
print new
except:
print "Unexpected error while renaming %s:%s"%(new, sys.exc_info()[0])
print "Renamed files: %s" % renamed_files
Hope this helps

Your code can be simplified a lot by using regular expression substitution. re.sub() can take a replacement function. In this case adding leading zeroes to the first number found in the filename.
import os, re
def renumber_files(directory, zeroes=4):
os.chdir(directory)
for filename in os.listdir(directory):
new_name = re.sub(r'\d+', lambda m: m.group().zfill(zeroes), filename, count=1)
os.rename(filename, new_name)
renumber_files(raw_input("Insert folder path: "))
This works because re.sub() can take a callable as the replacement argument.
Signature: re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping
occurrences of the pattern in string by the replacement repl. repl
can be either a string or a callable; if a string, backslash escapes
in it are processed. If it is a callable, it's passed the match
object and must return a replacement string to be used.
In the lambda m.group() returns a string matching the pattern \d+. For instance "1", "564645" or "005".
The next step, str.zfill(4), turns those into "0001", "564645", or "0005".

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

In Python how to whitelist certain characters in a filename? - python

To secure uploaded image names, I'd like to strip out image's filenames from anything but string.ascii_letters , string.digits, dot and (one) whitespace. So I'm wondering what is the best method to check a text against other characters?

import re import os s = 'asodgnasAIDID12313%*(#&(!$ 1231' result = re.sub('[^a-zA-Z\d\. ]|( ){2,}','',s ) if result =='' or os.path.splitext(result)[0].isspace(): print "not a valid name" else: print "valid name" EDIT: changed it so it will also whitelist only one whitespace + added import re

Related

Replacing a set of characters in a string

Removing white space after looping a list

How to have all possibilities of path with specific beginning?

Python: How to change a filename to lowercase but NOT the extension

Python: simple batch rename files in windows folder

Categories

Resources