Python: getting filename case as stored in Windows? - python

Though Windows is case insensitive, it does preserve case in filenames. In Python, is there any way to get a filename with case as it is stored on the file system?
E.g., in a Python program I have filename = "texas.txt", but want to know that it's actually stored "TEXAS.txt" on the file system, even if this is inconsequential for various file operations.

Here's the simplest way to do it:
>>> import win32api
>>> win32api.GetLongPathName(win32api.GetShortPathName('texas.txt')))
'TEXAS.txt'

I had problems with special characters with the win32api solution above. For unicode filenames you need to use:
win32api.GetLongPathNameW(win32api.GetShortPathName(path))

This one is standard library only and converts all path parts (except drive letter):
def casedpath(path):
r = glob.glob(re.sub(r'([^:/\\])(?=[/\\]|$)|\[', r'[\g<0>]', path))
return r and r[0] or path
And this one handles UNC paths in addition:
def casedpath_unc(path):
unc, p = os.path.splitunc(path)
r = glob.glob(unc + re.sub(r'([^:/\\])(?=[/\\]|$)|\[', r'[\g<0>]', p))
return r and r[0] or path
Note: It is somewhat slower than the file system dependent Win API "GetShortPathName" method, but works platform & file system independent and also when short filename generation is switched off on Windows volumes (fsutil.exe 8dot3name query C:). The latter is recommended at least for performance critical file systems when no 16bit apps rely anymore on that:
fsutil.exe behavior set disable8dot3 1

>>> import os
>>> os.listdir("./")
['FiLeNaMe.txt']
Does this answer your question?

and if you want to recurse directories
import os
path=os.path.join("c:\\","path")
for r,d,f in os.walk(path):
for file in f:
if file.lower() == "texas.txt":
print "Found: ",os.path.join( r , file )

You could use:
import os
a = os.listdir('mydirpath')
b = [f.lower() for f in a]
try:
i = b.index('texas.txt')
print a[i]
except ValueError:
print('File not found in this directory')
This of course assumes that your search string 'texas.txt' is in lowercase. If it isn't you'll have to convert it to lowercase first.

Related

Using input to change directory path

I'm kinda new to python and I feel like the answer to this is so simple but I have no idea what the answer is. I'm trying to move files from one place to another but I don't want to have to change my code every time I wanna move that file so I just want to get user input from the terminal.
import shutil
loop = True
while loop:
a = input()
shutil.move("/home/Path/a", "/home/Path/Pictures")
What do I have to put around the a so that it doesn't read it as part of the string?
This should do what you want. the os.path.join() will combine the string value in a, that you get from input with the first part of the path you have provided. You should use os.path.join() as this will form paths in a way that is system independent.
import shutil
import os
loop = True
while loop:
a = input()
shutil.move(os.path.join("/home/Path/", a), "/home/Path/Pictures")
Output:
>>> a = input()
test.txt
>>> path = os.path.join("/home/Path/", a)
>>> path
'/home/Path/test.txt'
You can also use "/home/Path/{0}".format(a) which will swap the value of a with {0}, or you can do do "/home/Path/{0}" + str(a) which will also do what you want.
Edited to account for Question in comment:
This will work if your directory doesn't have any sub-directories. it may still work if there are directories and files in there but I didn't test that.
import shutil
import os
files = os.listdir("/home/Path/")
for file in files:
shutil.move(os.path.join("/home/Path/", file), "/home/Path/Pictures")
one solution
a = 'test.csv'
path = '/home/Path/{}'.format(a)
>>> path
/home/Path/test.csv

Check tar archive before extractall

In the python documentation, it is adviced not to extract a tar archive without prior inspection. What is the best way to make sure an archive is safe using the tarfile python module? Should I just iterate over all the filename and check wether they contain absolute pathnames?
Would something like the following be sufficient?
import sys
import tarfile
with tarfile.open('sample.tar', 'r') as tarf:
for n in tarf.names():
if n[0] == '/' or n[0:2] == '..':
print 'sample.tar contains unsafe filenames'
sys.exit(1)
tarf.extractall()
Edit
This script is not compatible with versions prior to 2.7. cf with and tarfile.
I now iterate over the members:
target_dir = "/target/"
with closing(tarfile.open('sample.tar', mode='r:gz')) as tarf:
for m in tarf:
pathn = os.path.abspath(os.path.join(target_dir, m.name))
if not pathn.startswith(target_dir):
print 'The tar file contains unsafe filenames. Aborting.'
sys.exit(1)
tarf.extract(m, path=tdir)
Almost, although it would still be possible to have a path like foo/../../.
Better would be to use os.path.join and os.path.abspath, which together will correctly handle leading / and ..s anywhere in the path:
target_dir = "/target/" # trailing slash is important
with tarfile.open(…) as tarf:
for n in tarf.names:
if not os.path.abspath(os.path.join(target_dir, n)).startswith(target_dir):
print "unsafe filenames!"
sys.exit(1)
tarf.extractall(path=target_dir)

Working with relative paths

When I run the following script:
c:\Program Files\foo\bar\scripy.py
How can I refer to directory 'foo'?
Is there a convenient way of using relative paths?
I've done it before with the string module, but there must be a better way (I couldn't find it in os.path).
The os.path module includes various functions for working with paths like this. The convention in most operating system is to use .. to go "up one level", so to get the outside directory you could do this:
import os
import os.path
current_dir = os.getcwd() # find the current directory
print current_dir # c:\Program Files\foo\bar\scripy.py
parent = os.path.join(current_dir, "..") # construct a path to its parent
print parent # c:\Program Files\foo\bar\..
normal_parent = os.path.normpath(parent) # "normalize" the path
print normal_parent # c:\Program Files\foo
# or on one line:
print os.path.normpath(os.path.join(os.getcwd(), ".."))
os.path.dirname(path)
Will return the second half of a SPLIT that is performed on the path parameter. (head - the directory and tail, the file) Put simply it returns the directory the path is in. You'll need to do it twice but this is probably the best way.
Python Docs on path functions:
http://docs.python.org/library/os.path#os.path.expanduser
I have recently started using the unipath library instead of os.path. Its object-oriented representations of paths are much simpler:
from unipath import Path
original = Path(__file__) # .absolute() # r'c:\Program Files\foo\bar\scripy.py'
target = original.parent.parent
print target # Path(u'c:\\Program Files\\foo')
Path is a subclass of str so you can use it with standard filesystem functions, but it also provides alternatives for many of them:
print target.isdir() # True
numbers_dir = target.child('numbers')
print numbers_dir.exists() # False
numbers_dir.mkdir()
print numbers_dir.exists() # True
for n in range(10):
file_path = numbers_dir.child('%s.txt' % (n,))
file_path.write_file("Hello world %s!\n" % (n,), 'wt')
This is a bit tricky. For instance, the following code:
import sys
import os
z = sys.argv[0]
p = os.path.dirname(z)
f = os.path.abspath(p)
print "argv[0]={0} , dirname={1} , abspath={2}\n".format(z,p,f)
gives this output on Windows
argv[0]=../zzz.py , dirname=.. , abspath=C:\Users\michael\Downloads
First of all, notice that argv has the slash which I typed in the command python ../zzz.py and the absolute path has the normal Windows backslashes. If you need to be cross platform you should probably refrain from putting regular slashes on Python command lines, and use os.sep to refer to the character that separated pathname components.
So far I have only partly answered your question. There are a couple of ways to use the value of f to get what you want. Brute force is to use something like:
targetpath = f + os.sep + ".." + os.sep + ".."
which would result in something like C:\Users\michael\Downloads\..\.. on Windows and /home/michael/../.. on Unix. Each .. goes back one step and is the equivalent of removing the pathname component.
But you could do better by breaking up the path:
target = f.split(os.sep)
targetpath = os.sep.join(target[:-2]
and rejoining all but the last two bits to get C:\Users on Windows and / on Unix. If you do that it might be a good idea to check that there are enough pathname components to remove.
Note that I ran the program above by typing python ../xxx.py. In other words I was not in the same working directory as the script, therefore getcwd() would not be useful.

Python: How to create a unique file name?

I have a python web form with two options - File upload and textarea. I need to take the values from each and pass them to another command-line program. I can easily pass the file name with file upload options, but I am not sure how to pass the value of the textarea.
I think what I need to do is:
Generate a unique file name
Create a temporary file with that name in the working directory
Save the values passed from textarea into the temporary file
Execute the commandline program from inside my python module and pass it the name of the temporary file
I am not sure how to generate a unique file name. Can anybody give me some tips on how to generate a unique file name? Any algorithms, suggestions, and lines of code are appreciated.
Thanks for your concern
I didn't think your question was very clear, but if all you need is a unique file name...
import uuid
unique_filename = str(uuid.uuid4())
If you want to make temporary files in Python, there's a module called tempfile in Python's standard libraries. If you want to launch other programs to operate on the file, use tempfile.mkstemp() to create files, and os.fdopen() to access the file descriptors that mkstemp() gives you.
Incidentally, you say you're running commands from a Python program? You should almost certainly be using the subprocess module.
So you can quite merrily write code that looks like:
import subprocess
import tempfile
import os
(fd, filename) = tempfile.mkstemp()
try:
tfile = os.fdopen(fd, "w")
tfile.write("Hello, world!\n")
tfile.close()
subprocess.Popen(["/bin/cat", filename]).wait()
finally:
os.remove(filename)
Running that, you should find that the cat command worked perfectly well, but the temporary file was deleted in the finally block. Be aware that you have to delete the temporary file that mkstemp() returns yourself - the library has no way of knowing when you're done with it!
(Edit: I had presumed that NamedTemporaryFile did exactly what you're after, but that might not be so convenient - the file gets deleted immediately when the temp file object is closed, and having other processes open the file before you've closed it won't work on some platforms, notably Windows. Sorry, fail on my part.)
The uuid module would be a good choice, I prefer to use uuid.uuid4().hex as random filename because it will return a hex string without dashes.
import uuid
filename = uuid.uuid4().hex
The outputs should like this:
>>> import uuid
>>> uuid.uuid()
UUID('20818854-3564-415c-9edc-9262fbb54c82')
>>> str(uuid.uuid4())
'f705a69a-8e98-442b-bd2e-9de010132dc4'
>>> uuid.uuid4().hex
'5ad02dfb08a04d889e3aa9545985e304' # <-- this one
Maybe you need unique temporary file?
import tempfile
f = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
print f.name
f.close()
f is opened file. delete=False means do not delete file after closing.
If you need control over the name of the file, there are optional prefix=... and suffix=... arguments that take strings. See https://docs.python.org/3/library/tempfile.html.
You can use the datetime module
import datetime
uniq_filename = str(datetime.datetime.now().date()) + '_' + str(datetime.datetime.now().time()).replace(':', '.')
Note that:
I am using replace since the colons are not allowed in filenames in many operating systems.
That's it, this will give you a unique filename every single time.
In case you need short unique IDs as your filename, try shortuuid, shortuuid uses lowercase and uppercase letters and digits, and removing similar-looking characters such as l, 1, I, O and 0.
>>> import shortuuid
>>> shortuuid.uuid()
'Tw8VgM47kSS5iX2m8NExNa'
>>> len(ui)
22
compared to
>>> import uuid
>>> unique_filename = str(uuid.uuid4())
>>> len(unique_filename)
36
>>> unique_filename
'2d303ad1-79a1-4c1a-81f3-beea761b5fdf'
I came across this question, and I will add my solution for those who may be looking for something similar. My approach was just to make a random file name from ascii characters. It will be unique with a good probability.
from random import sample
from string import digits, ascii_uppercase, ascii_lowercase
from tempfile import gettempdir
from os import path
def rand_fname(suffix, length=8):
chars = ascii_lowercase + ascii_uppercase + digits
fname = path.join(gettempdir(), 'tmp-'
+ ''.join(sample(chars, length)) + suffix)
return fname if not path.exists(fname) \
else rand_fname(suffix, length)
This can be done using the unique function in ufp.path module.
import ufp.path
ufp.path.unique('./test.ext')
if current path exists 'test.ext' file. ufp.path.unique function return './test (d1).ext'.
To create a unique file path if its exist, use random package to generate a new string name for file. You may refer below code for same.
import os
import random
import string
def getUniquePath(folder, filename):
path = os.path.join(folder, filename)
while os.path.exists(path):
path = path.split('.')[0] + ''.join(random.choice(string.ascii_lowercase) for i in range(10)) + '.' + path.split('.')[1]
return path
Now you can use this path to create file accordingly.

Why doesn't os.path.join() work in this case?

The below code will not join, when debugged the command does not store the whole path but just the last entry.
os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/')
When I test this it only stores the /new_sandbox/ part of the code.
The latter strings shouldn't start with a slash. If they start with a slash, then they're considered an "absolute path" and everything before them is discarded.
Quoting the Python docs for os.path.join:
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
Note on Windows, the behaviour in relation to drive letters, which seems to have changed compared to earlier Python versions:
On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
The idea of os.path.join() is to make your program cross-platform (linux/windows/etc).
Even one slash ruins it.
So it only makes sense when being used with some kind of a reference point like
os.environ['HOME'] or os.path.dirname(__file__).
os.path.join() can be used in conjunction with os.path.sep to create an absolute rather than relative path.
os.path.join(os.path.sep, 'home','build','test','sandboxes',todaystr,'new_sandbox')
Do not use forward slashes at the beginning of path components, except when refering to the root directory:
os.path.join('/home/build/test/sandboxes', todaystr, 'new_sandbox')
see also: http://docs.python.org/library/os.path.html#os.path.join
To help understand why this surprising behavior isn't entirely terrible, consider an application which accepts a config file name as an argument:
config_root = "/etc/myapp.conf/"
file_name = os.path.join(config_root, sys.argv[1])
If the application is executed with:
$ myapp foo.conf
The config file /etc/myapp.conf/foo.conf will be used.
But consider what happens if the application is called with:
$ myapp /some/path/bar.conf
Then myapp should use the config file at /some/path/bar.conf (and not /etc/myapp.conf/some/path/bar.conf or similar).
It may not be great, but I believe this is the motivation for the absolute path behaviour.
It's because your '/new_sandbox/' begins with a / and thus is assumed to be relative to the root directory. Remove the leading /.
Try combo of split("/") and * for strings with existing joins.
import os
home = '/home/build/test/sandboxes/'
todaystr = '042118'
new = '/new_sandbox/'
os.path.join(*home.split("/"), todaystr, *new.split("/"))
How it works...
split("/") turns existing path into list: ['', 'home', 'build', 'test', 'sandboxes', '']
* in front of the list breaks out each item of list its own parameter
To make your function more portable, use it as such:
os.path.join(os.sep, 'home', 'build', 'test', 'sandboxes', todaystr, 'new_sandbox')
or
os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')
do it like this, without too the extra slashes
root="/home"
os.path.join(root,"build","test","sandboxes",todaystr,"new_sandbox")
Try with new_sandbox only
os.path.join('/home/build/test/sandboxes/', todaystr, 'new_sandbox')
os.path.join("a", *"/b".split(os.sep))
'a/b'
a fuller version:
import os
def join (p, f, sep = os.sep):
f = os.path.normpath(f)
if p == "":
return (f);
else:
p = os.path.normpath(p)
return (os.path.join(p, *f.split(os.sep)))
def test (p, f, sep = os.sep):
print("os.path.join({}, {}) => {}".format(p, f, os.path.join(p, f)))
print(" join({}, {}) => {}".format(p, f, join(p, f, sep)))
if __name__ == "__main__":
# /a/b/c for all
test("\\a\\b", "\\c", "\\") # optionally pass in the sep you are using locally
test("/a/b", "/c", "/")
test("/a/b", "c")
test("/a/b/", "c")
test("", "/c")
test("", "c")
Note that a similar issue can bite you if you use os.path.join() to include an extension that already includes a dot, which is what happens automatically when you use os.path.splitext(). In this example:
components = os.path.splitext(filename)
prefix = components[0]
extension = components[1]
return os.path.join("avatars", instance.username, prefix, extension)
Even though extension might be .jpg you end up with a folder named "foobar" rather than a file called "foobar.jpg". To prevent this you need to append the extension separately:
return os.path.join("avatars", instance.username, prefix) + extension
you can strip the '/':
>>> os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/'.strip('/'))
'/home/build/test/sandboxes/04122019/new_sandbox'
I'd recommend to strip from the second and the following strings the string os.path.sep, preventing them to be interpreted as absolute paths:
first_path_str = '/home/build/test/sandboxes/'
original_other_path_to_append_ls = [todaystr, '/new_sandbox/']
other_path_to_append_ls = [
i_path.strip(os.path.sep) for i_path in original_other_path_to_append_ls
]
output_path = os.path.join(first_path_str, *other_path_to_append_ls)
The problem is your laptop maybe running Window. And Window annoyingly use back lash instead of forward slash'/'. To make your program cross-platform (linux/windows/etc).
You shouldn't provide any slashes (forward or backward) in your path if you want os.path.join to handle them properly. you should using:
os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')
Or throw some Path(__file__).resolve().parent (path to parent of current file) or anything so that you don't use any slash inside os.path.join
Please refer following code snippet for understanding os.path.join(a, b)
a = '/home/user.name/foo/'
b = '/bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
OR
a = '/home/user.name/foo'
b = '/bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
But, when
a = '/home/user.name/foo/'
b = 'bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
OR
a = '/home/user.name/foo'
b = 'bar/file_name.extension'
print(os.path.join(a, b))
>>> /home/user.name/foo/bar/file_name.extension

Categories

Resources