Make glob directory variable - python

I'm trying to write a Python script that searches a folder for all files with the .txt extension. In the manuals, I have only seen it hardcoded into glob.glob("hardcoded path").
How do I make the directory that glob searches for patterns a variable? Specifically: A user input.
This is what I tried:
import glob
input_directory = input("Please specify input folder: ")
txt_files = glob.glob(input_directory+"*.txt")
print(txt_files)
Despite giving the right directory with the .txt files, the script prints an empty list [ ].

If you are not sure whether a path contains a separator symbol at the end (usually '/' or '\'), you can concatenate using os.path.join. This is a much more portable method than appending your local OS's path separator manually, and much shorter than writing a conditional to determine if you need to every time:
import glob
import os
input_directory = input('Please specify input folder: ')
txt_files = glob.glob(os.path.join(input_directory, '*.txt'))
print(txt_files)

For Python 3.4+, you can use pathlib.Path.glob() for this:
import pathlib
input_directory = pathlib.Path(input('Please specify input folder: '))
if not input_directory.is_dir():
# Input is invalid. Bail or ask for a new input.
for file in input_directory.glob('*.txt'):
# Do something with file.
There is a time of check to time of use race between the is_dir() and the glob, which unfortunately cannot be easily avoided because glob() just returns an empty iterator in that case. On Windows, it may not even be possible to avoid because you cannot open directories to get a file descriptor. This is probably fine in most cases, but could be a problem if your application has a different set of privileges from the end user or from other applications with write access to the parent directory. This problem also applies to any solution using glob.glob(), which has the same behavior.
Finally, Path.glob() returns an iterator, and not a list. So you need to loop over it as shown, or pass it to list() to materialize it.

Related

How to create a unique folder name (location path) in Windows?

I am writing a script to save some images in a folder each time it runs.
I would like make a new folder each it runs with a enumerating folder names. for example if I run it first time , it just save the images in C:\images\folder1 and next time I run it, it will save the images in C:\images\folder2 and C:\images\folder3 and so on.
And if I delete these folders, and start running again, it would start from the "C:\images\folder1" again.
I found this solution works for file names but not for the folder names:
Create file but if name exists add number
The pathlib library is the standard pythonic way of dealing with any kind of folders or files and is system independent. As far as creating a new folder name, that could be done in a number of ways. You could check for the existence of each file (like Patrick Gorman's answer) or you could save a user config file with a counter that keeps track of where you left off or you could recall your file creation function if the file already exists moving the counter. If you are planning on having a large number of sub-directories (millions), then you might consider performing a binary search for the next folder to create (instead of iterating through the directory).
Anyway, in windows creating a file/folder with the same name, adds a (2), (3), (4), etc. to the filename. The space and parenthesis make it particularly easy to identify the number of the file/folder. If you want the number directly appended, like folder1, folder2, folder3, etc., then that becomes a little tricky to detect. We essentially need to check what the folder endswith as an integer. Finding particular expressions within in a tricky string is normally done with re (regular expressions). If we had a space and parenthesis we probably wouldn't need re to detect the integer in the string.
from pathlib import Path
import re
def create_folder(string_or_path):
path = Path(string_or_path)
if not path.exists():
#You can't create files and folders with the same name in Windows. Hence, check exists.
path.mkdir()
else:
#Check if string ends with numbers and group the first part and the numbers.
search = re.search('(.*?)([0-9]+$)',path.name)
if search:
basename,ending = search.groups()
newname = basename + str(int(ending)+1)
else:
newname = path.name + '1'
create_folder(path.parent.joinpath(newname))
path = Path(r'C:\images\folder1')
create_folder(path) #creates folder1
create_folder(path) #creates folder2, since folder1 exists
create_folder(path) #creates folder3, since folder1 and 2 exist
path = Path(r'C:\images\space')
create_folder(path) #creates space
create_folder(path) #creates space1, since space exists
Note: Be sure to use raw-strings when dealing with windows paths, since "\f" means something in a python string; hence you either have to do "\\f" or tell python it is a raw-string.
I feel like you could do something by getting a list of the directories and then looping over numbers 1 to n for the different possible directories until one can't be found.
from pathlib import Path
import os
path = Path('.')
folder = "folder"
i = 1
dirs = [e for e in path.iterdir() if e.is_dir()]
while True:
if folder+str(i) not in dirs:
folder = folder+str(i)
break
i = i+1
os.mkdir(folder)
I'm sorry if I made any typos, but that seems like a way that should work.

Os.path gives unexpected output

lately I started working with the Os module in python . And I finally arrived to this Os.path method . So here is my question . I ran this method in one of my kivy project just for testing and it actually didn't returned the correct output.The method consisted of finding if any directory exist and return a list of folders in the directory . otherwise print Invalid Path and return -1 . I passed in an existing directory and it returned -1 but the weird path is that when I run similar program out of my kivy project using the same path present in thesame folder as my python file it return the desired output .here is the image with the python file and the directory name image I have tested which returns invalid path.
and here is my code snippet
def get_imgs(self, img_path):
if not os.path.exists(img_path):
print("Invalid Path...")
return -1
else:
all_files = os.listdir(img_path)
imgs = []
for f in all_files:
if (
f.endswith(".png")
or f.endswith(".PNG")
or f.endswith(".jpg")
or f.endswith(".JPG")
or f.endswith(".jpeg")
or f.endswith(".JPEG")
):
imgs.append("/".join([img_path, f]))
return imgs
It's tough to tell without seeing the code with your function call. Whatever argument you're passing must not be a valid path. I use the os module regularly and have slowly learned a lot of useful methods. I always print out paths that I'm reading or where I'm writing before doing it in case anything unexpected happens, I can see that img_path variable, for example. Copy and paste the path in file explorer up to the directory and make sure that's all good.
Some other useful os.path methods you will find useful, based on your code:
os.join(<directory>, <file_name.ext>) is much more intuitive than imgs.append("/".join([img_path, f]))
os.getcwd() gets your working directory (which I print at the start of scripts in dev to quickly address issues before debugging). I typically use full paths to play it safe because Python pathing can cause differences/issues when running from cmd vs. PyCharm
os.path.basename(f) gives you the file, while os.path.dirname(f) gives you the directory.
It seems like a better approach to this is to use pathlib and glob. You can iterate over directories and use wild cards.
Look at these:
iterating over directories: How can I iterate over files in a given directory?
different file types: Python glob multiple filetypes
Then you don't even need to check whether os.path.exists(img_path) because this will read the files directly from your file system. There's also more wild cards in the glob library such as * for anything/any length, ? for any character, [0-9] for any number, found here: https://docs.python.org/3/library/glob.html

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Python - Getting file directory as user input

Having some trouble getting a list of files from a user defined directory. The following code works fine:
inputdirectory = r'C:/test/files'
inputfileextensions = 'txt'
files = glob.glob(inputdirectory+"*."+inputfileextensions)
But I want to allow the user to type in the location. I've tried the following code:
inputdirectory = input("Please type in the full path of the folder containing your files: ")
inputfileextensions = input("Please type in the file extension of your files: ")
files = glob.glob(inputdirectory+"*."+inputfileextensions)
But it doesn't work. No error message occurs, but files returns as empty. I've tried typing in the directory with quotes, with forward and backward slashes but can't get it to work. I've also tried converting the input to raw string using 'r' but maybe by syntax is wrong. Any ideas?
Not quite sure how the first version works for you. The way the variables are defined, you should have the input to glob as something like:
inputdirectory+"*."+inputfileextensions == "C:\test\files*.txt"
Looking at the above value you can realize that its not something that you are trying to achieve. Instead, you need to join the two paths using the backslash operator. Something like:
os.path.join(inputdirectory, "*."+inputfileextensions) == "C:\test\files\*.txt"
With this change, the code should work regardless of whether the input is taken from the user or predefined.
Try to join path with os.path.join. It will handle slash issue.
import os
...
files = glob.glob(os.path.join(inputdirectory, "*."+inputfileextensions))
Working code for sample, with recursive search.
#!/usr/bin/python3
import glob
import os
dirname = input("What is dir name to search files? ")
path = os.path.join(dirname,"**")
for x in glob.glob(path, recursive=True):
print(x)

Getting the Folder Path of the last location I right clicked in Python

I'm using Glob.Glob to search a folder, and the sub-folders there in for all the invoices I have. To simplify that I'm going to add the program to the context menu, and have it take the path as the first part of,
import glob
for filename in glob.glob(path + "/**/*.pdf", recursive=True):
print(filename)
I'll have it keep the list and send those files to a Printer, in a later version, but for now just writing the name is a good enough test.
So my question is twofold:
Is there anything fundamentally wrong with the way I'm writing this?
Can anyone point me in the direction of how to actually capture folder path and provide it as path-variable?
You should have a look at this question: Python script on selected file. It shows how to set up a "Sent To" command in the context menu. This command calls a python script an provides the file name sent via sys.argv[1]. I assume that also works for a directory.
I do not have Python3.5 so that I can set the flag recursive=True, so I prefer to provide you a solution which you can run on any Python version (known up to day).
The solution consists in using calling os.walk() to run explore the directories and the set build-in type.
it is better to use set instead of list as with this later one you'll need more code to check if the directory you want to add is not listed already.
So basically you can keep two sets: one for the names of files you want to print and the other one for the directories and their sub folders.
So you can adapat this solution to your class/method:
import os
path = '.' # Any path you want
exten = '.pdf'
directories_list = set()
files_list = set()
# Loop over direcotries
for dirpath, dirnames, files in os.walk(path):
for name in files:
# Check if extension matches
if name.lower().endswith(exten):
files_list.add(name)
directories_list.add(dirpath)
You can then loop over directories_list and files_list to print them out.

Categories

Resources