Python operating on files in a folder - 'for file in folder'

Python operating on files in a folder - 'for file in folder' - python

I know a folder's path, and for every file in the folder I would like to do some operations. So essentially what I'm looking for is a for file in folder type of code that gives me access to the files in variables.
What is the Python way of doing this?
Thanks
EDIT - example: my folder will contain a bunch of XML files, and I have a python routine already to parse them into variables I need.

This will allow you to access and print all the file names in your current directory:
import os
for filename in os.listdir('.'):
print filename
The os module contains much more information about the various functions available. The os.listdir() function can also take any other paths you want to specify.

Does the glob library look helpful?
It will perform some pattern matching, and accepts both absolute and relative addresses.
>>> import glob
>>> for file in glob.glob("*.xml"): # only loops over XML documents
print file

For people coming at this from a python version 3.5 or later, we now have the superior os.scandir() which has tremendous performance improvements over os.listdir()
For more information about the improvements/benefits, check out https://benhoyt.com/writings/scandir/

Related

How to open files in a particular folder with randomly generated names?

How to open files in a particular folder with randomly generated names? I have a folder named 2018 and the files within that folder are named randomly. I want to iterate through all of the files and open them up.
I will post three names of the files as an example but note that there are over a thousand files in this folder so it has to work on a large scale without any hard coding.
0a2ec2da-628d-417d-9520-b0889886e2ac_1.xml
00a6b260-951d-46b5-ab27-b2e8729e664d_1.xml
00a6b260-951d-46b5-ab27-b2e8729e664d_2.xml

You're looking for os.walk().
In general, if you want to do something with files, it's worth glancing at the os, os.path, pathlib and other built-in modules. They're all documented.
You could also use glob expansion to expand "folder/*" into a list of all the filenames, but os.walk is probably better.

With os.listdir() or os.walk(), depending on whether you want to do it recursively or not.
You can go through the python doc
https://docs.python.org/3/library/os.html#os.walk
https://docs.python.org/3/library/os.html#os.listdir
One you have list of files you can read it simply -
for file in files:
with open(file, "r") as f:
# perform file operations

prevent getfiles from seeing .DS and other hidden files

I am currently working on a python project on my macintosh, and from time to time I get unexpected errors, because .DS or other files, which are not visible to the "not-root-user" are found in folders. I am using the following command
filenames = getfiles.getfiles(file_directory)
to retreive information about the amount and name of the files in a folder. So I was wondering, if there is a possibility to prevent the getfiles command to see these types of files, by for example limiting its right or the extensions which it can see (all files are of .txt format)
Many thanks in advance!

In your case, I would recommend you switch to the Python standard library glob.
In your case, if all files are of .txt format, and they are located in the directory of /sample/directory/, you can use the following script to get the list of files you wanted.
from glob import glob
filenames = glob.glob("/sample/directory/*.txt")
You can easily use regular expressions to match files and filter out files you do not need. More details can be found from Here.
Keep in mind that with regular expression, you could do much more complicated pattern matching than the above example to handle your future needs.
Another good example of using glob to glob multiple extensions can be found from Here.
If you only want to get the basenames of those files, you can always use standard library os to extract basenames from the full paths.
import os
file_basenames = [os.path.basename(full_path) for full_path in filenames]

There isn't an option to filter within getfiles, but you could filter the list after.
Most-likely you will want to skip all "dot files" ("system files", those with a leading .), which you can accomplish with code like the following.
filenames = [f for f in ['./.a', './b'] if not os.path.basename(f).startswith('.')]

Welcome to Stackoverflow.
You might find the glob module useful. The glob.glob function takes a path including wildcards and returns a list of the filenames that match.
This would allow you to either select the files you want, like
filenames = glob.glob(os.path.join(file_directory, "*.txt")
Alternatively, select the files you don't want, and ignore them:
exclude_files = glob.glob(os.path.join(file_directory, ".*"))
for filename in getfiles.getfiles(file_directory):
if filename in exclude_files:
continue
# process the file

Is there a way to be able to use a variable path using os

The goal is to run through a half stable and half variable path.
I am trying to run through a path (go to lowest folder which is called Archive) and fill a list with files that have a certain ending. This works quite well for a stable path such as this.
fileInPath='\\server123456789\provider\COUNTRY\CATEGORY\Archive
My code runs through the path (recursive) and lists all files that have a certain ending. This works well. For simplicity I will just print the file name in the following code.
import csv
import os
fileInPath='\\\\server123456789\\provider\\COUNTRY\\CATEGORY\\Archive
fileOutPath=some path
csvSeparator=';'
fileList = []
for subdir, dirs, files in os.walk(fileInPath):
for file in files:
if file[-3:].upper()=='PAR':
print (file)
The problem is that I can manage to have country and category to be variable e.g. by using *

The standard library module pathlib provides a simple way to do this.
Your file list can be obtained with
from pathlib import Path
list(Path("//server123456789/provider/".glob("*/*/Archive/*.PAR"))
Note I'm using / instead of \\ pathlib handles the conversion for you on windows.

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.

If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.

Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)

you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Search for file names that contain words from a list and have a certain file extension

Beginner at python. I'm trying to search users folders for illegal content saved in folders. I want to find all files that contain either one or a number of words from the below list and also the files also have an extension that's listed.
I can search the files using file.endswith but don't know how to add in the word condition.
I've looked through the site and how only come across how to search for a certain word and not a list of words.
Thank you in advance
import os
L = ['720p','aac','ac3','bdrip','brrip','demonoid','disc','hdtv','dvdrip',
'edition','sample','torrent','www','x264','xvid']
for root, dirs, files in os.walk("Y:\User Folders\"):
for file in files:
if file.endswith(('*.7z','.3gp','.alb','.ape','.avi','.cbr','.cbz','.cue','.divx','.epub','.flac',
'.flv','.idx','.iso','.m2ts','.m2v','.m3u','.m4a','.m4b','.m4p','.m4v','.md5',
'.mkv','.mobi','.mov','.mp3','.mp4','.mpeg','.mpg','.mta','.nfo','.ogg','.ogm',
'.pla','.rar','.rm','.rmvb','.sfap0','.sfk','.sfv','.sls','.smfmf','.srt,''.sub',
'.torrent','.vob','.wav','.wma','.wmv','.wpl','.zip')):
print(os.path.join(root, file))

Perhaps it might be better to do a reverse search, and display a warning about files that DON'T match the file types you want. For instance you could do this:
if file.endswith(".txt", ".py"):
print("File is ok!")
else:
print("File is not ok!")

Using py.path.local from py package
The py package (install by $ pip install py) offers a very nice interface for working with files.
from py.path import local
def isbadname(path):
bad_extensions = [".pyc", "txt"]
bad_names = ["code", "xml"]
return (path.ext in bad_extensions) or (path.purebasename in bad_names)
for path in local(".").visit(isbadname):
print(path.strpath)
Explained:
Import
from py.path import local
py.path.local function creates "objectified" file names. To keep my code short, I import
it this way to use only local for objectifying file name strings.
Create objectified path to local directory:
local(".")
Created object is not a string, but an object, which has many interesting properties and methods.
Listing all files within some directory:
local(".").visit("*.txt")
returns a generator, providing all paths to files having extension ".txt"..
Alternative method to detect files to generate is providing a function, which gets argument path
(objectified file name) and returns True if the file is to be used, False otherwise.
The function isbadname serves exactly this purpose.
If you want to google for more information, use py path local (the name py is not giving good hits).
For more see https://py.readthedocs.io/en/latest/path.html
Note, that if you use pytest package, the py is installed with it (for good
reason - it makes tests related to file names much more readable and shorter).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python operating on files in a folder - 'for file in folder' - python

Does the glob library look helpful? It will perform some pattern matching, and accepts both absolute and relative addresses. >>> import glob >>> for file in glob.glob("*.xml"): # only loops over XML documents print file

For people coming at this from a python version 3.5 or later, we now have the superior os.scandir() which has tremendous performance improvements over os.listdir() For more information about the improvements/benefits, check out https://benhoyt.com/writings/scandir/

Related

How to open files in a particular folder with randomly generated names?

prevent getfiles from seeing .DS and other hidden files

Is there a way to be able to use a variable path using os

How can I read files with similar names on python, rename them and then work with them?

Search for file names that contain words from a list and have a certain file extension

Categories

Resources