Check if there are .format files in a directory - python

I have been trying to figure out for a while how to check if there are .pkl files in a given directory. I checked the website and I could find ways to find if there are files in the directory and list them, but I just want to check if they are there.
In my directory are a total of 7 .pkl files, as soon as I create one, the others are created so to check if the seven of them exist, it will be enough to check if one exists. Therefore, I would like to check if there is any .pkl file.
This is working if I do:
os.path.exists('folder1/folder2/filename.pkl')
But I had to write one of my file names. I would like to do so without searching for a specific file. I also tried
os.path.exists('folder1/folder2/*.pkl'),
but it is not working neither as I don't have any file named *.pkl.

You can use the python module glob (https://docs.python.org/3/library/glob.html)
Specifically, glob.glob('folder1/folder2/*.pkl') will return a list of all .pkl files in folder2.

You can use :
for dir_path, dir_names, file_names in os.walk(search_dir):
# Go over all files and folders
for file_name in file_names:
if (file_name.endswith(".pkl")):
# do something like break after the first one you find
Note : This can be used if you want to search entire directory with sub directories also
In case you want to search only one directory , you can run the "for" on os.listdir(path)

Related

Traversing a file-system directory structure

I am trying to do some work within some files in a directory. The basic structure of what I'm trying to work with is folder -> sub-folders -> files I need to access. data holds hundreds of subfolders, I am trying to access each one, find the file within them that ends in 'params', and for now just read the contents. My code is below:
import os
for sub_folder in os.scandir('data'):
os.chdir(sub_folder)
for file in os.scandir(sub_folder):
print(file.name)
if(file.name.endswith('params')):
with open(file.name, 'r') as f:
data = f.read()
I'm getting a FileNotFoundError, where it's telling me that the path 'data\\\run.0' doesn't exist. I have confirmed that 'run.0' is the first sub folder within data, so where I'm confused is how the path doesn't actually exist.
I know the error is happening when I attempt to change directories, so I'm suspecting the way that I am traversing the data folder is not a correct way of doing so. I understand that os.scandir gives a DirEntry object, which is what the variable sub_folder will be but is this not a valid input for the change directory function?
You can use os.walk, but I prefer use glob: See How to use Glob() function to find files recursively in Python?

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Put contents of a folder into a zip using python

I need to put the contents of a folder into a ZIP archive (In my case a JAR but they're the same thing).
Though I cant just do the normal adding every file individually, because for one... they change each time depending on what the user put in and it's something like 1,000 files anyway!
I'm using ZipFile, it's the only really viable option.
Basically my code makes a temporary folder, and puts all the files in there, but I can't find a way to add the contents of the folder and not just the folder itself.
Python allrady support zip and rar archife. Personaly I learn it form here https://docs.python.org/3/library/zipfile.html
And to get all files form that folder try somethign like
for r, d, f in os.walk(path):
for file in f:
#enter code here

Convert all pdf in a folder to text files and store them in different folders using python

Im trying to convert all the pdf stored in one file, say 60 pdfs into text documents and store them in different folders. the folder should have unique names.
i tried this code.The folders where created, but the pdftotext conversion command doesnt work in the loop:
import os
def listfiles(path):
for root, dirs, files in os.walk(path):
for f in files:
print(f)
newpath = r'/home/user/files/'
p=f.replace("pdf","")
newpath=newpath+p
if not os.path.exists(newpath): os.makedirs(newpath)
os.system("pdftotext f f.txt")
f=listfiles("/home/user/reports")
One problem here is the os.system("pdftotext f f.txt") call. I assume you want the f's here replaced with the current file in the loop. If that is the case you need to change this to os.system("pdftotext {0} {0}.txt".format(f))
Another issue may be that the working directory is not being set up so the call to system is looking for the file in the wrong place. Try using os.chdir every time you change folders.
to place the text file in a diffrent folder try:
os.system("pdftotext {0} {1}/{0}.txt".format(f, newpath))
I don't know Python, but I think I can clearly see a mistake there. It looks like you are just replacing the ".pdf" with a ".txt". Since a PDF isn't just plain text, this won't work.
For the convertion look at the top answer of this post:
Python module for converting PDF to text

Script to rename files in folder to match names of files in another folder

I need to do a batch rename given the following scenario:
I have a bunch of files in Folder A
A bunch of files in Folder B.
The files in Folder A are all ".doc",
the files in Folder B are all ".jpg".
The files in Folder A are named "A0001.doc"
The files in Folder B are named "A0001johnsmith.jpg"
I want to merge the folders, and rename the files in Folder A so that they append the name portion of the matching file in Folder B.
Example:
Before:
FOLDER A: Folder B:
A0001.doc A0001johnsmith.jpg
After:
Folder C:
A0001johnsmith.doc
A0001johnsmith.jpg
I have seen some batch renaming scripts, but the only difference is that i need to assign a variable to contain the name portion so I can append it to the end of the corresponding file in Folder A.
I figure that the best way to do it would be to do a simple python script that would do a recursive loop, working on each item in the folder as follows:
Parse filename of A0001.doc
Match string to filenames in Folder B
Take the portion following the string that matched but before the "." and assign variable
Take the original string A0001 and append the variable containing the name element and rename it
Copy both files to Folder C (non-destructive, in case of errors etc)
I was thinking of using python for this, but I could use some help with syntax and such. I only know a little bit using the base python library, and I am guessing I would be importing libraries such as "OS", and maybe "SYS". I have never used them before, any help would be appreciated. I am also open to using a windows batch script or even powershell. Any input is helpful.
This is Powershell since you said you would use that.
Please note that I HAVE NOT TESTED THIS. I don't have access to a Windows machine right now so I can't test it. I'm basing this off of memory, but I think it's mostly correct.
foreach($aFile in ls "/path/to/FolderA")
{
$matchString = $aFile.Name.Split("."}[0]
$bFile = $(ls "/path/to/FolderB" |? { $_.Name -Match $matchString })[0]
$addString = $bFile.Name.Split(".")[0].Replace($matchString, "")
cp $aFile ("/path/to/FolderC/" + $matchString + $addString + ".doc")
cp $bFile "/path/to/FolderC"
}
This makes a lot of assumptions about the name structure. For example, I assumed the string to add doesn't appear in the common filename strings.
It is very simple with a plain batch script.
#echo off
for %%A in ("folderA\*.doc") do (
for %%B in ("folderB\%%~nA*.jpg") do (
copy "%%A" "folderC\%%~nB.doc"
copy "%%B" "folderC"
)
)
I haven't added any error checking.
You could have problems if you have a file like "A1.doc" matching multiple files like "A1file1.jpg" and "A10file2.jpg".
As long as the .doc files have fixed width names, and there exists a .jpg for every .doc, then I think the code should work.
Obviously more code could be added to handle various scenarios and error conditions.

Categories

Resources