How to import every docx file in a folder into Python? - python

I'm pretty new to Python and I'm using the Python-docx module to manipulate some docx files.
I'm importing the docx files using this code:
doc = docx.Document('filename.docx')
The thing is that I need to work with many docx files and in order to avoid write the same line code for each file, I was wondering, if I create a folder in my working directory, is there a way to import all the docx files in a more efficient way?

Something like:
from glob import glob
def edit_document(path):
document = docx.Document(path)
# --- do things on document ---
for path in glob.glob("./*.docx"):
edit_document(path)
You'll need to adjust the glob expression to suit.
There are plenty of other ways to do that part, like os.walk() if you want to recursively descend directories, but this is maybe a good place to start.

Related

Delete particular mp3 files in a directory that has no metadata(Title,artist etc)?

I want to delete particular mp3 files in a directory that has no metadata(Title, artist, etc). I tried sorting the files and then deleting it manually but it didn't work as the number of files is huge(~30K). Is there any python script to accomplish this task?
This example script is not completed, but I hope that good start point for you.
import os
import glob
from mutagen.easyid3 import EasyID3
mp3_files_list = glob.glob(path/to/your/mp3-files-folder/*.mp3) # example path /home/user/Downloads/mp3/*.mp3
for mp3_file in mp3_files_list:
audio = EasyID3(mp3_file)
if not audio['title']: # if title tag is empty
os.remove(mp3_file)
if not audio['artist']:
os.remove(mp3_file)
# etc tags check
Mutagen module documentation: https://mutagen.readthedocs.io/en/latest/

How to validate format of data using glob in python?

I have a list of different files in my folder and these files have several formats, like PDF, txt, Docx and HTML. I want to validate the format of the files in python.
Here is my attempt
import os
import pdftables_api
import glob
path = r"myfolder\*"
files = glob.glob(path)
for i in files:
if i.endswith('.pdf'):
conversion = pdftables_api.Client('my_api')
conversion.xlsx(i,r"destination\*")
The reason for this is I want to iterate through each file and check if the file is pdf, then it is pdf, convert it into excel using API from PDFTable_api package in python and save it in the destination folder. But I don't feel like this is an efficient way to do this.
Can anyone please help me if there is an efficient manner of achieving this?

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Python operating on files in a folder - 'for file in folder'

I know a folder's path, and for every file in the folder I would like to do some operations. So essentially what I'm looking for is a for file in folder type of code that gives me access to the files in variables.
What is the Python way of doing this?
Thanks
EDIT - example: my folder will contain a bunch of XML files, and I have a python routine already to parse them into variables I need.
This will allow you to access and print all the file names in your current directory:
import os
for filename in os.listdir('.'):
print filename
The os module contains much more information about the various functions available. The os.listdir() function can also take any other paths you want to specify.
Does the glob library look helpful?
It will perform some pattern matching, and accepts both absolute and relative addresses.
>>> import glob
>>> for file in glob.glob("*.xml"): # only loops over XML documents
print file
For people coming at this from a python version 3.5 or later, we now have the superior os.scandir() which has tremendous performance improvements over os.listdir()
For more information about the improvements/benefits, check out https://benhoyt.com/writings/scandir/

How do I use wild cards in python as specified in a text file

Hello I'm new to python and I'd like to know how to process a .txt file line by line to copy files specifid as wild cards
basically the .txt file looks like this.
bin/
bin/*.txt
bin/*.exe
obj/*.obj
document
binaries
so now with that information I'd like to be able to read my .txt file match the directory copy all the files that start with * for that directory, also I'd like to be able to copy the folders listed in the .txt file. What's the best practical way of doing this? your help is appreciated, thanks.
Here's something to start with...
import glob # For specifying pathnames with wildcards
import shutil # For doing common "shell-like" operations.
import os # For dealing with pathnames
# Grab all the pathnames of all the files matching those specified in `text_file.txt`
matching_pathnames = []
for line in open('text_file.txt','r'):
matching_pathnames += glob.glob(line)
# Copy all the matched files to the same filename + '.new' at the end
for pathname in matching_pathnames:
shutil.copyfile(pathname, '%s.new' % (pathname,))
You might want to look at the glob and re modules
http://docs.python.org/library/glob.html

Categories

Resources