How to check only folders inside the folder? - python

I'm trying to find out the proper solution to check only folders inside the folder and read only XML files.
My code:
data= os.listdir('./home/')
for packs in data:
file=open('./home/' + packs + '/files' + 'data.xml', 'r') #i have 100's of folders inside home
all_file=file.read()
Output: It reads all the folders as per requirement but I have 3 csv and 2 text files in the folder home. My code is also reading those files and gives an error. I don't want to read those, is there any methods to read only XML files.

Related

Read Word and PDF in python

In a folder I have 10 PDF files and 5 Word files (doc, docx). I would like to know how I can create a table in python with two columns:
ID of the file
Text of the pdf or word
Thanks for your help
For reading the PDFs, you can use the library: https://pypi.org/project/PyPDF2/
For docx, the library: https://github.com/ankushshah89/python-docx2txt
By file ID, do you simply mean the filename? You can use the following code to fetch a list of files and folders in the current working folder:
import os
filelist = os.listdir()

Python: Iterate through directory to find specific text file, open it, write some of its contents to a new file, close it, and move on to the next dir

I have a script that takes an input text file then finds data in it, puts that data as a variable, then later I call that variable to write to a new file. This snippet of code is just for reading the txt file and storing the data from it as variables.
searchfile = open('C://Users//Me//DynamicFolder//report//summary.txt','r', encoding='utf-8')
slab_count=0
slab_number=[]
slab_total=0
for line in searchfile:
if "Slab" in line:
slab_num = ([float(s) for s in re.findall(r'[-+]?(?:\d*\.\d+|\d+)', line)])
slab_percent = slab_num[-1]
slab_number.append(slab_percent)
slab_count=slab_count+1
slab_total=0
for slab_percent in slab_number:
slab_total+=slab_percent
searchfile.close()
I am using xlsxwriter to write the variables to an excel doc.
My question is, how do I iterate this to search through a given directories sub-directories for summary.txt when there is a dynamic folder.
So C://Users//Me//DynamicFolder//report//summary.txt is a path to one of the files. There are several folders I named DynamicFolder that are there because another process puts them there, they change their names all the time. I need have this script go into each of those dynamic folders to a subdir called report, this is a static name and is always the same. So each of those dynamicfolders has another subdir called report, and in the report folder is a file called summary.txt. I am trying to go through each of those dynamicfolders into the subdir report > summary.txt and then opening and writing data from those txt files.
How do I iterate or loop this? Right now I have 18 folders with those DynamicFolder names that will change when they are over written. How can I put this snip of code to iterate through?
for path in Path('C://Users//Me//DynamicFolder//report//summary.txt').rglob('summary.txt'):
report folder is not the only folder with a summary.txt file, but its the only folder with the file I want. So this code above pulls ALL summary.txt files from all subdir's under the DynamicFolder (not just report folder). I am wondering if I can make this JUST do the 'report' subdir folders under DynamicFolders, and somehow use this to iterate the rest of my code?

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

(I'm new to python so please excuse the probably trivial question. I tried my best looking for similar issues but suprisingly couldn't find someone with the same question.)
I'm trying to build a simple static site generator in Python. The script should take all .txt files in a specific directory (including subfolders), paste the content of each into a template .html file and then save all the newly generated .html files into a new directory while recreating the folder structure of the original directory.
So for I got the code which does the conversion itself for a single file but I'm unsure how to do it for multiple files in a directory.
with open('template/page.html', 'r') as template:
templatedata = template.read()
with open('content/content.txt', 'r') as content:
contentdata = content.read()
pagedata = templatedata.replace('!PlaceholderContent!', contentdata)
with open('www/content.html', 'w') as output:
output.write(pagedata)
To manipulate files and directories, you will need to import some system functionalites under the built-in module os.
import os
The functionalities under the os module include :
Listing the content of a directory :
path_to_template_dir = 'template/'
template_files = os.listdir(path_to_template_dir)
print(template_files)
# Outputs : ['page.html']
Creating a directory (If it does not already exist) :
path_to_output_dir = 'www/'
try :
os.mkdir(path_to_output_dir)
except FileExistsError as e:
print('Directory exists:', path_to_output_dir)
And since you know the names of the directories you want to use, and using these two functions, you now know the names of the files you want to use and generate, you can now concatenate the name of each file to the names of its directories to create the string str of the final file path, which you can then open() for reading and/or writing.
It's hard to give a perfect code example for your question since the logic of how you want to manipulate each of the template and content file is missing, but here is an example for writing a file inside the newly created directory :
path_to_output_file = path_to_output_dir + 'content.html'
with open(path_to_output_file, 'w') as output:
output.write('Content')
And an example for reading all the template files inside the template/ directory and then printing them to the screen.
for template_file in template_files:
path_to_template_file = path_to_template_dir + template_file
with open(path_to_template_file, 'r') as template:
print(template.read())
In the end, manipulating files is all about creating the path string you want to read from or write to, and then accessing it.
Anymore functionalities you might need (for example : checking if a path is a file os.path.isfile() or if it's for a directory os.path.isdir() can be found under the os module.

Import list of folder names in a folder with Python

So I've started down the path again of trying to automate something. My end game is to combine the data within Excel files containing the Clean Up in the file name and combine the data from a tab within these files named LOV. So basically it had to go into a folder with folders which have folders again that have 2 files, one file has the words Clean Up in the naming and is a .xlsx file. Which I need to only read those files and and pull the data from the tab called LOV into one large file. --- So that's my end goal. Which I just started and I am no where near, but now you know the end game.
Currently I'm stuck just getting a list of Folder names in the Master folder so I at least know it's getting there lol.
import os
import glob
import pandas as pd
# assigns directory location to PCC Folder
os.chdir('V:/PCC Clean Up Project 2017/_DCS Data SWAT Project/PCC Files
Complete Ready to Submit/Brake System Parts')
FolderList = glob.glob('')
print(FolderList)
Any help is appreciated, thanks guys!
EDITED
Firstly Its hard to understand your question. But from what I understand you need to iterate over folders and subfolders, you can do that with
for root, dirs, files in os.walk(source): #Give your path in source
for file in filenames:
if file.endswith((".xlxs")): # You can check for any file extension
filename = os.path.join(subdir,file)
dirname = subdir.split(os.path.sep)[-1] # gets the directory name
print(dirname)
If you only want the list of folders in your current directory, you can use os.path. Here is how it works:
import os
directory = "V:/PCC Clean Up Project 2017/_DCS Data SWAT Project/PCC Files
Complete Ready to Submit/Brake System Parts"
childDirectories = next(os.walk(directory))[1]
This will give you a list of all folders in your current directory.
Read more about os.walk here.
You can then go into one of the child directories by using os.chdir:
os.chdir(childDirectories[i])

Convert all pdf in a folder to text files and store them in different folders using python

Im trying to convert all the pdf stored in one file, say 60 pdfs into text documents and store them in different folders. the folder should have unique names.
i tried this code.The folders where created, but the pdftotext conversion command doesnt work in the loop:
import os
def listfiles(path):
for root, dirs, files in os.walk(path):
for f in files:
print(f)
newpath = r'/home/user/files/'
p=f.replace("pdf","")
newpath=newpath+p
if not os.path.exists(newpath): os.makedirs(newpath)
os.system("pdftotext f f.txt")
f=listfiles("/home/user/reports")
One problem here is the os.system("pdftotext f f.txt") call. I assume you want the f's here replaced with the current file in the loop. If that is the case you need to change this to os.system("pdftotext {0} {0}.txt".format(f))
Another issue may be that the working directory is not being set up so the call to system is looking for the file in the wrong place. Try using os.chdir every time you change folders.
to place the text file in a diffrent folder try:
os.system("pdftotext {0} {1}/{0}.txt".format(f, newpath))
I don't know Python, but I think I can clearly see a mistake there. It looks like you are just replacing the ".pdf" with a ".txt". Since a PDF isn't just plain text, this won't work.
For the convertion look at the top answer of this post:
Python module for converting PDF to text

Categories

Resources