I need to perform a loop in order to check during time if files having a given template are added into a directory.
In pseudo-code:
template = "START_*_hello_*.pdf"
while true:
while "file having template does not exist":
time.sleep(1)
found_file = get_existing_file
file_processing(found_file)
The os.path.exists(file_path) function needs the entire filename. How could I used a filename containing the * jolly character?
Thanks
Using the glob module you could write something like this:
import glob
template = "START_*_hello_*.pdf"
while True:
files = glob.glob(template)
if not files:
# no file matching template exists. Try again later.
time.sleep(1)
continue
# Process all existing files
for file in files:
file_processing(file)
Related
I'm trying to copy all pictures from one directory (also including subdirectories) to another target directory. Whenever the exact picture name is found in one of the xml files the tool should grap all information (attributes in the parent and child nodes) and create subdirectories based on those node informations, also it should rename the picture file.
The part when it extracts all the information from the nodes is already done.
from bs4 import BeautifulSoup as bs
path_xml = r"path\file.xml"
content = []
with open(res, "r") as file:
content = file.readlines()
content = "".join(content)
def get_filename(_content):
bs_content = bs(_content, "html.parser")
# some code
picture_path = f'{pm_1}{pm_2}\{pm_3}\{pm_4}\{pm_5}_{pm_6}_{pm_7}\{pm_8}\{pm_9}.jpg'
get_filename(content)
So in the end I get a string value with the directory path and the file name I want.
Now I struggle with opening all xml files in one directory instead of just opening one file. I tryed this:
import os
dir_xml = r"path"
res = []
for path in os.listdir(dir_xml):
if os.path.isfile(os.path.join(dir_xml, path)):
res.append(path)
with open(res, "r") as file:
content = file.readlines()
but it gives me this error: TypeError: expected str, bytes or os.PathLike object, not list
How can i read through all xml files instead of just one? I have hundreds of xml files so that will take a wile :D
And another question: How can i create directories base on string?
Lets say the value of picture_path is AB\C\D\E_F_G\H\I.jpg
I would need another directory path for the destination of the created folders and a function that somehow creates folders based on that string. How can I do that?
To read all XML files in a directory, you can modify your code as follows:
import os
dir_xml = r"path"
for path in os.listdir(dir_xml):
if path.endswith(".xml"):
with open(os.path.join(dir_xml, path), "r") as file:
content = file.readlines()
content = "".join(content)
get_filename(content)
This code uses the os.listdir() function to get a list of all files in the directory specified by dir_xml. It then uses a for loop to iterate over the list of files, checking if each file ends with the .xml extension. If it does, it opens the file, reads its content, and passes it to the get_filename function.
To create directories based on a string, you can use the os.makedirs function. For example:
import os
picture_path = r'AB\C\D\E_F_G\H\I.jpg'
dest_path = r'path_to_destination'
os.makedirs(os.path.join(dest_path, os.path.dirname(picture_path)), exist_ok=True)
In this code, os.path.join is used to combine the dest_path and the directory portion of picture_path into a full path. os.path.dirname is used to extract the directory portion of picture_path. The os.makedirs function is then used to create the directories specified by the path, and the exist_ok argument is set to True to allow the function to succeed even if the directories already exist.
Finally, you can use the shutil library to copy the picture file to the destination and rename it, like this:
import shutil
src_file = os.path.join(src_path, picture_path)
dst_file = os.path.join(dest_path, picture_path)
shutil.copy(src_file, dst_file)
Here, src_file is the full path to the source picture file and dst_file is the full path to the destination. The shutil.copy function is then used to copy the file from the source to the destination.
You can use os.walk() for recursive search of files:
import os
dir_xml = r"path"
for root, dirs, files in os.walk(dir_xml): #topdown=False
for names in files:
if ".xml" in names:
print(f"file path: {root}\n XML-Files: {names}")
with open(names, 'r') as file:
content = file.readlines()
So, I made this little "application" that checks if a file ends with a specific extension(.png or .jpg), but my issue is if I turn it into a loop and I download something while the loop is running it won't move the file to the intended location. It only moves the file on startup.
import os
import shutil
DownloadsDir = ""
Downloadslst = os.listdir(DownloadsDir)
ImageFolder = ''
while True:
for files in Downloadslst:
if files.endswith(('.png','.jpg')):
shutil.move(DownloadsDir + files, ImageFolder)
print("File moved succefully.")
os.listdir(...) is a one-time operation that lists all the files in the current directory at the point of calling. The collection is fixed, not dynamically updated; it's just a simple list. And lists don't update themselves based on a seemingly random condition, like when the files inside a directory change. If you want your list to stay updated, you need to call the function multiple times.
I would do:
processed = set()
to_process = set()
while has_smth_to_download:
download_some_files()
for item in os.listdir(DownloadsDir):
if item not in processed:
to_process.add(item)
processed.add(item)
for files in to_process:
if files.endswith(('png','.jpg')):
shutil.move(DownloadsDir + files, ImageFolder)
print("File moved succefully.")
to_process.clear()
try this:
import os
import shutil
DownloadsDir = ""
ImageFolder = ''
while True:
Downloadslst = os.listdir(DownloadsDir)
for files in Downloadslst:
if files.endswith(('.png','.jpg')):
shutil.move(DownloadsDir + files, ImageFolder)
print("File moved succefully.")
I've managed to find out the method to convert a file from one file extension to another (.evtx to .xml) using an external script. Below is what I am using:
os.system("file_converter.py file1.evtx > file1.xml")
This successfully converts a file from .txt to .xml using the external script I called (file_converter.py).
I am now trying to find out a method on how I can use 'os.system' or perhaps another method to convert more than one file at once, I would like for my program to dive into a folder and convert all of the 10 files I have at once to .xml format.
The questions I have are how is this possible as os.system only takes 1 argument and I'm not sure on how I could make it locate through a directory as unlike the first file I converted was on my standard home directory, but the folder I want to access with the 10 files is inside of another folder, I am trying to find out a way to address this argument and for the conversion to be done at once, I also want the file name to stay the same for each individual file with the only difference being the '.xml' being changed from '.evtx' at the end.
The file "file_converter.py" is downloadable from here
import threading
import os
def file_converter(file):
os.system("file_converter.py {0} > {1}".format(file, file.replace(".evtx", ".xml")))
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
for file in os.listdir(base_dir):
threading.Thread(target=file_converter, args=(file,)).start()
Here my sample code.
You can generate multiple thread to run the operation "concurrently". The program will check for all files in the directory and convert it.
EDIT python2.7 version
Now that we have more information about what you want I can help you.
This program can handle multiple file concurrently from one folder, it check also into the subfolders.
import subprocess
import os
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
commands_to_run = list()
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
commands_to_run.append("C:\\Python27\\python.exe file_converter.py {0} > {1}".format(fullPath, fullPath.replace(".evtx", ".xml")))
return allFiles
print "Searching for files"
file_list(base_dir)
print "Running conversion"
processes = [subprocess.Popen(command, shell=True) for command in commands_to_run]
print "Waiting for converted files"
for process in processes:
process.wait()
print "Conversion done"
The subprocess module can be used in two ways:
subprocess.Popen: it run the process and continue the execution
subprocess.call: it run the process and wait for it, this function return the exit status. This value if zero indicate that the process terminate succesfully
EDIT python3.7 version
if you want to solve all your problem just implement the code that you share from github in your program. You can easily implement it as function.
import threading
import os
import Evtx.Evtx as evtx
import Evtx.Views as e_views
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
tmp_list.append(e_views.XML_HEADER)
tmp_list.append("<Events>")
for record in log.records():
try:
tmp_list.append(record.xml())
except Exception as e:
print(e)
tmp_list.append("</Events>")
with open(file_out, 'w') as final:
final.writelines(tmp_list)
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
threading.Thread(target=convert, args=(fullPath, fullPath.replace(".evtx", ".xml"))).start()
return allFiles
print("Searching and converting files")
file_list(base_dir)
If you want to show your files generate, just edit as above:
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
with open(file_out, 'a') as final:
final.write(e_views.XML_HEADER)
final.write("<Events>")
for record in log.records():
try:
final.write(record.xml())
except Exception as e:
print(e)
final.write("</Events>")
UPDATE
If you want to delete the '.evtx' files after the conversion you can simply add the following rows at the end of the convert function:
try:
os.remove(file_in)
except(Exception, ex):
raise ex
Here you just need to use try .. except because you run the thread only if the input value is a file.
If the file doesn't exist, this function throws an exception, so it's necessary to check os.path.isfile() first.
import os, sys
DIR = "D:/Test"
# ...or as a command line argument
DIR = sys.argv[1]
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
name, ext = os.path.splitext(f)
if ext == ".txt":
new_path = os.path.join(DIR, f"{name}.xml")
os.rename(path, new_path)
Iterates over a directory, and changes all text files to XML.
I have to check in a while loop. I have found the following code it shows the file without while loop.
for file in glob.glob("*.txt"):
print(file)
but it does not work in a while loop if i use the following code
a = os.listdir(my_path)
#print(a)
for file in glob.glob("*.txt"):
print(file)
I am trying to program a watcher type in python with OS module
while True:
f = os.listdir(path)
if len(f) > 0:
for i in os.listdir(path):
if i.endswith('.txt'):
continue
else:
dosomething()
time.sleep(5) #so loop checks files every 5 sec
This way you only use os module, glob's not needed.
use os.listdir to get all files in a folder
len to get total files
list comprehension with sum and str.endswith to get count files ending with ".txt"
Demo:
import os
a = os.listdir(my_path)
if len(a) == sum(1 for i in a if i.endswith(".txt")):
print("All Text")
import os
dr = os.listdir(my_path)
if len(dr) == len(filename for filename, file_extension in a if file_extension == '.txt')):
#do stuff
pass
Well, from what i can understand, what you want is to create a watcher service to provide you if a new file is created.
`
import glob
files = glob.glob("*.txt") # Check first time all the files
while True: # till you exit
_old_files_count = len(files) # Get count of files
files = glob.glob("*.txt")
if len(files) > _old_files_count: # If new file is created.
print(*files[_old_files_count:], sep="\n") # Printing all new files in new line.
`
It will provide you with new files created in that particular directory.
Also, with some tweaks you can also get files which are deleted.
Hope it helps.
For using os, just use below instead of glob line.
import os
files = [file for file in os.listdir() if file.endswith("*.txt")]
Example of PDF: "Smith#00$Consolidated_Performance.pdf"
The goal is to add a bookmark to page 1 of each PDF based on the filename.
(Bookmark name in example would be "Consolidated Performance")
import os
from openpyxl import load_workbook
from PyPDF2 import PdfFileMerger
cdir = "Directory of PDF" # Current directory
pdfcdir = [filename for filename in os.listdir(cdir) if filename.endswith(".pdf")]
def addbookmark(f):
output = PdfFileMerger()
name = os.path.splitext(os.path.basename(f))[0] # Split filename from .pdf extension
dp = name.index("$") + 1 # Find position of $ sign
bookmarkname = name[dp:].replace("_", " ") # replace underscores with spaces
output.addBookmark(bookmarkname, 0, parent=None) # Add bookmark
output.append(open(f, 'rb'))
output.write(open(f, 'wb'))
for f in pdfcdir:
addbookmark(f)
The UDF works fine when applied to individual PDFs, but it won't add the bookmarks when put into the loop at the bottom of the code. Any ideas on how to make the UDF loop through all PDFs within pdfcdir?
I'm pretty sure that the issue you're having has nothing to do with the loop. Rather, you're passing just the filenames and not including the directory path. It's trying to open these files in the script's current working directory (the directory the script is in, by default) rather than in the directory you read the filenames from.
So, join the directory name with each file name when calling your function.
for f in pdfcdir:
addbookmark(os.path.join(cdir, f))