Watchdog observer hangs on Windows, works fine on Linux

Watchdog observer hangs on Windows, works fine on Linux - python

I am using an observer to watch a directory structure (project directory). My application uses a wx.TreeCtrl to display a tree of the watched directory. So far everything works on both Windows and Linux: newly created files in the watched directory are added to the wx.TreeCtrl, deleted files are removed, renamed files are renamed, etc.
The user has the option to select another directory to watch (read: project directory). This then calls the function DrawDocumentTree, which will populate the wx.TreeCtrl and update the observer. Following this, the observer happily watches the newly selected directory and performs its tasks accordingly. Again, this works on Windows and Linux.
My code gets stuck (program not responding) under Windows when the entire watched directory is deleted. Following the deletion, the previous project directory is selected and the DrawDocumentTree function is called. With the code as I have now, this only works on Linux. Here is what the DrawDocumentTree function looks like.
def DrawDocumentTree (self, fullpath):
logger = init_logger('Common')
logger.debug('Drawing document tree')
self.document_tree.SetFocus()
self.document_tree.DeleteAllItems()
self.document_tree.Refresh()
root = fullpath
self.ids = {root : self.document_tree.AddRoot(root, self.folderidx)}
for dirpath, dirnames, filenames in os.walk(root):
for dirpath, dirnames, filenames in os.walk(root):
for dirname in dirnames:
fullpath = os.path.join(dirpath, dirname)
file_type = 'directory'
self.ids[fullpath] = self.document_tree.AppendItem(self.ids[dirpath], dirname, get_icon(self, fullpath, file_type))
self.document_tree.SetItemData(self.ids[fullpath], { 'type' : file_type})
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
file_type = get_filetype(fullpath)
self.ids[fullpath] = self.document_tree.AppendItem(self.ids[dirpath], filename, get_icon(self, fullpath, file_type))
self.document_tree.SetItemData(self.ids[fullpath], { 'type' : file_type})
try:
# ~ logger.debug('Stop observer')
# ~ self.observer.stop()
# ~ logger.debug('Join observer')
# ~ self.observer.join()
# ~ logger.debug('Start observer')
# ~ self.observer.start()
# ~ logger.debug('Unschedule all watchers')
# ~ self.observer.unschedule_all()
logger.info('Set observer to watch project directory: ' + root)
self.observer.schedule(self.event_handler, root, recursive=True)
except:
logger.error('Unable to watch the project directory.')
logger.debug('Done drawing document tree')
You can see towards the bottom of the function that I have tried various things, like stopping, joining and then starting the observer. None of this works. If anything, it makes it stop working under Linux.
When running on Windows, the last entry in the log is Set observer to watch project directory: xyz (where xyz is the path to the directory to be watched). The exception message does not make it to the log.
Another thing I have tried is: the moment the user deletes the project directory, I immediately (before the current project directory is actually deleted) tell the observer to watch a temporary directory. This in an attempt to prevent the observer from possibly crashing due to it trying to watch a directory that no longer exists. Once everything has been deleted, the DrawDocumentTree function is called and will attempt to point the observer to watch the then current project directory.
I am running this code on Python 3.9.0 and watchdog version 2.1.9 (in both Windows and Linux).
Any ideas on what I can do to ensure this code works on Windows like it does on Linux?
Thank you.
Marc.

Related

How do I create an automated test for my python script?

I am fairly new to programming and currently working on a python script. It is supposed to gather all the files and directories that are given as paths inside the program and copy them to a new location that the user can choose as an input.
import shutil
import os
from pathlib import Path
import argparse
src = [ [insert name of destination directory, insert path of file/directory that
should be copied ]
]
x = input("Please choose a destination path\n>>>")
if not os.path.exists(x):
os.makedirs(x)
print("Directory was created")
else:
print("Existing directory was chosen")
dest = Path(x.strip())
for pfad in src:
if os.path.isdir(pfad[1]):
shutil.copytree(pfad[1], dest / pfad[0])
elif os.path.isfile(pfad[1]):
pfad1 = Path(dest / pfad[0])
if not os.path.exists(pfad1):
os.makedirs(pfad1)
shutil.copy(pfad[1], dest / pfad[0])
else:
print("An error occured")
print(pfad)
print("All files and directories have been copied!")
input()
The script itself is working just fine. The problem is that I want write a test that automatically test the code each time I push it to my GitLab repository. I have been browsing through the web for quite some time now but wasnt able to find a good explanation on how to approach creating a test for a script like this.
I would be extremely thankful for any kind of feedback or hints to helpful resources.

First, you should write a test that you can run in command line.
I suggest you use the argparse module to pass source and destination directories, so that you can run thescript.py source_dir dest_dir without human interaction.
Then, as you have a test you can run, you need to add a .gitlab-ci.yml to the root of the project so that you can use the gitlab CI.
If you never used the gitlab CI, you need to start here: https://docs.gitlab.com/ee/ci/quick_start/
After that, you'll be able to add a job to your .gitlab-ci.yml, so that a runner with python installed will run the test. If you don't understad the bold terms of the previous sentence, you need to understant Gitlab CI first.

PyMuPDF (fitz) not properly closing files, resulting in PermissionError [WinError 32]

I can't figure out why I'm getting a PermissionError when trying to clean up some temporary pdf files that are no longer needed.
My script downloads a bunch of single page pdf's into a /temp folder, then uses PyMuPDF to merge them into a single pdf. At the end of the script, when the merged file has been created, a cleanup function is supposed to move the pdf's from the temp folder to a another folder so I can delete the temp folder. It's when everything else is done, at the end, that I get the permission error when trying to move the temp files.
I tried 2 methods to generate the pdf without leaving files open at the end: 1 as per the fitz wiki using open() and then close(), and the other using with to ensure that nothing was being left open unintentionally. I included a simplification of what I'm trying to do, which results in exactly the same PermissionError. Both methods I used are in there, which can be tried by commenting out either one of the methods when initiation the object. It is available with the folders and files as used in the script on my github. The scripts assumes some things to be present as defined in the dubinit of the PdfOut class:
import os, fitz, time
class PdfOut:
def __init__(self):
cwd = os.getcwd()
# 3 pdf files exist in the /temp folder
self.files = ['pdf1.pdf', 'pdf2.pdf', 'pdf3.pdf']
self.dir_in = os.path.join(cwd, 'temp')
# /archive directory exists - this is where composite pdf will be saved
self.dir_out = os.path.join(cwd, 'archive')
# /raw directory exists - this is where single page pdf must be moved at the end of the script
self.dir_store = os.path.join(cwd, 'raw')
self.bookmarks = ['file1', 'file2', 'file3']
self.file_out = "Combined_File.pdf"
def writePDFusingClose(self):
composite_pdf = fitz.open()
for f in self.files:
new_page = fitz.open(os.path.join(self.dir_in, f))
composite_pdf.insert_pdf(new_page)
new_page.close()
new_toc = []
page_count = 1
for item in self.bookmarks:
entry = [1, item, page_count]
new_toc.append(entry)
page_count += 1
composite_pdf.set_toc(new_toc)
composite_pdf.save(os.path.join(self.dir_out, self.file_out), deflate=True, garbage=3)
composite_pdf.close()
def writePDFusingWith(self):
with fitz.open() as composite_pdf:
for f in self.files:
with fitz.open(os.path.join(self.dir_in,f)) as new_page:
composite_pdf.insert_pdf(new_page)
new_toc = []
page_count = 1
for item in self.bookmarks:
entry = [1, item, page_count]
new_toc.append(entry)
page_count += 1
composite_pdf.set_toc(new_toc)
composite_pdf.save(os.path.join(self.dir_out, self.file_out), deflate=True, garbage=3)
def cleanUp(self):
for file_name in os.listdir(self.dir_in):
os.replace(os.path.join(self.dir_in, file_name), os.path.join(self.dir_store, file_name))
os.rmdir(self.dir_in)
new_file = PdfOut()
new_file.writePDFusingClose()
# new_file.writePDFusingWith()
# time.sleep(10)
new_file.cleanUp()
As you can see I even tried putting in a 10sec delay to allow for any scanning or system background operations to finish, but it didn't make a difference. In fact I actually tried manually deleting the files in windows explorer while the 10sec delay was ticking and it told me the file was locked by Python (so not some other system process). This leads me to believe PyMuPDF/fitz somehow keeps those files open in the python process even though the use of with should cause it to relinquish the files when completed with that specific operation.
This is the error message it generates:
Traceback (most recent call last):
File "d:\GitHub\TestPDFmergeandclean\main.py", line 52, in <module>
new_file.cleanUp()
File "d:\GitHub\TestPDFmergeandclean\main.py", line 45, in cleanUp
os.replace(os.path.join(self.dir_in, file_name), os.path.join(self.dir_store, file_name))
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\GitHub\\TestPDFmergeandclean\\temp\\pdf1.pdf' -> 'D:\\GitHub\\TestPDFmergeandclean\\raw\\pdf1.pdf'
Everything works as expected, the combined pdf is generated with the ToC in the folder where it's supposed to go, it's just the problem with the cleanup of the temp folder. For the life of me I can't find anywhere in the PyMuPDF documentation any other way of forcibly closing out docs other than with the use of .close() ...
Anybody have any idea what I'm doing wrong, or another suggestion to achieve the cleanup of the temp folder as I'm trying to achieve?
EDIT:
Once the main script completes I can manually move/delete the pdfs, indicating that they are indeed relinquished by Python when the script finishes. But that's kinda the point of my question, why can't I get Python to relinquish the files without having to end main.py and rerun another one? In my project I tried moving the cleanUp method to the main.py script and elsewhere, to separate it from the output.py (where the merged pdf is created), that didn't solve the issue unfortunately.
If you're interested you can see the full setup on my github (https://github.com/flyingbelgian/AU_AIP_crawler/tree/CombinePDF). In the project you will see that I also create temporary html files which are then moved to another folder without issue, even when cleanup was in the same .py as the creation of the temp html files. It's only the pdfs touched by PyMuPDF that appear to remain open even after calling .close() on them.
EDIT 2:
I added a more explicit
print("All methods completed, starting 20sec sleep")
time.sleep(20)
before the final cleanUp call, allowing me to check if the files are relinquished by PyMuPDF when all the pdf handling is completed.
This confirms that the files are being held open by Python and not some other windows process:

MPTT Algorith Modification

I've got this algorithm to generate MPTT from my folder structure:
https://gist.github.com/unbracketed/946520
Found on github and works perfectly for my needs. Currently I have requirement to add to it functionality of skipping some folders in tree. For example I want to skip everything in/under /tmp/A/B1/C2. So my tree will not contain anything from C2 (including C2).
I'm not so useless in python so I've created that queries (and passed extra list to function):
def is_subdir(path, directory):
path = os.path.realpath(path)
directory = os.path.realpath(directory)
relative = os.path.relpath(path, directory)
return not relative.startswith(os.pardir + os.sep)
/Now we can add somewhere
for single in ignorelist:
if fsprocess.is_subdir(node,single):
But my question is where to stick in the function? I've tried to this on the top and in if do return but that exiting my whole application. It's recurring invoking itself so I'm pretty lost.
Any good advices? I've tried contact script creator on github no owner. Really good job with this algorithm, save me a lot of the time and it's perfect for our project requirements.

def generate_mptt(root_dir):
"""
Given a root directory, generate a calculated MPTT
representation for the file hierarchy
"""
for root, dirs, _ in os.walk(root_dir):
Your check should go here:
if any(is_subdir(root, path) for path in ignorelist):
del dirs[:] # don't descend
continue
Sort of like that. Assuming that is_subdir(root, path) returns True if root is a subdirectory of path.
dirs.sort()
tree[root] = dirs
preorder_tree(root_dir, tree[root_dir])
mptt_list.sort(key=lambda x: x.left)

How to create a directory using os.mkdir()

I'm trying to create a directory using os.mkdir() or os.makedirs() as follows:
if not os.path.exists(directory):
os.mkdir(directory)
This code runs fine, but I could see no directory created in the 'directory' path.
If I only write:
os.mkdir(directory)
it gives error message that directory already exists.

Try the following for a little more robust handling -- similar to how mkdir -p works on Linux:
def _mkdir(_dir):
if os.path.isdir(_dir): pass
elif os.path.isfile(_dir):
raise OSError("%s exists as a regular file." % _dir)
else:
parent, directory = os.path.split(_dir)
if parent and not os.path.isdir(parent): _mkdir(parent)
if directory: os.mkdir(_dir)
If you try making a dir over a file, complain, otherwise, just make sure the dir exists.

Python. Unchroot directory

I chrooted directory using following commands:
os.chroot("/mydir")
How to return to directory to previous - before chrooting?
Maybe it is possible to unchroot directory?
SOLUTION:
Thanks to Phihag. I found a solution. Simple example:
import os
os.mkdir('/tmp/new_dir')
dir1 = os.open('.', os.O_RDONLY)
dir2 = os.open('/tmp/new_dir', os.O_RDONLY)
os.getcwd() # we are in 'tmp'
os.chroot('/tmp/new_dir') # chrooting 'new_dir' directory
os.fchdir(dir2)
os.getcwd() # we are in chrooted directory, but path is '/'. It's OK.
os.fchdir(dir1)
os.getcwd() # we came back to not chrooted 'tmp' directory
os.close(dir1)
os.close(dir2)
More info

If you haven't changed your current working directory, you can simply call
os.chroot('../..') # Add '../' as needed
Of course, this requires the CAP_SYS_CHROOT capability (usually only given to root).
If you have changed your working directory, you can still escape, but it's harder:
os.mkdir('tmp')
os.chroot('tmp')
os.chdir('../../') # Add '../' as needed
os.chroot('.')
If chroot changes the current working directory, you can get around that by opening the directory, and using fchdir to go back.
Of course, if you intend to go out of a chroot in the course of a normal program (i.e. not a demonstration or security exploit), you should rethink your program. First of all, do you really need to escape the chroot? Why can't you just copy the required info into it beforehand?
Also, consider using a second process that stays outside of the chroot and answers to the requests of the chrooted one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Watchdog observer hangs on Windows, works fine on Linux - python

Related

How do I create an automated test for my python script?

PyMuPDF (fitz) not properly closing files, resulting in PermissionError [WinError 32]

MPTT Algorith Modification

How to create a directory using os.mkdir()

Python. Unchroot directory

Categories

Resources