os.path.getmtime of shared files in Dropbox

os.path.getmtime of shared files in Dropbox - python

I want to run a script to check whether certain files in my Dropbox folder have changed. I am currently using os.path.getmtime() to check that the modified time is in some window of time.time(). The problem is that if I modify a file in my Dropbox folder from a different computer than where the script is set to run, the modified time does not change on that latter computer. Is there a good way to watch shared files that doesn't run into this problem?
Thanks for any help! I am just getting into python.
*******UPDATE*******
I have been playing more with how Dropbox handles file timestamping. It only updates the mtime if the file changes. If you open a file, modify it, but save it unchanged, the mtime stays the same.

It looks that Dropbox preserves mtime when synchronizing files. Try to detect changed file by changed file size and/or checksum (MD5, SHA1 or so) instead of modification time. Or just ask Dropbox :) (I don't know if it has any API for this).

Related

Getting the time where a file is copied to a folder (Python)

I'm trying to write a Python script that runs on Windows. Files are copied to a folder every few seconds, and I'm polling that folder every 30 seconds for names of the new files that were copied to the folder after the last poll.
What I have tried is to use one of the os.path.getXtime(folder_path) functions and compare that with the timestamp of my previous poll. If the getXtime value is larger than the timestamp, then I work on those files.
I have tried to use the function os.path.getctime(folder_path), but that didn't work because the files were created before I wrote the script. I tried os.path.getmtime(folder_path) too but the modified times are usually smaller than the poll timestamp.
Finally, I tried os.path.getatime(folder_path), which works for the first time the files were copied over. The problem is I also read the files once they were in the folder, so the access time keeps getting updated and I end up reading the same files over and over again.
I'm not sure what is a better way or function to do this.

You've got a bit of an XY problem here. You want to know when files in a folder change, you tried a homerolled solution, it didn't work, and now you want to fix your homerolled solution.
Can I suggest that instead of terrible hackery, you use an existing package designed for monitoring for file changes? One that is not a polling loop, but actually gets notified of changes as they happen? While inotify is Linux-only, there are other options for Windows.

How to copy only the newer files? Python script. Unix Host

I've to copy from a Remote Host to Local.
My current logic - Generate a list of files from Remote. Make the file with latest modified time-stamp as my water mark. In the next run copy files with modified time greater than the watermark. Again update watermark.
Problem - The logic fails, when files are copied to remote while preserving the modified time. If a file comes in with modified time lower than the watermark, that file gets skipped.
Any idea how to circumvent the issue?
I dont want to make lists on Remote & Local and find the difference.

Beginner Python: Saving an excel file while it is open

I have a simple problem that I hope will have a simple solution.
I am writing python(2.7) code using the xlwt package to write excel files. The program takes data and writes it out to a file that is being saved constantly. The problem is that whenever I have the file open to check the data and python tries to save the file the program crashes.
Is there any way to make python save the file when I have it open for reading?

My experience is that sashkello is correct, Excel locks the file. Even OpenOffice/LibreOffice do this. They lock the file on disk and create a temp version as a working copy. ANY program trying to access the open file will be denied by the OS. The reason for this is because many corporations treat Excel files as databases but the users have no understanding of the issues involved in concurrency and synchronisation.
I am on linux and I get this behaviour (at least when the file is on a SAMBA share). Look in the same directory as your file, if a file called .~lock.[filename]# exists then you will be unable to read your file from another program. I'm not sure what enforces this lock but I suspect it's an NTFS attribute. Note that even a simple cp or cat fails: cp: error reading ‘CATALOGUE.ods’: Input/output error
UPDATE: The actual locking mechanism appears to be 'oplocks`, a concept connected to Windows shares: http://oreilly.com/openbook/samba/book/ch05_05.html . If the share is managed by Samba the workaround is to disable locks on certain file types, eg:
veto oplock files = /*.xlsx/
If you aren't using a share or NTFS on linux then I guess you should be able to RW the file as long as your script has write permissions. By default only the user who created the file has write access.
WORKAROUND 2: The restriction only seems to apply if you have the file open in Excel/LO as writable, however LO at least allows you to open a file as read-only (Go to File -> Properties -> Security, set Read-Only, Save and re-open the file). I don't know if this will also make it RO for xlwt though.

Hah, funny I ran across your post. I actually just implemented this tonight.
The issue is that Excel files write, and that's it, not both. You cannot read/write off the same object. So if you have another method to save data please do. I'm in a position where I don't have an option.. and so might you.
You're going to need xlutils it's the bread and butter to this.
Here's some example code:
from xlutils.copy import copy
wb_filename = 'example.xls'
wb_object = xlrd.open_workbook(wb_filename)
# And then you can read this file to your hearts galore.
# Now when it comes to writing to this, you need to copy the object and work off that.
write_object = copy(wb_object)
# Write to it all you want and then save that object.
And that's it, now if you read the object, write to it, and read the original one again it won't be updated. You either need to recreate wb_object or you need to create some sort of table in memory that you can keep track of while working through it.

How to determing if a file has finished downloading in python

I have a folder called raw_files. Very large files (~100GB files) from several sources will be uploaded to this folder.
I need to get file information from videos that have finished uploading to the folder. What is the best way to determine if a file is currently being downloaded to the folder (pass) or if the video has finished download (run script)? Thank you.

The most reliable way is to modify the uploading software if you can.
A typical scheme would be to first upload each file into a temporary directory on the same filesystem, and move to the final location when the upload is finished. Such a "move" operation is cheap and atomic.
A variation on this theme is to upload each file under a temporary name (e.g. file.dat.incomplete instead of file.dat) and then rename. You script will simply need to skip files called *.incomplete.

If you check those files, store the size of the files somewhere. When you are in the next round and the filesize is still the same, you can pretty much consider them as finished (depending on how much time is between first and second check). The time interval could e.g. be set to the timeout-interval of your uploading service(FTP, whatever).
There is no special sign or content showing that a file is complete.

Check if the directory content has changed with shell script or python

I have a program that create files in a specific directory.
When those files are ready, I run Latex to produce a .pdf file.
So, my question is, how can I use this directory change as a trigger
to call Latex, using a shell script or a python script?
Best Regards

inotify replaces dnotify.
Why?
...dnotify requires opening one file descriptor for each directory that you intend to watch for changes...
Additionally, the file descriptor pins the directory, disallowing the backing device to be unmounted, which causes problems in scenarios involving removable media. When using inotify, if you are watching a file on a file system that is unmounted, the watch is automatically removed and you receive an unmount event.
...and more.
More Why?
Unlike its ancestor dnotify, inotify doesn't complicate your work by various limitations. For example, if you watch files on a removable media these file aren't locked. In comparison with it, dnotify requires the files themselves to be open and thus really "locks" them (hampers unmounting the media).
Reference

Is dnotify what you need?

Make on unix systems is usually used to track by date what needs rebuilding when files have changed. I normally use a rather good makefile for this job. There seems to be another alternative around on google code too

You not only need to check for changes, but need to know that all changes are complete before running LaTeX. For example, if you start LaTeX after the first file has been modified and while more changes are still pending, you'll be using partial data and have to re-run later.
Wait for your first program to complete:
#!/bin/bash
first-program &&
run-after-changes-complete
Using && means the second command is only executed if the first completes successfully (a zero exit code). Because this simple script will always run the second command even if the first doesn't change any files, you can incorporate this into whatever build system you are already familiar with, such as make.

Python FAM is a Python interface for FAM (File Alteration Monitor)
You can also have a look at Pyinotify, which is a module for monitoring file system changes.

Not much of a python man myself. But in a pinch, assuming you're on linux, you could periodically shell out and "ls -lrt /path/to/directory" (get the directory contents and sort by last modified), and compare the results of the last two calls for a difference. If so, then there was a change. Not very detailed, but gets the job done.

You can use native python module hashlib which implements MD5 algorithm:
>>> import hashlib
>>> import os
>>> m = hashlib.md5()
>>> for root, dirs, files in os.walk(path):
for file_read in files:
full_path = os.path.join(root, file_read)
for line in open(full_path).readlines():
m.update(line)
>>> m.digest()
'pQ\x1b\xb9oC\x9bl\xea\xbf\x1d\xda\x16\xfe8\xcf'
You can save this result in a file or a variable, and compare it to the result of the next run. This will detect changes in any files, in any sub-directory.
This does not take into account file permission changes; if you need to monitor these change as well, this could be addressed via appending a string representing the permissions (accessible via os.stat for instance, attributes depend on your system) to the mvariable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.