How to copy only the newer files? Python script. Unix Host

How to copy only the newer files? Python script. Unix Host - python

I've to copy from a Remote Host to Local.
My current logic - Generate a list of files from Remote. Make the file with latest modified time-stamp as my water mark. In the next run copy files with modified time greater than the watermark. Again update watermark.
Problem - The logic fails, when files are copied to remote while preserving the modified time. If a file comes in with modified time lower than the watermark, that file gets skipped.
Any idea how to circumvent the issue?
I dont want to make lists on Remote & Local and find the difference.

Related

How to capture the list of changed .c files when compiling?

The version of eclipse I am using analyzes the .o files that occur in the compilation stage and analyzes which files have been updated prior to compiling, so as to shorten compile time to only what needs an update.
For the sake of change management, is there a way I can capture this list of updated files?
I want to ::
Capture the list.
Go through the files one by one and copy the changes to a "backup" folder.
Create a script that will allow me to "revert" changes and then let me know what files were changed.
This way if a mistake is made, I can go back to a working version and start over as necessary, or perform debugging on the code.
I think I can do part 2 and 3, I am just needing help with part 1
Edit 1: April 2, 2019 :
Regarding GIT : I will try this, but it appears I may have to wait to get approval from my company in order to install it on my system. Furthermore, I want to ensure that such functionality is available for others in my company (on a local basis), even if they may not have Git installed (because for whatever reason it is not standard)
As such, I have found 3 possible options :
1) forfiles : Has an option that allows you to find all files modified after a certain date. Can be coupled with a copy command to copy each individual file over to a desired folder. Con : Only uses date, I found no such references that use Time as a filter mechanism to establish changes after a certain time.
2) Get-ChildItem : can be filtered to only find files changed after a certain time. The output can be sent to a folder, where the files and filepaths can be parsed into a csv file, which can be easily used to establish what files in what paths were updated and then downgrade those files later on. This same list can also be used to establish what files need to be copied over
3) XCopy : Can find all files modified after a certain date using the /d [:MM-DD-YYYY] modifier, but I do not see a time option available, which would mean I would have to then go through the files copied over and delete any modified before the desired time, keeping only those modified after.
*source for info ::
Find files on Windows modified after a given date using the command line
At this time, of the 3 options listed, Get-ChildItem may be the best option. Will update more when I have decided on an option

checkout the os.stat package, where you have st_mtime, to check when a file was changed last time. but you have to manage storing the last changes, and do your backup, when they change. If you are lazy, I there is something, relatively solid on Pypy called watchdog.

Getting the time where a file is copied to a folder (Python)

I'm trying to write a Python script that runs on Windows. Files are copied to a folder every few seconds, and I'm polling that folder every 30 seconds for names of the new files that were copied to the folder after the last poll.
What I have tried is to use one of the os.path.getXtime(folder_path) functions and compare that with the timestamp of my previous poll. If the getXtime value is larger than the timestamp, then I work on those files.
I have tried to use the function os.path.getctime(folder_path), but that didn't work because the files were created before I wrote the script. I tried os.path.getmtime(folder_path) too but the modified times are usually smaller than the poll timestamp.
Finally, I tried os.path.getatime(folder_path), which works for the first time the files were copied over. The problem is I also read the files once they were in the folder, so the access time keeps getting updated and I end up reading the same files over and over again.
I'm not sure what is a better way or function to do this.

You've got a bit of an XY problem here. You want to know when files in a folder change, you tried a homerolled solution, it didn't work, and now you want to fix your homerolled solution.
Can I suggest that instead of terrible hackery, you use an existing package designed for monitoring for file changes? One that is not a polling loop, but actually gets notified of changes as they happen? While inotify is Linux-only, there are other options for Windows.

How to find modified files in Python

I want to monitor a folder and see if any new files are added, or existing files are modified. The problem is, it's not guaranteed that my program will be running all the time (so, inotify based solutions may not be suitable here). I need to cache the status of the last scan and then with the next scan I need to compare it with the last scan before processing the files.
What are the alternatives for achieving this in Python 2.7?
Note1: Processing of the files is expensive, so I'm trying to process the files that are not modified in the meantime. So, if the file is only renamed (as opposed to a change in the contents of the file), I would also like to detect that and skip the processing.
Note2: I'm only interested in a Linux solution, but I wouldn't complain if answers for other platforms are added.

There are several ways to detect changes in files. Some are easier to
fool than others. It doesn't sound like this is a security issue; more
like good faith is assumed, and you just need to detect changes without
having to outwit an adversary.
You can look at timestamps. If files are not renamed, this is a good way
to detect changes. If they are renamed, timestamps alone wouldn't
suffice to reliably tell one file from another. os.stat will tell you
the time a file was last modified.
You can look at inodes, e.g., ls -li. A file's inode number may change
if changes involve creating a new file and removing the old one; this is
how emacs typically changes files, for example. Try changing a file
with the standard tool your organization uses, and compare inodes before
and after; but bear in mind that even if it doesn't change this time, it
might change under some circumstances. os.stat will tell you inode
numbers.
You can look at the content of the files. cksum computes a small CRC
checksum on a file; it's easy to beat if someone wants to. Programs such
as sha256sum compute a secure hash; it's infeasible to change a file
without changing such a hash. This can be slow if the files are large.
The hashlib module will compute several kinds of secure hashes.
If a file is renamed and changed, and its inode number changes, it would
be potentially very difficult to match it up with the file it used to
be, unless the data in the file contains some kind of immutable and
unique identifier.
Think about concurrency. Is it possible that someone will be changing a
file while the program runs? Beware of race conditions.

I would've probably go with some kind of sqlite solution, such as writing the last polling time.
Then on each such poll, sort the files by last_modified_time (mtime) and get all the ones who are having mtime greater than your previous poll (this value will be taken out of the sqlite or some kind of file if you insist on not having requirement of such db).

Monitoring for new files isn't hard -- just keep a list or database of inodes for all files in the directory. A new file will introduce a new inode. This will also help you avoid processing renamed files, since inode doesn't change on rename.
The harder problem is monitoring for file changes. If you also store file size per inode, then obviously a changed size indicates a changed file and you don't need to open and process the file to know that. But for a file that has (a) a previously recorded inode, and (b) is the same size as before, you will need to process the file (e.g. compute a checksum) to know if it has changed.

I suggest cheating and using the system find command. For example, the following finds all Python files that have been modified or created in the last 60 minutes. Using the ls output can determine if further checking is needed.
$ echo beer > zoot.py
$ find . -name '*.py' -mmin -60 -type f -ls
1973329 4 -rw-r--r-- 1 johnm johnm 5 Aug 30 15:17 ./zoot.py

Are there a modules for temporarily backup and restore text files in Python

I need to modify a text file at runtime but restore its original state later (even if the computer crash).
My program runs in regular sessions. Once a session ended, the original state of that file can be changed, but the original state won't change at runtime.
There are several instances of this text file with the same name in several directories. My program runs in each directory (but not in parallel), but depending on the directory content's it does different things. The order of choosing a working directory like this is completely arbitrary.
Since the file's name is the same in each directory, it seems a good idea to store the backed up file in slightly different places (ie. the parent directory name could be appended to the backup target path).
What I do now is backup and restore the file with a self-written class, and also check at startup if the previous backup for the current directory was properly restored.
But my implementation needs serious refactoring, and now I'm interested if there are libraries already implemented for this kind of task.
edit
version control seems like a good idea, but actually it's a bit overkill since it requires network connection and often a server. Other VCS need clients to be installed. I would be happier with a pure-python solution, but at least it should be cross-platform, portable and small enough (<10mb for example).

Why not just do what every unix , mac , window file has done for years -- create a lockfile/working file concept.
When a file is selected for edit:
Check to see if there is an active lock or a crashed backup.
If the file is locked or crashed, give a "recover" option
Otherwise, begin editing the file...
The editing tends to do one or more of a few things:
Copy the original file into a ".%(filename)s.backup"
Create a ".%(filename)s.lock" to prevent others from working on it
When editing is achieved, the lock goes away and the .backup is removed
Sometimes things are slightly reversed, and the original stays in place while a .backup is the active edit; on success the .backup replaces the original
If you crash vi or some other text programs on a linux box, you'll see these files created . note that they usually have a dot(.) prefix so they're normally hidden on the command line. Word/Powerpoint/etc all do similar things.

Implement Version control ... like svn (see pysvn) it should be fast as long as the repo is on the same server... and allows rollbacks to any version of the file... maybe overkill but that will make everything reversible
http://pysvn.tigris.org/docs/pysvn_prog_guide.html
You dont need a server ... you can have local version control and it should be fine...

Git, Subversion or Mercurial is your friend.

os.path.getmtime of shared files in Dropbox

I want to run a script to check whether certain files in my Dropbox folder have changed. I am currently using os.path.getmtime() to check that the modified time is in some window of time.time(). The problem is that if I modify a file in my Dropbox folder from a different computer than where the script is set to run, the modified time does not change on that latter computer. Is there a good way to watch shared files that doesn't run into this problem?
Thanks for any help! I am just getting into python.
*******UPDATE*******
I have been playing more with how Dropbox handles file timestamping. It only updates the mtime if the file changes. If you open a file, modify it, but save it unchanged, the mtime stays the same.

It looks that Dropbox preserves mtime when synchronizing files. Try to detect changed file by changed file size and/or checksum (MD5, SHA1 or so) instead of modification time. Or just ask Dropbox :) (I don't know if it has any API for this).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.