rsync - write delta (new data) to new file - python

I have a growing web server log file that rotates once a week, and I need to feed 1 hour worth of logs to some python script. So, rsync is the solution, right?
I know that rsync would transfer (add) only new (changed) date in file. But how do I make it write new lines from remote log (form last time it did so an hour aga) to a separete file for inspection?
So, the difference from normal behaviour is that it does not append new (changed) lines to local file, but just writes the difference to a separete one.

Related

Routing Python Logs to Databases Efficiently

I want to route some logs from my application to a database. Now, I know that this isn't exactly the ideal way to store logs, but my use case requires it.
I have also seen how one can write their own database logger as explained here,
python logging to database
This looks great, but given that a large number of logs are generated from an application, I feel like sending as many requests to the database could maybe overwhelm it? It may not be the most efficient solution?
Given that this argument is correct, what are some efficient methods for achieving this?
Some ideas that come to mind are,
Write the logs out to a log file during application run time and develop a script that will parse the file and make bulk inserts to a database.
Build some kind of queue architecture that the logs will be routed to, where each record will be inserted to the database in sequence.
Develop a type of reactive program, that will run in the background and route logs to the database.
etc.
What are some other possibilities that can be explored? Are there any best practices?
The rule of thumb is that DB throughput will be greater
if you can batch N row inserts into a single commit,
rather than doing N separate commits.
Have your app append to a structured log file, such as a .CSV
or an easily parsed logfile format.
Be sure to .flush() before sleeping for a while,
so recent output will be visible to other processes.
Consider making a call to .fsync() every now and again
if durability following power fail matters to the app.
Now you have timestamped structured logs that are safely stored
in the filesystem. Clearly there are other ways, such as 0mq
or Kafka, but FS is simplest and plays nicely with unit tests.
During interactive debugging you can tail -f the file.
Now write a daemon that tail -f's the file and copies new
records to the database. Upon reboot it will .seek() to end
after perhaps copying any trailing lines that are missing from DB.
Use kqueue -style events, or poll every K seconds and then sleep.
You can .stat() the file to learn its current length.
Beware of partial lines, where last character in file is not newline.
Consume all unseen lines, BEGIN a transaction, INSERT each line,
COMMIT the DB transaction, resume the loop.
When you do log rolling, avoid renaming logs.
Prefer log filenames that contain ISO8601 timestamps.
Perhaps you settle on daily logs.
Writer won't append lines past midnight, and will move on
to the next filename. Daemon will notice the newly created
file and will .close() the old one, with optional delete
of ancient logs more than a week old.
Log writers might choose to prepend a hashed checksum
to each message, so the reader can verify it receieved
the whole message intact.
A durable queue like Kafka certainly holds some attraction,
but has more moving pieces.
Maybe implement FS logging, with unit tests, and then
use what you've already learned about the application, when
you refactor to employ a more sophisticated message queueing API.

compare list size of storages with yesterday in linux by python

I must check our storage size every morning.
The routine is that I connect to the Linux server every day (via ssh). And then with the command "df -h" I see the storage list and write the change compared to yesterday in an excel file. I wanted to see if there was a way for this comparison to be done automatically via a code.
I have to check each of these storages every day. for example I must check following storage and if it increase inform others.

How to solve OdbError in Abaqus Python script?

I am running a 3D solid model in Abaqus Python script, which is supposed to be analyzed for 200 times as the model has been arranged in a for loop (for i in range(0,199):). Sometimes, I receive the following error and then the analysis terminates. I can't realize the reason.
Odb_0=session.openOdb(name='Job-1'+'.odb')
odberrror: the .lck file for the output database D:/abaqus/Model/Job-1.odb indicates that the analysis Input File Processor is currently modifying the database. The database cannot be opened at this time.
It is noted that all the variables including "Odb_0" and .... are deleted at the end of each step of the loop prior to starting the further one.
I don't believe your problem will be helped by a change in element type.
The message and the .lck file say that there's an access deadlock in the database. The output file lost out and cannot update the .odb database.
I'm not sure what database Abaqus uses. I would have guessed that the input stream would have scanned the input file and written whatever records were necessary to the database before the solution and output processing began.
From the Abaqus documentation
The lock file (job_name.lck) is written whenever an output database file is opened with write access, including when an analysis is running and writing output to an output database file. The lock file prevents you from having simultaneous write permission to the output database from multiple sources. It is deleted automatically when the output database file is closed or when the analysis that creates it ends.
When you are deleting your previous analysis you should be sure that all processes connected with that simulation have been terminated. There are several possibilities to do so:
Launching simulation through subprocess.popen could give you much more control over the process (e.g. waiting until it ends, writing of a specific log, etc.);
Naming your simulations differently (e.g. 'Job-1', 'Job-2', etc.) and deleting old ones with a delay (e.g. deleting 'Job-1' while 'Job-3' has started);
Less preferable: using the time module

Monitor file and iterate Python loop on each new line

I have a script that runs tcpdump indefinitely, and outputs to a capture.out file. I would like to write another Python script to monitor capture.out and iterate over a loop each time a new line (or even better, a new packet) is written to the file by the other script.
I know how to loop through lines in a file, but I am not sure how to continuously monitor a file and iterate only when a new line (or packet) is written by the other script.
My ultimate goal is to publish each packet captured over MQTT (filtering out MQTT traffic of course), so if there is a more efficient solution to my end goal here, such as bypassing an output file and a simple way to make a Python function call on each packet captured by tcpdump, that would be even better.

How to download part of a file over SFTP connection?

So I have a Python program that pulls access logs from remote servers and processes them. There are separate log files for each day. The files on the servers are in this format:
access.log
access.log-20130715
access.log-20130717
The file "access.log" is the log file for the current day, and is modified throughout the day with new data. The files with the timestamp appended are archived log files, and are not modified. If any of the files in the directory are ever modified, it is either because (1) data is being added to the "access.log" file, or (2) the "access.log" file is being archived, and an empty file takes its place. Every minute or so, my program checks for the most recent modification time of any files in the directory, and if it changes it pulls down the "access.log" file and any newly archived files
All of this currently works fine. However, if a lot of data is added to the log file throughout the day, downloading the whole thing over and over just to get some of the data at the end of the file will create a lot of traffic on the network, and I would like to avoid that. Is there any way to only download a part of the file? If I have already processed, say 1 GB of the file, and another 500 bytes suddenly get added to the log file, is there a way to only download the 500 bytes at the end?
I am using Python 3.2, my local machine is running Windows, and the remote servers all run Linux. I am using Chilkat for making SSH and SFTP connections. Any help would be greatly appreciated!
Call ResumeDownloadFileByName. Here's the description of the method in the Chilkat reference documentation:
Resumes an SFTP download. The size of the localFilePath is checked and
the download begins at the appropriate position in the remoteFilePath.
If localFilePath is empty or non-existent, then this method is
identical to DownloadFileByName. If the localFilePath is already fully
downloaded, then no additional data is downloaded and the method will
return True.
See http://www.chilkatsoft.com/refdoc/pythonCkSFtpRef.html
You could do that, or you could massively reduce your complexity by splitting the latest log file down into hours, or tens of minutes.

Categories

Resources