Could I unblock a file in windows(7), which is automatically blocked by windows (downloaded from Internet) from a python script? A WindowsError is raised when such a file is encountered. I thought of catching this exception, and running a powershell script that goes something like:
Parameter Set: ByPath
Unblock-File [-Path] <String[]> [-Confirm] [-WhatIf] [ <CommonParameters>]
Parameter Set: ByLiteralPath
Unblock-File -LiteralPath <String[]> [-Confirm] [-WhatIf] [ <CommonParameters>]
I don't know powershell scripting. But if I had one I could call it from python. Could you folks help?
Yes, all you have to do is call the following command line from Python:
powershell.exe -Command Unblock-File -Path "c:\path\to\blocked file.ps1"
From this page about the Unblock-File command: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/unblock-file?view=powershell-7.2
Internally, the Unblock-File cmdlet removes the Zone.Identifier alternate data stream, which has a value of 3 to indicate that it was downloaded from the internet.
To remove an alternate data stream ads_name from a file path\to\file.ext, simply delete path\to\file.ext:ads_name:
try:
os.remove(your_file_path + ':Zone.Identifier')
except FileNotFoundError:
# The ADS did not exist, it was already unblocked or
# was never blocked in the first place
pass
# No need to open up a PowerShell subprocess!
(And similarly, to check if a file is blocked you can use os.path.isfile(your_file_path + ':Zone.Identifier'))
In a PowerShell script, you can use Unblock-File for this, or simply Remove-Item -Path $your_file_path':Zone.Identifier'.
Remove-Item also has a specific flag for alternate data streams: Remove-Item -Stream Zone.Identifier (which you can pipe in multiple files to, or a single -Path)
Late to the party . . . .
I have found that the Block status is simply an extra 'file' (stream) attached in NTFS and it can actually be accessed and somewhat manipulated by ordinary means. These are called Alternative Data Streams.
The ADS for file blocking (internet zone designation) is called ':Zone.Identifier' and contains, I think, some useful information:
[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://www.google.com/
HostUrl=https://imgs.somewhere.com/product/1969297/some-pic.jpg
All the other info I have found says to just delete this extra stream.... But, personally, I want to keep this info.... So I tried changing the ZoneId to 0, but it still shows as Blocked in Windows File Properties.
I settled on moving it to another stream name so I can still find it later.
The below script originated from a more generic script called pyADS. I only care about deleting / changing the Zone.Identifier attached stream -- which can all be done with simple Python commands. So this is a stripped-down version. It has several really nice background references listed. I am currently running the latest Windows 10 and Python 3.8+; I make no guarantees this works on older versions.
import os
'''
References:
Accessing alternative data-streams of files on an NTFS volume https://www.codeproject.com/Articles/2670/Accessing-alternative-data-streams-of-files-on-an
Original ADS class (pyADS) https://github.com/RobinDavid/pyADS
SysInternal streams applet https://learn.microsoft.com/en-us/sysinternals/downloads/streams
Windows: killing the Zone.Identifier NTFS alternate data stream https://wiert.me/2011/11/25/windows-killing-the-zone-identifier-ntfs-alternate-data-stream-from-a-file-to-prevent-security-warning-popup/
Zone.Information https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/6e3f7352-d11c-4d76-8c39-2516a9df36e8
About URL Security Zones https://learn.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/platform-apis/ms537183(v=vs.85)?redirectedfrom=MSDN
GREAT info: How Windows Determines That the File.... http://woshub.com/how-windows-determines-that-the-file-has-been-downloaded-from-the-internet/
Dixin's Blog: Understanding File Blocking and Unblocking https://weblogs.asp.net/dixin/understanding-the-internet-file-blocking-and-unblocking
'''
class ADS2():
def __init__(self, filename):
self.filename = filename
def full_filename(self, stream):
return "%s:%s" % (self.filename, stream)
def add_stream_from_file(self, filename):
if os.path.exists(filename):
with open(filename, "rb") as f: content = f.read()
return self.add_stream_from_string(filename, content)
else:
print("Could not find file: {0}".format(filename))
return False
def add_stream_from_string(self, stream_name, bytes):
fullname = self.full_filename(os.path.basename(stream_name))
if os.path.exists(fullname):
print("Stream name already exists")
return False
else:
fd = open(fullname, "wb")
fd.write(bytes)
fd.close()
return True
def delete_stream(self, stream):
try:
os.remove(self.full_filename(stream))
return True
except:
return False
def get_stream_content(self, stream):
fd = open(self.full_filename(stream), "rb")
content = fd.read()
fd.close()
return content
def UnBlockFile(file, retainInfo=True):
ads = ADS2(file)
if zi := ads.get_stream_content("Zone.Identifier"):
ads.delete_stream("Zone.Identifier")
if retainInfo: ads.add_stream_from_string("Download.Info", zi)
### Usage:
from unblock_files import UnBlockFile
UnBlockFile(r"D:\Downloads\some-pic.jpg")
Before:
D:\downloads>dir /r
Volume in drive D is foo
Directory of D:\downloads
11/09/2021 10:05 AM 8 some-pic.jpg
126 some-pic.jpg:Zone.Identifier:$DATA
1 File(s) 8 bytes
D:\downloads>more <some-pic.jpg:Zone.Identifier:$DATA
[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://www.google.com/
HostUrl=https://imgs.somewhere.com/product/1969297/some-pic.jpg
After:
D:\downloads>dir /r
Volume in drive D is foo
Directory of D:\downloads
11/09/2021 10:08 AM 8 some-pic.jpg
126 some-pic.jpg:Download.Info:$DATA
1 File(s) 8 bytes
Related
I'm using ansbile_runner Python module as bellow:
import ansible_runner
r = ansible_runner.run(private_data_dir='/tmp/demo', playbook='test.yml')
When I execute the above code, it will show the output without printing in Python. What I want is to save the stdout content into a Python variable for further text processing.
Did you read the manual under https://ansible-runner.readthedocs.io/en/stable/python_interface/ ? There is an example where you add another parameter, which is called output_fd and that could be a file handler instead of sys.stdout.
Sadly, this is a parameter of the run_command function and the documentation is not very good. A look into the source code at https://github.com/ansible/ansible-runner/blob/devel/ansible_runner/interface.py could help you.
According to the implementation details in https://github.com/ansible/ansible-runner/blob/devel/ansible_runner/runner.py it looks like, the run() function always prints to stdout.
According to the interface, there is a boolean flag in run(json_mode=TRUE) that stores the response in JSON (I expect in r instead of stdout) and there is another boolean flag quiet.
I played around a little bit. The relevant option to avoid output to stdout is quiet=True as run() attribute.
Ansible_Runner catches the output and writes it to a file in the artifacts directory. Every run() command produces that directory as described in https://ansible-runner.readthedocs.io/en/stable/intro/#runner-artifacts-directory-hierarchy. So there is a file called stdout in the artifact directory. It contains the details. You can read it as JSON.
But also the returned object contains already some relevant data. Here is my example
playbook = 'playbook.yml'
private_data_dir = 'data/' # existing folder with inventory etc
work_dir = 'playbooks/' # contains playbook and roles
try:
logging.debug('Running ansible playbook {} with private data dir {} in project dir {}'.format(playbook, private_data_dir, work_dir))
runner = ansible_runner.run(
private_data_dir=private_data_dir,
project_dir=work_dir,
playbook=playbook,
quiet=True,
json_mode=True
)
processed = runner.stats.get('processed')
failed = runner.stats.get('failures')
# TODO inform backend
for host in processed:
if host in failed:
logging.error('Host {} failed'.format(host))
else:
logging.debug('Host {} backupd'.format(host))
logging.error('Playbook runs into status {} on inventory {}'.format(runner.status, inventory.get('name')))
if runner.rc != 0:
# we have an overall failure
else:
# success message
except BaseException as err:
logging.error('Could not process ansible playbook {}\n{}'.format(inventory.get('name'),err))
So this outputs all processed hosts and informs about failures per host. Concrete more output can be found in the stdout file in artifact directory.
import sys
import subprocess
command = 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64 -y ' + sys.argv[1] + ' -i ' + sys.argv[2] + ' -z ' + sys.argv[3] + ' -c "!analyze" '
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
I tried this code, I am trying to take input of crash dump name and exe location and then I have to display user understandable crash analysis ouput.How to do that using python scripting? Is it easier with cpp scripting?
take input of crash dump name and exe location and then I have to display user understandable crash analysis ouput.
It seems you want to parse the text output of the !analyze command. You can do that, but you should be aware that this command can have a lot of different output.
Let me assume you're analyzing a user mode crash dump. In such a case, I would first run a few simpler commands to check whether you got a legit dump. You may try the following commands:
|| to check the dump type (should be "user")
| to get the name of the executable (should match your application)
lmvm <app> to check the version number of your executable
If everything is fine, you can go on:
.exr -1: distinguish between a crash and a hang. A 80000003 breakpoint is more likely a hang or nothing at all.
This may help you decide if you should run !analyze or !analyze -hang.
How to do that using Python scripting?
[...] \Windows Kits\10\Debuggers\x64 -y ' + [...]
This path contains backslashes, so you want to escape them or use an r-string like r"C:\Program Files (x86)\Windows Kits\10\...".
You should probably start an executable here to make it work. cdb.exe is the command line version of WinDbg.
command.split()
This will not only split the arguments, but also the path to the exectuable. Thus subprocess.popen() will try to an application called C:\Program which does not exist.
This could fail even more often, depending on the arguments with spaces in sys.argv[].
I suggest that you pass the options as they are:
command = r'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\cdb.exe'
arguments = [command]
arguments.extend(['-y', sys.argv[1]]) # Symbol path
arguments.extend(['-i', sys.argv[2]]) # Image path
arguments.extend(['-z', sys.argv[3]]) # Dump file
arguments.extend(['-c', '!analyze']) # Command(s) for analysis
process = subprocess.Popen(arguments, stdout=subprocess.PIPE)
Note that there's no split() involved, which could split in wrong position.
Side note: -i may not work as expected. If you receive the crash dump from clients, they may have a different version than the one you have on disk. Set up a proper symbol server to mitigate this.
Is it easier with CPP scripting?
It will be different, not easier.
Working example
This is a Python code that considers the above. It's still a bit hacky because of the delays etc. but there's no real indicator other than time and output for deciding when a command finished. This succeeds with Python 3.8 on a crash dump of Windows Explorer.
import subprocess
import threading
import time
import re
class ReaderThread(threading.Thread):
def __init__(self, stream):
super().__init__()
self.buffer_lock = threading.Lock()
self.stream = stream # underlying stream for reading
self.output = "" # holds console output which can be retrieved by getoutput()
def run(self):
"""
Reads one from the stream line by lines and caches the result.
:return: when the underlying stream was closed.
"""
while True:
line = self.stream.readline() # readline() will block and wait for \r\n
if len(line) == 0: # this will only apply if the stream was closed. Otherwise there is always \r\n
break
with self.buffer_lock:
self.output += line
def getoutput(self, timeout=0.1):
"""
Get the console output that has been cached until now.
If there's still output incoming, it will continue waiting in 1/10 of a second until no new
output has been detected.
:return:
"""
temp = ""
while True:
time.sleep(timeout)
if self.output == temp:
break # no new output for 100 ms, assume it's complete
else:
temp = self.output
with self.buffer_lock:
temp = self.output
self.output = ""
return temp
command = r'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\cdb.exe'
arguments = [command]
arguments.extend(['-y', "srv*D:\debug\symbols*https://msdl.microsoft.com/download/symbols"]) # Symbol path, may use sys.argv[1]
# arguments.extend(['-i', sys.argv[2]]) # Image path
arguments.extend(['-z', sys.argv[3]]) # Dump file
arguments.extend(['-c', ".echo LOADING DONE"])
process = subprocess.Popen(arguments, stdout=subprocess.PIPE, stdin=subprocess.PIPE, universal_newlines=True)
reader = ReaderThread(process.stdout)
reader.start()
result = ""
while not re.search("LOADING DONE", result):
result = reader.getoutput() # ignore initial output
def dbg(command):
process.stdin.write(command+"\r\n")
process.stdin.flush()
return reader.getoutput()
result = dbg("||")
if "User mini" not in result:
raise Exception("Not a user mode dump")
else:
print("Yay, it's a user mode dump")
result = dbg("|")
if "explorer" not in result:
raise Exception("Not an explorer crash")
else:
print("Yay, it's an Explorer crash")
result = dbg("lm vm explorer")
if re.search(r"^\s*File version:\s*10\.0\..*$", result, re.M):
print("That's a recent version for which we should analyze crashes")
else:
raise Exception("That user should update to a newer version before we spend effort on old bugs")
dbg("q")
if you don't want to use windbg which is a gui
use cdb.exe it is console mode windbg it will output all the results to terminal
here is a demo
F:\>cdb -c "!analyze -v;qq" -z testdmp.dmp | grep -iE "bucket|owner"
DEFAULT_BUCKET_ID: BREAKPOINT
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
FOLLOWUP_NAME: MachineOwner
BUCKET_ID: BREAKPOINT_ntdll!LdrpDoDebuggerBreak+30
BUCKET_ID_IMAGE_STR: ntdll.dll
BUCKET_ID_MODULE_STR: ntdll
BUCKET_ID_FUNCTION_STR: LdrpDoDebuggerBreak
BUCKET_ID_OFFSET: 30
BUCKET_ID_MODTIMEDATESTAMP: c1bb301
BUCKET_ID_MODCHECKSUM: 1f647b
BUCKET_ID_MODVER_STR: 10.0.18362.778
BUCKET_ID_PREFIX_STR: BREAKPOINT_
FAILURE_BUCKET_ID: BREAKPOINT_80000003_ntdll.dll!LdrpDoDebuggerBreak
Followup: MachineOwner
grep is a general purpose string parser
it is built-in in Linux
it is available for windows in several places
if in 32 bit you can use it from gnuwin32 package / Cygwin
if in 64 bit you can find it in git
you can use the native findstr.exe also
:\>dir /b f:\git\usr\bin\gr*
grep.exe
groups.exe
or in msys / mingw / Cygwin / wsl / third party clones /
:\>dir /b /s *grep*.exe
F:\git\mingw64\bin\x86_64-w64-mingw32-agrep.exe
F:\git\mingw64\libexec\git-core\git-grep.exe
F:\git\usr\bin\grep.exe
F:\git\usr\bin\msggrep.exe
F:\msys64\mingw64\bin\msggrep.exe
F:\msys64\mingw64\bin\pcregrep.exe
F:\msys64\mingw64\bin\x86_64-w64-mingw32-agrep.exe
F:\msys64\usr\bin\grep.exe
F:\msys64\usr\bin\grepdiff.exe
F:\msys64\usr\bin\msggrep.exe
F:\msys64\usr\bin\pcregrep.exe
or you can write your own simple string parser in python / JavaScript / typescript / c / c++ / ruby / rust / whatever
here is a sample python word lookup and repeat script
import sys
for line in sys.stdin:
if "BUCKET" in line:
print(line)
lets check this out
:\>dir /b *.py
pyfi.py
:\>cat pyfi.py
import sys
for line in sys.stdin:
if "BUCKET" in line:
print(line)
:\>cdb -c "!analyze -v ;qq" -z f:\testdmp.dmp | python pyfi.py
DEFAULT_BUCKET_ID: BREAKPOINT
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
BUCKET_ID: BREAKPOINT_ntdll!LdrpDoDebuggerBreak+30
BUCKET_ID_IMAGE_STR: ntdll.dll
BUCKET_ID_MODULE_STR: ntdll
BUCKET_ID_FUNCTION_STR: LdrpDoDebuggerBreak
BUCKET_ID_OFFSET: 30
BUCKET_ID_MODTIMEDATESTAMP: c1bb301
BUCKET_ID_MODCHECKSUM: 1f647b
BUCKET_ID_MODVER_STR: 10.0.18362.778
BUCKET_ID_PREFIX_STR: BREAKPOINT_
FAILURE_BUCKET_ID: BREAKPOINT_80000003_ntdll.dll!LdrpDoDebuggerBreak
I have the following code written in Python 2.7 on Windows. I want to check for updates for the current python script and update it, if there is an update, with a new version through ftp server preserving the filename and then executing the new python script after terminating the current through the os.kill with SIGNTERM.
I went with the exit function approach but I read that in Windows this only works with the atexit library and default python exit methods. So I used a combination of the atexit.register() and the signal handler.
***necessary libraries***
filematch = 'test.py'
version = '0.0'
checkdir = os.path.abspath(".")
dircontent = os.listdir(checkdir)
r = StringIO()
def exithandler():
try:
try:
if filematch in dircontent:
os.remove(checkdir + '\\' + filematch)
except Exception as e:
print e
ftp = FTP(ip address)
ftp.login(username, password)
ftp.cwd('/Test')
for filename in ftp.nlst(filematch):
fhandle = open(filename, 'wb')
ftp.retrbinary('RETR ' + filename, fhandle.write)
fhandle.close()
subprocess.Popen([sys.executable, "test.py"])
print 'Test file successfully updated.'
except Exception as e:
print e
ftp = FTP(ip address)
ftp.login(username, password)
ftp.cwd('/Test')
ftp.retrbinary('RETR version.txt', r.write)
if(r.getvalue() != version):
atexit.register(exithandler)
somepid = os.getpid()
signal.signal(SIGTERM, lambda signum, stack_frame: exit(1))
os.kill(somepid, signal.SIGTERM)
print 'Successfully replaced and started the file'
Using the:
signal.signal(SIGTERM, lambda signum, stack_frame: exit(1))
I get:
Traceback (most recent call last):
File "C:\Users\STiX\Desktop\Python Keylogger\test.py", line 50, in <module>
signal.signal(SIGTERM, lambda signum, stack_frame: exit(1))
NameError: name 'SIGTERM' is not defined
But I get the job done without a problem except if I use the current code in a more complex script where the script give me the same error but terminates right away for some reason.
On the other hand though, if I use it the correct way, signal.SIGTERM, the process goes straight to termination and the exit function never executed. Why is that?
How can I make this work on Windows and get the outcome that I described above successfully?
What you are trying to do seems a bit complicated (and dangerous from an infosec-perspective ;-). I would suggest to handle the reload-file-when-updated part of the functionality be adding a controller class that imports the python script you have now as a module and, starts it and the reloads it when it is updated (based on a function return or other technique) - look this way for inspiration - https://stackoverflow.com/a/1517072/1010991
Edit - what about exe?
Another hacky technique for manipulating the file of the currently running program would be the shell ping trick. It can be used from all programming languages. The trick is to send a shell command that is not executed before after the calling process has terminated. Use ping to cause the delay and chain the other commands with &. For your use case it could be something like this:
import subprocess
subprocess.Popen("ping -n 2 -w 2000 1.1.1.1 > Nul & del hack.py & rename hack_temp.py hack.py & hack.py ", shell=True)
Edit 2 - Alternative solution to original question
Since python does not block write access to the currently running script an alternative concept to solve the original question would be:
import subprocess
print "hello"
a = open(__file__,"r")
running_script_as_string = a.read()
b = open(__file__,"w")
b.write(running_script_as_string)
b.write("\nprint 'updated version of hack'")
b.close()
subprocess.Popen("python hack.py")
I am trying to archive a remote git repo using Python code. I did it successfully using Git command line with following command.
> git archive --format=zip --remote=ssh://path/to/my/repo -o archived_file.zip
HEAD:path/to/directory filename
This command fetches the required file from the repo and stores the zip in my current working directory. Note that there is no cloning of remote repo happening.
Now I have to do it using Python code. I am using GitPython 1.0.1. I guess if it is doable using command line then it should be doable using GitPython library. According to the docs,
repo.archive(open(join(rw_dir, 'archived_file.zip'), 'wb'))
Above line of code will archive the repo. Here repo is the instance of Repo class. It can be initialized using
repo = Repo('path to the directory which has .git folder')
If I give path to my remote repo(Ex. ssh://path/to/my/repo) in above line it goes to find it in directory where the .py file containing this code is residing(Like, Path\to\python\file\ssh:\path\to\my\repo), which is not what I want. So to sum up I can archive a local repo but not a remote one using GitPython. I may be able to archive remote repo if I am able to create a repo instance pointing to the remote repo. I am very new to Git and Python.
Is there any way to archive a remote repo using Python code without cloning it in local?
This is by the way a terrible idea, since you already have begun using gitpython, and I have never tried working with that, but I just really want to let you know, that you can do it without cloning it in local, without using gitpython.
Simply run the git command, in a shell, using subprocess..
running bash commands in python
edit: added some demonstration code, of reading stdout and writing stdin.
some of this is stolen from here:
http://eyalarubas.com/python-subproc-nonblock.html
The rest is a small demo..
first two prerequisites
shell.py
import sys
while True:
s = raw_input("Enter command: ")
print "You entered: {}".format(s)
sys.stdout.flush()
nbstreamreader.py:
from threading import Thread
from Queue import Queue, Empty
class NonBlockingStreamReader:
def __init__(self, stream):
'''
stream: the stream to read from.
Usually a process' stdout or stderr.
'''
self._s = stream
self._q = Queue()
def _populateQueue(stream, queue):
'''
Collect lines from 'stream' and put them in 'quque'.
'''
while True:
line = stream.readline()
if line:
queue.put(line)
else:
raise UnexpectedEndOfStream
self._t = Thread(target = _populateQueue,
args = (self._s, self._q))
self._t.daemon = True
self._t.start() #start collecting lines from the stream
def readline(self, timeout = None):
try:
return self._q.get(block = timeout is not None,
timeout = timeout)
except Empty:
return None
class UnexpectedEndOfStream(Exception): pass
then the actual code:
from subprocess import Popen, PIPE
from time import sleep
from nbstreamreader import NonBlockingStreamReader as NBSR
# run the shell as a subprocess:
p = Popen(['python', 'shell.py'],
stdin = PIPE, stdout = PIPE, stderr = PIPE, shell = False)
# wrap p.stdout with a NonBlockingStreamReader object:
nbsr = NBSR(p.stdout)
# issue command:
p.stdin.write('command\n')
# get the output
i = 0
while True:
output = nbsr.readline(0.1)
# 0.1 secs to let the shell output the result
if not output:
print "time out the response took to long..."
#do nothing, retry reading..
continue
if "Enter command:" in output:
p.stdin.write('try it again' + str(i) + '\n')
i += 1
print output
This is probably a bit of a silly excercise for me, but it raises a bunch of interesting questions. I have a directory of logfiles from my chat client, and I want to be notified using notify-osd every time one of them changes.
The script that I wrote basically uses os.popen to run the linux tail command on every one of the files to get the last line, and then check each line against a dictionary of what the lines were the last time it ran. If the line changed, it used pynotify to send me a notification.
This script actually worked perfectly, except for the fact that it used a huge amount of cpu (probably because it was running tail about 16 times every time the loop ran, on files that were mounted over sshfs.)
It seems like something like this would be a great solution, but I don't see how to implement that for more than one file.
Here is the script that I wrote. Pardon my lack of comments and poor style.
Edit: To clarify, this is all linux on a desktop.
Not even looking at your source code, there are two ways you could easily do this more efficiently and handle multiple files.
Don't bother running tail unless you have to. Simply os.stat all of the files and record the last modified time. If the last modified time is different, then raise a notification.
Use pyinotify to call out to Linux's inotify facility; this will have the kernel do option 1 for you and call back to you when any files in your directory change. Then translate the callback into your osd notification.
Now, there might be some trickiness depending on how many notifications you want when there are multiple messages and whether you care about missing a notification for a message.
An approach that preserves the use of tail would be to instead use tail -f. Open all of the files with tail -f and then use the select module to have the OS tell you when there's additional input on one of the file descriptors open for tail -f. Your main loop would call select and then iterate over each of the readable descriptors to generate notifications. (You could probably do this without using tail and just calling readline() when it's readable.)
Other areas of improvement in your script:
Use os.listdir and native Python filtering (say, using list comprehensions) instead of a popen with a bunch of grep filters.
Update the list of buffers to scan periodically instead of only doing it at program boot.
Use subprocess.popen instead of os.popen.
If you're already using the pyinotify module, it's easy to do this in pure Python (i.e. no need to spawn a separate process to tail each file).
Here is an example that is event-driven by inotify, and should use very little cpu. When IN_MODIFY occurs for a given path we read all available data from the file handle and output any complete lines found, buffering the incomplete line until more data is available:
import os
import select
import sys
import pynotify
import pyinotify
class Watcher(pyinotify.ProcessEvent):
def __init__(self, paths):
self._manager = pyinotify.WatchManager()
self._notify = pyinotify.Notifier(self._manager, self)
self._paths = {}
for path in paths:
self._manager.add_watch(path, pyinotify.IN_MODIFY)
fh = open(path, 'rb')
fh.seek(0, os.SEEK_END)
self._paths[os.path.realpath(path)] = [fh, '']
def run(self):
while True:
self._notify.process_events()
if self._notify.check_events():
self._notify.read_events()
def process_default(self, evt):
path = evt.pathname
fh, buf = self._paths[path]
data = fh.read()
lines = data.split('\n')
# output previous incomplete line.
if buf:
lines[0] = buf + lines[0]
# only output the last line if it was complete.
if lines[-1]:
buf = lines[-1]
lines.pop()
# display a notification
notice = pynotify.Notification('%s changed' % path, '\n'.join(lines))
notice.show()
# and output to stdout
for line in lines:
sys.stdout.write(path + ': ' + line + '\n')
sys.stdout.flush()
self._paths[path][1] = buf
pynotify.init('watcher')
paths = sys.argv[1:]
Watcher(paths).run()
Usage:
% python watcher.py [path1 path2 ... pathN]
Simple pure python solution (not the best, but doesn't fork, spits out 4 empty lines after idle period and marks everytime the source of the chunk, if changed):
#!/usr/bin/env python
from __future__ import with_statement
'''
Implement multi-file tail
'''
import os
import sys
import time
def print_file_from(filename, pos):
with open(filename, 'rb') as fh:
fh.seek(pos)
while True:
chunk = fh.read(8192)
if not chunk:
break
sys.stdout.write(chunk)
def _fstat(filename):
st_results = os.stat(filename)
return (st_results[6], st_results[8])
def _print_if_needed(filename, last_stats, no_fn, last_fn):
changed = False
#Find the size of the file and move to the end
tup = _fstat(filename)
# print tup
if last_stats[filename] != tup:
changed = True
if not no_fn and last_fn != filename:
print '\n<%s>' % filename
print_file_from(filename, last_stats[filename][0])
last_stats[filename] = tup
return changed
def multi_tail(filenames, stdout=sys.stdout, interval=1, idle=10, no_fn=False):
S = lambda (st_size, st_mtime): (max(0, st_size - 124), st_mtime)
last_stats = dict((fn, S(_fstat(fn))) for fn in filenames)
last_fn = None
last_print = 0
while 1:
# print last_stats
changed = False
for filename in filenames:
if _print_if_needed(filename, last_stats, no_fn, last_fn):
changed = True
last_fn = filename
if changed:
if idle > 0:
last_print = time.time()
else:
if idle > 0 and last_print is not None:
if time.time() - last_print >= idle:
last_print = None
print '\n' * 4
time.sleep(interval)
if '__main__' == __name__:
from optparse import OptionParser
op = OptionParser()
op.add_option('-F', '--no-fn', help="don't print filename when changes",
default=False, action='store_true')
op.add_option('-i', '--idle', help='idle time, in seconds (0 turns off)',
type='int', default=10)
op.add_option('--interval', help='check interval, in seconds', type='int',
default=1)
opts, args = op.parse_args()
try:
multi_tail(args, interval=opts.interval, idle=opts.idle,
no_fn=opts.no_fn)
except KeyboardInterrupt:
pass