Why do scripts behave differently called from commandline vs git attribuites? - python

Updated scripts attached below, these are now working on my sample document
Why do the following python scripts perform differently when called via git attributes or from command line?
What I have are two scripts that I modeled based on the mercurial zipdoc functionality. All I'm attempting to do is unzip docx files on store (filter.clean) and zip them on restore (filter.smudge). I have two scripts working well, but once I put them into git attribute they don't work and I don't understand why.
I've tested by doing the following
Testing the Scripts (git bash)
$ cat original.docx | python ~/Documents/pyscripts/unzip.py >
uncompress.docx
$ cat uncompress.docx | python
~/Documents/pyscripts/zip.py > compress.docx
$ md5sum uncompress.docx compress.docx
I can open both the uncompressed and compressed files with Microsoft Word with no errors. The scripts work as expected.
Test Git Attributes
I set both clean and scrub to cat, verified my file saves and restores w/o problem.
I set clean to python ~/Documents/pyscripts/unzip.py. After a commit and checkout the file is now larger (uncompressed) but errors when opened in MS-Word. Also the md5 does not match the "script only" test above. Although the file size is identical.
I set clean back to cat and set scrub to python ~/Documents/pyscripts/zip.py. After a commit and checkout the file is now smaller (compressed) but again errors when opened in MS-Word. Again the md5 differs from the "script only" test but the file size matches.
Setting both clean and scrub to the python scripts produces an error, as expected.
I'm really lost here, I thought git Attributes simply provides input on stdin and reads it from stdout. I've tested both scripts to work with a pipe from cat and a redirect from the output just fine. I know the scripts are running b/c the files change size as expected, however a small change is introduced somewhere in the file.
Additional References
I'm using msgit on Win7, all commands above were typed into the git bash window.
Git Attributes Description
Uncompress Script
import fileinput
import sys
import zipfile
# Set stdin and stdout to binary read/write
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
try:
from cStringIO import StringIO
except:
from StringIO import StringIO
# Wrap stdio into a file like object
inString = StringIO(sys.stdin.read())
outString = StringIO()
# Store each member uncompressed
try:
with zipfile.ZipFile(inString,'r') as inFile:
outFile = zipfile.ZipFile(outString,'w',zipfile.ZIP_STORED)
for memberInfo in inFile.infolist():
member = inFile.read(memberInfo)
memberInfo.compress_type = 0
outFile.writestr(memberInfo,member)
outFile.close()
except zipfile.BadZipfile:
sys.stdout.write(inString.getvalue())
sys.stdout.write(outString.getvalue())
Compress Script
import fileinput
import sys
import zipfile
# Set stdin and stdout to binary read/write
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
try:
from cStringIO import StringIO
except:
from StringIO import StringIO
# Wrap stdio into a file like object
inString = StringIO(sys.stdin.read())
outString = StringIO()
# Store each member compressed
try:
with zipfile.ZipFile(inString,'r') as inFile:
outFile = zipfile.ZipFile(outString,'w',zipfile.ZIP_DEFLATED)
for memberInfo in inFile.infolist():
member = inFile.read(memberInfo)
memberInfo.compress_type = zipfile.ZIP_DEFLATED
outFile.writestr(memberInfo,member)
outFile.close()
except zipfile.BadZipfile:
sys.stdout.write(inString.getvalue())
sys.stdout.write(outString.getvalue())
Edit: Formatting
Edit 2: Scripts updated to write to memory rather than stdout during file processing.

I've found that using zipfile.ZipFile() with the target being stdout was causing a problem. Opening the zipfile with the target being a StringIO() and then at the end writing the full StringIO file into stdout has solved that problem.
I haven't tested this extensively and it's possible some .docx contents won't be handled well but only time will tell. My test files now open with out error, and as a bonus the .docx file in your working directory is smaller due to using higher compression than the standard .docx format.
I have confirmed that after performing several edits and commits on a .docx file I can open the file, edit one line, and commit with out a large delta added to the repo size. For example, a 19KB file, after 3 previous edits in the repo history, having a new line added at the top created a delta of only 1KB in the repo after performing garbage collection. Running the same test (as close as I could) with Mercurial resulted in a 9.3KB delta commit. I'm no Mercurial expert my understanding is there is no "gc" command for mercurial so none was run.

Related

Extracting a text file from tar with tarfile module in python3

Is there a simple way to extract a text file from a tar file as a file object of text I/O in python 3.4 or later?
I am revising my python2 code to python3, and I found TarFile.extractfile, which used to return a file object with text I/O, now returns a io.BufferedReader object which seems to have binary I/O. The other part of my code expects a text I/O, and I need to absorb this change in some way.
One method I can think of is to use TarFile.extract and write the file to a directory, and open it by open function, but I wonder if there is a way to get the text I/O stream directly.
Try io.TextIOWrapper to wrap the io.BufferedReader.
you can use getmembers()
import tarfile
tar = tarfile.open("test.tar")
tar.getmembers()
After that, you can use extractfile() to extract the members as file object. Just an example
import tarfile,os
import sys
os.chdir("/tmp/foo")
tar = tarfile.open("test.tar")
for member in tar.getmembers():
f=tar.extractfile(member)
content=f.read()
// do operations with your content
sys.exit()
tar.close()

How to open a file on mac OSX 10.8.2 in python

I am writing a python code on eclipse and want to open a file that is present in Downloads folder. I am using MAC OSX 10.8.2. I tried with f=os.path.expanduser("~/Downloads/DeletingDocs.txt")
and also with
ss=subprocess.Popen("~/Downloads/DeletingDocs.txt",shell=True)
ss.communicate()
I basically want to open a file in subprocess, to listen to the changes in the opened file.But, the file is not opening in either case.
from os.path import baspath, expanduser
filepath = abspath(expanduser("~/") + '/Downloads/DeletingDocs.txt')
print('Opening file', filepath)
with open(filepath, 'r') as fh:
print(fh.read())
Take note of OSX file-handling tho, the IO is a bit different depending on the filetype.
For instance, a .txt file which under Windows would be considered a "plain text-file" is actually a compressed data-stream under OSX because OSX tries to be "smart" about the storage space.
This can literately ruin your day unless you know about it (been there, had the headache.. moved on)
When double-clicking on a .txt file in OSX for instance normally the text-editor pops up and what it does is call for a os.open() instead of accessing it on a lower level which lets OSX middle layers do disk-area|decompression pipe|file-handle -> Texteditor but if you access the file-object on a lower level you'll end up opening the disk-area where the file is stored and if you print the data you'll get garbage because it's not the data you'd expect.
So try using:
import os
fd = os.open( "foo.txt", os.O_RDONLY )
print(os.read(fd, 1024))
os.close( fd )
And fiddle around with the flags.
I honestly can't remember which of the two opens the file as-is from disk (open() or os.open()) but one of them makes your data look like garbage and sometimes you just get the pointer to the decompression pipe (giving you like 4 bytes of data even tho the text-file is hughe).
If it's tracking/catching updates on a file you want
from time import ctime
from os.path import getmtime, expanduser, abspath
from os import walk
for root, dirs, files in walk(expanduser('~/')):
for fname in files:
modtime = ctime(getmtime(abspath(root + '/' + fname)))
print('File',fname,'was last modified at',modtime)
And if the time differs from your last check, well then do something cool with it.
For instance, you have these libraries for Python to work with:
.csv
.pdf
.odf
.xlsx
And MANY more, so instead of opening an external application as your first fix, try opening them via Python and modify to your liking instead, and only as a last resort (if even then) open external applications via Popen.
But since you requested it (sort of... erm), here's a Popen approach:
from subprocess import Popen, PIPE, STDOUT
from os.path import abspath, expanduser
from time import sleep
run = Popen('open -t ' + abspath(expanduser('~/') + '/example.txt'), shell=True, stdout=PIPE, stdin=PIPE, stderr=STDOUT)
##== Here's an example where you could interact with the process:
##== run.stdin.write('Hey you!\n')
##== run.stdin.flush()
while run.poll() == None:
sleep(1)
Over-explaining your job:
This will print a files contents every time it's changed.
with open('test.txt', 'r') as fh:
import time
while 1:
new_data = fh.read()
if len(new_data) > 0:
fh.seek(0)
print(fh.read())
time.sleep(5)
How it works:
The regular file opener with open() as fh will open up the file and place it as a handle in fh, once you call .read() without any parameters it will fetch the entire contents of the file.
This in turn doesn't close the file, it simply places the "reading" pointer at the back of the file (lets say at position 50 for convenience).
So now your pointer is at character 50 in your file, at the end.
Wherever you write something in your file, that will put more data into it so the next .read() will fetch data from position 50+ making the .read() not empty, so we place the "reading" pointer back to position 0 by issuing .seek(0) and then we print all the data.
Combine that with os.path.getmtime() to fine any reversed changes or 1:1 ratio changes (replacing a character mis-spelling etc).
I am hesitant to answer because this question was double-posted and is confusingly phrased, but... if you want to "open" the file in OSX using the default editor, then add the open command to your subprocess. This works for me:
subprocess.Popen("open myfile.txt",shell=True)
It is most likely a permissions issue, if you try your code in the Python interpreter you will probably receive a "Permission denied" error from the shell when you call subprocess.Popen. If this is the case then you will need to make the file a minimum of 700 (it's probably 644 by default) and you'll probably want 744.
Try the code in the Python interpreter and check for the "Permission denied" error, if you see that then do this in a shell:
chmod 744 ~/Downloads/DeletingDocs.txt
Then run the script. To do it all in Python you can use os.system:
import os
import subprocess
filename = "~/Downloads/DeletingDocs.txt"
os.system("chmod 744 "+filename)
ss=subprocess.Popen(filename, shell=True)
ss.communicate()
The reason it "just works" in Windows is because Windows doesn't support file permission types (read, write and execute) in the same way as *nix systems (e.g. Linux, BSD, OS X, etc.) do.

Opening .exe from .txt error

I'm currently creating a script that will simply open a program in the SAME directory as the script. I want to have a text file named "target.txt", and basically the script will read what's in "target.txt" and open a file based on its contents.
For example.. The text file will read "program.exe" inside, and the script will read that and open program.exe. The reason I'm doing this is to easily change the program the script opens without having to actually change whats inside.
The current script Im using for this is:
import subprocess
def openclient():
with open("target.txt", "rb") as f:
subprocess.call(f.read())
print '''Your file is opening'''
Its giving me an error saying it cannot find target.txt, even though I have it in the same directory. I have tried taking away the .txt, still nothing. This code actually worked before, however; it stopped working for some strange reason. I'm using PythonWin compiler instead of IDLE, I don't know if this is the reason.
There are two possible issues:
target.txt probably ends with a newline, which messes up subprocess.call()
If target.txt is not in the current directory, you can access the directory containing the currently executing Python file by parsing the magic variable __file__.
However, __file__ is set at script load time, and if the current directory is changed between loading the script and calling openclient(), the value of __file__ may be relative to the old current directory. So you have to save __file__ as an absolute path when the script is first read in, then use it later to access files in the same directory as the script.
This code works for me, with target.txt containing the string date to run the Unix date command:
#!/usr/bin/env python2.7
import os
import subprocess
def openclient(orig__file__=os.path.abspath(__file__)):
target = os.path.join(os.path.dirname(orig__file__), 'target.txt')
with open(target, "rb") as f:
subprocess.call(f.read().strip())
print '''Your file is opening'''
if __name__ == '__main__':
os.chdir('foo')
openclient()

can you print a file from python?

Is there some way of sending output to the printer instead of the screen in Python? Or is there a service routine that can be called from within python to print a file? Maybe there is a module I can import that allows me to do this?
Most platforms—including Windows—have special file objects that represent the printer, and let you print text by just writing that text to the file.
On Windows, the special file objects have names like LPT1:, LPT2:, COM1:, etc. You will need to know which one your printer is connected to (or ask the user in some way).
It's possible that your printer is not connected to any such special file, in which case you'll need to fire up the Control Panel and configure it properly. (For remote printers, this may even require setting up a "virtual port".)
At any rate, writing to LPT1: or COM1: is exactly the same as writing to any other file. For example:
with open('LPT1:', 'w') as lpt:
lpt.write(mytext)
Or:
lpt = open('LPT1:', 'w')
print >>lpt, mytext
print >>lpt, moretext
close(lpt)
And so on.
If you've already got the text to print in a file, you can print it like this:
with open(path, 'r') as f, open('LPT1:', 'w') as lpt:
while True:
buf = f.read()
if not buf: break
lpt.write(buf)
Or, more simply (untested, because I don't have a Windows box here), this should work:
import shutil
with open(path, 'r') as f, open('LPT1:', 'w') as lpt:
shutil.copyfileobj(f, lpt)
It's possible that just shutil.copyfile(path, 'LPT1:'), but the documentation says "Special files such as character or block devices and pipes cannot be copied with this function", so I think it's safer to use copyfileobj.
Python doesn't (unless you're using graphical libraries) ever send stuff to "The screen". It writes to stdout and stderr, which are, as far as Python is concerned, just things that look like files.
It's simple enough to have python direct those streams to anything else that looks like a file; for instance, see Redirect stdout to a file in Python?
On unix systems, there are file-like devices that happen to be printers (/dev/lp*); on windows, LPT1 serves a similar purpose.
Regardless of the OS, you'll have to make sure that LPT1 or /dev/lp* are actually hooked up to a printer somehow.
If you are on linux, the following works if you have your printer setup and set as your default.
from subprocess import Popen
from cStringIO import StringIO
# place the output in a file like object
sio = StringIO(output_string)
# call the system's lpr command
p = Popen(["lpr"], stdin=sio, shell=True)
output = p.communicate()[0]

tail -f does not seem to work in the shell when file is being populated through file.write()

I am trying to daemonize a python script that currently runs in the foreground. However, I still need to be able to see its output which it currently dumps to stdout.
So I am using the following piece of code which generates a unique file name in /tmp and then it assigns sys.stdout to this new file. All subsequent calls to 'print' are then redirected to this log file.
import uuid
outfile = open('/tmp/outfile-' + str(uuid.uuid4()), 'w')
outfile.write("Log file for daemon script...\n")
sys.stdout=outfile
# Rest of script uses print statements to dump information into the /tmp file
.
.
.
The problem I am facing is that, when I tail -f the file created in /tmp, I don't see any output. However, once I kill my daemon process, output is visible in the /tmp logfile, because python flushes out the file data.
I want to monitor the /tmp log file in realtime, hence it would be great if somehow, the output can be made visible in realtime.
One solution that I have tried was trying to use unbeffered IO, but that didn't help either.
Try harder to use unbuffered I/O. The problem is almost certainly that your output is buffered.
Opening the file like this should work:
outfile = open(name, 'w', 0)

Categories

Resources