Most Efficient Way to "Slurp" All of STDIN Into a String - python

I'm writing an email parser for python 2.7 that will be invoked via sendmail using an alias, parsed using the email module and then processed and stored into an oracle database:
From /etc/aliases:
myalias: | /my/python/script.py
I'm having trouble "slurping" all of stdin into a string object that I can use with the email module:
import email
# Slurp stdin and store into message
message =
msg = email.message_from_string(message)
# Do something with it
print msg['Subject']
What would be the most efficient way to do this? I've tried stdin.readlines() but it ends up as a list.
Thx for any help. (Sorry if this is noobish... I'm a perl convert and been forced to standardize my tools using python and this is only my second script. Well not really "forced", I've been wanting to do this for some time but not under the gun of a deadline like now)

sys.stdin.readlines() returns a list of lines. If what you want is one long (multi-line) string, use sys.stdin.read().

Related

Python 2.7 - How to programmatically read binary data from stdin

I'd like to be able to read binary data from stdin with python.
However, when I use input = sys.stdin.buffer.read(), I get the error that AttributeError: 'file' object has no attribute 'buffer'. This seems strange because the docs say that I should be able to use the underlying buffer object - how can I fix / work around this?
Notes: I've checked out the last time this was asked, but the answers there are all either "use -u", "use buffer" (which I'm trying), or something about reading from files. The first and last don't help because I have no control over the users of this program (so I can't tell them to use particular arguments) and because this is stdin - not files.
Just remove the buffer for python2:
import sys
input = sys.stdin.read()

Efficient way to find a string based on a list

I'm new to scripting and have been reading up on Python for about 6 weeks. The below is meant to read a log file and send an alert if one of the keywords defined in srchstring is found. It works as expected and doesn't alert on strings previously found, as expected. However the file its processing is actively being written to by an application and the script is too slow on files around 500mb. under 200mb it works fine ie within 20secs.
Could someone suggest a more efficient way to search for a string within a file based on a pre-defined list?
import os
srchstring = ["Shutdown", "Disconnecting", "Stopping Event Thread"]
if os.path.isfile(r"\\server\\share\\logfile.txt"):
with open(r"\\server\\share\\logfile.txt","r") as F:
for line in F:
for st in srchstring:
if st in line:
print line,
#do some slicing of string to get dd/mm/yy hh:mm:ss:ms
# then create a marker file called file_dd/mm/yy hh:mm:ss:ms
if os.path.isfile("file_dd/mm/yy hh:mm:ss:ms"): # check if a file already exists named file_dd/mm/yy hh:mm:ss:ms
print "string previously found- ignoring, continuing search" # marker file exists
else:
open("file_dd/mm/yy hh:mm:ss:ms", 'a') # create file_dd/mm/yy hh:mm:ss:ms
print "error string found--creating marker file sending email alert" # no marker file, create it then send email
else:
print "file not exist"
Refactoring the search expression to a precompiled regular expression avoids the (explicit) innermost loop.
import os, re
regex = re.compile(r'Shutdown|Disconnecting|Stopping Event Thread')
if os.path.isfile(r"\\server\\share\\logfile.txt"):
#Indentation fixed as per comment
with open(r"\\server\\share\\logfile.txt","r") as F:
for line in F:
if regex.search(line):
# ...
I assume here that you use Linux. If you don't, install MinGW on Windows and the solution below will become suitable too.
Just leave the hard part to the most efficient tools available. Filter your data before you go to the python script. Use grep command to get the lines containing "Shutdown", "Disconnecting" or "Stopping Event Thread"
grep 'Shutdown\|Disconnecting\|"Stopping Event Thread"' /server/share/logfile.txt
and redirect the lines to your script
grep 'Shutdown\|Disconnecting\|"Stopping Event Thread"' /server/share/logfile.txt | python log.py
Edit: Windows solution. You can create a .bat file to make it executable.
findstr /c:"Shutdown" /c:"Disconnecting" /c:"Stopping Event Thread" \server\share\logfile.txt | python log.py
In 'log.py', read from stdin. It's file-like object, so no difficulties here:
import sys
for line in sys.stdin:
print line,
# do some slicing of string to get dd/mm/yy hh:mm:ss:ms
# then create a marker file called file_dd/mm/yy hh:mm:ss:ms
# and so on
This solution will reduce the amount of work your script has to do. As Python isn't a fast language, it may speed up the task. I suspect it can be rewritten purely in bash and it will be even faster (20+ years of optimization of a C program is not the thing you compete with easily), but I don't know bash enough.

Python : Postfix stdin

I want to make postfix send all emails to a python script that will scan the emails.
However, how do I pipe the output from postfix to python ?
What is the stdin for Python ?
Can you give a code example ?
Rather than calling sys.stdin.readlines() then looping and passing the lines to email.FeedParser.FeedParser().feed() as suggested by Michael, you should instead pass the file object directly to the email parser.
The standard library provides a conveinience function, email.message_from_file(fp), for this purpose. Thus your code becomes much simpler:
import email
msg = email.message_from_file(sys.stdin)
To push mail from postfix to a python script, add a line like this to your postfix alias file:
# send to emailname#example.com
emailname: "|/path/to/script.py"
The python email.FeedParser module can construct an object representing a MIME email message from stdin, by doing something like this:
# Read from STDIN into array of lines.
email_input = sys.stdin.readlines()
# email.FeedParser.feed() expects to receive lines one at a time
# msg holds the complete email Message object
parser = email.FeedParser.FeedParser()
msg = None
for msg_line in email_input:
parser.feed(msg_line)
msg = parser.close()
From here, you need to iterate over the MIME parts of msg and act on them accordingly. Refer to the documentation on email.Message objects for the methods you'll need. For example email.Message.get("Header") returns the header value of Header.

How do I generate a multipart/mime message with correct CRLF in Python?

I need to generate a multipart/mime message to send as a response to a HTTP request but am hitting either a bug or limitation in the Python email.* package.
The problem is that using Python 2.6, the message.as_string() call below generates a string with \n rather that CRLF as the line endings:
message = MIMEMultipart()
for image in images:
f = image.open('rb')
img = MIMEImage(f.read(), _encoder=encode_7or8bit)
message.attach(img)
message.as_string()
There doesn't seem to be any way to persuade it to use the (MIME standard) CRLF. The Generator class that seems it should be able to do this, doesn't.
What have other people done to get round this?
This was a bug in Python that has now been fixed: http://hg.python.org/lookup/r85811
It should now be possible to use the MIME libraries over non-email transports and have sensible things happen.
What about a simple hack
message.as_string().replace('\n', '\r\n')
? Inelegant, but should work (and a bug report should be entered at the Python tracker).

Embed pickle (or arbitrary) data in python script

In Perl, the interpreter kind of stops when it encounters a line with
__END__
in it. This is often used to embed arbitrary data at the end of a perl script. In this way the perl script can fetch and store data that it stores 'in itself', which allows for quite nice opportunities.
In my case I have a pickled object that I want to store somewhere. While I can use a file.pickle file just fine, I was looking for a more compact approach (to distribute the script more easily).
Is there a mechanism that allows for embedding arbitrary data inside a python script somehow?
With pickle you can also work directly on strings.
s = pickle.dumps(obj)
pickle.loads(s)
If you combine that with """ (triple-quoted strings) you can easily store any pickled data in your file.
If the data is not particularly large (many K) I would just .encode('base64') it and include that in a triple-quoted string, with .decode('base64') to get back the binary data, and a pickle.loads() call around it.
In Python, you can use """ (triple-quoted strings) to embed long runs of text data in your program.
In your case, however, don't waste time on this.
If you have an object you've pickled, you'd be much, much happier dumping that object as Python source and simply including the source.
The repr function, applied to most objects, will emit a Python source-code version of the object. If you implement __repr__ for all of your custom classes, you can trivially dump your structure as Python source.
If, on the other hand, your pickled structure started out as Python code, just leave it as Python code.
I made this code. You run something like python comp.py foofile.tar.gz, and it creates decomp.py, with foofile.tar.gz's contents embedded in it. I don't think this is really portable with windows because of the Popen though.
import base64
import sys
import subprocess
inf = open(sys.argv[1],"r+b").read()
outs = base64.b64encode(inf)
decomppy = '''#!/usr/bin/python
import base64
def decomp(data):
fname = "%s"
outf = open(fname,"w+b")
outf.write(base64.b64decode(data))
outf.close()
# You can put the rest of your code here.
#Like this, to unzip an archive
#import subprocess
#subprocess.Popen("tar xzf " + fname, shell=True)
#subprocess.Popen("rm " + fname, shell=True)
''' %(sys.argv[1])
taildata = '''uudata = """%s"""
decomp(uudata)
''' %(outs)
outpy = open("decomp.py","w+b")
outpy.write(decomppy)
outpy.write(taildata)
outpy.close()
subprocess.Popen("chmod +x decomp.py",shell=True)

Categories

Resources