Python 2.7 - How to programmatically read binary data from stdin

Python 2.7 - How to programmatically read binary data from stdin - python

I'd like to be able to read binary data from stdin with python.
However, when I use input = sys.stdin.buffer.read(), I get the error that AttributeError: 'file' object has no attribute 'buffer'. This seems strange because the docs say that I should be able to use the underlying buffer object - how can I fix / work around this?
Notes: I've checked out the last time this was asked, but the answers there are all either "use -u", "use buffer" (which I'm trying), or something about reading from files. The first and last don't help because I have no control over the users of this program (so I can't tell them to use particular arguments) and because this is stdin - not files.

Just remove the buffer for python2:
import sys
input = sys.stdin.read()

Related

Can i use input after sys.stdin read a file ? (in Windows System)

After using stdin.read()/stdin.readlines()/stdin.readlines(), every following input will not work(it will show off EOF error,) beacuse input() is set to prevent the EOF reading.
However, isn't there any way to clean the sys.stdin() buffer?
code sample

Here is the link to the Delete Post
input() after readlines() from sys.stdin?
the answer is no, but there may be a way to re-construct your input by using wsvcrt module
Just In Case For The Deleted Page Will Never Be Seen
pic one
pic two
pic three

How do you use Python Ghostscript's high-level interface to convert a .pdf file into multiple .png files?

I am trying to convert a .pdf file into several .png files using Ghostscript in Python. The other answers on here were pretty old hence this new thread.
The following code was given as an example on pypi.org of the 'high level' interface, and I am trying to model my code after the example code below.
import sys
import locale
import ghostscript
args = [
"ps2pdf", # actual value doesn't matter
"-dNOPAUSE", "-dBATCH", "-dSAFER",
"-sDEVICE=pdfwrite",
"-sOutputFile=" + sys.argv[1],
"-c", ".setpdfwrite",
"-f", sys.argv[2]
]
# arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
Can someone explain what this code is doing? And can it be used somehow to convert a .pdf into .png files?
I am new to this and am truly confused. Thanks so much!

That's calling Ghostscript, obviously. From the arguments it's not spawning a process, it's linked (either dynamically or statically) to the Ghostscript library.
The args are Ghostscript arguments. These are documented in the Ghostscript documentation, you can find it online here. Because it mimics the command line interface, where the first argument is the calling program, the first argument here is meaningless and can be anything you want (as the comment says).
The next three arguments turn on SAFER (which prevents some potentially dangerous operations and is, now, the default anyway), sets NOPAUSE so the entire input is processed without pausing between pages, and BATCH so that on completion Ghostscript exits instead of returning to the interactive prompt.
Then it selects a device. In Ghostscript (due to the PostScript language) devices are what actually output stuff. In this case the device selected is the pdfwrite device, which outputs PDF.
Then there's the OutputFile, you can probably guess that this is the name (and path) of the file where the output is to be written.
The next 3 arguments; -c .setpdfwrite -f are, frankly archaic and pointless. They were once recommended when using the pdfwrite device (and only the pdfwrite device) but they have no useful effect these days.
The very last argument is, of course, the input file.
Certainly you can use Ghostscript to render PDF files to PNG. You want to use one of the PNG devices, there are several depending on what colour depth you want to support. Unless you have some stranger requirement, just use png16m. If your input file contains more than one page you'll want to set the OutputFile to use %d so that it writes one file per page.
More details on all of this can, of course, be found in the documentation.

Converting the output of a module from print to write mode

I have been trying to use a twobitreader package (http://pythonhosted.org//twobitreader/) to extract DNA sequence information, however I have ran into a problem. Whenever I use twobitreader.twobit_reader() module I am only able to obtain a printed output. What I would like to do is to write the output into a new file.
This is the information on this module from http://pythonhosted.org//twobitreader/:
twobit_reader takes a twobit_file (of class TwoBitFile) and an “input_stream” which can be any iterable (incl. file-like objects) writes output (FASTA format) using write (print if write=None) logs errors/warning to stderr
Likely, my limited knowledge with python programming is impeding me from accomplishing this task.
For example, here is some code that I wrote:
def get_a(n):
"""get sequences from genome"""
genome = twobitreader.TwoBitFile('hg19.2bit')
bedfile = open(n+'.bed', 'r')
o_f = open(n+'_FASTA.txt', 'w')
twobitreader.twobit_reader(genome, bedfile)
bedfile.close()
o_f.close()
This ends up printing my sequences.
If I try to alter the twobitreader line to: twobitreader.twobit_reader(genome, bedfile, o_f) in the attempt to write the data to the file o_f, I get the error 'file' object is not callable.

OP confirmed this worked:
twobitreader.twobit_reader(genome, bedfile, o_f.write)

create a tar file in a string using python

I need to generate a tar file but as a string in memory rather than as an actual file. What I have as input is a single filename and a string containing the assosiated contents. I'm looking for a python lib I can use and avoid having to role my own.
A little more work found these functions but using a memory steam object seems a little... inelegant. And making it accept input from strings looks like even more... inelegant. OTOH it works. I assume, as most of it is new to me. Anyone see any bugs in it?

Use tarfile in conjunction with cStringIO:
c = cStringIO.StringIO()
t = tarfile.open(mode='w', fileobj=c)
# here: do your work on t, then...:
s = c.getvalue() # extract the bytestring you need

Embed pickle (or arbitrary) data in python script

In Perl, the interpreter kind of stops when it encounters a line with
__END__
in it. This is often used to embed arbitrary data at the end of a perl script. In this way the perl script can fetch and store data that it stores 'in itself', which allows for quite nice opportunities.
In my case I have a pickled object that I want to store somewhere. While I can use a file.pickle file just fine, I was looking for a more compact approach (to distribute the script more easily).
Is there a mechanism that allows for embedding arbitrary data inside a python script somehow?

With pickle you can also work directly on strings.
s = pickle.dumps(obj)
pickle.loads(s)
If you combine that with """ (triple-quoted strings) you can easily store any pickled data in your file.

If the data is not particularly large (many K) I would just .encode('base64') it and include that in a triple-quoted string, with .decode('base64') to get back the binary data, and a pickle.loads() call around it.

In Python, you can use """ (triple-quoted strings) to embed long runs of text data in your program.
In your case, however, don't waste time on this.
If you have an object you've pickled, you'd be much, much happier dumping that object as Python source and simply including the source.
The repr function, applied to most objects, will emit a Python source-code version of the object. If you implement __repr__ for all of your custom classes, you can trivially dump your structure as Python source.
If, on the other hand, your pickled structure started out as Python code, just leave it as Python code.

I made this code. You run something like python comp.py foofile.tar.gz, and it creates decomp.py, with foofile.tar.gz's contents embedded in it. I don't think this is really portable with windows because of the Popen though.
import base64
import sys
import subprocess
inf = open(sys.argv[1],"r+b").read()
outs = base64.b64encode(inf)
decomppy = '''#!/usr/bin/python
import base64
def decomp(data):
fname = "%s"
outf = open(fname,"w+b")
outf.write(base64.b64decode(data))
outf.close()
# You can put the rest of your code here.
#Like this, to unzip an archive
#import subprocess
#subprocess.Popen("tar xzf " + fname, shell=True)
#subprocess.Popen("rm " + fname, shell=True)
''' %(sys.argv[1])
taildata = '''uudata = """%s"""
decomp(uudata)
''' %(outs)
outpy = open("decomp.py","w+b")
outpy.write(decomppy)
outpy.write(taildata)
outpy.close()
subprocess.Popen("chmod +x decomp.py",shell=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.