create a tar file in a string using python - python

I need to generate a tar file but as a string in memory rather than as an actual file. What I have as input is a single filename and a string containing the assosiated contents. I'm looking for a python lib I can use and avoid having to role my own.
A little more work found these functions but using a memory steam object seems a little... inelegant. And making it accept input from strings looks like even more... inelegant. OTOH it works. I assume, as most of it is new to me. Anyone see any bugs in it?

Use tarfile in conjunction with cStringIO:
c = cStringIO.StringIO()
t = tarfile.open(mode='w', fileobj=c)
# here: do your work on t, then...:
s = c.getvalue() # extract the bytestring you need

Related

Unicode issues with tarfile.extractall() (Python 2.7)

I'm using python 2.7.6 on Windows and I'm using the tarfile module to extract a file a gzip file. The mode option of tarfile.open() is set to "r:gz". After the open call, if I were to print the contents of the archive via tarfile.list(), I see the following directory in the list:
./静态分析 Part 1.v1/
However, after I call tarfile.extractall(), I don't see the above directory in the extracted list of files, instead I see this:
é™æ€åˆ†æž Part 1.v1/
If I were to extract the archive via 7zip, I see a directory with the same name as the first item above. So, clearly, the extractall() method is screwing up, but I don't know how to fix this.
I learned that tar doesn't retain the encoding information as part of the archive and treats filenames as raw byte sequences. So, the output I saw from tarfile.extractall() was simply raw the character sequence that comprised the file's name prior to compression. In order to get the extractall() method to recreate the original filenames, I discovered that you have to manually convert the members of the TarFile object to the appropriate encoding before calling extractall(). In my case, the following did the trick:
modeltar = tarfile.open(zippath, mode="r:gz")
updatedMembers = []
for m in modeltar.getmembers():
m.name = unicode(m.name, 'utf-8')
updatedMembers.append(m)
modeltar.extractall(members=updatedMembers, path=dbpath)
The above code is based on this superuser answer: https://superuser.com/a/190786/354642

How do I edit an executable with python by address/offset/bytes, like in a hex editor?

I've used Hex-rays IDA to find the bytes of code I need changed in a windows executable. I would like to write a python script that will programmatically edit those bytes.
I know the address (as given in hex-rays IDA) and I know the hexadecimal I wish to overwrite it with. How do I do this in python? I'm sure there is a simple answer, but I can't find it.
(For example: address = 0x00436411, and new hexadecimal = 0xFA)
You just need to open the executable as a file, for writing, in binary mode; then seek to the position you want to write; then write. So:
with open(path, 'r+b') as f:
f.seek(position)
f.write(new_bytes)
If you're going to be changing a lot of bytes, you may find it simpler to use mmap, which lets you treat the file as a giant list:
with open(path, 'r+b') as f:
with contextlib.closing(mmap.mmap(f.fileno(), access=mmap.ACCESS_WRITE)) as m:
m[first_position] = first_new_byte
m[other_position] = other_new_byte
# ...
If you're trying to write multi-byte values (e.g., a 32-bit int), you probably want to use the struct module.
If what you know is an address in memory at runtime, rather than a file position, you have to be able to map that to the right place in the executable file. That may not even be possible (e.g., a memory-mapped region). But if it is, you should be able to find out from the debugger where it's mapped. From inside a debugger, this is easy; from outside, you need to parse the PE header structures and do a lot of complicated logic, and there is no reason to do that.
I believe when using hex-ray IDA as a static disassembler, with all the default settings, the addresses it gives you are the addresses where the code and data segments will be mapped into memory if they aren't forced to relocate. Those are, obviously, not offsets into the file.

Output python into python-readable format

We're using a python based application which reads a configuration file containing a couple of arrays:
Example layout of config file:
array1 = [
'bob',
'sue',
'jayne'
]
Currently changes to the configuration are done by hand, but I've written a little interface to streamline the process (mainly to avoid errors).
It currently reads in the existing configuration, using a simple "import". However what I'm not sure how to do, is get my script to write it's output in valid python, so that the main application can read it again.
How can I can dump the array back into the file, but in valid python?
Cheers!
I'd suggest JSON or YAML (Less verbose than JSON) for configuration files. That way, the configuration file becomes more readable for the less pythonate ;) It's also easier to throw adequate errors, e.g. if the configuration is incomplete.
To save python objects you can always use pickle.
Generally using repr() will create a string that can be re-avaluated. But pprint does a little nicer output.
from pprint import pprint
outf.write("array1 = "); pprint(array1, outf)
repr(array1) (and write that into the file) would be a very simple solution, but it should work here.

Python parsing xml directly from web address

Hey. I tried to find a way but i can't. I have set up a xml.sax parser in python and it works perfect when i read a local file (for example calendar.xml), but i need to read a xml file from a web address.
I figured it would work if i do this:
toursxml='http://api.songkick.com/api/3.0/artists/mbid:'+mbid+'/calendar.xml?apikey=---------'
toursurl=urllib2.urlopen(toursxml)
toursurl=toursurl.read()
parser.parse(toursurl)
but it doesnt. im sure theres an easy way but i cant find it.
so yeah I can easily go to the url and download the file and open it by doing
parser.parse("calendar.xml")
as a work around ive set it up to read the file and create the file locally, close the file, and then read it. But as you can guess its slow as hell.
Is there anyone to directly read the xml? also note that the url name does not end in ".xml" so that may be a problem later
First, your example is mixed up. Please don't reuse variables.
toursurl= urllib2.urlopen(toursxml)
toursurl_string= toursurl.read()
parser.parseString( toursurl_string )
Reads the entire file into a string, named toursurl_string.
To parse a string, you use the parseString(toursurl_string) method.
http://docs.python.org/library/xml.sax.html#xml.sax.parseString
If you want to combine reading and parsing, you have to pass the "stream" or filename to parse.
toursurl= urllib2.urlopen(toursxml)
parser.parse(toursurl)
parser.parse(xyz)
expects xyz to be a file; you are looking for
parser.parseString(xyz)
which expects xyz to be a string containing XML.

Embed pickle (or arbitrary) data in python script

In Perl, the interpreter kind of stops when it encounters a line with
__END__
in it. This is often used to embed arbitrary data at the end of a perl script. In this way the perl script can fetch and store data that it stores 'in itself', which allows for quite nice opportunities.
In my case I have a pickled object that I want to store somewhere. While I can use a file.pickle file just fine, I was looking for a more compact approach (to distribute the script more easily).
Is there a mechanism that allows for embedding arbitrary data inside a python script somehow?
With pickle you can also work directly on strings.
s = pickle.dumps(obj)
pickle.loads(s)
If you combine that with """ (triple-quoted strings) you can easily store any pickled data in your file.
If the data is not particularly large (many K) I would just .encode('base64') it and include that in a triple-quoted string, with .decode('base64') to get back the binary data, and a pickle.loads() call around it.
In Python, you can use """ (triple-quoted strings) to embed long runs of text data in your program.
In your case, however, don't waste time on this.
If you have an object you've pickled, you'd be much, much happier dumping that object as Python source and simply including the source.
The repr function, applied to most objects, will emit a Python source-code version of the object. If you implement __repr__ for all of your custom classes, you can trivially dump your structure as Python source.
If, on the other hand, your pickled structure started out as Python code, just leave it as Python code.
I made this code. You run something like python comp.py foofile.tar.gz, and it creates decomp.py, with foofile.tar.gz's contents embedded in it. I don't think this is really portable with windows because of the Popen though.
import base64
import sys
import subprocess
inf = open(sys.argv[1],"r+b").read()
outs = base64.b64encode(inf)
decomppy = '''#!/usr/bin/python
import base64
def decomp(data):
fname = "%s"
outf = open(fname,"w+b")
outf.write(base64.b64decode(data))
outf.close()
# You can put the rest of your code here.
#Like this, to unzip an archive
#import subprocess
#subprocess.Popen("tar xzf " + fname, shell=True)
#subprocess.Popen("rm " + fname, shell=True)
''' %(sys.argv[1])
taildata = '''uudata = """%s"""
decomp(uudata)
''' %(outs)
outpy = open("decomp.py","w+b")
outpy.write(decomppy)
outpy.write(taildata)
outpy.close()
subprocess.Popen("chmod +x decomp.py",shell=True)

Categories

Resources