Grabbing program header information with pyelftools

Grabbing program header information with pyelftools - python

I am simply trying to grab the program header information with pyelftools (the offset, virtual address, and physical address).
This can be done from the terminal by running:
readelf -l <elf_file>
But I am having trouble getting the same information from pyelftools. From the examples, I have pieced together this:
elffile = ELFFile(stream)
section_header = elffile.structs.Elf_Shdr.parse_stream(stream)
print (section_header)
Note: Elf_Shdr is the program header file.
This will print the offset, virtual address, physical address, etc. But not in the hexadecimal format I want, or like how readelf prints it. Is there a way to get it to print out the hex format like readelf?

Some strange things in your post happen.
You say:
I am simply trying to grab the program header information with
pyelftools (the offset, virtual address, and physical address).
Note: Elf_Shdr is the program header file.
But Elf_Shdr is not Program header, it is Section header. Look at elftools/elf/structs.py
Elf_Shdr:
Section header
Then in your code you parse file twice by some reason. First string is enough to parse it, you can access all header data from elffile object:
for segment in elffile.iter_segments():
header = segment.header
for key, value in header.items():
if isinstance(value, int):
print(key, hex(value))
Here I iterate over all segments (they are described by Program headers) of ELF file, then iterate over all attributes in header and print them as hex if it is integer. No magic here, header is just Container for standard dictionary.
Also you may be interested in readelf implemented with pyelftools - here.

Related

Error (Hung process?) when using COPY INTO with ANSI file

I'm trying to load a set of public flat files (using COPY INTO from Python) - that apparently are saved in ANSI format. Some of the files load with no issue, but there is at least one case where the COPY INTO statement hangs (no error is returned, & nothing is logged, as far as I can tell). I isolated the error to a particular row with a non-standard character e.g., the ¢ character in the 2nd row -
O}10}49771}2020}02}202002}4977110}141077}71052900}R }}N}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}08}CWI STATE A}CENTENNIAL RESOURCE PROD, LLC}PHANTOM (WOLFCAMP)
O}10}50367}2020}01}202001}5036710}027348}73933500}R }}N}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}08}A¢C 34-197}APC WATER HOLDINGS 1, LLC}QUITO, WEST (DELAWARE)
Re-saving these rows into a file with UTF-8 encoding solves the issue, but I thought I'd pass this along in case someone wants to take a look at the back-end to handle these types of characters and/or return some kind of error.

Why do you save into a file?
If it is possible, just play with strings internally from Python with:
resultstr= bytestr.encode("utf-8")

zapier - python - pass bytes to output to be used for next action

I am trying to make an automation on Zapier with the flow like this:
Trigger: a web hook that receive POST request. The body is a file key with a value of base64 string of a certain PDF, so the type is str
Action: a Zapier Python Code that retrieve the file from web hooks, decode the base64 string to bytes to get the real valid content of the PDF to say a variable named file_bytes
Action: a dropbox that retrieve the file_bytes from the step before, and upload it to dropbox
I coded the decoder myself (point 2) and tested that it worked well on my local system.
The problem is that Dropbox (point 3) only receive binary, while Python (point 2) can not pass a value other than JSON serializable. This is a clear limitation from Zapier:
output A dictionary or list of dictionaries that will be the "return value" of this code. You can explicitly return early if you like. This must be JSON serializable!
...
The close to what I can get from other question on this sites are these two, but it did not give me any luck.
Why am I getting a Runtime.MarshalError when using this code in Zapier?
Use Python to get image in Zapier
...
The code to decode base64 string to bytes is like so:
file_bytes = base64.b64decode(input_data['file'])
What I already did:
pass the file_bytes to output like so:
output = [{'file': input_data['file_bytes']}]}]
but it gave me This must be JSON serializable!
pass the file_bytes as string like so:
output = [{'file': str(input_data['file_bytes'])}]
it do uploaded to dropbox, but the file content is corrupt. (of course it is, duh)
pass the file_bytes as decoded string with latin-1 encoding:
output = [{'file': input_data['file_bytes'].decode('latin-1')}]
it do uploaded to dropbox, the PDF can also be opened, even having the same page number as the original PDF, but it is all blank (white, no content)
...
So, is this kind of feature really visible in Zapier platform? Or I was already at dead end even since the beginning?

How do I edit an executable with python by address/offset/bytes, like in a hex editor?

I've used Hex-rays IDA to find the bytes of code I need changed in a windows executable. I would like to write a python script that will programmatically edit those bytes.
I know the address (as given in hex-rays IDA) and I know the hexadecimal I wish to overwrite it with. How do I do this in python? I'm sure there is a simple answer, but I can't find it.
(For example: address = 0x00436411, and new hexadecimal = 0xFA)

You just need to open the executable as a file, for writing, in binary mode; then seek to the position you want to write; then write. So:
with open(path, 'r+b') as f:
f.seek(position)
f.write(new_bytes)
If you're going to be changing a lot of bytes, you may find it simpler to use mmap, which lets you treat the file as a giant list:
with open(path, 'r+b') as f:
with contextlib.closing(mmap.mmap(f.fileno(), access=mmap.ACCESS_WRITE)) as m:
m[first_position] = first_new_byte
m[other_position] = other_new_byte
# ...
If you're trying to write multi-byte values (e.g., a 32-bit int), you probably want to use the struct module.
If what you know is an address in memory at runtime, rather than a file position, you have to be able to map that to the right place in the executable file. That may not even be possible (e.g., a memory-mapped region). But if it is, you should be able to find out from the debugger where it's mapped. From inside a debugger, this is easy; from outside, you need to parse the PE header structures and do a lot of complicated logic, and there is no reason to do that.
I believe when using hex-ray IDA as a static disassembler, with all the default settings, the addresses it gives you are the addresses where the code and data segments will be mapped into memory if they aren't forced to relocate. Those are, obviously, not offsets into the file.

Writing raw IP data to an interface (linux)

I have a file which contains raw IP packets in binary form. The data in the file contains a full IP header, TCP\UDP header, and data. I would like to use any language (preferably python) to read this file and dump the data onto the line.
In Linux I know you can write to some devices directly (echo "DATA" > /dev/device_handle). Would using python to do an open on /dev/eth1 achieve the same effect (i.e. could I do echo "DATA" > /dev/eth1)

Something like:
#!/usr/bin/env python
import socket
s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
s.bind(("ethX", 0))
blocksize = 100;
with open('filename.txt') as fh:
while True:
block = fh.read(blocksize)
if block == "": break #EOF
s.send(block)
Should work, haven't tested it however.
ethX needs to be changed to your interface (e.g. eth1, eth2, wlan1, etc.)
You may want to play around with blocksize. 100 bytes at a time should be fine, you may consider going up but I'd stay below the 1500 byte Ethernet PDU.
It's possible you'll need root/sudoer permissions for this. I've needed them before when reading from a raw socket, never tried simply writing to one.
This is provided that you literally have the packet (and only the packet) dumped to file. Not in any sort of encoding (e.g. hex) either. If a byte is 0x30 it should be '0' in your text file, not "0x30", "30" or anything like that. If this is not the case you'll need to replace the while loop with some processing, but the send is still the same.
Since I just read that you're trying to send IP packets -- In this case, it's also likely that you need to build the entire packet at once, and then push that to the socket. The simple while loop won't be sufficient.

No; there is no /dev/eth1 device node -- network devices are in a different namespace from character/block devices like terminals and hard drives. You must create an AF_PACKET socket to send raw IP packets.

How to separate content from a file that is a container for binary and other forms of content

I am trying to parse some .txt files. These files serve as containers for a variable number of 'children' files that are set off or identified within the container with SGML tags. With python I can easily separate the children files. However I am having trouble writing the binary content back out as a binary file (say a gif or jpg). In the simplest case the container might have an embedded html file followed by a graphic that is called by the html. I am assuming that my problem is because I am reading the original .txt file using open(filename,'r'). But that seems the only option to find the sgml tags to split the file.
I would appreciate any help to identify some relevant reading material.
I appreciate the suggestions but I am still struggling with the most basic questions. For example when I open the file with wordpad and scroll down to the section tagged as a gif I see this:
<FILENAME>h65803h6580301.gif
<DESCRIPTION>GRAPHIC
<TEXT>
begin 644 h65803h6580301.gif
M1TE&.#EA(P)I`=4#`("`#,#`P$!`0+^_OW]_?_#P\*"#H.##X-#0T&!#8!`0
M$+"PL"`#('!P<)"0D#`P,%!04#\_/^_O[Y^?GZ^OK]_?WX^/C\_/SV]O;U]?
I can handle finding the section easily enough but where does the gif file begin. Does the header start with 644, the blanks after the word begin or the line beginning with MITE?
Next, when the file is read into python does it do anything to the binary code that has to be undone when it is read back out?
I can find the lines where the graphics begin:
filerefbin=file('myfile.txt','rb')
wholeFile=filerefbin.read()
import re
graphicReg=re.compile('<DESCRIPTION>GRAPHIC')
locationGraphics=graphicReg.finditer(wholeFile)
graphicsTags=[]
for match in locationGraphics:
graphicsTags.append(match.span())
I can easily use the same process to get to the word begin, or to identify the filename and get to the end of the filename in the 'first' line. I have also successefully gotten to the end of the embedded gif file. But I can't seem to write out the correct combination of things so when I double click on h65803h6580301.gif when it has been isolated and saved I get to see the graphic.
Interestingly, when I open the file in rb, the line endings appear to still be present even though they don't seem to have any effect in notebpad. So that is clearly one of my problems I might need to readlines and join the lines together after stripping out the \n
I love this site and I love PYTHON
This was too easy once I read bendin's post. I just had to snip the section that began with the word begin and save that in a txt file and then run the following command:
import uu
uu.decode(r'c:\test2.txt',r'c:\test.gif')
I have to work with some other stuff for the rest of the day but I will post more here as I look at this more closely. The first thing I need to discover is how to use something other than a file, that is since I read the whole .txt file into memory and clipped out the section that has the image I need to work with the clipped section instead of writing it out to test2.txt. I am sure that can be done its just figuring out how to do it.

What you're looking at isn't "binary", it's uuencoded. Python's standard library includes the module uu, to handle uuencoded data.
The module uu requires the use of temporary files for encoding and decoding. You can accomplish this without resorting to temporary files by using Python's codecs module like this:
import codecs
data = "Let's just pretend that this is binary data, ok?"
uuencode = codecs.getencoder("uu")
data_uu, n = uuencode(data)
uudecode = codecs.getdecoder("uu")
decoded, m = uudecode(data_uu)
print """* The initial input:
%(data)s
* Encoding these %(n)d bytes produces:
%(data_uu)s
* When we decode these %(m)d bytes, we get the original data back:
%(decoded)s""" % globals()

You definitely need to be reading in binary mode if the content includes JPEG images.
As well, Python includes an SGML parser, http://docs.python.org/library/sgmllib.html .
There is no example there, but all you need to do is setup do_ methods to handle the sgml tags you wish.

You need to open(filename,'rb') to open the file in binary mode. Be aware that this will cause python to give You confusing, two-byte line endings on some operating systems.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grabbing program header information with pyelftools - python

Related

Error (Hung process?) when using COPY INTO with ANSI file

zapier - python - pass bytes to output to be used for next action

How do I edit an executable with python by address/offset/bytes, like in a hex editor?

Writing raw IP data to an interface (linux)

How to separate content from a file that is a container for binary and other forms of content

Categories

Resources