I want to write a python script which extracts a function opcodes from an elf binary knowing its address e.g 0x437310 and size. How can I map this address to the corresponding offset in the binary file to start read from it ?
Using a hex editor I can figure-out that function at 0x437310 starts at offset 0x37310 in the hexdump.
How can I calculate this in a generic way, since the imagebase of a binary is not always the same.
any help will be appreciated
Let's say I want to extract the instructions of maybe_make_export_env from bash.
The first thing you want to do is find this symbol in the symbol table:
$ readelf -s /bin/bash
Num: Value Size Type Bind Vis Ndx Name
[...]
216: 000000000043ed80 18 FUNC GLOBAL DEFAULT 14 maybe_make_export_env
[...]
This gives us the address of the function in memory (0x43ed80) and its length (18).
We have the address in memory (in the process image). We now want to find the relevant address in the file. In order to do that we need to look at the program header table:
$ readelf -l /bin/bash
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000f3ad4 0x00000000000f3ad4 R E 200000
LOAD 0x00000000000f3de0 0x00000000006f3de0 0x00000000006f3de0
0x0000000000008ea8 0x000000000000ea78 RW 200000
DYNAMIC 0x00000000000f3df8 0x00000000006f3df8 0x00000000006f3df8
0x0000000000000200 0x0000000000000200 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000d8ab0 0x00000000004d8ab0 0x00000000004d8ab0
0x0000000000004094 0x0000000000004094 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x00000000000f3de0 0x00000000006f3de0 0x00000000006f3de0
0x0000000000000220 0x0000000000000220 R 1
We want to find in which PT_LOAD entry this address belongs (based on VirtAddr and MemSize). The first PT_LOAD entry range from 0x400000 to 0x400000 + 0xf3ad4 = 0x4f3ad4 (excluded) so the symbol belongs to this PT_LOAD entry.
We can find the location of the function in the file with: symbol_value - VirtAddr + Offset = 0x3ed80.
This is the relevant part of the file:
0003ed80: 8b05 3260 2b00 85c0 7406 e911 feff ff90 ..2`+...t.......
0003ed90: f3c3 0f1f 4000 662e 0f1f 8400 0000 0000 ....#.f.........
We indeed have the same bytes as the one given by objdump -d /bin/bash:
000000000043ed80 <maybe_make_export_env##Base>:
43ed80: 8b 05 32 60 2b 00 mov 0x2b6032(%rip),%eax # 6f4db8 <array_needs_making##Base>
43ed86: 85 c0 test %eax,%eax
43ed88: 74 06 je 43ed90 <maybe_make_export_env##Base+0x10>
43ed8a: e9 11 fe ff ff jmpq 43eba0 <bind_global_variable##Base+0x60>
43ed8f: 90 nop
43ed90: f3 c3 repz retq
43ed92: 0f 1f 40 00 nopl 0x0(%rax)
43ed96: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
43ed9d: 00 00 00
Related
I am trying to read values from an inverter (for solarpanels). The inverter uses Modbus (TCP/IP) with SunSpec protocol.
I am using pyModbusTCP and I can connect to the inverter and get values of type int16 and uint16 but not string or acc64.
I am using this code to get the values:
from pyModbusTCP.client import ModbusClient
client = ModbusClient(host="192.168.40.10", port=502)
client.open()
client.read_holding_registers(40084)
Do I need to import something else to be able to read these values?
I am pretty new to python and it is the first time I have worked with pyModbusTCP.
PDF with some SunSpec info: https://sunspec.org/wp-content/uploads/2015/06/SunSpec-Information-Models-12041.pdf
In the official documentation of pyModbusTCP: https://pymodbustcp.readthedocs.io/en/latest/package/class_ModbusClient.html
we can see next description:
read_holding_registers(reg_addr, reg_nb=1)
Modbus function READ_HOLDING_REGISTERS (0x03)
Parameters:
reg_addr (int) – register address (0 to 65535)
reg_nb (int) – number of registers to read (1 to 125)
Returns:
registers list or None if fail
Return type:
list of int or None
Return type is list, so we can do:
values = client.read_holding_registers(40084)
if values:
for value in values:
# you can convert int to string
converted_value = str(value)
# implement here you own lgorithm
First of all be sure register 40084 holds the value your're looking for. We're using a rev. 4 SunSpec protocol and AC Lifetime active (real) energy output in stored in an acc64 registrer at 40187 address.
Anyway, in order to read an acc64 values you must read 4 values of 16 bit each:
client.read_holding_registers(40084,4).
you'll get back 4 values like:
0
0
869
55690
each of these needs now be converted back to HEX, concat togheter and then convert to int back again:
0 (dec) --> 00 00 (Hex)
0 (dec) --> 00 00 (Hex)
869 (dec) --> 03 65 (Hex)
55690 (dec) --> D9 8A (Hex)
Final HEX Value will be: 00 00 00 00 03 65 D9 8A which converted in DEC is 57.006.474
Now I have one 1024 * 1024 * 1024 array, whose dtype is float32. Firstly I save this array to one file in the format of '.bigfile'. And then I convert this bigfile to the Fortran unformatted file by running the code as below.
with bigfile.File('filename.bigfile') as bf:
shape = bf['Field'].attrs['ndarray.shape']
data = bf['Field'][:].reshape(shape)
np.asfortranarray(data).tofile('filename.dat')
Next for testing this binary file i.e. 'filename.dat', I read this file by Python and Fortran95 respectively. The Python code runs fine and the code snippet is shown below.
field = np.fromfile('filename.dat',
dtype='float32', count=1024*1024*1024)
density_field = field.reshape(1024, 1024, 1024)
However, Fortran runtime error occurred when I run Fortran reading code:
Program readout00
Implicit None
Integer, Parameter :: Ng = 1024
Real, Allocatable, Dimension(:,:,:) :: dens
Integer :: istat, ix, iy, iz
! -------------------------------------------------------------------------
! Allocate the arrays for the original simulation data
! -------------------------------------------------------------------------
Allocate(dens(0:Ng-1, 0:Ng-1, 0:Ng-1), STAT=istat)
If( istat/=0 ) Stop "Wrong Allocation-1"
! -------------------------------------------------------------------------
Open(10, file="filename.dat", status="old", form="unformatted")
Read(10) dens
Close(10)
Write(*,*) "read-in finished"
! -------------------------------------------------------------------------
Do ix = 0, 1
Do iy = 0, 1
Do iz = 0, 1
Write(*,*) "ix, iy, iz, rho=", ix, iy, iz, dens(ix, iy, iz)
EndDo
EndDo
EndDo
!--------------------------------------------------------------------------
End Program readout00
The error message:
At line 13 of file readout00.f90 (unit = 10, file = 'filename.dat')
Fortran runtime error: I/O past end of record on unformatted file
Error termination. Backtrace:
#0 0x7f7d8aff8e3a
#1 0x7f7d8aff9985
#2 0x7f7d8affa13c
#3 0x7f7d8b0c96e0
#4 0x7f7d8b0c59a6
#5 0x400d24
#6 0x400fe1
#7 0x7f7d8a4db730
#8 0x400a58
#9 0xffffffffffffffff
I don't understand why those errors appear.
Note: the overall operation is processed in the LINUX remote server.
After repeatedly modified the read statement, I found that the Fortran code ran fine if ix<=632, iy<=632, iz<=632. If they are greater than 632, runtime error will appear. How should I do to correct this error so that dens can read all 1024^3 elements?
Read(10) (((dens(ix, iy, iz), ix=0,632), iy=0,632), iz=0,632)
Supplementary:
Today I added one clause acccess=stream in the open statement, and read(10) header before the read(10) dens, i.e.
Integer :: header
......
Open(10, file="filename.dat", status="old", &
form="unformatted", access='stream')
Read(10) header
Read(10) dens
After modification, the Fortran code 'readout00.f95' read in 1024 * 1024 * 1024 array, i.e. dens successfully.
Why does the original 'readout00.f95' fail to read in dens?
#IanH has correctly answered your question in the comments, or more precisely pointed to the correct answer in a different question.
The 'unformatted' format just means that the file is not interpreted as human-readable, but the data in the file needs to be laid out in a specific way. While the specific format is not certain, and compiler- and system-dependent, usually each record has its own header and footer that displays the length of the data.
The numpy.asfortanarray does not impact the file layout at all, it only ensures that the layout of the array in memory is the same as Fortran (Column-Major, or first index changing most quickly), as opposed to the usual (Row-Major, or last index changing most quickly).
See this example:
I created the same data (type int16, values 0 through 11) in python and fortran, and stored it in two files, the python version with np.asfortranarray.tofile and in Fortran with unformatted write. These are the results:
With Python:
0000000 00 00 01 00 02 00 03 00 04 00 05 00 06 00 07 00
0000010 08 00 09 00 0a 00 0b 00
With Fortran:
0000000 18 00 00 00 00 00 01 00 02 00 03 00 04 00 05 00
0000010 06 00 07 00 08 00 09 00 0a 00 0b 00 18 00 00 00
In the python file, the 'data' starts immediately (00 00 for 0, then 01 00 for 1, and so forth, until 0b 00 for 11), but in Fortran, there's a 4-byte header: 18 00 00 00, or 24, which is the number of bytes of data, and this value is then repeated at the end.
When you try to read a file with Fortran using form='unformatted', that is the kind of data that the program expects to find, but that's not the data you have.
The solution is exactly what you have done: Use a stream. In a stream, the program expects the data to come in continuously without any headers or metadata.
I am trying to convert the hex dump obtained from a Cisco router via embedded packet capture feature to pcap file.
My input format is as listed below
0
0000: 70E42273 90D2003A 7D36A502 81000183 p."s...:}6......
0010: 080045C0 003BB1BF 40000106 8FA20A10 ..E..;..#.......
0020: 91BD0A10 91BEAC03 00B313C4 EE96E803 ................
0030: 1C875018 3D41832D 0000FFFF FFFFFFFF ..P.=A.-........
0040: FFFFFFFF FFFFFFFF FFFF0013 04 .............
1
0000: 003A7D36 A50270E4 227390D2 81000183 .:}6..p."s......
0010: 08004500 00281097 40000106 319E0A10 ..E..(..#...1...
0020: 91BE0A10 91BD00B3 AC03E803 1C8713C4 ................
0030: EEA95010 7B534936 0000 ..P.{SI6..
2
0000: 003A7D36 A50270E4 227390D2 81000183 .:}6..p."s......
0010: 08004500 003B1197 40000106 308B0A10 ..E..;..#...0...
0020: 91BE0A10 91BD00B3 AC03E803 1C8713C4 ................
0030: EEA95018 7B534508 0000FFFF FFFFFFFF ..P.{SE.........
0040: FFFFFFFF FFFFFFFF FFFF0013 04 .............
The above format is not accepted in text2pcap, as text2pcap is expecting
0000: 70 E4 22 73 90 D2 00 3A 7D 36 A5 02 81 00 01 83
0010: 08 00 45 C0 00 3B B1 BF 40 00 01 06 8F A2 0A 10
Is there any converter tools or scripts available for the same?
Is there any converter tools or scripts available for the same?
As you know, text2pcap doesn't currently support this data format; however, I have opened a Wireshark bug report so that one day text2pcap may natively support reading data in such a format. Feel free to follow Wireshark Bug 16193 - text2pcap could be enhanced to accept input in other formats for any updates to this enhancement request.
In the meantime, you will either have to write your own script/command(s), find someone to write one for you, or use/modify an existing script/command in order to convert the data into a format readable by text2pcap. To help get you going, I'm providing you with one method that seems to work in my testing. Assuming your output is saved in a dump.in file, you can run the following:
cat dump.in | sed 's/\([0-9A-F]\{2\}\)/\1 /g' | sed 's/\([0-9A-F]\{2\}\) \([0-9A-F]\{2\}\) : /\1\2 /g' > dump.out
Both cat and sed should be available on most platforms. I actually ran this command on Windows 10 under Cygwin.
NOTE: I am no sed expert, but there are almost certainly sed experts out there who can probably figure out how to get this to work in 1 pass; I couldn't in the time I was willing to spend on this.
Using the command provided, I was able to convert the data to a format that text2pcap could read and then ran text2pcap -a dump.out dump.pcap to generate a valid pcap file. Running tshark -r dump.pcap generates the following output:
1 387 2019-11-12 21:49:23.000000 0.000000 0.000000 10.16.145.189 → 10.16.145.190 BGP 77 KEEPALIVE Message
2 387 2019-11-12 21:49:23.000001 0.000001 0.000001 10.16.145.190 → 10.16.145.189 TCP 58 bgp(179) → 44035 [ACK] Seq=1 Ack=20 Win=31571 Len=0
3 387 2019-11-12 21:49:23.000002 0.000002 0.000001 10.16.145.190 → 10.16.145.189 BGP 77 KEEPALIVE Message
I assume that's the correct and expected output.
See also: How to convert hex dump from 4 hex digit groups to 2 hex digit groups
First of all, I know this might look like a duplicate of: ePassport reading with PN532, Keep Getting SW1 SW2 = 0x69 0x88 (Incorrect Secure Messaging Data Objects)
..but I'm one step further in the process.
I'm trying to read a MRTD (ePassport) using python and a PN7120 nfc reader. I used pyPassport 2.0 as a basis.
I know the reader is OK because I can read passports with the same device using a Android setup.
I followed the ICAO 9303 Part 11 documentation, and simulated the "worked example" (appendix D, same file).
The problem
When using a real passport, the steps "select applet", "get challenge", "do bac", and "select file" step work fine but read binary results in 69 88 (Incorrect Secure Messaging Data Objects).
When I simulate the "worked example" by injecting the ksmac/ssc I get to the exact same ProtectedAPDU outcome as stated on page 75 (AppD-7) point g.
Also, the step "select file" almost uses the exact same procedure (see def protect) with success (rAPDU 90 00).
I've compared everything in to extreme detail at least twice and really don't see where to look next. I hope someone can give some advice or insights.
The relevant log (error at the end)
Calculate Session Keys (KSenc and KSmac) using Appendix 5.1
KSenc: 3DE649F8AEA41C04FB6D4CD9043757AD
KSmac: 8C34AD61974F68CEBA3E0EAEA1456476
Calculate Send Sequence Counter
SSC: AB1D2F337FD997D6
Reading Common
Select File
APDU 00 A4 02 0C 02 [011E]
Mask class byte and pad command header
CmdHeader: 0CA4020C80000000
Pad data
Data: 011E800000000000
Encrypt data with KSenc 3DE649F8AEA41C04FB6D4CD9043757AD
EncryptedData: FF0E241E2F94B508
Build DO'87
DO87: 870901FF0E241E2F94B508
Concatenate CmdHeader and DO87
M: 0CA4020C80000000870901FF0E241E2F94B508
Compute MAC of M
Increment SSC with 1
SSC: AB1D2F337FD997D7
Concatenate SSC and M and add padding
N: AB1D2F337FD997D70CA4020C80000000870901FF0E241E2F94B5088000000000
Compute MAC over N with KSmac 8C34AD61974F68CEBA3E0EAEA1456476
CC: 22FF803EC3104336
Build DO'8E
DO8E: 8E0822FF803EC3104336
Construct and send protected APDU
ProtectedAPDU: 0CA4020C15870901FF0E241E2F94B5088E0822FF803EC310433600
[SM] - 0C A4 02 0C 15 [870901FF0E241E2F94B5088E0822FF803EC3104336] 00
[SM] - [990290008E08AAEA3B783FD6CA9D] 90 00
Receive response APDU of MRTD's chip
RAPDU: 990290008E08AAEA3B783FD6CA9DC29000
Read Binary
APDU 00 B0 00 00 [] 04
Mask class byte and pad command header
CmdHeader: 0CB0000080000000
Build DO'97
DO97: 970104
Concatenate CmdHeader and DO97
M: 0CB0000080000000970104
Compute MAC of M
Increment SSC with 1
SSC: AB1D2F337FD997D8
Concatenate SSC and M and add padding
N: AB1D2F337FD997D80CB00000800000009701048000000000
Compute MAC over N with KSmac 8C34AD61974F68CEBA3E0EAEA1456476
CC: 68DD9FD88472834A
Build DO'8E
DO8E: 8E0868DD9FD88472834A
Construct and send protected APDU
ProtectedAPDU: 0CB000000D9701048E0868DD9FD88472834A00
[SM] - 0C B0 00 00 0D [9701048E0868DD9FD88472834A] 00
[SM] - [] 69 88 //SM data objects incorrect
Thanks!!
Figured it out:
Due to a binary/hex/string conversion error (here) the SM validation step for the SELECT FILE response was skipped and thus the SSC wasn't incremented correctly.
I have traced a memory leak in my program to a Python module I wrote in C to efficiently parse an array expressed in ASCII-hex. (e.g. "FF 39 00 FC ...")
char* buf;
unsigned short bytesPerTable;
if (!PyArg_ParseTuple(args, "sH", &buf, &bytesPerTable))
{
return NULL;
}
unsigned short rowSize = bytesPerTable;
char* CArray = malloc(rowSize * sizeof(char));
// Populate CArray with data parsed from buf
ascii_buf_to_table(buf, bytesPerTable, rowSize, CArray);
int dims[1] = {rowSize};
PyObject* pythonArray = PyArray_SimpleNewFromData(1, (npy_intp*)dims, NPY_INT8, (void*)CArray);
return Py_BuildValue("(O)", pythonArray);
I realized that numpy does not know to free the memory allocated for CArray, thus causing a memory leak. After some research into this issue, at the suggestion of comments in this article I added the following line which is supposed to tell the array that it "owns" its data, and to free it when it is deleted.
PyArray_ENABLEFLAGS((PyArrayObject*)pythonArray, NPY_ARRAY_OWNDATA);
But I am still getting the memory leak. What am I doing wrong? How do I get the NPY_ARRAY_OWNDATA flag to work properly?
For reference, the documentation in ndarraytypes.h makes it seem like this should work:
/*
* If set, the array owns the data: it will be free'd when the array
* is deleted.
*
* This flag may be tested for in PyArray_FLAGS(arr).
*/
#define NPY_ARRAY_OWNDATA 0x0004
Also for reference, the following code (calling the Python function defined in C) demonstrates the memory leak.
tableData = "FF 39 00 FC FD 37 FF FF F9 38 FE FF F1 39 FE FC \n" \
"EF 38 FF FE 47 40 00 FB 3D 3B 00 FE 41 3D 00 FE \n" \
"43 3E 00 FF 42 3C FE 02 3C 40 FD 02 31 40 FE FF \n" \
"2E 3E FF FE 24 3D FF FE 15 3E 00 FC 0D 3C 01 FA \n" \
"02 3E 01 FE 01 3E 00 FF F7 3F FF FB F4 3F FF FB \n" \
"F1 3D FE 00 F4 3D FE 00 F9 3E FE FC FE 3E FD FE \n" \
"F6 3E FE 02 03 3E 00 FE 04 3E 00 FC 0B 3D 00 FD \n" \
"09 3A 00 01 03 3D 00 FD FB 3B FE FB FD 3E FD FF \n"
for i in xrange(1000000):
PES = ParseTable(tableData, 128, 4) //Causes memory usage to skyrocket
It's probably a reference-count issue (from How to extend NumPy):
One common source of reference-count errors is the Py_BuildValue function. Pay careful attention to the difference between the ‘N’ format character and the ‘O’ format character. If you create a new object in your subroutine (such as an output array), and you are passing it back in a tuple of return values, then you should most- likely use the ‘N’ format character in Py_BuildValue. The ‘O’ character will increase the reference count by one. This will leave the caller with two reference counts for a brand-new array. When the variable is deleted and the reference count decremented by one, there will still be that extra reference count, and the array will never be deallocated. You will have a reference-counting induced memory leak. Using the ‘N’ character will avoid this situation as it will return to the caller an object (inside the tuple) with a single reference count.