How to parse values like \x00, \x95x95? - python

I have String which is like:
\x00\x00\x01\x90\x27 and on.
How can I parse this value to a simple long value?
Like, what is the long value of \x00? 0 or 00?

I am assuming that you thing your data is character...
So using python 3
a=b'\x00\x00\x01\x90'
b=b.decode('utf-8')
If your binary data has some 'nasty' i.e. bad characters this will fail - but adding an option like errors='ignore' or errors='replace' will assist you unhanding those issues.

Your question is not very clear so I am making some assumptions here. If you need fixed length conversions then you can use standard structs module.
For example:
In [11]: unpack('>i', '\x00\x01\x90\x27')
Out[11]: (102439,)
if you need variable length string then something like this will work (for unsigned and watch for endianness):
s = '\x00\x00\x01\x90\x27'
i = 0
for c in s:
i <<= 8
i |= ord(c)
print i

Related

How to convert hexadecimal string to character with that code point?

I have the string x = '0x32' and would like to turn it into y = '\x32'.
Note that len(x) == 4 and len(y) == 1.
I've tried to use z = x.replace("0", "\\"), but that causes z = '\\x32' and len(z) == 4. How can I achieve this?
You do not have to make it that hard: you can use int(..,16) to parse a hex string of the form 0x.... Next you simply use chr(..) to convert that number into a character with that Unicode (and in case the code is less than 128 ASCII) code:
y = chr(int(x,16))
This results in:
>>> chr(int(x,16))
'2'
But \x32 is equal to '2' (you can look it up in the ASCII table):
>>> chr(int(x,16)) == '\x32'
True
and:
>>> len(chr(int(x,16)))
1
Try this:
z = x[2:].decode('hex')
The ability to include code points like '\x32' inside a quoted string is a convenience for the programmer that only works in literal values inside the source code. Once you're manipulating strings in memory, that option is no longer available to you, but there are other ways of getting a character into a string based on its code point value.
Also note that '\x32' results in exactly the same string as '2'; it's just typed out differently.
Given a string containing a hexadecimal literal, you can convert it to its numeric value with int(str,16). Once you have a numeric value, you can convert it to the character with that code point via chr(). So putting it all together:
x = '0x32'
print(chr(int(x,16)))
#=> 2

Convert Hexadecimal string to long python

I want to get the value of 99997 in big endian which is (2642804992) and then return the answer as a long value
here is my code in python:
v = 99997
ttm = pack('>i', v) # change the integer to big endian form
print ("%x"), {ttm}
r = long(ttm, 16) # convert to long (ERROR)
return r
Output: %x set(['\x00\x01\x86\x9d'])
Error: invalid literal for long() with base 16: '\x00\x01\x86\x9d'
As the string is already in hex form why isn't it converting to a long? How would I remove this error and what is the solution to this problem.
pack will return a string representation of the data you provide.
The string representation is different than a base 16 of a long number. Notice the \x before each number.
Edit:
try this
ttm = pack('>I',v)
final, = unpack('<I',ttm)
print ttm
Notice the use of I, this so the number is treated as an unsigned value
You have to use struct.unpack as a reverse operation to struct.pack.
r, = unpack('<i', ttm)
this will r set to -1652162304.
You just converted the integer value to big endian binary bytes.
This is useful mostly to embed in messages addressed to big-endian machines (PowerPC, M68K,...)
Converting to long like this means parsing the ttm string which should be 0x1869D as ASCII.
(and the print statement does not work either BTW)
If I just follow your question title: "Convert hexadecimal string to long":
just use long("0x1869D",16). No need to serialize it.
(BTW long only works in python 2. In python 3, you would have to use int since all numbers are represented in the long form)
Well, I'm answering to explain why it's bound to fail, but I'll edit my answer when I really know what you want to do.
This is a nice question.
Here is what you are looking for.
s = str(ttm)
for ch in r"\bx'":
s = s.replace(ch, '')
print(int(s, 16))
The problem is that ttm is similar to a string in some aspects. This is what is looks like: b'\x00\x01\x86\x9d'. Supplying it to int (or long) keeps all the non-hex characters. I removed them and then it worked.
After removing the non-hex-digit chars, you are left with 0001869d which is indeed 99997
Comment I tried it on Python 3. But on Python 2 it will be almost the same, you won't have the b attached to the string, but otherwise it's the same thing.

Finding the length of first half of a string using string slicing in Python?

I'm working on an assignment in PyCharm, and have been tasked with the following problem:
The len() function is used to count how many characters a string contains. Get the first half of the string storied in the variable 'phrase'.
Note: Remember about type conversion.
Here's my code so far that it's given me:
phrase = """
It is a really long string
triple-quoted strings are used
to define multi-line strings
"""
first_half = len(phrase)
print(first_half)
I have no idea what to do. I need to use string slicing to find the first half of the string "phrase". Any help appreciated. I apologize for my ignorance.
Just slice the first half of the string, be sure to use // in the event that the string is of odd length like:
print phrase[:len(phrase) // 2] # notice the whitespace in your literal triple quote
Try something like:
first_half = len(phrase)
print(phrase[0:first_half/2])
It will need to be smarter to handle strings of odd length. See this question for more on slicing.
first_half = phrase[:len(phrase)//2] or phrase[:int(len(phrase)/2)]
Note: Remember about type conversion.
In Python 2 the division will yield an int, however in Python 3 you want to use an int division like this half = len(phrase) // 2
Below is a Python 2 version
>>> half = len(phrase) / 2
>>> phrase[:half]
'\nIt is a really long string\ntriple-quoted st'
No need for the 0 in phrase[0:half], phrase[:half] looks better :)
Try this print(string[:int(len(string)/2)])
len(string)/2 returns a decimal normally so that's why I used int()
Use slicing and bit shifting (which will be faster should you have to do this many times):
>>> s = "This is a string with an arbitrary length"
>>> half = len(s) >> 1
>>> s[:half]
'This is a string wit'
>>> s[half:]
'h an arbitrary length'
Try this:
phrase = """
It is a really long string
triple-quoted strings are used
to define multi-line strings
"""
first_half = phrase[0: len(phrase) // 2]
print(first_half)
you can simply slicing a string using it's indexes.
For Example:
def first_half(str):
return str[:len(str)/2]
The above function first_half accept a string and return it's half using slicing

Python 2,3 Convert Integer to "bytes" Cleanly

The shortest ways I have found are:
n = 5
# Python 2.
s = str(n)
i = int(s)
# Python 3.
s = bytes(str(n), "ascii")
i = int(s)
I am particularly concerned with two factors: readability and portability. The second method, for Python 3, is ugly. However, I think it may be backwards compatible.
Is there a shorter, cleaner way that I have missed? I currently make a lambda expression to fix it with a new function, but maybe that's unnecessary.
Answer 1:
To convert a string to a sequence of bytes in either Python 2 or Python 3, you use the string's encode method. If you don't supply an encoding parameter 'ascii' is used, which will always be good enough for numeric digits.
s = str(n).encode()
Python 2: http://ideone.com/Y05zVY
Python 3: http://ideone.com/XqFyOj
In Python 2 str(n) already produces bytes; the encode will do a double conversion as this string is implicitly converted to Unicode and back again to bytes. It's unnecessary work, but it's harmless and is completely compatible with Python 3.
Answer 2:
Above is the answer to the question that was actually asked, which was to produce a string of ASCII bytes in human-readable form. But since people keep coming here trying to get the answer to a different question, I'll answer that question too. If you want to convert 10 to b'10' use the answer above, but if you want to convert 10 to b'\x0a\x00\x00\x00' then keep reading.
The struct module was specifically provided for converting between various types and their binary representation as a sequence of bytes. The conversion from a type to bytes is done with struct.pack. There's a format parameter fmt that determines which conversion it should perform. For a 4-byte integer, that would be i for signed numbers or I for unsigned numbers. For more possibilities see the format character table, and see the byte order, size, and alignment table for options when the output is more than a single byte.
import struct
s = struct.pack('<i', 5) # b'\x05\x00\x00\x00'
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
I have found the only reliable, portable method to be
bytes(bytearray([n]))
Just bytes([n]) does not work in python 2. Taking the scenic route through bytearray seems like the only reasonable solution.
Converting an int to a byte in Python 3:
n = 5
bytes( [n] )
>>> b'\x05'
;) guess that'll be better than messing around with strings
source: http://docs.python.org/3/library/stdtypes.html#binaryseq
In Python 3.x, you can convert an integer value (including large ones, which the other answers don't allow for) into a series of bytes like this:
import math
x = 0x1234
number_of_bytes = int(math.ceil(x.bit_length() / 8))
x_bytes = x.to_bytes(number_of_bytes, byteorder='big')
x_int = int.from_bytes(x_bytes, byteorder='big')
x == x_int
from int to byte:
bytes_string = int_v.to_bytes( lenth, endian )
where the lenth is 1/2/3/4...., and endian could be 'big' or 'little'
form bytes to int:
data_list = list( bytes );
When converting from old code from python 2 you often have "%s" % number this can be converted to b"%d" % number (b"%s" % number does not work) for python 3.
The format b"%d" % number is in addition another clean way to convert int to a binary string.
b"%d" % number

python struct unpack

I'm trying to convert the following perl code:
unpack(.., "Z*")
to python, however the lack of a "*" format modifier in struct.unpack() seems to make this impossible. Is there a way I can do this in python?
P.S. The "*" modifier in perl from the perldoc - Supplying a * for the repeat count instead of a number means to use however many items are left, ...
So although python has a numeric repeat count like perl, it seems to lack a * repeat count.
python's struct.unpack doesn't have the Z format
Z A null-terminated (ASCIZ) string, will be null padded.
i think this
unpack(.., "Z*")
would be:
data.split('\x00')
although that strips the nulls
I am assuming that you create the struct datatype and you know the size of the struct. If that is the case, then you can create a buffer allocated for that struct and the pack the value into the buffer. While unpacking, you can use the same buffer to unpack directly by just specifying the starting point.
For e.g.
import ctypes
import struct
s = struct.Struct('I')
b = ctypes.create_string_buffer(s.size)
s.pack_into(b, 0, 42)
s.unpack_from(b, 0)
You must calculate the repeat count yourself:
n = len(s) / struct.calcsize(your_fmt_string)
f = '%d%s' % (n, your_fmt_string)
data = struct.unpack(s, f)
I am assuming your_fmt_string doesn't unpack more than one element, and len(s) is perfectly divided by that element's packed size.

Categories

Resources