How to add a character before a string? - python

I am new and I'm trying to insert a character before a string.
If I have a string like so:
'wB0JSYuEUshUkgpKi8TRTwv/EABgBAQADAQAAAAAAA'
I want to add b before the string but not part of the string-like so:
b'wB0JSYuEUshUkgpKi8TRTwv/EABgBAQADAQAAAAAAA'
Here's what I tried:
test = 'b' + words[1]
test
but this obviously returns the b within the string, which is not what I want.

That b is not part of the string, it's a special syntax in Python 3.x to indicate that it's a bytes literal (see this post). If you want to convert a "normal" string into a bytes literal, do this:
st = 'abc'
bl = st.encode()
bl
=> b'abc'

I'm not exactly sure what you mean. But assuming words is a list of strings, and index 1 = 'wB0JSYuEUshUkgpKi8TRTwv/EABgBAQADAQAAAAAAA' you could print(f'b {words[1]}')

There is a bit of confusion here. In python "" is a string and b"" is a byte string. These are completely different objects. They can be converted to one another, but they are not the same thing. You can't add "b to a string". Essentially a byte string b"" is a string of the bytes that generate a string, and a string is well the string. For example,
x = 'STRING' #The string itself.
y = x.encode() #The bytes for the string. Note that ascii bytes are written in ascii.
a = 'MyName®' #The string itself.
b = a.encode() #The bytes for the string. The last character takes two non-ascii bytes.
c = b.decode() #Covert the bytes back to a string.

Related

Why does a string converted from an array behave differently from another initialized with the same value?

The goal of the program is converting the little_endian string to another string equal to clean_data_little_endian and then to convert it using struct.unpack. However the string clean_data_little_endian behaves differently from the other, that is the result of a conversion from an array.
During debug clean_data_little_endian is à1ÿÏÿÊÿÄ and strBinary_Values is \xE0\x31\xFF\xCF\xFF\xCA\xFF\xC4 and if I try to print them I obtain
:
clean_data_little_endian: b'\xe01\xff\xcf\xff\xca\xff\xc4' <class 'str'>
strBinary_Values: b'\\xE0\\x31\\xFF\\xCF\\xFF\\xCA\\xFF\\xC4' <class 'str'>
(strBinary values has 2 backslashes instead of one)
There must be a difference that I don't know how to remove between them, so that struct.unpack works only with clean_data_little_endian and not with strBinary_Values.
The error returned is:
unpack requires a buffer of 8 bytes
and if I change the buffer the number of bytes required becomes the double and so on.
Here's the code I used, even if I think it will not be necessary to read it.
little_endian = '#800000100?xE0??x31??xFF??xCF??xFF??xCA??xFF??xC4?'
clean_data_little_endian = '\xE0\x31\xFF\xCF\xFF\xCA\xFF\xC4'
#from raw string to clean string
j=0
i=0
listValuesToClean = list(little_endian[10:len(little_endian)])
for i in range(0,len(listValuesToClean)-1):
mod = i % 5
if ((mod == 2) or (mod == 3) or (mod == 1)):
listBinary_Values.append(listValuesToClean[i])
j=j+1
if (mod == 0):
listBinary_Values.append('\\')
j=j+1
strBinary_Values=''.join(listBinary_Values)
print('expected: ',clean_data_little_endian.encode('raw_unicode_escape'),type(strBinary_Values), '\n' 'real: ', strBinary_Values.encode('raw_unicode_escape'),type(clean_data_little_endian))
#from clean string to initial values
iqty_of_values = len(strBinary_Values)/8
h = "H" * int(iqty_of_values)
#correct result:
ivalues = struct.unpack("<"+h,clean_data_little_endian.encode('raw_unicode_escape'))
#wrong result:
ivalues = struct.unpack("<"+h,strBinary_Values.encode('raw_unicode_escape'))
The double backslashes indicate a literal backslash, and it doesn't create the byte values you want. This would fix it. latin1 translates 1:1 Unicode string codepoints to byte values, which is required for unicode_escape to translate the literal escape codes to Unicode string codepoints, but then encoding to latin1 again turns the string back to the bytes required for unpack:
ivalues = struct.unpack("<"+h,strBinary_Values.encode('latin1').decode('unicode_escape').encode('latin1'))
print(ivalues)
# (12768, 53247, 51967, 50431)
From the looks of it, a regular expression to capture the hexadecimal bytes and a direct conversion using bytes.fromhex would be more straightforward:
import re
import struct
little_endian = '#800000100?xE0??x31??xFF??xCF??xFF??xCA??xFF??xC4?'
s = ''.join(re.findall(r'x([0-9A-F]{2})',little_endian))
print(s)
b = bytes.fromhex(s)
print(b)
data = struct.unpack(f'<{len(b)//2}H',b)
print(data)
Output:
E031FFCFFFCAFFC4
b'\xe01\xff\xcf\xff\xca\xff\xc4'
(12768, 53247, 51967, 50431)

Python incorrectly converts between bytes and hex for me

I have an info_address that I want to convert to delimited hex
info_address_original = b'002dd748'
What i want is
info_address_coded = b'\x00\x2d\xd7\x48'
I tried this solution
info_address_original = b'002dd748'
info_address_intermediary = info_address_original.decode("utf-8") # '002dd748'
info_address_coded = bytes.fromhex( info_address_intermediary ) # b'\x00-\xd7H'
and i get
info_address_coded = b'\x00-\xd7H'
What my debugger shows
How would one go about correctly turning a bytes string like that to delimited hex? It worked implicitly in Python 2 but it doesn't work the way i would want in Python 3.
This is only a representation of the bytes. '-' is the same as '\x2d'.
>>> b'\x00\x2d\xd7\x48' == b'\x00-\xd7H'
True
The default representation of a byte string is to display the character value for all ascii printable characters and the encoded \xhh representation where hh is the hexadecimal value of the byte.
That means that b'\x00\x2d\xd7\x48' and `b'\x00-\xd7H' are the exact same string containing 4 bytes.

Converting a hex values to ASCII

What is the most 'Pythonic' way of translating
'\xff\xab\x12'
into
'ffab12'
I looked for functions that can do it, but they all want to translate to ASCII (so '\x40' to 'a'). I want to have the hexadecimal digits in ASCII.
There's a module called binascii that contains functions for just this:
>>> import binascii
>>> binascii.hexlify('\xff\xab\x12')
'ffab12'
>>> binascii.unhexlify('ffab12')
'\xff\xab\x12'
original = '\xff\xab\x12'
result = original.replace('\\x', '')
print result
It's \x because it's escaped. a.replace(b,c) just replaces all occurances of b with c in a.
What you want is not ascii, because ascii translates 0x41 to 'A'. You just want it in hexadecimal base without the \x (or 0x, in some cases)
Edit!!
Sorry, I thought the \x is escaped. So, \x followed by 2 hex digits represents a single char, not 4..
print "\x41"
Will print
A
So what we have to do is to convert each char to hex, then print it like that:
res = ""
for i in original:
res += hex(ord(i))[2:].zfill(2)
print res
Now let's go over this line:
hex(ord(i))[2:]
ord(c) - returns the numerical value of the char c
hex(i) - returns the hex string value of the int i (e.g if i=65 it will return 0x41.
[2:] - cutting the "0x" prefix out of the hex string.
.zfill(2) - padding with zeroes
So, making that with a list comprehension will be much shorter:
result = "".join([hex(ord(c))[2:].zfill(2) for c in original])
print result

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 7 years ago.
I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this:
06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).
The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.
So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
And that's what I need to convert to hexadecimal.
So far I tried binascii with no success, I've tried this:
h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
h += hex(i)
print(h)
It prints:
0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37
Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?
When I remove 0x from the string like this:
h.replace("0x", "")
I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
Any ideas?
If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.
>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Otherwise you can use binascii.hexlify() to do the same thing
>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.
The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:
>>> hex(10)
'0xa'
>>> hex(2)
'0x2'
So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.
What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:
>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):
>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'

Python store non numeric string as number

I am currently trying to find a way to convert any sort of text to a number, so that it can later be converted back to text.
So something like this:
text = "some string"
number = somefunction(text)
text = someotherfunction(number)
print(text) #output "some string"
If you're using Python 3, it's pretty easy. First, convert the str to bytes in a chosen encoding (utf-8 is usually appropriate), then use int.from_bytes to convert to an int:
number = int.from_bytes(mystring.encode('utf-8'), 'little')
Converting back is slightly trickier (and will lose trailing NUL bytes unless you've stored how long the resulting string should be somewhere else; if you switch to 'big' endianness, you lose leading NUL bytes instead of trailing):
recoveredstring = number.to_bytes((number.bit_length() + 7) // 8, 'little').decode('utf-8')
You can do something similar in Python 2, but it's less efficient/direct:
import binascii
number = int(binascii.hexlify(mystring.encode('utf-8')), 16)
hx = '%x' % number
hx = hx.zfill(len(hx) + (len(hx) & 1)) # Make even length hex nibbles
recoveredstring = binascii.unhexlify(hx).decode('utf-8')
That's equivalent to the 'big' endian approach in Python 3; reversing the intermediate bytes as you go in each direction would get the 'little' effect.
You can use the ASCII values to do this:
ASCII to int:
ord('a') # = 97
Back to a string:
str(unichr(97)) # = 'a'
From there you could iterate over the string one character at a time and store these in another string. Assuming you are using standard ASCII characters, you would need to zero pad the numbers (because some are two digits and some three) like so:
s = 'My string'
number_string = ''
for c in s:
number_string += str(ord(c)).zfill(3)
To decode this, you will read the new string three characters at a time and decode them into a new string.
This assumes a few things:
all characters can be represented by ASCII (you could use Unicode code points if not)
you are storing the numeric value as a string, not as an actual int type (not a big deal in Python—saves you from having to deal with maximum values for int on different systems)
you absolutely must have a numeric value, i.e. some kind of hexadecimal representation (which could be converted into an int) and cryptographic algorithms won't work
we're not talking about GB+ of text that needs to be converted in this manner

Categories

Resources