Python Base64 print problem - python

I have a base64 encoded string
When I decode the string this way:
>>> import base64
>>> base64.b64decode("XH13fXM=")
'\\}w}s'
The output is fine.
But when i use it like this:
>>> d = base64.b64decode("XH13fXM=")
>>> print d
\}w}s
some characters are missing
Can anyone advise ?
Thank you in advanced.

It is just a matter of presentation:
>>> '\\}w}s'
'\\}w}s'
>>> print(_, len(_))
\}w}s 5
This string has 5 characters. When you use it in code you need to escape backslash, or use raw string literals:
>>> r'\}w}s'
'\\}w}s'
>>> r'\}w}s' == '\\}w}s'
True

When you print a string, the characters in the string are output. When the interactive shell shows you the value of your last statement, it prints the __repr__ of the string, not the string itself. That's why there are single-quotes around it, and your backslash has been escaped.
No characters are missing from your second example, those are the 5 characters in your string. The first example has had characters add to make the output a legal Python string literal.
If you want to use the print statement and have the output look like the first example, then use:
print repr(d)

Related

replace padding in base64 encoding in python 3

import base64
s = "05052020"
python2.7
base64.b64encode(s)
output is string 'MDUwNTIwMjA='
python 3.7
base64.b64encode(b"05052020")
output is bytes
b'MDUwNTIwMjA='
I want to replace = with "a"
s = str(base64.b64encode(b"05052020"))[2:-1]
s = s.replace("=", "a")
I realise it is dirty way so how can I do it better?
EDIT:
expected result:
Python code 3 output string with replaced padding
In Python 3, a byte string supports almost the same methods as a unicode string (except for encode/decode). So you can just do:
s = base64.b64encode(b"05052020").replace(b'=', b'a')
to get the b'MDUwNTIwMjAa' byte string.
If you want an unicode string, just decode it:
s = base64.b64encode(b"05052020").replace(b'=', b'a').decode()
will give 'MDUwNTIwMjAa' as a plain (unicode) Python 3 string.
Why do you need to replace the padding? If the = character breaks something, just remove them, these characters contain no information and base64 encoding works perfectly without them.
When decoding back, you may pad a few = characters back just in case (always no more than 3, so I'd pad 3, but extra characters don't break anything:
>>> import base64
>>> base64.b64encode('aa')
'YWE='
>>> base64.b64decode('YWE==')
'aa'
>>> base64.b64decode('YWE===')
'aa'
>>> base64.b64decode('YWE======')
'aa'
>>>
On the other hand, putting a character, which is a valid b64 encoding character might ruin your decoded string:
>>> base64.b64encode('aa')
'YWE='
>>> base64.b64decode('YWEa')
'aa\x1a'

How to remove '\x' from a hex string in Python?

I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?
Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')
One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).
Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'
As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'

as I hold the string of hexadecimal format without encoding the string?

The general problem is that I need the hexadecimal string stays in that format to assign it to a variable and not to save the coding?
no good:
>>> '\x61\x74'
'at'
>>> a = '\x61\x74'
>>> a
'at'
works well, but is not as:
>>> '\x61\x74'
'\x61\x74' ????????
>>> a = '\x61\x74'
>>> a
'\x61\x74' ????????
Use r prefix (explained on SO)
a = r'\x61\x74'
b = '\x61\x74'
print (a) #prints \x61\x74
print (b) # prints at
It is the same data. Python lets you specify a literal string using different methods, one of which is to use escape codes to represent bytes.
As such, '\x61' is the same character value as 'a'. Python just chooses to show printable ASCII characters as printable ASCII characters instead of the escape code, just because that makes working with bytestrings that much easier.
If you need the literal slash, x character and the two digit 6 and 1 characters (so a string of length 4), you need to double the slash or use raw strings.
To illustrate:
>>> '\x61' == 'a' # two notations for the same value
True
>>> len('\x61') # it's just 1 character
1
>>> '\\x61' # escape the escape
'\\x61'
>>> r'\x61' # or use a raw literal instead
'\\x61'
>>> len('\\x61') # which produces 4 characters
4

Similar C string format in Python

I need to read a file with some strange string lines like : \x72\xFE\x20TEST_STRING\0\0\0
but when I do a print of this string (with repr()) it prints this : r\xfe TEST_STRING\x00\x00\x00
Example :
>>> test = '\x72\xFE\x20TEST_STRING\0\0\0'
>>> print test
r? TEST_STRING
>>> print repr(test)
'r\xfe TEST_STRING\x00\x00\x00'
How can I get the same line from a file in Python and my editor ?
Is python changing encoding during string manipulation ?
You should use python's raw strings, like this (note the 'r' in front of the string)
test = r'\x72\xFE\x20TEST_STRING\0\0\0'
Then it won't try to interpret the escapes as special characters.
When reading from a text file python shouldn't be trying to interpret the string as having multi-byte unicode characters. You should get a exactly what's in the file:
In [22]: fp = open("test.txt", "r")
In [23]: s = fp.read()
In [24]: s
Out[24]: '\\x72\\xFE\\x20TEST_STRING\\0\\0\\0\n\n'
In [25]: print s
\x72\xFE\x20TEST_STRING\0\0\0
\x20 is a space. When you put that into a Python string it is stored exactly the same way as a space.
If you have printable characters in a string it does not matter whether they were typed as the actual character or some escape sequence, they will be represented the same way because they are in fact the same value.
Consider the following examples:
>>> ' ' == '\x20'
True
>>> hex(ord('a'))
'0x61'
>>> '\x61'
'a'
Python did not change the encoding:
When printing Python just resolved the printable chars in your string: chr(0x72) is a "r", chr(0xfe) is not printable, so you get the "?", chr(0x20) is chr(32) that is a space " ", and zero bytes are not printed at all.
repr() resolves the "r", leaves the chr(0xfe), and prints the chr(0) in full hexadecimal notation for chr(0x00).
So if you want the same line in your editor and for repr(), you have to type your string in your editor in the same notation repr() does, that is you write
test='r\xfe TEST_STRING\x00\x00\x00'
and repr(test) should print the same string:
To avoid having python interpret the backslashes as escaped characters, prefix your string with an "r" character:
>>> test = r'\x72\xFE\x20TEST_STRING\0\0\0'
>>> print test
\x72\xFE\x20TEST_STRING\0\0\0`

I want one backslash - not two

I have a string that after print is like this: \x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71
But I want to change this string to "\x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71" which is not printable (it is necessary to write to serial port). I know that it ist problem with '\'. how can I replace this printable backslashes to unprintable?
If you want to decode your string, use decode() with 'string_escape' as parameter which will interpret the literals in your variable as python literal string (as if it were typed as constant string in your code).
mystr.decode('string_escape')
Use decode():
>>> st = r'\x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71'
>>> print st
\x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71
>>> print st.decode('string-escape')
MÿýfHq
That last garbage is what my Python prints when trying to print your unprintable string.
You are confusing the printable representation of a string literal with the string itself:
>>> c = '\x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71'
>>> c
'M\xff\xfd\x00\x02\x8f\x0e\x80fHq'
>>> len(c)
11
>>> len('\x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71')
11
>>> len(r'\x4d\xff\xfd\x00\x02\x8f\x0e\x80\x66\x48\x71')
44
your_string.decode('string_escape')

Categories

Resources