Why even we have to have two characters like space ?
and why space is chr(32) and not chr(0) ?
also chr(160) is A half space ??
chr(0) isn't actually a space, it's a NULL character. chr(n) returns the ASCII character for the number n.
When you print(chr(0)), it just prints the representation of the NULL character, which is nothing.
Observe:
>>> print('hi'+chr(0)+'hello')
hihello
>>> print('hi'+chr(32)+'hello')
hi hello
Note that NULL is not None, nor is it even an empty string:
>>> chr(0) is None
False
>>> chr(0) == ''
False
It is literally nothing.
chr(0) is NULL character, which is very significant and chr(32) is ' '. The point of NULL character is to terminate strings for example. So what you see like x = "abcd" is actaully x = "abcd\00", where of course \00 is the same as chr(0). Without null character you would not be able to determine the end of strings, because what might happen is that you read string byte by byte, but right after "abcd"there is something else stored in memory, y = "efgh" for example. If there would be no null char at the end of x, calling print(x) would print 'abcdefgh' and maybe even more garbage that is not x because the computer would not know when to stop.
If not mistaken chr(int) converts the int (Decimal value) to the character in the ascii code...
char(0) is Null
char(32) is space
Actually chr(n) returns not the ASCII code but the Unicode codepoint for n. The first elements Unicode happen to be the same as the ASCII ones.
Try it yourself: chr(15265) returns '㮡' in Python 3.6
Related
I am learning about strings and bytestrings in python. I don't understand why certain hexadecimal escape sequences are displayed in \XNN form and some are not?
s = 'A\x31\tC'
s1 = 'A\x00B\tC'
In this case, when I type s1 into the console, it prints the exact string of characters within the quotes,'A\x00B\tC', but when I type s into the console, it prints 'A1B\tC'. It is only when I print s1 that the screen shows 'AB C'. I don't understand why certain escape characters are shown and others are not? And why does it then show when you print them?
Cheers
http://www.asciitable.com/
If you look at the ASCII table, you would see that some characters are printable, while others are not.
In particular, \x31 == 1 (Hexadecimal 31 == Decimal 49 == ASCII Character 1.
On the other hand \x00 is not printable. It represents the null terminator (or \0)
>>> '\x31' == '1'
True
>>> '\x00' == '\0'
True
A more interesting question is: Why does \x31 get converted to 1, \x09 gets converted to a \t, while \x00 is not converted to \0. That I don't know.
When you type name into the interpreter, it is using the result of calling repr on that name. Since \x31 can be represented as 1, it uses that. Since \x00 cannot be represented as a printable character, it falls back to using the hex escape notation.
Note that:
>>> '\x31' == '1'
True
So the result of repr is valid.
What is the most 'Pythonic' way of translating
'\xff\xab\x12'
into
'ffab12'
I looked for functions that can do it, but they all want to translate to ASCII (so '\x40' to 'a'). I want to have the hexadecimal digits in ASCII.
There's a module called binascii that contains functions for just this:
>>> import binascii
>>> binascii.hexlify('\xff\xab\x12')
'ffab12'
>>> binascii.unhexlify('ffab12')
'\xff\xab\x12'
original = '\xff\xab\x12'
result = original.replace('\\x', '')
print result
It's \x because it's escaped. a.replace(b,c) just replaces all occurances of b with c in a.
What you want is not ascii, because ascii translates 0x41 to 'A'. You just want it in hexadecimal base without the \x (or 0x, in some cases)
Edit!!
Sorry, I thought the \x is escaped. So, \x followed by 2 hex digits represents a single char, not 4..
print "\x41"
Will print
A
So what we have to do is to convert each char to hex, then print it like that:
res = ""
for i in original:
res += hex(ord(i))[2:].zfill(2)
print res
Now let's go over this line:
hex(ord(i))[2:]
ord(c) - returns the numerical value of the char c
hex(i) - returns the hex string value of the int i (e.g if i=65 it will return 0x41.
[2:] - cutting the "0x" prefix out of the hex string.
.zfill(2) - padding with zeroes
So, making that with a list comprehension will be much shorter:
result = "".join([hex(ord(c))[2:].zfill(2) for c in original])
print result
I am testing struct module because I would like to send simple commands with parameters in bytes (char) and unsigned int to another application.
However I found some weird things when converting to little endian unsigned int, these examples print the correct hexadecimal representation:
>>> import struct
>>> struct.pack('<I',7)
b'\x07\x00\x00\x00'
>>> struct.pack('<I',11)
b'\x0b\x00\x00\x00'
>>> struct.pack('<I',16)
b'\x10\x00\x00\x00'
>>> struct.pack('<I',15)
b'\x0f\x00\x00\x00'
but these examples apparently not:
>>> struct.pack('<I',10)
b'\n\x00\x00\x00'
>>> struct.pack('<I',32)
b' \x00\x00\x00'
>>> struct.pack('<I',64)
b'#\x00\x00\x00'
I would appreciate any explanation or hint. Thanks beforehand!
Python is being helpful.
The bytes representation will use ASCII characters for any bytes that are printable and escape codes for the rest.
Thus, 0x40 is printed as #, because that's a printable byte. But 0x0a is represented as \n instead, because that is the standard Python escape sequence for a newline character. 0x00 is represented as \x00, a hex escape sequence denoting the NULL byte value. Etc.
All this is just the Python representation when echoing the values, for your debugging benefit. The actual value itself still consists of actual byte values.
>>> b'\x40' == b'#'
True
>>> b'\x0a' == b'\n'
True
It's just that any byte in the printable ASCII range will be shown as that ASCII character rather than a \xhh hex escape or dedicated \c one-character escape sequence.
If you wanted to see only hexadecimal representations, use the binascii.hexlify() function:
>>> import binascii
>>> binascii.hexlify(b'#\x00\x00\x00')
b'40000000'
>>> binascii.hexlify(b'\n\x00\x00\x00')
b'0a000000'
which returns bytes as hex characters (with no prefixes), instead. The return value is of course no longer the same value, you now have a bytestring of twice the original length consisting of characters representing hexadecimal values, literal a through to f and 0 through to 9 characters.
"\xNN" is just the way to represent a non-prinatble character ... it will give you the prinable character if it can
print "\x0a" == "\n" == chr(10)
I'm trying to work out a way to encode/decode binary data in such a way that the new line character is not part of the encoded string.
It seems to be a recursive problem, but I can't seem to work out a solution.
e.g. A naive implementation:
>>> original = 'binary\ndata'
>>> encoded = original.replace('\n', '=n')
'binary=ndata'
>>> decoded = original.replace('=n', '\n')
'binary\ndata'
What happens if there is already a =n in the original string?
>>> original = 'binary\ndata=n'
>>> encoded = original.replace('\n', '=n')
'binary=ndata=n'
>>> decoded = original.replace('=n', '\n')
'binary\ndata\n' # wrong
Try to escape existing =n's, but then what happens if there is already an escaped =n?
>>> original = '++nbinary\ndata=n'
>>> encoded = original.replace('=n', '++n').replace('\n', '=n')
'++nbinary=ndata++n'
How can I get around this recursive problem?
Solution
original = 'binary\ndata \\n'
# encoded = original.encode('string_escape') # escape many chr
encoded = original.replace('\\', '\\\\').replace('\n', '\\n') # escape \n and \\
decoded = encoded.decode('string_escape')
verified
>>> print encoded
binary\ndata \\n
>>> print decoded
binary
data \n
The solution is from How do I un-escape a backslash-escaped string in python?
Edit: I wrote it also with your ad-hoc economic encoding. The original "string_escape" codec escapes backslash, apostrophe and everything below chr(32) and above chr(126). Decoding is the same for both.
The way to encode strings that might contain the "escape" character is to escape the escape character as well. In python, the escape character is a backslash, but you could use anything you want. Your cost is one character for every occurrence of newline or the escape.
To avoid confusing you, I'll use forward slash:
# original
>>> print "slashes / and /newline/\nhere"
slashes / and /newline/
here
# encoding
>>> print "slashes / and /newline/\nhere".replace("/", "//").replace("\n", "/n")
slashes // and //newline///nhere
This encoding is unambiguous, since all real slashes are doubled; but it must be decoded in a single pass, so you can't just use two successive calls to replace():
# decoding
>>> def decode(c):
# Expand this into a real mapping if you have more substitutions
return '\n' if c == '/n' else c[0]
>>> print "".join( decode(c) for c in re.findall(r"(/.|.)",
"slashes // and //newline///nhere"))
slashes / and /newline/
here
Note that there is an actual /n in the input (and another slash before the newline): it all works correctly anyway.
If you encoded the entire string systematically, would you not end up escaping it? Say for every character you do chr(ord(char) + 1) or something trivial like that?
I don't have a great deal of experience with binary data, so this may be completely off/inefficient/both, but would this get around your issue?
In [40]: original = 'binary\ndata\nmorestuff'
In [41]: nlines = [index for index, i in enumerate(original) if i == '\n']
In [42]: encoded = original.replace('\n', '')
In [43]: encoded
Out[43]: 'binarydatamorestuff'
In [44]: decoded = list(encoded)
In [45]: map(lambda x: decoded.insert(x, '\n'), nlines)
Out[45]: [None, None]
In [46]: decoded = ''.join(decoded)
In [47]: decoded
Out[47]: 'binary\ndata\nmorestuff'
Again, I am sure there is a much better/more accurate way - this is just from a novice perspective.
If you are encoding an alphabet of n symbols (e.g. ASCII) into a smaller set of m symbols (e.g. ASCII except newline) you must allow the encoded string to be longer than the original string.
The typical way of doing this is to define one character as an "escape" character; the character following the "escape" represents an encoded character. This technique has been used since the 1940s in teletypewriters; that's where the "Esc" key you see on your keyboard came from.
Python (and other languages) already provide this in strings with the backslash character. Newlines are encoded as '\n' (or '\r\n'). Backslashes escape themselves, so the literal string '\r\n' would be encoded '\\r\\n'.
Note that the encoded length of a string that includes only the escaped character will be double that of the original string. If that is not acceptable you will have to use an encoding that uses a larger alphabet to avoid the escape characters (which may be longer than the original string) or compress it (which may also be longer than the original string).
How about:
In [8]: import urllib
In [9]: original = 'binary\ndata'
In [10]: encoded = urllib.quote(original)
In [11]: encoded
Out[11]: 'binary%0Adata'
In [12]: urllib.unquote(encoded)
Out[12]: 'binary\ndata'
The escapeless encodings are specifically designed to trim off certain characters from binary data. In your case of removing just the \n character, the overhead will be less than 0.4%.
I am trying to convert big integer number to hexadecimal, but in result I get extra "0x" in the beginning and "L" at the and. Is there any way to remove them. Thanks.
The number is:
44199528911754184119951207843369973680110397865530452125410391627149413347233422
34022212251821456884124472887618492329254364432818044014624401131830518339656484
40715571509533543461663355144401169142245599341189968078513301836094272490476436
03241723155291875985122856369808620004482511813588136695132933174030714932470268
09981252011612514384959816764532268676171324293234703159707742021429539550603471
00313840833815860718888322205486842202237569406420900108504810
In hex I get:
0x2ef1c78d2b66b31edec83f695809d2f86e5d135fb08f91b865675684e27e16c2faba5fcea548f3
b1f3a4139942584d90f8b2a64f48e698c1321eee4b431d81ae049e11a5aa85ff85adc2c891db9126
1f7f2c1a4d12403688002266798ddd053c2e2670ef2e3a506e41acd8cd346a79c091183febdda3ca
a852ce9ee2e126ca8ac66d3b196567ebd58d615955ed7c17fec5cca53ce1b1d84a323dc03e4fea63
461089e91b29e3834a60020437db8a76ea85ec75b4c07b3829597cfed185a70eeaL
The 0x is literal representation of hex numbers. And L at the end means it is a Long integer.
If you just want a hex representation of the number as a string without 0x and L, you can use string formatting with %x.
>>> a = 44199528911754184119951207843369973680110397
>>> hex(a)
'0x1fb62bdc9e54b041e61857943271b44aafb3dL'
>>> b = '%x' % a
>>> b
'1fb62bdc9e54b041e61857943271b44aafb3d'
Sure, go ahead and remove them.
hex(bignum).rstrip("L").lstrip("0x") or "0"
(Went the strip() route so it'll still work if those extra characters happen to not be there.)
Similar to Praveen's answer, you can also directly use built-in format().
>>> a = 44199528911754184119951207843369973680110397
>>> format(a, 'x')
'1fb62bdc9e54b041e61857943271b44aafb3d'
I think it's dangerous idea to use strip.
because lstrip or rstrip strips 0.
ex)
a = '0x0'
a.lstrip('0x')
''
result is '', not '0'.
In your case, you can simply use replace to prevent above situation.
Here's sample code.
hex(bignum).replace("L","").replace("0x","")
Be careful when using the accepted answer as lstrip('0x') will also remove any leading zeros, which may not be what you want, see below:
>>> account = '0x000067'
>>> account.lstrip('0x')
'67'
>>>
If you are sure that the '0x' prefix will always be there, it can be removed simply as follows:
>>> hex(42)
'0x2a'
>>> hex(42)[2:]
'2a'
>>>
[2:] will get every character in the string except for the first two.
A more elegant way would be
hex(_number)[2:-1]
but you have to be careful if you're working with gmpy mpz types,
then the 'L' doesn't exist at the end and you can just use
hex(mpz(_number))[2:]