Python strings as long number - python

I have long string and I want to present it as a long num.
I tried:
l=[ord (i)for i in str1]
but this is not what I need.
I need to make it long number and not numbers as items in the list.
this line gives me [23,21,45,34,242,32]
and I want to make it one long Number that I can change it again to the same string.
any idea?

Here is a translation of Paulo Bu's answer (with base64 encoding) into Python 3:
>>> import base64
>>> s = 'abcde'
>>> e = base64.b64encode(s.encode('utf-8'))
>>> print(e)
b'YWJjZGU='
>>> base64.b64decode(e).decode('utf-8')
'abcde'
Basically the difference is that your workflow has gone from:
string -> base64
base64 -> string
To:
string -> bytes
bytes -> base64
base64 -> bytes
bytes -> string

Is this what you are looking for :
>>> str = 'abcdef'
>>> ''.join([chr(y) for y in [ ord(x) for x in str ]])
'abcdef'
>>>

#! /usr/bin/python2
# coding: utf-8
def encode(s):
result = 0
for ch in s.encode('utf-8'):
result *= 256
result += ord(ch)
return result
def decode(i):
result = []
while i:
result.append(chr(i%256))
i /= 256
result = reversed(result)
result = ''.join(result)
result = result.decode('utf-8')
return result
orig = u'Once in Persia reigned a king …'
cipher = encode(orig)
clear = decode(cipher)
print '%s -> %s -> %s'%(orig, cipher, clear)

This is code that I found that works.
str='sdfsdfsdfdsfsdfcxvvdfvxcvsdcsdcs sdcsdcasd'
I=int.from_bytes(bytes([ord (i)for i in str]),byteorder='big')
print(I)
print(I.to_bytes(len(str),byteorder='big'))

What about using base 64 encoding? Are you fine with it? Here's an example:
>>>import base64
>>>s = 'abcde'
>>>e = base64.b64encode(s)
>>>print e
YWJjZGU=
>>>base64.b64decode(e)
'abcde'
The encoding is not pure numbers but you can go back and forth from string without much trouble.
You can also try encoding the string to hexadecimal. This will yield numbers although I'm not sure you can always come back from the encode string to the original string:
>>>s='abc'
>>>n=s.encode('hex')
>>>print n
'616263'
>>>n.decode('hex')
'abc'
If you need it to be actual integers then you can extend the trick:
>>>s='abc'
>>>n=int(s.encode('hex'), 16) #convert it to integer
>>>print n
6382179
hex(n)[2:].decode('hex') # return from integer to string
>>>abc
Note: I'm not sure this work out of the box in Python 3
UPDATE: To make it work with Python 3 I suggest using binascii module this way:
>>>import binascii
>>>s = 'abcd'
>>>n = int(binascii.hexlify(s.encode()), 16) # encode is needed to convert unicode to bytes
>>>print(n)
1633837924 #integer
>>>binascii.unhexlify(hex(n)[2:].encode()).decode()
'abcd'
encode and decode methods are needed to convert from bytes to string and the opposite. If you plan to include especial (non-ascii) characters then probably you'll need to specify encodings.
Hope this helps!

Related

Remove hex character from string in Python

I'm trying to remove one character from a string o hex values but I can't find a solution to make this work. If I print remove2 I get the string "\x41", but when I print buffer I get ABCD". The thing is I don't understand why when I print remove2 I get the string hex format and when I print buffer I get the ASCII format. I think it is in the root of the problem. How could I fix this using Python 2?
>>> buffer = "\x41\x42\x43\x44"
>>> remove = raw_input("Enter hex value to remove: ")
Enter hex value to remove: 42
>>> remove2 = "\\x" + remove
>>> print buffer
ABCD
>>> print remove2
\x42
>>> buffer2 = buffer.replace(remove2, '')
>>> print buffer2
ABCD
I wish buffer2 = "\x41\x43\x44".
Here's the problem:
remove2 = "\\x" + remove
You can't programmatically build escape sequences like that. Instead, do this:
remove2 = chr(int(remove, 16))
Alternatively, you'd have to make buffer contain the literal backslashes instead of the escaped characters:
buffer = "\\x41\\x42\\x43\\x44"
The problem being is that if you print out remove without the print, you'll see
>>> remove2
'\\x42'
that the \ is staying there and not making it hexadecimal. For that you need to do:
remove.decode('hex')
so the code being:
>>> buffer = "\x41\x42\x43\x44"
>>> remove = raw_input("Enter hex value to remove: ")
Enter hex value to remove: 42
>>> remove2=remove.decode('hex')
>>> buffer.replace(remove2, '')
'ACD'
Does that help/answers your question?
You will need to escape the \ in your buffer string o/w it will be treated as hex value. So,
>>> buffer="\\x41\\x42\\x43"`<br>
>>> remove = "42"`<br>
>>> remove = "\\x" + remove` <br>
>>> buffer = buffer.replace(remove, '')` <br>
>>> print buffer #prints \\\x41\\\x43
You can use filter() and construct a filtered bytes object using the user input of "42" and original bytes (just a string in Python2).
>>> inp = "42"
>>> filter(lambda x: x != chr(int(inp, 16)), 'ABCD')
'ACD'
Python 3
>>> inp = "42"
>>> bytes(filter(lambda x: x != int(inp, 16), b'ABCD'))
b'ACD'
Anyway, simpler to use replace(), this is just an alternative way to filter out specific values from a bytes object. It illustrates the basic idea other answers point out. The user input needs to be correctly converted to the value you intend to remove.
When the interp renders the output, the backslashes aren't represented in bytes or str objects for characters/values that correspond to utf-8 or ascii printable characters. If there isn't a corresponding printable character, then an escaped version of the value will be presented in output.

How to convert \\xhh into \xhh python

I have encounter a case where I need to convert a string of character into a character string in python.
s = "\\x80\\x78\\x07\\x00\\x75\\xb3"
print s #gives: \x80\x78\x07\x00\x75\xb3
what I want is that, given the string s, I can get the real character store in s. which in this case is "\x80, \x78, \x07, \x00, \x75, and \xb3"(something like this)�xu�.
You can use string-escape encoding (Python 2.x):
>>> s = "\\x80\\x78\\x07\\x00\\x75\\xb3"
>>> s.decode('string-escape')
'\x80x\x07\x00u\xb3'
Use unicode-escape encoding (in Python 3.x, need to convert to bytes first):
>>> s.encode().decode('unicode-escape')
'\x80x\x07\x00u³'
you can simply write a function, taking the string and returning the converted form!
something like this:
def str_to_chr(s):
res = ""
s = s.split("\\")[1:] #"\\x33\\x45" -> ["x33","x45"]
for(i in s):
res += chr(int('0'+i, 16)) # converting to decimal then taking the chr
return res
remember to print the return of the function.
to find out what does each line do, run that line, if still have questions comment it... i'll answer
or you can build a string from the byte values, but that might not all be "printable" depending on your encoding, example:
# -*- coding: utf-8 -*-
s = "\\x80\\x78\\x07\\x00\\x75\\xb3"
r = ''
for byte in s.split('\\x'):
if byte: # to get rid of empties
r += chr(int(byte,16)) # convert to int from hex string first
print (r) # given the example, not all bytes are printable char's in utf-8
HTH, Edwin

Converting string to raw bytes

I wrote a program that works with raw bytes (I don't know if this is the right name!) but the user will input the data as plain strings.
How to convert them?
I've tried wih a method but it returns a string with length 0!
Here's the starting string:
5A05705DC25CA15123C8E4750B80D0A9
Here's the result that I need:
\x5A\x05\x70\x5D\xC2\x5C\xA1\x51\x23\xC8\xE4\x75\x0B\x80\xD0\xA9
And here's the method I wrote:
def convertStringToByte(string):
byte_char = "\\x"
n=2
result = ""
bytesList = [string[i:i+n] for i in range(0, len(string), n)]
for i in range(0, len(bytesList)):
bytesList[i] = byte_char + bytesList[i]
return result
Use binascii.unhexlify():
import binascii
binary = binascii.unhexlify(text)
The same module has binascii.hexlify() too, to mirror the operation.
Demo:
>>> import binascii
>>> binary = '\x5A\x05\x70\x5D\xC2\x5C\xA1\x51\x23\xC8\xE4\x75\x0B\x80\xD0\xA9'
>>> text = '5A05705DC25CA15123C8E4750B80D0A9'
>>> binary == binascii.unhexlify(text)
True
>>> text == binascii.hexlify(binary).upper()
True
The hexlify() operation produces lowercase hex, but that is easily fixed with a .upper() call.
You must get from 5A (a string representing an hexidecimal number) to 0x5A or 90 (integers) and feed them into chr(). You can do the first conversion with int('0x5A', 16), so you'll get something like
chr(int('0x5A', 16))

Print a string as hexadecimal bytes

I have this string: Hello, World! and I want to print it using Python as '48:65:6c:6c:6f:2c:20:57:6f:72:6c:64:21'.
hex() works only for integers.
How can it be done?
You can transform your string to an integer generator. Apply hexadecimal formatting for each element and intercalate with a separator:
>>> s = "Hello, World!"
>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:2c:20:57:6f:72:6c:64:21
':'.join(x.encode('hex') for x in 'Hello, World!')
For Python 2.x:
':'.join(x.encode('hex') for x in 'Hello, World!')
The code above will not work with Python 3.x. For 3.x, the code below will work:
':'.join(hex(ord(x))[2:] for x in 'Hello, World!')
Another answer in two lines that some might find easier to read, and helps with debugging line breaks or other odd characters in a string:
For Python 2.7
for character in string:
print character, character.encode('hex')
For Python 3.7 (not tested on all releases of 3)
for character in string:
print(character, character.encode('utf-8').hex())
Some complements to Fedor Gogolev's answer:
First, if the string contains characters whose ASCII code is below 10, they will not be displayed as required. In that case, the correct format should be {:02x}:
>>> s = "Hello Unicode \u0005!!"
>>> ":".join("{0:x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:5:21:21'
^
>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:05:21:21'
^^
Second, if your "string" is in reality a "byte string" -- and since the difference matters in Python 3 -- you might prefer the following:
>>> s = b"Hello bytes \x05!!"
>>> ":".join("{:02x}".format(c) for c in s)
'48:65:6c:6c:6f:20:62:79:74:65:73:20:05:21:21'
Please note there is no need for conversion in the above code as a bytes object is defined as "an immutable sequence of integers in the range 0 <= x < 256".
Print a string as hex bytes?
The accepted answer gives:
s = "Hello world !!"
":".join("{:02x}".format(ord(c)) for c in s)
returns:
'48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21'
The accepted answer works only so long as you use bytes (mostly ascii characters). But if you use unicode, e.g.:
a_string = u"Привет мир!!" # "Prevyet mir", or "Hello World" in Russian.
You need to convert to bytes somehow.
If your terminal doesn't accept these characters, you can decode from UTF-8 or use the names (so you can paste and run the code along with me):
a_string = (
"\N{CYRILLIC CAPITAL LETTER PE}"
"\N{CYRILLIC SMALL LETTER ER}"
"\N{CYRILLIC SMALL LETTER I}"
"\N{CYRILLIC SMALL LETTER VE}"
"\N{CYRILLIC SMALL LETTER IE}"
"\N{CYRILLIC SMALL LETTER TE}"
"\N{SPACE}"
"\N{CYRILLIC SMALL LETTER EM}"
"\N{CYRILLIC SMALL LETTER I}"
"\N{CYRILLIC SMALL LETTER ER}"
"\N{EXCLAMATION MARK}"
"\N{EXCLAMATION MARK}"
)
So we see that:
":".join("{:02x}".format(ord(c)) for c in a_string)
returns
'41f:440:438:432:435:442:20:43c:438:440:21:21'
a poor/unexpected result - these are the code points that combine to make the graphemes we see in Unicode, from the Unicode Consortium - representing languages all over the world. This is not how we actually store this information so it can be interpreted by other sources, though.
To allow another source to use this data, we would usually need to convert to UTF-8 encoding, for example, to save this string in bytes to disk or to publish to html. So we need that encoding to convert the code points to the code units of UTF-8 - in Python 3, ord is not needed because bytes are iterables of integers:
>>> ":".join("{:02x}".format(c) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
Or perhaps more elegantly, using the new f-strings (only available in Python 3):
>>> ":".join(f'{c:02x}' for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
In Python 2, pass c to ord first, i.e. ord(c) - more examples:
>>> ":".join("{:02x}".format(ord(c)) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
>>> ":".join(format(ord(c), '02x') for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
You can use hexdump's:
import hexdump
hexdump.dump("Hello, World!", sep=":")
(append .lower() if you require lower-case). This works for both Python 2 and 3.
Using map and lambda function can produce a list of hex values, which can be printed (or used for other purposes)
>>> s = 'Hello 1 2 3 \x01\x02\x03 :)'
>>> map(lambda c: hex(ord(c)), s)
['0x48', '0x65', '0x6c', '0x6c', '0x6f', '0x20', '0x31', '0x20', '0x32', '0x20', '0x33', '0x20', '0x1', '0x2', '0x3', '0x20', '0x3a', '0x29']
A bit more general for those who don't care about Python 3 or colons:
from codecs import encode
data = open('/dev/urandom', 'rb').read(20)
print(encode(data, 'hex')) # Data
print(encode(b"hello", 'hex')) # String
This can be done in the following ways:
from __future__ import print_function
str = "Hello, World!"
for char in str:
mm = int(char.encode('hex'), 16)
print(hex(mm), sep=':', end=' ')
The output of this will be in hexadecimal as follows:
0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x21
For something that offers more performance than ''.format(), you can use this:
>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in 'Hello, World!' )
'48:65:6C:6C:6F:2C:20:57:6F:72:6C:64:21'
>>>
>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in b'Hello, World!' )
'48:65:6C:6C:6F:2C:20:57:6F:72:6C:64:21'
>>>
I am sorry this couldn't look nicer.
It would be nice if one could simply do '%02x'%v, but that only takes int...
But you'll be stuck with byte-strings b'' without the logic to select ord(v).
With f-string:
"".join(f"{ord(c):x}" for c in "Hello")
Use any delimiter:
>>> "⚡".join(f"{ord(c):x}" for c in "Hello")
'48⚡65⚡6c⚡6c⚡6f'
Just for convenience, very simple.
def hexlify_byteString(byteString, delim="%"):
''' Very simple way to hexlify a byte string using delimiters '''
retval = ""
for intval in byteString:
retval += ('0123456789ABCDEF'[int(intval / 16)])
retval += ('0123456789ABCDEF'[int(intval % 16)])
retval += delim
return(retval[:-1])
hexlify_byteString(b'Hello, World!', ":")
# Out[439]: '48:65:6C:6C:6F:2C:20:57:6F:72:6C:64:21'

Python get character code in different encoding?

Given a character code as integer number in one encoding, how can you get the character code in, say, utf-8 and again as integer?
UTF-8 is a variable-length encoding, so I'll assume you really meant "Unicode code point". Use chr() to convert the character code to a character, decode it, and use ord() to get the code point.
>>> ord(chr(145).decode('koi8-r'))
9618
You can only map an "integer number" from one encoding to another if they are both single-byte encodings.
Here's an example using "iso-8859-15" and "cp1252" (aka "ANSI"):
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('cp1252')
'\x80'
>>> ord(s.encode('cp1252'))
128
>>> ord(s.encode('iso-8859-15'))
164
Note that ord is here being used to get the ordinal number of the encoded byte. Using ord on the original unicode string would give its unicode code point:
>>> ord(s)
8364
The reverse operation to ord can be done using either chr (for codes in the range 0 to 127) or unichr (for codes in the range 0 to sys.maxunicode):
>>> print chr(65)
A
>>> print unichr(8364)
€
For multi-byte encodings, a simple "integer number" mapping is usually not possible.
Here's the same example as above, but using "iso-8859-15" and "utf-8":
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('utf-8')
'\xe2\x82\xac'
>>> [ord(c) for c in s.encode('iso-8859-15')]
[164]
>>> [ord(c) for c in s.encode('utf-8')]
[226, 130, 172]
The "utf-8" encoding uses three bytes to encode the same character, so a one-to-one mapping is not possible. Having said that, many encodings (including "utf-8") are designed to be ASCII-compatible, so a mapping is usually possible for codes in the range 0-127 (but only trivially so, because the code will always be the same).
Here's an example of how the encode/decode dance works:
>>> s = b'd\x06' # perhaps start with bytes encoded in utf-16
>>> map(ord, s) # show those bytes as integers
[100, 6]
>>> u = s.decode('utf-16') # turn the bytes into unicode
>>> print u # show what the character looks like
٤
>>> print ord(u) # show the unicode code point as an integer
1636
>>> t = u.encode('utf-8') # turn the unicode into bytes with a different encoding
>>> map(ord, t) # show that encoding as integers
[217, 164]
Hope this helps :-)
If you need to construct the unicode directly from an integer, use unichr:
>>> u = unichr(1636)
>>> print u
٤

Categories

Resources