Python 3 code:
def md5hex(data):
""" return hex string of md5 of the given string """
h = MD5.new()
h.update(data.encode('utf-8'))
return b2a_hex(h.digest()).decode('utf-8')
Python 2 code:
def md5hex(data):
""" return hex string of md5 of the given string """
h = MD5.new()
h.update(data)
return b2a_hex(h.digest())
Input python 3:
>>> md5hex('bf5¤7¤8¤3')
'61d91bafe643c282bd7d7af7083c14d6'
Input python 2:
>>> md5hex('bf5¤7¤8¤3')
'46440745dd89d0211de4a72c7cea3720'
Whats going on?
EDIT:
def genurlkey(songid, md5origin, mediaver=4, fmt=1):
""" Calculate the deezer download url given the songid, origin and media+format """
data = b'\xa4'.join(_.encode("utf-8") for _ in [md5origin, str(fmt), str(songid), str(mediaver)])
data = b'\xa4'.join([md5hex(data), data])+b'\xa4'
if len(data)%16:
data += b'\x00' * (16-len(data)%16)
return hexaescrypt(data, "jo6aey6haid2Teih").decode('utf-8')
All this problem started with this b'\xa4' in python 2 code in another function. This byte doesn't work in python 3.
And with that one I get the correct MD5 hash...
Use hashlib & a language agnostic implementation instead:
import hashlib
text = u'bf5¤7¤8¤3'
text = text.encode('utf-8')
print(hashlib.md5(text).hexdigest())
works in Python 2/3 with the same result:
Python2:
'61d91bafe643c282bd7d7af7083c14d6'
Python3 (via repl.it):
'61d91bafe643c282bd7d7af7083c14d6'
The reason your code is failing is the encoded string is not the same string as the un-encoded one: You are only encoding for Python 3.
If you need it to match the unencoded Python 2:
import hashlib
text = u'bf5¤7¤8¤3'
print(hashlib.md5(text.encode("latin1")).hexdigest())
works:
46440745dd89d0211de4a72c7cea3720
the default encoding for Python 2 is latin1 not utf-8
Default encoding in python3 is Unicode. In python 2 it's ASCII. So even if string matches when read they are presented differently.
Related
I have Unicode Code Point of an emoticon represented as U+1F498:
emoticon = u'\U0001f498'
I would like to get utf-16 decimal groups of this character, which according to this website are 55357 and 56472.
I tried to do print emoticon.encode("utf16") but did not help me at all because it gives some other characters.
Also, trying to decode from UTF-8 before encode it to UTF-16 as follow print str(int("0001F498", 16)).decode("utf-8").encode("utf16") does not help either.
How do I correctly get the utf-16 decimal groups of a unicode character?
You can encode the character with the utf-16 encoding, and then convert every 2 bytes of the encoded data to integers with int.from_bytes (or struct.unpack in python 2).
Python 3
def utf16_decimals(char, chunk_size=2):
# encode the character as big-endian utf-16
encoded_char = char.encode('utf-16-be')
# convert every `chunk_size` bytes to an integer
decimals = []
for i in range(0, len(encoded_char), chunk_size):
chunk = encoded_char[i:i+chunk_size]
decimals.append(int.from_bytes(chunk, 'big'))
return decimals
Python 2 + Python 3
import struct
def utf16_decimals(char):
# encode the character as big-endian utf-16
encoded_char = char.encode('utf-16-be')
# convert every 2 bytes to an integer
decimals = []
for i in range(0, len(encoded_char), 2):
chunk = encoded_char[i:i+2]
decimals.append(struct.unpack('>H', chunk)[0])
return decimals
Result:
>>> utf16_decimals(u'\U0001f498')
[55357, 56472]
In a Python 2 "narrow" build, it is as simple as:
>>> emoticon = u'\U0001f498'
>>> map(ord,emoticon)
[55357, 56472]
This works in Python 2 (narrow and wide builds) and Python 3:
from __future__ import print_function
import struct
emoticon = u'\U0001f498'
print(struct.unpack('<2H',emoticon.encode('utf-16le')))
Output:
(55357, 56472)
This is a more general solution that prints the UTF-16 code points for any length of string:
from __future__ import print_function,division
import struct
def utf16words(s):
encoded = s.encode('utf-16le')
num_words = len(encoded) // 2
return struct.unpack('<{}H'.format(num_words),encoded)
emoticon = u'ABC\U0001f498'
print(utf16words(emoticon))
Output:
(65, 66, 67, 55357, 56472)
I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
Receive hex values from serial port, which are characters encoded in ISO-8859-2.
Decode them, this is - get "standard" python unicode strings from them.
Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
Your example doesn't work because you've tried to use a str to hold bytes. In Python 3 you must use byte strings.
In reality, if you're using PySerial then you'll be reading byte strings anyway, which you can convert as required:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
s = ser.read(10)
# Py3: s == bytes
# Py2.x: s == str
my_unicode_string = s.decode('iso-8859-2')
If your iso-8895-2 data is actually then encoded to ASCII hex representation of the bytes, then you have to apply an extra layer of encoding:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
hex_repr = ser.read(10)
# Py3: hex_repr == bytes
# Py2.x: hex_repr == str
# Decodes hex representation to bytes
# Eg. b"A3" = b'\xa3'
hex_decoded = codecs.decode(hex_repr, "hex")
my_unicode_string = hex_decoded.decode('iso-8859-2')
Now you can pass my_unicode_string to your favourite XML library.
Interesting sample data. Ideally your sample data should be a direct print of the raw data received from PySerial. If you actually are receiving the raw bytes as 8-digit hexadecimal values, then:
#!python3
from binascii import unhexlify
data = b''.join(unhexlify(x)[::-1] for x in b'''\
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069'''.splitlines())
print(data.decode('iso-8859-2'))
Output:
W chuj bardzo długa nazwa jakiejś zapyziałej pipidówy, brudnej ulicyumer najgorszej rudery we wsi
Google Translate of Polish to English:
The dick very long name some zapyziałej Small Town , dirty ulicyumer worst hovel in the village
This topic is closed. Working code, that handles what need to be done:
x=177
x.to_bytes(1, byteorder='big').decode("ISO-8859-2")
For my class assignment we need to decrypt a message that used RSA Encryption. We were given code that should help us with the decryption, but its not helping.
def block_decode(x):
output = ""
i = BLOCK_SIZE+1
while i > 0:
b1 = int(pow(95,i-1))
y = int(x/b1)
i = i - 1
x = x - y*b1
output = output + chr(y+32)
return output
I'm not great with python yet but it looks like it is doing something one character at a time. What really has me stuck is the data we were given. Can't figure out where or how to store it or if it is really decrypted data using RSA. below are just 3 lines of 38 lines some lines have ' or " or even multiple.
FWfk ?0oQ!#|eO Wgny 1>a^ 80*^!(l{4! 3lL qj'b!.9#'!/s2_
!BH+V YFKq _#:X &?A8 j_p< 7\[0 la.[ a%}b E`3# d3N? ;%FW
KyYM!"4Tz yuok J;b^!,V4) \JkT .E[i i-y* O~$? o*1u d3N?
How do I get this into a string list?
You are looking for the function ord which is a built-in function that
Returns the integer ordinal of a one-character string.
So for instance, you can do:
my_file = open("file_containing_encrypted_message")
data = my_file.read()
to read in the encrypted contents.
Then, you can iterate over each character doing
char_val = ord(each_character)
block_decode(char_val)
I receive a bunch of data into a variable using a mechanize and urllib in Python 2.7. However, certain characters are not decoded despite using .decode(UTF-8). The full code is as follows:
#!/usr/bin/python
import urllib
import mechanize
from time import time
total_time = 0
count = 0
def send_this(url):
global count
count = count + 1
this_browser=mechanize.Browser()
this_browser.set_handle_robots(False)
this_browser.addheaders=[('User-agent','Chrome')]
translated=this_browser.open(url).read().decode("UTF-8")
return translated
def collect_this(my_ltarget,my_lhome,data):
global total_time
data = data.replace(" ","%20")
get_url="http://mymemory.translated.net/api/ajaxfetch?q="+data+"&langpair="+my_lhome+"|"+my_ltarget+"&mtonly=1"
return send_this(get_url)
ctr = 0
print collect_this("hi-IN","en-GB","This is my first proper computer program.")
The output of the print statement is:
{"responseData":{"translatedText":"\u092f\u0939 \u092e\u0947\u0930\u093e \u092a\u0939
u0932\u093e \u0938\u092e\u0941\u091a\u093f\u0924 \u0915\u0902\u092a\u094d\u092f\u0942\u091f
\u0930 \u092a\u094d\u0930\u094b\u0917\u094d\u0930\u093e\u092e \u0939\u0948
\u0964"},"responseDetails":"","responseStatus":200,"matches":[{"id":0,"segment":"This is my
first proper computer program.","translation":"\u092f\u0939 \u092e\u0947\u0930\u093e \u092a
\u0939\u0932\u093e \u0938\u092e\u0941\u091a\u093f\u0924 \u0915\u0902\u092a\u094d\u092f\u0942
\u091f\u0930 \u092a\u094d\u0930\u094b\u0917\u094d\u0930\u093e\u092e \u0939\u0948
\u0964","quality":"70","reference":"Machine Translation provided by Google, Microsoft,
Worldlingo or MyMemory customized engine.","usage-count":0,"subject":"All","created-
by":"MT!","last-updated-by":"MT!","create-date":"2013-12-20","last-update-
date":"2013-12-20","match":0.85}]}
The characters starting with \u... are supposed to be the characters that were supposed to be converted.
Where have I gone wrong?
You don't have a UTF-8-encoded string. You have JSON with JSON unicode escapes in it. Decode it with a JSON decoder:
import json
json.loads(your_json_string)
I need help converting the following PHP to python
$plaintext = "MyPassword";
$utf_text = mb_convert_encoding( $plaintext, 'UTF-16LE' );
$sha1_text = sha1( $utf_text, true );
$base64_text = base64_encode( $sha1_text );
echo $base64_text; //ouput = QEy4TXy9dNgleLq+IEcjsQDYm0A=
Convert the string to UTF16LE
Hash the output of 1. using SHA1
Encode the output of 2. using base64 encoding.
Im trying hashlib.sha1 but its not working. Maybe due to this, maybe encodings. Can anyone help
Your PHP code encodes the password to UTF16, little endian; if your password value is a unicode value, that works just fine in Python:
>>> import hashlib
>>> plaintext = u"MyPassword"
>>> utf_text = plaintext.encode('UTF-16LE')
>>> hashlib.sha1(utf_text).digest().encode('base64')
'QEy4TXy9dNgleLq+IEcjsQDYm0A=\n'
The above session was done in Python 2; in Python 3 the u prefix can be omitted as all string values are Unicode.
Got it, i was using the hashlib.sha1 incorrectly.
import hashlib
password = "MYPASSWORD"
result ='QEy4TXy9dNgleLq+IEcjsQDYm0A='
utftext = password.encode("utf-16LE")
m = hashlib.sha1()
m.update(utftext)
sha1text = m.digest()
output = sha1text.encode('base64','strict')
print "output: %s" % output
print "expect: %s" % result
I don't see why this wouldn't work. This works like a charm and prints out the desired result, QEy4TXy9dNgleLq+IEcjsQDYm0A=
from hashlib import sha1
from base64 import b64encode
print(b64encode(sha1("MyPassword".encode("utf-16le")).digest()))