WM_GETTEXT returns text separated with nulls - python

import time
import win32gui
import win32con
while True:
time.sleep(1)
buf = win32gui.PyMakeBuffer(255)
window = win32gui.GetForegroundWindow()
title = win32gui.GetWindowText(window)
control = win32gui.FindWindowEx(window, 0, 'Edit', None)
length = win32gui.SendMessage(control, win32con.WM_GETTEXT, 255, buf)
result = buf[:length]
print('Title: ', win32gui.GetWindowText(window))
print(str(buf[:length*2], "UTF_8")
Why it returns string separated with nulls? When I've tried just buff[:length] I had half of my string because of that nulls
bytearray(b'H\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x9dL\x03E\x888P\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0\xedL\x03\xa9\xc4\xffb\xa0\tO\x00j\x8c\x1bZ\xa04\xc6\x02IP\x12\x8d\x00\x00\x00\x00\x00\x00\x00\x00\xa0X?\x03\xed`\x05\x89\xa0n\xfb\x02.\x02\xea\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0*X\x00\xf4b\x9c\xf9\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd6\x8d\x02\x98?n\xb2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00D\xcc\x02\xbey\xee\x08\x00\x00\x00\x00\x00\x00\x00')
edit:
result = buf.tobytes()[:length*2:2]
print(result.decode("UTF-8"))
The code follow work as I wanted but I'm not sure It has been written correctly

What you are getting back from the Win32 API is a UTF-16 string. Each character is 16-bits, so that's why it appears as if a null byte is in between each ascii when viewed as a byte array.
This is the correct way to interpret that string:
length = win32gui.SendMessage(control, win32con.WM_GETTEXT, 255, buf)
result = buf[0:length*2]
text = result.decode("utf-16")
Your solution manages to work with a utf-8 decode because you are skipping over all the null chars. That works fine, but will generate weird results (and possibly throw an exception) as soon as unicode characters are typed typed into that edit control.

Related

Trying to send string variable via Python socket

I'm in a CTF competition and I'm stuck on a challenge where I have to retrieve a string from a socket, reverse it and get it back. The string changes too fast to do it manually. I'm able to get the string and reverse it but am failing at sending it back. I'm pretty sure I'm either trying to do something that's not possible or am just too inexperienced at Python/sockets/etc. to kung fu my way through.
Here's my code:
import socket
aliensocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
aliensocket.connect(('localhost', 10000))
aliensocket.send('GET_KEY'.encode())
key = aliensocket.recv(1024)
truncKey = str(key)[2:16]
revKey = truncKey[::-1]
print(truncKey)
print(revKey)
aliensocket.send(bytes(revKey.encode('UTF-8')))
print(aliensocket.recv(1024))
aliensocket.close()
And here is the output:
F9SIJINIK4DF7M
M7FD4KINIJIS9F
b'Server expects key to unlock or GET_KEY to retrieve the reversed key'
key is received as a byte string. The b'' wrapped around it when printed just indicates it is a byte string. It is not part of the string. .encode() turns a Unicode string into a byte string, but you can just mark a string as a byte string by prefixing with b.
Just do:
aliensocket.send(b'GET_KEY')
key = aliensocket.recv(1024)
revKey = truncKey[::-1]
print(truncKey) # or do truncKey.decode() if you don't want to see b''
print(revKey)
aliensocket.send(revKey)
data = ''
while True:
chunk = aliensocket.recv(1)
data +=chunk
if not chunk:
rev = data[::-1]
aliensocket.sendall(rev)
break

Exclude escaped byte char from serial.read_until()

I'm writing code to communicate back and forth with a module over serial which returns specific byte values to indicate the start/end of its communication. The length of the data returned can vary as can all content between the start header and end footer.
In an ideal scenario, I'd be able to use the following code to receive all data from the module:
start = b'\x5a'
end = b'\x5b'
max_size = 1024
def get_from_serial(ser: serial.Serial) -> bytes:
with ser:
_ = ser.read_until(expected=start, size=max_size)
data = ser.read_until(expected=end, size=max_size)
return start + data
Unfortunately, there are circumstances where the data sent by the module includes bytes that match either the start or end byte values. In these instances, the module prepends an escape character to them:
valid_start = b'\x5a'
valid_end = b'\x5b'
escaped_start = b'\x5c\x5a'
escaped_end = b'\x5c\x5b'
A valid start/end byte can be preceded by ANY byte value other than an escape one:
good_result = b'\x5a\xff\x5c\x5b\xff\x5b'
bad_result = b'\x5a\xff\x5c\x5b' # missed b'\xff\x5b'
Is there a way to configure ser.read_until() to ignore any escaped instance of a start/end byte and only return when encountering a valid start/end byte?
There's probably a way to do this with a loop that checks if data[-2] == b'\x5c': each time ser.read_until() returns something though I feel it could get complicated if the module returns multiple instances of an escaped start/end byte scattered throughout the data.
Any thoughts or suggestions would be greatly appreciated.
Edit:
Starting to think this isn't actually possible to do from inside ser.read_until() so have added a check before returning the data.
start = b'\x5a'
end = b'\x5b'
escape = b'\x5c'
max_size = 1024
def get_from_serial(ser: serial.Serial) -> bytes:
with ser:
_ = ser.read_until(expected=start, size=max_size)
data = ser.read_until(expected=end, size=max_size)
if valid_packet(data):
return start + data
else:
raise Exception("Invalid packet")
def valid_packet(packet: bytearray) -> bool:
header = packet[:1]
footer = packet[-1:]
escape_check = packet[-2:-1]
valid_header = header == start
valid_footer = footer == end
not_escaped = escape_check != escape
return all([
valid_header,
valid_footer,
not_escaped
])

Hex String to Image File from varbinary(max)

I have a table in a database which stores image files in varbinary(max) type. I would like to extract, convert and save the image file. Then, I used the cast as varcharmax to extract:
cast([IMG_FILE] as varchar(max))
The result of this cast looks like a hex string (I've removed part of string to protect the privacy of the person):
\
I tried to used this hex string in a online tool (https://codepen.io/abdhass/full/jdRNdj), and the image is corrected displayed (remembering that I've cutted part of string to preserve the persons privacy):
Then, I've tried to take this hex string and tried to convert to a image file using python3. I've been trying a lot of things (the majority found here), but until now, I coudn't save the correct file.
Saving directly doesn't generate the image.
with open(photo_path + 'file.jpg', 'wb') as new_jpg:
new_jpg.write(hexString)
Using binascii.unhexlify returns "Non-hexadecimal digit found"
binascii.unhexlify(hexString)
Converting to int/bin returns invalid literal for int() with base 16:
bin(int(hexString, 16))[2:]
I would like to know how to solve this problem? That is, I would like to take this hex string and save a image file in my computer.
If I have string without \x then I can convert every two chars to integer value, create bytearray and save it
text = ''
integers = []
while text:
value = int(text[:2], 16)
integers.append(value)
text = text[2:]
data = bytearray(integers)
with open('output.jpg', 'wb') as fh:
print(fh.write(data))
If I have string with \x then first \xff is treated as char's code so I have to use ord() to convert it integer.
text = '\xffd8ffe000104a46494600010100000100010000fffe003b43524541544f523a2067642d6a7065672076312e3020287573696e6720494a47204a50454720763632292c207175616c697479203d2037350affdb004300080606070605080707070909080a0c140d0c0b0b0c1912130f141d1a1f1e1d1a1c1c20242e2720222c231c1c2837292c30313434341f27393d38323c2e333432ffdb0043010909090c0b0c180d0d1832211c213232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232ffc000110801e0018003012200021101031101ffc4001f0000010501010101010100000000000000000102030405060708090a0bffc400b5100002010303020403050504040000017d01020300041105122131410613516107227114328191a1082342b1c11552d1f02433627282090a161718191a25262728292a3435363738393a434445464748494a535455565758595a636465666768696a737475767778797a838485868788898a92939495969798999aa2a3a4a5a6a7a8a9aab2b3b4b5b6b7b8b9bac2c3c4c5c6c7c8c9cad2d3d4d5d6d7d8d9dae1e2e3e4e5e6e7e8e9eaf1f2f3f4f5f6f7f8f9faffc4001f0100030101010101010101010000000000000102030405060708090a0bffc400b51100020102040403040705040400010277000102031104052131061241510761711322328108144291a1b1c109233352f0156272d10a162434e125f11718191a262728292a35363738393a434445464748494a535455565758595a636465666768696a737475767778797a82838485868788898a92939495969798999aa2a3a4a5a6a7a8a9aab2b3b4b5b6b7b8b9bac2c3c4c5c6c7c8c9cad2d3d4d5d6d7d8d9dae2e3e4e5e6e7e8e9eaf2f3f4f5f6f7f8f9faffda000c03010002110311003f00f7676c606d18c0ea334ddf8ecbff007cd39f185e7f845338f5cd4bb8c5de40fbaa3fe034d2e78e17fef914eebd3a5340cd1d01313cc3d36aff00df229371ecabff007c8a5231d2914366a6ed0d88243fdd5ffbe452ef20636a7fdf228c61b19a1ba8a1b121379c1ca271fec8a6f9a48e113fef914ee71d6902e471473581ec024207089ff7c8a42e7b2aff00df229083d05041f5a24f99d90262190a8fba9ff7c8a6f9849190bff7c8a561d39cd23039cd2d7a0085f031b53fef914d121e8427fdf229db78dd9a611cd2d52d07a5c56705c602ff00df2290ccd9c6d4ff00be4518fce9a179343040643d36affdf229a642171b53fef914bef4d61934936210c8c31f2a7fdf229a643fdd5ffbe45291c719a0a9229df5b8fd06994f4da9ff007c8a634ac38dabff007c8a52bd293193d687a8ae233f1d171fee8a42e71d17fef914e6038c1cd308c8a86f41a1bbdb00617fef914d676c636aff00df229d8f7a61269a62b06ef970557fef9151ef6033b57fef914f24e0d37195c114ef71dc6bb96c7cabff007c8a432118185ffbe4538819f6a6e016a96f501a58e7eeaffdf229a5db7745ff00be4539b03a1a615ee0d1cce3a0c6ef2011b57fef914d0c7d17fef9141fad2018f5a96f415c52e76e36affdf229be6606085ffbe452b76a8ce0e0f38a2d7455f5144879e17fef9151b16c7dd5ff00be4548704719151382070c69abf526fa15dc9cfdd5ff00be4522363b2ffdf2295f205469cd26c49dc9fcc6c636afb7ca283237a2ff00df22902e7192683c526afb957d86b139e8bff7c8a371e0e17fef9a70e4649a61f41569f98842fcf45ffbe453d5b7c4f9000c0edef4c200e94e8c651f24f4feb431791e8cc01c67fba29360fc2a43dbe94801c62ba04479c0c0a403d0d49b71de82075c54bd1e8047b493d38a0549819c83499ed498c888e71da9b8e7bd4d8079a4e33d68695f4023dbc74a4c10b8c91529eb8ce69a49c63934ec85a918000e4d379ed521507bd2600e452d2d6434bb8c2a7bd21008e49a93934d2bef495c08f0318e69a4735285c0e4d371f5a56761dc6601ee69303d4d3c820f34d2bde86b5b8911ed1d3b5260f41526063bd23151f74934f60647d38a6924f7a70cd0400696c088b83de902d48401ce6a3660bd5b152ec03700714de808aad2dec68df7c71eb5564d6add7ab8cfb5351e657173a5a17cae7b9a36918f4ace4d66d9cffac03eb436b100e932fe747237b20724b7341867a66a307b66a9c7abc0cdb44c326a65ba46e323eb4493ec11922561900534a91eb8a72c8a69c5b90012452bad8a202a0fd28381c0cd48cbe9d29a411d6a795010b2827ad0063d85388f9860d29c639a1bf20b11300718a6e0edc53c8cf20d3704f4a4af14090cdb918cd46e3039353e2a365c8a399ee162948d938a8d783562551ce0557546dd424ac277e8591961405e2841814fc6053e55d4688f0aa29ac722a4007f11e69a403d2a7975125d488e3a629ea0847c7a0fe74a149e7a53c708ff004feb54a36d4773d0db181f4a074eb4a7b7d290574589131ef46294d203c53b2189ed40e99a70028153cb740348a6ed1ef4fe28346880600077a4c11df8a7e28a16c0c8f68c505401c53f6f7cf148dd695900cda09e4d211da9fb79e4d21e33536ea806119a691934fcf3487079a76ee172365c91cd211818e952718a630cf7a997604308c0c03c5340515260014d2bc668f31a23200a85e403a9a49ee1501f9b8ae6b58f11dad846c4c9f30a23173d8994d47566bdcde2c6092c063d6b95d5fc5f05aa952ca71ef5c0f883e20e4be1be5ed835e6ba9789aeaf9d88f941f7aeba54a0a3766337397c27a36b9e3d241f29f1f8d7232f8c6e8b191676c7a66b93dcf28def2135195ddc838acf9aced1d098d1b3bc99d4ff00c26b7f21ff005ac07d6abcde30bfdf949dc7a8cd736a769c1a693939c50a524ef73a9f2f2d91d647e33d45006331c7d6ba2d1fe225ce4099d881d6bcc2aca4e513e5f94d5aa925e6651a315767d13a5f8cadaee252b2e4f7aeaec7515b98c152bcfbd7cad61ab5c5a4c1a3948f519eb5dbe87f1027b670928e3d41a9f62aa2e65a14ef17eeea7d04a463ef0a4233d49ae0b44f18c37e5433ed26bb3b6bd8e75051b7715cd38d8719296c4c579143026a45f9f9348eac0641e2a514c8f6fa7029a40031ba9f83c734854679a4eedea043823a1a69e739a94e33c1a615fad16b0fc8a9228a8d5483c1a9a5003707351e08a495d581d895464629714a99f5a083db9a6d7612b11b73d79a4cf1c715260e6908e714d30b0c1d39a545051c7b0fe74ec63922950e56407d3fad55fa01e80c3a7d293da949e9f4a4e86ba0421f4a31ef4b4829083a714734b4668630c51d28a0000d0f4013149c8a773499a5600c66931834bda8a9d980d2bcd210318079a791e949b78e4d0047b78eb49803a53f6e3bd348ed52d8c65260639a7851de98dd7da9598ba8c6031ed54e7ba11a9cb60517b7ab164ee000af35f1978a1adcb2452ede3b1ada9d273d889cf976347c4be26b7b457559b0476af14f13f8a26ba91d6397afa1aced6b5fb8bbb8649a46e7eed614a8001963b8f352deb62e8c1cfdf90af70f71018db9e725aaa84cf19a452c0100d27dd6cf5a6958d5b5bd8937793f2900d46f206ed8a473939a6d34ba99b7d828141a314c91c8477141618c629a0e0d14157d0703ed4e572ad9076fd2a3a2815d9b1a6eb773672ae18919af5ef09f8ba2b88d10cbf37a5785e48e86af699a94da75d2ca8c719e4538422fdd9688c9c1a7cd13eb0b4bd49e31b5b922ade094af29f09f8ba3bb31aac849e339af5082713a2b29e08ac2709537666b1929ec4b8ed9a4239c669e17d4d042f1827359b5a94884a1ce3148460e2a43b9bbe714c6f9693f20b6a539940622a2079c5589706a118f4fc695da0f21e9c7069eb919c522e0e3e6cd4b803a1a7b088f1914c0073cf22a5c0cf34dc00738a1ad6e0b61873b7ad2c60857ebd07f3a5dbd48a55dc11ce7b0fe7556ea80ef5bb7d29bdb14e6edf4a4ae9243b6281c503ad0681876c51db1477a3bd0213be28f6a5a08a000fa5276a5a3bd2b0076a4c714a7eb476a76b0c4f5148697bd045660371410734b9e682734b96c0467e5fa551bbb95894f356e66db19e7a5717e25d51a0b497071c75a50d7414df2ab9cd78d7c5f1d8a3471b027d735e23acf88a4d46566790e074029fe2dd5e4bbbd740f900fad73182a37706bae724a3c9131a51937cecb25bcf21dcfe350ce36b0c396146c326020fcaa55b296460a4115cd751ea77c94a5d0a59a39abeda5caa326a06b4901c053f88a6aa45eccc7d9c8af454be4b02734d319155742e576b8ca4a52a451834c9128a28a005a01a28c7340077a706229761a530b0ed4ae8a519762ee93aa4fa65da4b1390b9f987ad7bcf84bc4d1dfc119593823a1af9e30548ae87c37aecda7dea0f30aa67a55b9732b313f755cfa962963950609a918818c571fe1bd716ea2521f7715d746448809ae59e9a3634efaa0c0e7b5464362a52bef4846062b392b2d0ab3dd95258cf7aae0f241e055bb824f7aa6793823f1a6924ee4dee4a80638a93b535146060d498c1da47e349eaf41b637b734cf518a93be0d34a91c814db698ad713660511802393e83f9d29fad3907cae4fa0fe745f41eccee58703e9494a7b7d290f15d64074a3a504d1de801314bda8a0f4a0618e28a3b518a00314514502000628ed8a5c521a560128a5e31450d0c4c519e0d29e941e10e6a5ad00ccd4a4f2e1246315e39f10b5ef22d6684100b29e6bd33c4774d0c2c41e315f3af8e6fcdddcb82c0807079a984b91f31369549282382690ccecce4e7b1a96d2d1e7942004e7d7a540ff7f6818adfd1d0aa82cd914abd4718b91d14e9ddf2a5b1a5a7e86a8a3705e7a915a83478970579ab369e4ac39cfd6a413a76af0aa55ab2773b13e5d119d269e8010462b367b34f73f856ccf3ef6212a8cae429c8c0aba72982bee667d8e0f2cef8c7b5567b18d81250afd2b48b231e7a52336480bfad752a9240a16462bd826dc8e9555ecdb38515bb28557dbc54261dc8f81c8e95b46b48538dd6a60b5b32f0460d362b7dcc41ad10e76ec75c9f5a6e571854c9f5ae8f68f617b185d3b6840205098da2905ba93f2e39ab009c95dbf8d34a283d714b999a4a104d6831ad9d0659463da9de5ee5ce0f4a91d8ecfbd9a4492458b8a57614d3bbe633a48ca37351862a4107906af5c4259039618aa2c306ba212ba386b53e591e9fe00d564dca8ec79af6eb09fcd814af422be78f025c289b637506bdf7459035aa63d2b9eaa49e88c29c3951b5b7e514c2083d6a4ec29a473cd4a7a17b15a6191554801aaeca7e5354c8c9a6a4d6c0da1e83d2a6078a8d3afb54c38031d296ec345a91f6e39a42c777b548700f069b81914df611195a910028f9f41fce9a7ae29c87f76ff41fcea40ed8f403da93bd2b76fa52576884c76a53d28a3b5020c7140eb8a0d140c38a3bf147d6814083b52e78c5277a281876a4141345020c52f6c514679a0031da9929db19a79351ceb9898fa54caf60381f165c1585f7600c57cd9e2194b6a3721c900b715f4278cc3341213d81af9d3c452017854f27359a4ddac441be666282cec3daba2d2c648504f22b0202049c9c0ae97492b8c1fc2b2c53b44edc3adce8208418301c83de9aca51701b8a9229422e579f5aaf24aace49c827a578cb99b3a9a4d5c8598231c3735565bb6fbbb43525c3fcdc1aafbf6c6db860f635d5082dd9297ba364726e1146429a9d8a6e0031355a396e0e3f761aa54573212ca466b592b0e1252f7424453306627e9eb434721ced6280f6a74aac17383ed512a4b2725cfd285b5ee5db5d0a8ec1414da091dea247081976edcd5d36c77161d7bd40f6eccd8ad949131e5bd9958c9c1403f1a74414fdf19a9becf818c73eb519421f6e0d55d3d8bbab0d7556caa80bef51a02b959188fa55a11c663225e0f6a3ecb1ba125d81edc53524b7314efa14278c795f23122a8739c56c4b6fb21c2f35972aed278eb5b5295cc2bd3b2b9d1f8324db7d8248e6be85f0ecb9b7418ea2be6ff0b4c22d4d3271935f4678625530a1073c56357994fc8c15ba6e74e14e3a527be69fe664f14c391ef46b6021917208aaa54038356e56238aaac79eb4d790ada8a9c0c54aa78a621ce2a618c74a76e50d861a41d6a461f2f0298471cf152ddec3e846d9dd8a7a0da8ff0041fce838cd3971b1fe83f9d0c0ecdba0fa537da95bb7d28aec2043e94bda93bd2f7a061da8068ef4628101e4d1d28a281873d293da97a0a2810518e29334b40c00a4a7629a68105046518514a7ee9a996c079ef8ce3416f2640e86be63f1044dfda92c871b43702bea1f18dbef864e07435f3278ad366aac07af22953dac42b29ea6270f90a315ada4b3f98a8188ac78ced6cf6addd3514ca8c8c49c56588d22ceea366d3674510645209e290c0eec5bb76ab36f1798b935656061f285fcebc3752cce8e5d6e658b3129da78a73d80520655beb566e008589dea3f1aa12dca93c30fceb48b94b54529ae83bcb58895db83ed51b3739e955e4bbf9b99302a27bb4f3026e24915b2a7264a945487cd2bb0c7040a8fcc60463bd33ce0723207d0d264e460d6aa36d0b7364f23055f94f5eb9a58e48c00acb93ea6a84d2beff5aad2ddc8770dd8c7a55aa4da26d6dcb525c33ccea170a0f14d424b162df8566c770fbf96a735f1070056dec9ad110a50b6a5d2a6460ac6ac44932f2d82a2b23edac1f76734f8f5173260b10b4dd295b412ab052d4d6914b9e00c5646a518598053c62b421bc8647f2c125'
integers = []
value = ord(text[0])
integers.append(value)
text = text[1:]
while text:
value = int(text[:2], 16)
integers.append(value)
text = text[2:]
data = bytearray(integers)
with open('output.jpg', 'wb') as fh:
print(fh.write(data))
Because string in your question is incomplete so it creates incomplete image.
But with data from link it create correct JPG file.
EDIT:
It seems you have raw string and \x is treated as normal string, not part of byte \xff - and you have to remove \x at start using using text = text[2:]
text = r'\xffd8ff...'
integers = []
text = text[2:]
while text:
value = int(text[:2], 16)
integers.append(value)
text = text[2:]
data = bytearray(integers)
with open('output.jpg', 'wb') as fh:
print(fh.write(data))
EDIT:
Simpler version with standard module codecs. It still need to remove \x from string.
If you have bytes:
text = b'\\xffd8ff...' # bytes
import codecs
text = text[2:] # remove `\x`
data = codecs.decode(text, 'hex_codec')
with open('output.jpg', 'wb') as fh:
fh.write(data)
If you have string - then you have to first encode() to bytes:
text = '\\xffd8ff...' # string
import codecs
text = text.encode() # bytes
text = text[2:] # remove `\x`
data = codecs.decode(text, 'hex_codec')
with open('output-1.jpg', 'wb') as fh:
fh.write(data)

Encoding a file with ord function

I'm trying to encode a file and output the encode into a new file, but I got this error:
TypeError: ord() expected string of length 1, but int found
My code:
from sys import argv, exit
def encode(data):
encoded = ''
while data:
current = data[0]
count = 1
for i in data[1:]:
if i == current:
count += 1
else:
break
if count == 255:
break
encoded += '{}{}'.format(chr(ord(current) & 255), chr(count & 255)) #error occurs here.
data = data[count:]
return encoded
if __name__ == '__main__':
if len(argv) < 2:
print('Please specify input file!')
exit(0)
with open(argv[1], 'rb') as (f):
data = f.read()
with open(argv[1] + '.out', 'wb') as (f):
f.write(encode(data))
Additional question: How do I decode the encoded file?
You are reading bytes (open(..., 'rb')), so when you take one element of the byte string, you get a byte, ie. a number. This number already is the character code, so just leave out the ord. Alternatively, you could open the file without the b modifier (open(..., 'r')), which will return a string; I would advise to keep it as a byte string though (or you could run into encoding issues if you are parsing something non-ascii).
You will run into a similar problem saving your file: you cannot write a string into a file opened with the b modifier. Since you have characters outside the ascii range (>128), writing as a string is not a good idea, since python will try to encode your characters (eg. in UTF-8), and you will end up with completely different bytes. Therefore, the best solution probably is not to concat your data to a string in your loop (the part where you do '{}{}'.format(...), but to have a list (encoded = [], concat with encoded.append(current)) and convert that to a byte string using bytes(encoded) after your loop. You can then pass that to write without a problem.
As for how to decode your file, you can just open the file like you do for encoding, read two bytes b1 and b2, and append [b1]*b2 to your output (again, as a list), and convert that to a byte string with bytes().

Convertion between ISO-8859-2 and UTF-8 in Python

I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
Receive hex values from serial port, which are characters encoded in ISO-8859-2.
Decode them, this is - get "standard" python unicode strings from them.
Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
Your example doesn't work because you've tried to use a str to hold bytes. In Python 3 you must use byte strings.
In reality, if you're using PySerial then you'll be reading byte strings anyway, which you can convert as required:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
s = ser.read(10)
# Py3: s == bytes
# Py2.x: s == str
my_unicode_string = s.decode('iso-8859-2')
If your iso-8895-2 data is actually then encoded to ASCII hex representation of the bytes, then you have to apply an extra layer of encoding:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
hex_repr = ser.read(10)
# Py3: hex_repr == bytes
# Py2.x: hex_repr == str
# Decodes hex representation to bytes
# Eg. b"A3" = b'\xa3'
hex_decoded = codecs.decode(hex_repr, "hex")
my_unicode_string = hex_decoded.decode('iso-8859-2')
Now you can pass my_unicode_string to your favourite XML library.
Interesting sample data. Ideally your sample data should be a direct print of the raw data received from PySerial. If you actually are receiving the raw bytes as 8-digit hexadecimal values, then:
#!python3
from binascii import unhexlify
data = b''.join(unhexlify(x)[::-1] for x in b'''\
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069'''.splitlines())
print(data.decode('iso-8859-2'))
Output:
W chuj bardzo długa nazwa jakiejś zapyziałej pipidówy, brudnej ulicyumer najgorszej rudery we wsi
Google Translate of Polish to English:
The dick very long name some zapyziałej Small Town , dirty ulicyumer worst hovel in the village
This topic is closed. Working code, that handles what need to be done:
x=177
x.to_bytes(1, byteorder='big').decode("ISO-8859-2")

Categories

Resources