Accentuation in python: structure and for loop

Accentuation in python: structure and for loop - python

I've got a set filled with value that are present in a JSON, when I print my set I got the following output:
set(['Path\xc3\xa9', 'Synergy Cin\xc3\xa9ma'])
but if I print each element by using a for loop I've got the following output:
Pathé
Synergy Cinéma
Why I don't got the same encoding for each words?

I guess you are using python 2 and it might be related to the default encoding behavior. The value stocked in your set is the "encoded" value and when you use print (which is based on the underlying __repr__ and/or __str__ methods of the object) you get the decoded/formated output (according to the default system encoding).
You can obtain information about the default encoding used with the function sys.getdefaultencoding()
Note that in python 3, encoding is utf-8 by default (ie. by default "any string created (...) is stored as Unicode", according to the documentation) and you wont have the exact same behavior (you can see in the python 2 snippet that the hashed values, as python sets are based on them, are the same if your input string is encoded or not) :
Python 2 :
>>> a = b'Path\xc3\xa9'
>>> a
'Path\xc3\xa9'
>>> print(a)
Pathé
>>> sys.getdefaultencoding()
'ascii'
>>> hash('Pathé')
8776754739882320435
>>> hash(b'Path\xc3\xa9')
8776754739882320435
Python 3:
>>> a = b'Path\xc3\xa9'
>>> a
b'Path\xc3\xa9'
>>> print(a)
b'Path\xc3\xa9'
>>> print(a.decode())
Pathé
>>> sys.getdefaultencoding()
'utf-8'
>>> hash("Pathé")
1530394699459763000
>>> hash(b"Path\xc3\xa9")
1621747577200686773

Related

Print all numbers as hex format in Python command line [duplicate]

I'm doing a bunch of work in the Python console, and most of it is referring to addresses, which I'd prefer to see in hex.
So if a = 0xBADF00D, when I simply enter Python> a into the console to view its value, I'd prefer python to reply with 0xBADF00D instead of 195948557.
I know I can enter '0x%X' % a to see it in hex, but I'm looking for some sort of python console option to have it do this automatically. Does something liket this exist? Thanks!

The regular Python interpreter will call sys.displayhook to do the actual displaying of expressions you enter. You can replace it with something that displays exactly what you want, but you have to keep in mind that it is called for all expressions the interactive interpreter wants to display:
>>> import sys
>>> 1
1
>>> "1"
'1'
>>> def display_as_hex(item):
... if isinstance(item, (int, long)):
... print hex(item)
... else:
... print repr(item)
...
>>> sys.displayhook = display_as_hex
>>> 1
0x1
>>> "1"
'1'
I suspect you'll quickly get tired of seeing all integers as hex, though, and switch to explicitly converting the ones you want to see as hex accordingly.

Building on previous answers, here's a version that works for Python 2/3, doesn't display bools as hex, and also properly sets the _ variable:
import sys
def _displayhook(o):
if type(o).__name__ in ('int', 'long'):
print(hex(o))
__builtins__._ = o
else:
sys.__displayhook__(o)
def hexon():
sys.displayhook = _displayhook
def hexoff():
sys.displayhook=sys.__displayhook__

Something like this, perhaps?
class HexInt(int):
"Same as int, but __repr__() uses hex"
def __repr__(self):
return hex(self)
So you'd use that when creating all your integers that you want to be shown as hex values.
Example:
>>> a = HexInt(12345)
>>> b = HexInt(54321)
>>> a
0x3039
>>> b
0xd431
>>> c = HexInt(a + b)
>>> c
0x1046a
Note that if you wanted to skip the explicit creation of a new HexInt when doing arithmetic operations, you'd have to override the existing int versions of methods such as __add__(), __sub__(), etc., such that they'd return HexInts.

Modifying the top python2 answer for python3...
def display_as_hex(item):
if isinstance(item, int):
print(hex(item))
else:
print(repr(item))
import sys
sys.displayhook = display_as_hex

You could so something like this:
while True:
print hex(input('> '))
To get a basic prompt that prints the hex value of all of the results. You could even make it conditional -- check to see if the return type of input is a string or number, and if it is, print the hex value, else print the value normally.

How can you get char values from non ascii character

Hello so I have a string in python type POINTER(wintypes.BYTE) I am using DATA_BLOB in python(
class CREATE_DATA_BLOB(Structure):
_fields_ = [('cbData', wintypes.DWORD), ('pbData', POINTER(wintypes.BYTE))]
) I have a DLL that encrypts the data. after it encrypts the data, the data is saved inside the pbData of the data_blob structure. The problem is the values inside the pbData(pbData[0]) for example has -42 in it, another example is that some of them are between 0 to 255 - they are good but some are completely random numbers and I can't figure out how to turn these non-ASCII number to a character. In c++ I use writeFile function and I just send pbData and everything works great in python is not the case I have this error if I am trying to write pbData to a text file:
file.write(data_out.pbData)
TypeError: write() argument must be str, not LP_c_byte
I really don't know how to fix this problem.

Listing [Python 3.Docs]: ctypes - A foreign function library for Python.
There are several problems:
wintypes.BYTE is signed ([Python.Bugs]: wrong type for wintypes.BYTE)
file.write works with Python strings (in your case) not ctypes pointers (and there's no implicit conversion between them)
Going further (this would appear after solving the other 2): you have "special" chars in your buffer. That means that you shouldn't treat is as a "normal string", but as a binary sequence (otherwise you may get encode / decode errors). As a consequence, open the file where you want to dump its contents to, in binary mode (e.g.: file = open(file_name, "wb")).
>>> import ctypes as ct
>>> from ctypes import wintypes as wt
>>>
>>> class CREATE_DATA_BLOB(ct.Structure):
... _fields_ = [
... ("cbData", wt.DWORD),
... ("pbData", ct.POINTER(ct.c_ubyte)), # wt.BYTE is signed !!!
... ]
...
>>>
>>> buf_initial = b"AB\xD6CD\xD9EF\x9CGH" # Contains the 3 chars you mentioned
>>> buf_initial
b'AB\xd6CD\xd9EF\x9cGH'
>>> # Populate the structure as it was done from C++
...
>>> blob = CREATE_DATA_BLOB(len(buf_initial), ct.cast(ct.create_string_buffer(buf_initial), ct.POINTER(ct.c_ubyte)))
>>> blob.cbData, blob.pbData
(11, <__main__.LP_c_ubyte object at 0x00000154FF6998C8>)
>>>
>>> buf_final = bytes(blob.pbData[:blob.cbData]) # Convert the pointer explicitly to Python bytes
>>> buf_final
b'AB\xd6CD\xd9EF\x9cGH'
>>> buf_initial == buf_final
True
>>>
>>> with open("q058436070_out.bin", "wb") as file:
... file.write(buf_final)
...
11

Non-ASCII Python identifiers and reflectivity [duplicate]

This question already has answers here:
Identifier normalization: Why is the micro sign converted into the Greek letter mu?
(2 answers)
Closed 5 years ago.
I have learnt from PEP 3131 that non-ASCII identifiers were supported in Python, though it's not considered best practice.
However, I get this strange behaviour, where my 𝜏 identifier (U+1D70F) seems to be automatically converted to τ (U+03C4).
class Base(object):
def __init__(self):
self.𝜏 = 5 # defined with U+1D70F
a = Base()
print(a.𝜏) # 5 # (U+1D70F)
print(a.τ) # 5 as well # (U+03C4) ? another way to access it?
d = a.__dict__ # {'τ': 5} # (U+03C4) ? seems converted
print(d['τ']) # 5 # (U+03C4) ? consistent with the conversion
print(d['𝜏']) # KeyError: '𝜏' # (U+1D70F) ?! unexpected!
Is that expected behaviour? Why does this silent conversion occur? Does it have anything to see with NFKC normalization? I thought this was only for canonically ordering Unicode character sequences...

Per the documentation on identifiers:
All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.
You can see that U+03C4 is the appropriate result using unicodedata:
>>> import unicodedata
>>> unicodedata.normalize('NFKC', '𝜏')
'τ'
However, this conversion doesn't apply to string literals, like the one you're using as a dictionary key, hence it's looking for the unconverted character in a dictionary that only contains the converted character.
self.𝜏 = 5 # implicitly converted to "self.τ = 5"
a.𝜏 # implicitly converted to "a.τ"
d['𝜏'] # not converted
You can see similar problems with e.g. string literals used with getattr:
>>> getattr(a, '𝜏')
Traceback (most recent call last):
File "python", line 1, in <module>
AttributeError: 'Base' object has no attribute '𝜏'
>>> getattr(a, unicodedata.normalize('NFKD', '𝜏'))
5

Python reading proccess memory with ctypes

I am attempting to read my players health. I have been on a roll but have run into a problem. I am able to read what type of information is at a certain address but can't read what the actual value is, for example here is the response I receive.
<ctypes.c_char_Array_64 object at 0x0000000002EBF9C8>
I am looking for what information is held in the c_char_Array_64 object but have no idea how I would go about it.
Here is my code:
class User:
ctypes.wintypes.DWORD = "Entity"
ctypes.wintypes.c_int = "Team"
ctypes.wintypes.c_int = "Health"
ctypes.wintypes.c_int = "Player"
def getSelfInfo(self):
adr1 = clientdll + dw_LocalPlayer
adr2 = ctypes.create_string_buffer(64)
bytes_read = ctypes.c_size_t()
(rPM(PROCESS.handle, adr1, adr2, sys.getsizeof(ctypes.wintypes.DWORD), ctypes.byref(bytes_read)))
print adr2
t = User()
t.getSelfInfo()

You need to get the value:
print(ar2.value)
From the docs:
If you need mutable memory blocks, ctypes has a create_string_buffer()
function which creates these in various ways. The current memory block
contents can be accessed (or changed) with the raw property; if you
want to access it as NUL terminated string, use the value property:
>>> from ctypes import *
>>> p = create_string_buffer(3) # create a 3 byte buffer, initialized to NUL bytes
>>> print sizeof(p), repr(p.raw)
3 '\x00\x00\x00'
>>> p = create_string_buffer("Hello") # create a buffer containing a NUL terminated string
>>> print sizeof(p), repr(p.raw)
6 'Hello\x00'
>>> print repr(p.value)
'Hello'
>>> p = create_string_buffer("Hello", 10) # create a 10 byte buffer
>>> print sizeof(p), repr(p.raw)
10 'Hello\x00\x00\x00\x00\x00'
>>> p.value = "Hi"
>>> print sizeof(p), repr(p.raw)
10 'Hi\x00lo\x00\x00\x00\x00\x00'
>>>

The empty slice of most ctypes array types will return the Python equivalent type. So to convert your 64 byte buffer to a str (in Py3 bytes), you can do:
print ar2[:]
That will read the full raw 64 bytes mind you. If you want to read it as a C-style string (so the first NUL byte terminates the Python equivalent str), you'd use .value:
print ar2.value

Writing a pickle.dumps output to a file

I have the following code:
some_dict = {'a':0, 'b':1}
line = "some_dict_b = %s\n" % pickle.dumps(some_dict,2)
exec(line)
decoded_dict = pickle.loads(some_dict_b)
decoded_dict == some_dict
In python 3 this code prints True. In python 2 (2.7.8) I get an error in the exec line. I know dumps returns str in 2.7 while it returns a byte-stream in 3.
I am writing a program that parses data from an input file then creates certain memory objects and should write out a python script that uses these objects. I write these objects in the script file using pickle.dumps() and inserting it into a variable declaration line as per the idea sketched above. But I need to be able to run this code in python 2.
I did notice that in python 3 the line variable gets each backslash properly escaped and a type:
>>> line
"some_dict_b = b'\\x80\\x02...
while in python 2 I get:
>>> line
'some_dict_b = \x80\x02...

The Python 3 bytes type doesn't have a string represention, so when converted to a string with %s, the object representation is used instead. If you wanted to produce Python-compatible syntax from objects, you can use the %r formatter instead, to just use the representation directly.
In Python 2:
>>> import pickle
>>> some_dict = {'a':0, 'b':1}
>>> p = pickle.dumps(some_dict, 2)
>>> print 'string: %s\nrepresentation: %r' % (p, p)
string: ?}q(UaqKUbqKu.
representation: '\x80\x02}q\x00(U\x01aq\x01K\x00U\x01bq\x02K\x01u.'
In Python 3:
>>> import pickle
>>> some_dict = {'a':0, 'b':1}
>>> p = pickle.dumps(some_dict, 2)
>>> print('string: %s\nrepresentation: %r' % (p, p))
string: b'\x80\x02}q\x00(X\x01\x00\x00\x00bq\x01K\x01X\x01\x00\x00\x00aq\x02K\x00u.'
representation: b'\x80\x02}q\x00(X\x01\x00\x00\x00bq\x01K\x01X\x01\x00\x00\x00aq\x02K\x00u.'
Object representations (the output of the repr() function, which uses the object.__repr__ special method) generally will attempt to provide you with a representation that can be pasted back into a Python script or interactive prompt to recreate the same value.
From the documentation for repr():
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.
None of this is specific to pickle, really.

Whenever you think "I use exec", think again. You don't. Instead of evaluating data like this, store the contents of the data inside a dict itself.
Then, assign the data explicit to the variable.
some_dict = {'a':0, 'b':1}
line = pickle.dumps(some_dict)
decoded_dict = pickle.loads(line)
decoded_dict == some_dict

You can call repr on the string or bytes object before inserting them into the line.
# Python 2
>>> 'some_dict = %s' % repr(pickle.dumps(d))
'some_dict = "(dp0\\nS\'a\'\\np1\\nI12\\nsS\'b\'\\np2\\nI24\\ns."'
# Python 3
>>> 'some_dict = %s' % repr(pickle.dumps(d))
"some_dict = b'\\x80\\x03}q\\x00(X\\x01\\x00\\x00\\x00bq\\x01K\\x18X\\x01\\x00\\x00\\x00aq\\x02K\\x0cu.'"
Or use the format method, using !r to automatically call repr:
>>> 'some_dict = {!r}'.format(pickle.dumps(d))
"some_dict = b'\\x80\\x03}q\\x00(X\\x01\\x00\\x00\\x00bq\\x01K\\x18X\\x01\\x00\\x00\\x00aq\\x02K\\x0cu.'"
(Also works in python 2)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Accentuation in python: structure and for loop - python

Related

Print all numbers as hex format in Python command line [duplicate]

How can you get char values from non ascii character

Non-ASCII Python identifiers and reflectivity [duplicate]

Python reading proccess memory with ctypes

Writing a pickle.dumps output to a file

Categories

Resources