Once again, UnicodeEncodeError (ascii codec can't encode) - python

I'm running python 3.6 + gunicorn + django 2.0.5 in docker container with some cyrillic project and that's what I see when I try to log cyrillic strings in console with Django.
'ascii' codec can't encode character '\u0410' in position 0: ordinal not in range(128)
Also this what happens in shell
Python 3.6.5 (default, May 3 2018, 10:08:28)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> :�ириллица
The same time, when i'm running python 3.5 outside docker container, everything is ok:
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> Кириллица
Any ideas how to make python 3.6 inside docker work ok with cyrillic strings?

Use # -*- coding: utf-8 -*- in the first line of your python code.
And in your Dockerfile add:
ENV PYTHONIOENCODING=utf-8

Related

bytearray to string python2 to 3

I need to get bytearray as string on python3.
on python 2.7 , str(bytearray) results the contents of bytearray in string format.
Python 2.7.18 (default, Feb 8 2022, 09:11:29)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> c = bytearray(b'\x80\x04\x95h\x00\x00')
>>> str(c)
'\x80\x04\x95h\x00\x00'
>>>
on python 3.6, even the "bytearray" keyword is added into the resulting string.
Python 3.6.8 (default, Aug 12 2021, 07:06:15)
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> c = bytearray(b'\x80\x04\x95h\x00\x00')
>>> str(c)
"bytearray(b'\\x80\\x04\\x95h\\x00\\x00')"
>>>
why is it happening so on 3.6 ?
how to get the exact same behavior on 3.6 as that of 2.7 ?
Note: I cannot do c.decode(), as those are compressed/pickled data which will result in invalid start byte errors.
Any suggestions please.
The __str__ method for the bytearray class is different in Python 3, to obtain a similar result you could try below snippet.
>>> str(bytes(c))
"b'\\x80\\x04\\x95h\\x00\\x00'"

disable python greeting/version info

When I start a python interactive session from the command line I am greeted by :
Python 3.9.6 (default, Jun 30 2021, 10:22:16)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Is there a way of disabling that message so that I go immediately to the >>> prompt?
Yes, there is a way to do so.
Type in the cmd:
python -q
instead of
python
and this should do the trick.

pwntools' p32 function is weird

I'm testing on Intel x86_64, Ubuntu 64bit, Python3, Pwntools v4.3.1
$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pwn import *
>>> addr = 0xbffffb78
>>> print(p32(addr))
b'x\xfb\xff\xbf'
In my opinion, the correct packing result for 0xbffffb78 should be\x78\xfb\xff\xbf.
But why did b'x\xfb\xff\xbf' happen?
where is \x78 ?
And what is the correct way of packing, not using p32()?
This is just how Python renders bytes objects. If a byte can be rendered as an ASCII character, it is displayed as one.
>>> b"\x78"
b'x'
To see the bytes rendered as hex you can use the hex method of the bytes object:
>>> b'x\xfb\xff\xbf'.hex()
'78fbffbf'

How to surpress python's start-up information?

When I type "python" and return in shell, the following lines will come out:
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
How to surpress these lines please?
An easy way is to call Python as python -i -c "". This will also disable any start-up scripts, though. If you have a start-up script, you can also use python -i ~/.pythonrc.py (or however that script is named).

Make Emacs use UTF-8 with Python Interactive Mode

When I start Python from Mac OS' Terminal.app, python recognises the encoding as UTF-8:
$ python3.0
Python 3.0.1 (r301:69556, May 18 2009, 16:44:01)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
This works the same for python2.5.
But inside Emacs, the encoding is US-ASCII.
Python 3.0.1 (r301:69556, May 18 2009, 16:44:01)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'US-ASCII'
How do I make Emacs communicate with Python so that sys.stdout knows to use UTF-8?
Edit: Since I don't have the rep to edit the accepted answer, here is precisely what worked for me on Aquaemacs 1.6, Mac OS 10.5.6.
In the python-mode-hook, I added the line
(setenv "LANG" "en_GB.UTF-8")
Apparently, Mac OS requires "UTF-8", while dfa says that Ubuntu requires "UTF8".
Additionally, I had to set the input/output encoding by doing C-x RET p and then typing "utf-8" twice. I should probably find out how to set this permanently.
Thanks to dfa and Jouni for collectively helping me find the answer.
Here is my final python-mode-hook:
(add-hook 'python-mode-hook
(lambda ()
(set (make-variable-buffer-local 'beginning-of-defun-function)
'py-beginning-of-def-or-class)
(define-key py-mode-map "\C-c\C-z" 'py-shell)
(setq outline-regexp "def\\|class ")
(setenv "LANG" "en_GB.UTF-8"))) ; <-- *this* line is new
check your environment variables:
$ LANG="en_US.UTF8" python -c "import sys; print sys.stdout.encoding"
UTF-8
$ LANG="en_US" python -c "import sys; print sys.stdout.encoding"
ANSI_X3.4-1968
in your python hook, try:
(setenv "LANG" "en_US.UTF8")

Categories

Resources