proper replacement of QString().arg method in python3 - python

When it comes to internationalization - using python2 and PyQt4 - the "proposed way" to format a translated string is using the QString.arg() method:
from PyQt4.QtGui import QDialog
#somewhere in a QDialog:
self.tr("string %1 %2").arg(arg1).arg(arg2)
But QString() doesn't exist in python3-PyQt4.
So my question is, what is the best way to format any translated strings in python3? Should I use the standard python method str.format() or maybe there is something more suitable?

The QString::arg method is really there as a workaround for C++'s limited string formatting support, to make sure you don't use sprintf with all the problems that entails (not handling placeholders that are in different orders in different languages, buffer overruns, etc.). Because Python doesn't have any such problems, there's no good reason to use it.
In fact, there's very little reason to ever use QString explicitly. In PyQt4, it wasn't phased out completely, but by PyQt5, it was. (Technically, PyQt4 supports "string API v2" in both Python 2.x and 3.x, but only enables it by default in 3.x; PyQt4 enables v2 by default in both, and hides the ability to switch back to v1.) See Python Strings, Qt Strings and Unicode in the documentation for some added info.
There is one uncommon, but major if it affects you, exception: If you're writing an app that's partly in Qt/C++ and partly in PyQt, you're going to have problems sharing I18N data when some of them are in "string %1 %2" format and others are in "string {1} {2}" format. (The first time you ship one of your file out to an outsourced translation company, they're going to get it wrong, guaranteed.)

Yes, just use standard python string formatting.
QString is gone because it's pretty much interchangable with python's unicode strings (which are str in python3 and unicode in python2), so PyQt takes care of converting one into the other as needed.
QString being disabled isn't limited to python3, it's just the default there. You can get the same on python2 by doing this befor importing anything from PyQt4:
import sip
sip.setapi('QString', 2)

Related

Should I include this boilerplate code in every Python script I write?

My boss asked me to put the following lines (from this answer) into a Python 3 script I wrote:
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
He says it's to prevent UnicodeEncodeErrors when printing Unicode characters in non-UTF8 locales. I am wondering whether this is really necessary, and why Python wouldn't handle encoding/decoding correctly without boilerplate code.
What is the most Pythonic way to make Python scripts compatible with different operating system locales? And what does this boilerplate code do exactly?
The answer provided here has a good excerpt from the Python mailing list regarding your question. I guess it is not necessary to do this.
The only supported default encodings in Python are:
Python 2.x: ASCII
Python 3.x: UTF-8
If you change these, you are on your own and strange things will start
to happen. The default encoding does not only affect the translation
between Python and the outside world, but also all internal
conversions between 8-bit strings and Unicode.
Hacks like what's happening in the pango module (setting the default
encoding to 'utf-8' by reloading the site module in order to get the
sys.setdefaultencoding() API back) are just downright wrong and will
cause serious problems since Unicode objects cache their default
encoded representation.
Please don't enable the use of a locale based default encoding.
If all you want to achieve is getting the encodings of stdout and
stdin correctly setup for pipes, you should instead change the
.encoding attribute of those (only).
--
Marc-Andre Lemburg
eGenix.com

Alternatives to Qt.escape in PySide?

I am porting some code from PyQt to PySide, which includes a home-grown XML exporter. The code is peppered with lines like:
Qt.escape(textNote)
This is new to me. My PyQt book (Summerfield, 2008) writes:
The Qt.escape() function takes a QString and returns it with any XML
metacharacters properly escaped. And we...convert any
paragraph and line breaks in the notes to their Unicode equivalents.
But unfortunately for my goal of creating XML from text, escape seems to no longer be in use.
This issue is discussed at two sources I found:
http://qt-project.org/wiki/Transition_from_Qt_4.x_to_Qt5#f166611e9788f9dbdff69088d622663e
http://www.kdab.com/automated-porting-from-qt-4-to-qt-5/
Unfortunately, they both suggest to use QString.toHtmlEscaped() but this method seems to not exist in PySide (indeed, QString is not part of PySide's lexicon).
Finally, as of four years ago, it seemed escape is not something that they intended to support in PySide, as discussed at a bug report:
After a discussion with other PySide developers, we decided not export
this function, the reasons are:
This function is part of QtGui, if we create a QtGui.Qt, this will cause some headaches with QtCore.Qt.
PyQt4 also didn't export this function.
There are functions in python std lib that you can use to achieve the same goals, like xml.sax.saxutils.escape().
So, I'll mark this bug as WONTFIX.
This seems to answer my question, but it is four years old, and I am curious if it still holds. That is, is there no PySide escape functionality, so is the best option to go to saxutils? Or perhaps is there some workaround akin to toHtmlEscaped in PySide that I've overlooked?
Every time you would have used
Qt.escape(yourText)
you can get the exact same functionality with
from xml.sax.saxutils import escape as escape
escape(yourText)
It's a little less elegant, but it works. The PySide developers have remained consistent with their initial reaction to a question about this four years ago.

Why does python logging module use old string formatting?

Python has two main string formatting options % and str.format. logging module has a lazy feature.
logging.debug('The value is %s', huge_arg)
This does not construct the string if the log line is not going to be print. However, this feature works only if the sting uses old style % format. Is there a way to use str.format with this lazy feature? There could be a named arg like:
logging.debug('The value is {}', fmt_arg=(huge_arg))
The only answer to your question is that - so far - nobody has volunteered to change the logging code to support the newer format feature. If you're volunteering, why not ask about it on the Python-Dev mailing list? I expect it's trickier than you realize (e.g., there may be calls to logging functions already that happen to pass a fmt_arg keyword argument). Good luck ;-)
I'm a little scared to contradict the TimBot, but I have another answer :-)
The logging module uses % formatting because it pre-dates the appearance of {}-formatting in Python (logging appeared in 2.3, str.format in 2.6). Logging has not been converted over to {}-formatting because:
You can't just switch over without breaking a lot of existing code in third party libraries and applications, so %-formatting is here to stay.
When {}-formatting arrived, it was a bit slower that %-formatting (AFAIK, it still is, though it offers more control over output), and people regard logging as an overhead as it is, never mind adding to that overhead ;-)
There is already support for {}-formatting and even $-style formatting (string.Template), as described in this post from 2010. The approach described supports logging's lazy formatting.

Setting and getting "data" from PyQt widget items?

This is not so much a question as it is a request for an explanation. I'm following Mark Summerfield's "Rapid GUI Programming with Python and Qt", and I must've missed something because I cannot make sense of the following mechanism to link together a real "instance_item" which I am using and is full of various types of data, and a "widget_item" which represents it in a QTreeWidget model for convenience.
Setting:
widget_item.setData(0, Qt.UserRole, QVariant(long(id(instance_item))))
Getting
widget_item.data(0, Qt.UserRole).toLongLong()[0]
Stuff like toLongLong() doesn't seem "Pythonic" at all, and why are we invoking Qt.UserRole and QVariant? are the "setData" and "data" functions part of the Qt framework or is it a more general Python command?
There are at least 2 better solutions. In order of increasing pythonicity:
1) You don't need quite so much data type packing
widget_item.setData(0, Qt.UserRole, QVariant(instance_item))
widget_item.data(0, Qt.UserRole).toPyObject()
2) There is an alternate API to PyQt4 where QVariant is done away with, and the conversion to-from QVariant happens transparently. To enable it, you need to add the following lines before any PyQt4 import statements:
import sip
sip.setapi('QVariant', 2)
Then, your code looks like this:
widget_item.setData(0, Qt.UserRole, instance_item)
widget_item.data(0, Qt.UserRole) # original python object
Note that there is also an option sip.setapi('QString', 2) where QString is done away with, and you can use unicode instead.
All of these methods -- setData(), data(), toLongLong() are all part of Qt and were originally intended to be used in C++, where they make a lot more sense. I'm not really sure what the author is trying to do here, but if you find yourself doing something terribly un-pythonic, there is probably a better way:
## The setter:
widget_item.instance_item = instance_item
## The getter:
instance_item = widget_item.instance_item
The Qt docs can't recommend this, of course, because there are no dynamic attribute assignments in C++. There are a few very specific instances when you may have to deal with QVariant and other such nonsense (for example, when dealing with databases via QtSQL), but they are quite rare.

Should I use Unicode string by default?

Is it considered as a good practice to pick Unicode string over regular string when coding in Python? I mainly work on the Windows platform, where most of the string types are Unicode these days (i.e. .NET String, '_UNICODE' turned on by default on a new c++ project, etc ). Therefore, I tend to think that the case where non-Unicode string objects are used is a sort of rare case. Anyway, I'm curious about what Python practitioners do in real-world projects.
From my practice -- use unicode.
At beginning of one project we used usuall strings, however our project was growing, we were implementing new features and using new third-party libraries. In that mess with non-unicode/unicode string some functions started failing. We started spending time localizing this problems and fixing them. However, some third-party modules doesn't supported unicode and started failing after we switched to it (but this is rather exclusion than a rule).
Also I have some experience when we needed to rewrite some third party modules(e.g. SendKeys) cause they were not supporting unicode. If it was done in unicode from beginning it will be better :)
So I think today we should use unicode.
P.S. All that mess upwards is only my hamble opinion :)
As you ask this question, I suppose you are using Python 2.x.
Python 3.0 changed quite a lot in string representation, and all text now is unicode.
I would go for unicode in any new project - in a way compatible with the switch to Python 3.0 (see details).
Yes, use unicode.
Some hints:
When doing input output in any sort of binary format, decode directly after reading and encode directly before writing, so that you never need to mix strings and unicode. Because mixing that tends to lead to UnicodeEncodeDecodeErrors sooner or later.
[Forget about this one, my explanations just made it even more confusing. It's only an issue when porting to Python 3, you can care about it then.]
Common Python newbie errors with Unicode (not saying you are a newbie, but this may be read by newbies): Don't confuse encode and decode. Remember, UTF-8 is an ENcoding, so you ENcode Unicode to UTF-8 and DEcode from it.
Do not fall into the temptation of setting the default encoding in Python (by setdefaultencoding in sitecustomize.py or similar) to whatever you use most. That is just going to give you problems if you reinstall or move to another computer or suddenly need to use another encoding. Be explicit.
Remember, not all of Python 2s standard library accepts unicode. If you feed a method unicode and it doesn't work, but it should, try feeding it ascii and see. Examples: urllib.urlopen(), which fails with unhelpful errors if you give it a unicode object instead of a string.
Hm. That's all I can think of now!
It can be tricky to consistently use unicode strings in Python 2.x - be it because somebody inadvertently uses the more natural str(blah) where they meant unicode(blah), forgetting the u prefix on string literals, third-party module incompatibilities - whatever. So in Python 2.x, use unicode only if you have to, and are prepared to provide good unit test coverage.
If you have the option of using Python 3.x however, you don't need to care - strings will be unicode with no extra effort.
Additional to Mihails comment I would say: Use Unicode, since it is the future. In Python 3.0, Non-Unicode will be gone, and as much I know, all the "U"-Prefixes will make trouble, since they are also gone.
If you are dealing with severely constrained memory or disk space, use ASCII strings. In this case, you should additionally write your software in C or something even more compact :)

Categories

Resources