Python - non-English characters don't work in one case - python

Despite the fact I tried to find a solution to my problem both on english and my native-language sites I was unable to find a solution.
I'm querying an online dictionary to get translated words, however non-English characters are displayed as e.g. x86 or x84. However, if I just do print(the_same_non-english_character) the letter is displayed in a proper form. I use Python 3.3.2 and the HTML source of the site I extract the words from has charset=UTF-8 set.
Morever, if I use e.g. replace("x86", "non-english_character"), I don't get anything replaced, but replacing of normal characters works.

you need to escape with a \:
In [1]: s= "\x86"
In [2]: s.replace("\x86","non-english_character")
Out[2]: 'non-english_character'

Related

pywinauto escaping special characters

I am using type_keys() on a combobox to upload files via a file dialog. As mentioned in similar SO posts, this function omits certain special characters in the text that it actually types into the combobox. I'm resolving this by simply replacing every "specialchar" in that string with "{specialchar}". So far I've found the need to replace the following chars: + ^ % ( ).
I'm wondering where I can find the complete list of characters that require this treatment. I don't think it's this list because I'm not seeing, for example, the % symbol there. I also tried checking keyboard.py from the keyboard library but I don't know if it can be found there.
PS. I realize that instead of using type_keys(), for example, using send_keys() or set_edit_text(), the escaping of special characters might be done for me automatically. However, for various reasons, it looks like type_keys() works the best for my particular file dialog/situation.
Thanks
This is the full documentation: https://pywinauto.readthedocs.io/en/latest/code/pywinauto.keyboard.html All special characters can be wrapped by {}

Django __istartswith does't work. Python 2.7, Django 1.8

I am working with cyrillic text. There is parser written in python2.7 which saves strings from another site into database, which are saved as unicode to database [u'\u041a\u043e\u043d\u0446\u0435\u0440\u0442\u044b']
type:
<type 'lxml.etree._ElementUnicodeResult'>
In templates it's shown as normal text (russian this case), but search doesn't work. In this case using django filters(any)(and the relevant data is present)do not return by the normal key? Is it correct behaviour? What is the right solution?
Tired cyrillic letter, and the must be results by this search. The type of string is unicode, but find nothing among many result starting with the following letter, so something is wrong, but I can't understand why. And at the same time english search keys give relevant search result to english words, while russian do not.
var = u'б'
print(type(var))
result = Event.objects.filter(title__istartswith= var)
EDIT: actually simple exact filter works , but anything complex doesn't
EDIT: and title__contains also works. so I'll better rename the questions not to get beaten. but behaviour is pretty strange, sometimes it doesn't show anything thought exact same title is the key. why? and still in face I need start with functionality which doesn't work

Python3 src encodings of Emojis

I'd like to print emojis from python(3) src
I'm working on a project that analyzes Facebook Message histories and in the raw htm data file downloaded I find a lot of emojis are displayed as boxes with question marks, as happens when the value can't be displayed. If I copy paste these symbols into terminal as strings, I get values such as \U000fe328. This is also the output I'm getting when I run the htm files through BeautifulSoup and output the data.
I Googled this string (and others), and consistently one of the only sites that comes up with them is iemoji.com, in the case of the above string this page, that lists the string as a Python Src. I want to be able to print out these strings as their corresponding emojis (after all, they were originallly emojis when being messaged), and after looking around I found a mapping of src encodings at this page, that mapped the above like strings to emoji string names. I then found this emoji string names to Unicode list, that for the most part seems to map the emoji names to Unicode. If I try printing out these values, I get good output. Like following
>>> print(u'\U0001F624')
😤
Is there a way to map these "Python src" encodings to their unicode values? Chaining both libraries would work if not for the fact that the original src mapping is missing around 50% of the unicode values found in the unicode library. And if I do end up having to do that, is there a good way to find the Python Src value of a given emoji? From my testing emoji as strings equal their Unicode, such as '😤' == u'\U0001F624', but I can't seem to get any sort of relations to \U000fe328
This has nothing to do with Python. An escape like \U000fe328 just contains the hexadecimal representation of a code point, so this one is U+0FE328 (which is a private use character).
These days a lot of emoji are assigned to code points, eg. 😤 is U+01F624 — FACE WITH LOOK OF TRIUMPH.
Before these were assigned, various programs used various code points in the private use ranges to represent emoji. Facebook apparently used the private use character U+0FE328. The mapping from these code points to the standard code points is arbitrary. Some of them may not have a standard equivalent at all.
So what you have to look for is a table which tells you which of these old assignments correspond to which standard code point.
There's php-emoji on GitHub which appears to contain these mappings. But note that this is PHP code, and the characters are represented as UTF-8 (eg. the character above would be "\xf3\xbe\x8c\xa8").

How to get the length of a unicode string? [duplicate]

given a character like "✮" (\xe2\x9c\xae), for example, can be others like "Σ", "д" or "Λ") I want to find the "actual" length that character takes when printed onscreen
for example
len("✮")
len("\xe2\x9c\xae")
both return 3, but it should be 1
You may try like this:
unicodedata.normalize('NFC', u'✮')
len(u"✮")
UTF-8 is an unicode encoding which uses more than one byte for special characters. Check unicodedata.normalize()
My answer to a similar question:
You are looking for the rendering width from the current output context. For graphical UIs, there is usually a method to directly query this information; for text environments, all you can do is guess what a conformant rendering engine would probably do, and hope that the actual engine matches your expectations.

Problems writing a regex in testcases.xml of pylot

I have to verify a list of strings to be present in a response to a soap request. I am using pylot testing tool. I know that if I use a string inside <verify>abcd</verify>element it works fine. I have to use regex though and I seem to face problems with the same since I am not good with regex.
I have to verify if <TestName>Abcd Hijk</TestName> is present in my response for the request sent.
Following is my attempt to write the regex inside testcases.xml
<verify>[.TestName.][\w][./TestName.]</verify>
Is this the correct way to write a regex in testcases.xml file? I want to exactly verify the tagnames and its values mentioned above.
When I run the tool, it gives me no errors. But If I change the the characters to <verify>[.TesttttName.][\w][./TestttttName.]</verify> and run the tool, it still run without giving errors. While this should be a failed run since no tags like the one mentioned is present in the response!
Could someone please tell me what I am doing wrong in the regex here?
Any help would be appreciated. Thanks!
The regex used should be like the following.
<verify>&lt;TestName&gt;[\w\s]+&lt;/TestName&gt;</verify>
The reason being, Pylot has the response content in the form of a text i.e, [the above part in the response would be like the following]
.......<TestName>ABCd Hijk</TestName>.....
What Pylot does is, when it parses element in the Testcases.xml, it takes the value of the element in TEXT format. Then it searches for the 'verify text' in the response which it got from the request.
Hence whenever we would want to verify anything in Pylot using regex we need to put the regex in the above format so that it gives the required results.
Note: One has to be careful of the response format received. To view the response got from the request, Enable the Log Messages on the tool or if you want to view the response on the console, edit the tools engine.py module and insert print statements.
The raw regular expression (no XML escape). I assume you want to accept English alphabet a-zA-Z, digits 0-9, underscore _ and space characters (space, new line, carriage return, and a few others - check documentation for details).
<TestName>[\w\s]+</TestName>
You need to escape the < and > to specify inside <verify> tag:
<TestName>[\w\s]+</TestName>

Categories

Resources