Alphabetic characters issue - python

I have to use Turkish characters, when I write "chcp" in cmd, I see 857.
So I tried to start my programs with:
# -*- coding: cp857 -*-
but nothing changed.Still I cant see Turkish characters like "ş,İ,Ş,Ğ" etc.
So I tried to start my programs with:
# -*- coding: cp1254 -*-
with this,I can see the Turkish characters,BUT when my program needs data from user, I cant see the Turkish characters again so my program is nothing. its like:
name=raw_input("Please enter your name: ")
print name
--------
Please enter your name: Ayşe
A*/8e
so,if I have to find user's name in a list in my program, I cant find "Ayşe" because program doesnt understand Turkish characters, but it shows Turkish characters at the beginning of program.The problem appears when I need data from user..
It doesnt make any sense, I really want to know why and how can I fix it.I tried tons of methods,none of them works..

Have a read of How to read Unicode input and compare Unicode strings in Python?, it should help you understand why raw_input isn't reading the name as you expect.

type chcp 65001 in console
right click the cmd window and change fonts to Lucida console
This is a duplicate of Unicode characters in Windows command line - how?
Also, you should really avoid users writing to console, it creates unnecessary complication to both you and the user

Related

Using unicode / umlauts in Python: Dictionary v manual input

I am using a dictionary to store some character pairs in Python (I am replacing umlaut characters). Here is what it looks like:
umlautdict={
'ae': 'ä',
'ue': 'ü',
'oe': 'ö'
}
Then I run my inputwords through it like so:
for item in umlautdict.keys():
outputword=inputword.replace(item,umlautdict[item])
But this does not do anything (no replacement happens). When I printed out my umlautdict, I saw that it looks like this:
{'ue': '\xfc', 'oe': '\xf6', 'ae': '\xc3\xa4'}
Of course that is not what I want; however, trying things like unicode() (--> Error) or pre-fixing u did not improve things.
If I type the 'ä' or 'ö' into the replace() command by hand, everything works just fine. I also changed the settings in my script (working in TextWrangler) to # -*- coding: utf-8 -*- as it would net even let me execute the script containing umlauts without it.
So I don't get...
Why does this happen? Why and when do the umlauts change from "good
to evil" when I store them in the dictionary?
How do I fix it?
Also, if anyone knows: what is a good resource to learn about
encoding in Python? I have issues all the time and so many things
don't make sense to me / I can't wrap my head around.
I'm working on a Mac in Python 2.7.10. Thanks for your help!
Converting to Unicode is done by decoding your string (assuming you're getting bytes):
data = "haer ueber loess"
word = data.decode('utf-8') # actual encoding depends on your data
Define your dict with unicode strings as well:
umlautdict={
u'ae': u'ä',
u'ue': u'ü',
u'oe': u'ö'
}
and finally print umlautdict will print out some representation of that dict, usually involving escapes. That's normal, you don't have to worry about that.
Declare your coding.
Use raw format for the special characters.
Iterate properly on your string: keep the changes from each loop iteration as you head to the next.
Here's code to get the job done:
\# -*- coding: utf-8 -*-
umlautdict = {
'ae': r'ä',
'ue': r'ü',
'oe': r'ö'
}
print umlautdict
inputword = "haer ueber loess"
for item in umlautdict.keys():
inputword = inputword.replace(item, umlautdict[item])
print inputword
Output:
{'ue': '\xc3\xbc', 'oe': '\xc3\xb6', 'ae': '\xc3\xa4'}
här über löss

Input: Applescript display dialog. Output: Python

I'm coding a little app that asks a question using display dialog with default answer "", takes whatever the user input is (let's say die Universität), and sends it to a Python file. Python checks the spelling of the word, translates it, spits out a display dialog with the English translation.
The problem I'm having is that Applescript is not giving Python a nice encoding. Here's my code in Applescript:
set phrase to the text returned of (display dialog "Enter German Phrase" default answer "")
set command to "python /Users/Eli/Documents/Alias\\ Scripts/gm_en.py " & phrase
do shell script command
I get the input into Python. It's breaking everything, so I'm using chardet to figure out what the encoding is. It's giving me this: {'confidence': 0.7696762680042672, 'encoding': 'ISO-8859-2'}
Not only is this pretty innacurrate, it's an encoding I can find very little about online. Trying to convert with decode('iso-8859-2') gives very strange symbols.
Any ideas?

Python UTF-8 REGEX

I have a problem while trying to find text specified in regex.
Everything work perfectly fine but when i added "\£" to my regex it started causing problems. I get SyntaxError. "NON ASCII CHACTER "\xc2" in file (...) but no encoding declared...
I've tried to solve this problem with using
import sys
reload(sys) # to enable `setdefaultencoding` again
sys.setdefaultencoding("UTF-8")
but it doesnt help. I just want to build regular expression and use pound sign there. flag re.Unicode flag doesnt help, saving string as unicode (pat) doesnt help. Is there any solution to fix this regex? I just want to build regular expression and use pound sign there.Thanks for help.
k = text.encode('utf-8')
pat = u'salar.{1,6}?([0-9\-,\. \tkFFRroOMmTtAanNuUMm\$\&\;\£]{2,})'
pattern = re.compile(pat, flags = re.DOTALL|re.I|re.UNICODE)
salary = pattern.search(k).group(1)
print (salary)
Error is still there even if I comment(put "#" and skip all of those lines. Maybe its not connected with re. library but my settings?
The error message means Python cannot guess which character set you are using. It also tells you that you can fix it by telling it the encoding of your script.
# coding: utf-8
string = "£"
or equivalently
string = u"\u00a3"
Without an encoding declaration, Python sees a bunch of bytes which mean different things in different encodings. Rather than guess, it forces you to tell you what they mean. This is codified in PEP-263.
(ASCII is unambiguous [except if your system is EBCDIC I guess] so it knows what you mean if you use a pure-ASCII representation for everything.)
The encoding settings you were fiddling with affect how files and streams are read, and program I/O generally, but not how the program source is interpreted.

Python Printing in Terminal

I am sorry to post what I think may be a very basic question, but my attempts at solving this have been futile, and I can't find a useful solution that has already been suggested to similar questions on this site.
My basic issue is this: I am attempting to run a file (coding UTF-8) as a program in Mac terminal (running Python 2.7.5). This works fine when I print the results of a mathematical operations, but for some reason I cannot print a simple string of characters.
I have tried running both:
# coding: utf-8
print "Hello, World."
exit()
and
# coding: utf-8
print("Hello, World.")
exit()
Both return an invalid syntax error, with the caret pointing at first set of quotation marks that I've used. What am I missing here?
Thank you for your help!
It turned out that I needed to disable smart quotes in TextEdit.

Cannot display MySQL latin2_croation_ci characters in Python script

I am trying to apply MySQL records in a Python script. The fields I am concerned with use latin2_croatian_ci collation characters. When I try to print out the following characters,
Karadžić
Stanišić & Simatović
Boškoski & Tarčulovski
Đorđević
Ražnatović, Željko
I get only,
Karadži?
Staniši? & Simatovi?
Boškoski & Tar?ulovski
?or?evi?
Ražnatovi?, Željko
I have tried numerous strategies from both MySQL and Python. In MySQL I have tried both CONVERT and CAST in various combinations. In Python I have tried applying the unicode() function. Nothing seems to work. In Flash, I had to set the embedding to "latin extended A" to solve a similar problem.
Any and all tips and or clues would be appreciated.
Put this line as second line into your python script (after #!/usr/bin/python)
# -*- coding: utf-8 -*-
with utf-8 you should be ok

Categories

Resources