How to check is first character is ñ - Django - python

I get a word from a form, and to slugify it I want to differentiate it.
Using django's slugify if I get the word 'Ñandu', the slug becomes 'nandu'. And if I get the word 'Nandu' the slug also becomes 'nandu'.
So I decided that if the word starts with 'Ñ' the slug will become 'word_ene'.
The problem is I can't find a way to check if the first character from the input is really a 'ñ' (or 'ñ').
I have tried both self.palabra[0]==u"ñ" and self.palabra[0]=="ñ" with and without encoding palabra before. But I can't get to work.
Thanks in advance.

This works for me:
>>> str = u"Ñandu"
>>> str[0] == u"\xd1"
True
>>> if str[0] == u"\xd1": print "Begins with \xd1!"
Begins with Ñ!
Watch out for case; lower case ñ is encoded as u"\xf1".

If you type things like u"ñ" directly in the code, then you have to remember about putting sth like (with your coding of choice of course):
# -*- coding: utf8 -*-
at the top of your .py file, otherwise Python doesn't know what to do.
http://www.python.org/dev/peps/pep-0263/

Related

python loop breaks on special chars in django template

So I am working with django and an external List of names and values. With a custom template tag I am trying to display some values in my html template.
Here is a example of what the list could look like:
names.txt
hello 4343.5
bye 43233.4
Hëllo 554.3
whatever 4343.8
My template tag looks like this (simplified names of variables):
# -*- coding: utf-8 -*-
from django import template
register = template.Library()
#register.filter(name='operation_name')
def operation_name(member):
with open('/pathtofile/member.txt','r') as f:
for line in f:
if member.member_name in line:
number = float(line.split()[1])
if number is not member.member_number:
member.member_number = number
member.save()
return member.member_number
return 'Not in List'
It works fine for entries without specials char. But it stops when a name in member.member_names has special chars. So if member.member_names would be Hëllo the entire script just stops. I can't return anything. This is driving me crazy. Even the names without special chars won't be displayed after any name with special chars occurred.
I appreciate any help, thanks in advance.
EDIT:
So this did the trick:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
But I don't know if this is a good solution.
This may help you try to compare both to unicode:-
if (member.member_name).decode('latin1') in (line).decode('latin1'):
number = float(line.split()[1])
if number is not member.member_number:
member.member_number = number
member.save()

Using unicode / umlauts in Python: Dictionary v manual input

I am using a dictionary to store some character pairs in Python (I am replacing umlaut characters). Here is what it looks like:
umlautdict={
'ae': 'ä',
'ue': 'ü',
'oe': 'ö'
}
Then I run my inputwords through it like so:
for item in umlautdict.keys():
outputword=inputword.replace(item,umlautdict[item])
But this does not do anything (no replacement happens). When I printed out my umlautdict, I saw that it looks like this:
{'ue': '\xfc', 'oe': '\xf6', 'ae': '\xc3\xa4'}
Of course that is not what I want; however, trying things like unicode() (--> Error) or pre-fixing u did not improve things.
If I type the 'ä' or 'ö' into the replace() command by hand, everything works just fine. I also changed the settings in my script (working in TextWrangler) to # -*- coding: utf-8 -*- as it would net even let me execute the script containing umlauts without it.
So I don't get...
Why does this happen? Why and when do the umlauts change from "good
to evil" when I store them in the dictionary?
How do I fix it?
Also, if anyone knows: what is a good resource to learn about
encoding in Python? I have issues all the time and so many things
don't make sense to me / I can't wrap my head around.
I'm working on a Mac in Python 2.7.10. Thanks for your help!
Converting to Unicode is done by decoding your string (assuming you're getting bytes):
data = "haer ueber loess"
word = data.decode('utf-8') # actual encoding depends on your data
Define your dict with unicode strings as well:
umlautdict={
u'ae': u'ä',
u'ue': u'ü',
u'oe': u'ö'
}
and finally print umlautdict will print out some representation of that dict, usually involving escapes. That's normal, you don't have to worry about that.
Declare your coding.
Use raw format for the special characters.
Iterate properly on your string: keep the changes from each loop iteration as you head to the next.
Here's code to get the job done:
\# -*- coding: utf-8 -*-
umlautdict = {
'ae': r'ä',
'ue': r'ü',
'oe': r'ö'
}
print umlautdict
inputword = "haer ueber loess"
for item in umlautdict.keys():
inputword = inputword.replace(item, umlautdict[item])
print inputword
Output:
{'ue': '\xc3\xbc', 'oe': '\xc3\xb6', 'ae': '\xc3\xa4'}
här über löss

Python :Non-UTF-8 code starting with '\xe8' in file [duplicate]

I am trying to write a binary search program for a class, and I am pretty sure that my logic is right, but I keep getting a non-UTF-8 error. I have never seen this error and any help/clarification would be great! Thanks a bunch.
Here's the code.
def main():
str names = [‘Ava Fischer’, ‘Bob White’, ‘Chris Rich’, ‘Danielle Porter’, ‘Gordon Pike’, ‘Hannah Beauregard’, ‘Matt Hoyle’, ‘Ross Harrison’, ‘Sasha Ricci’, ‘Xavier Adams’]
binarySearch(names, input(str("Please Enter a Name.")))
print("That name is at position "+position)
def binarySearch(array, searchedValue):
begin = 0
end = len(array) - 1
position = -1
found = False
while !=found & begin<=end:
middle=(begin+end)/2
if array[middle]== searchedValue:
found=True
position = middle
elif array[middle] >value:
end=middle-1
else:
first =middle+1
return position
Add this line at the top of you code. It may work.
# coding=utf8
Your editor replaced ' (ASCII 39) with U+2018 LEFT SINGLE QUOTATION MARK characters, usually a sign you used Word or a similar wordprocessor instead of a plain text editor; a word processor tries to make your text 'prettier' and auto-replaces things like simple quotes with fancy ones. This was then saved in the Windows 1252 codepage encoding, where the fancy quotes were saved as hex 91 characters.
Python is having none of it. It wants source code saved in UTF-8 and using ' or " for quotation marks. Use notepad, or better still, IDLE to edit your Python code instead.
You have numerous other errors in your code; you cannot use spaces in your variable names, for example, and Python uses and, not & as the boolean AND operator. != is an operator requiring 2 operands (it means 'not equal', the opposite of ==), the boolean NOT operator is called not.
If you're using Notepad++, click Encoding at the top and choose Encode in UTF-8.
The character you are beginning your constant strings with is not the right string delimiter. You are using
‘Ava Fischer’ # ‘ and ’ as string delimiters
when it should have been either
'Ava Fischer' # Ascii 39 as string delimiter
or maybe
"Ava Fischer" # Ascii 34 as string delimiter
Add this line to the top of your code, it might help
# -*- coding:utf-8 -*-

SyntaxError: Non-UTF-8 code starting with '\x91'

I am trying to write a binary search program for a class, and I am pretty sure that my logic is right, but I keep getting a non-UTF-8 error. I have never seen this error and any help/clarification would be great! Thanks a bunch.
Here's the code.
def main():
str names = [‘Ava Fischer’, ‘Bob White’, ‘Chris Rich’, ‘Danielle Porter’, ‘Gordon Pike’, ‘Hannah Beauregard’, ‘Matt Hoyle’, ‘Ross Harrison’, ‘Sasha Ricci’, ‘Xavier Adams’]
binarySearch(names, input(str("Please Enter a Name.")))
print("That name is at position "+position)
def binarySearch(array, searchedValue):
begin = 0
end = len(array) - 1
position = -1
found = False
while !=found & begin<=end:
middle=(begin+end)/2
if array[middle]== searchedValue:
found=True
position = middle
elif array[middle] >value:
end=middle-1
else:
first =middle+1
return position
Add this line at the top of you code. It may work.
# coding=utf8
Your editor replaced ' (ASCII 39) with U+2018 LEFT SINGLE QUOTATION MARK characters, usually a sign you used Word or a similar wordprocessor instead of a plain text editor; a word processor tries to make your text 'prettier' and auto-replaces things like simple quotes with fancy ones. This was then saved in the Windows 1252 codepage encoding, where the fancy quotes were saved as hex 91 characters.
Python is having none of it. It wants source code saved in UTF-8 and using ' or " for quotation marks. Use notepad, or better still, IDLE to edit your Python code instead.
You have numerous other errors in your code; you cannot use spaces in your variable names, for example, and Python uses and, not & as the boolean AND operator. != is an operator requiring 2 operands (it means 'not equal', the opposite of ==), the boolean NOT operator is called not.
If you're using Notepad++, click Encoding at the top and choose Encode in UTF-8.
The character you are beginning your constant strings with is not the right string delimiter. You are using
‘Ava Fischer’ # ‘ and ’ as string delimiters
when it should have been either
'Ava Fischer' # Ascii 39 as string delimiter
or maybe
"Ava Fischer" # Ascii 34 as string delimiter
Add this line to the top of your code, it might help
# -*- coding:utf-8 -*-

How to search and replace utf-8 special characters in Python?

I'm a Python beginner, and I have a utf-8 problem.
I have a utf-8 string and I would like to replace all german umlauts with ASCII replacements (in German, u-umlaut 'ü' may be rewritten as 'ue').
u-umlaut has unicode code point 252, so I tried this:
>>> str = unichr(252) + 'ber'
>>> print repr(str)
u'\xfcber'
>>> print repr(str).replace(unichr(252), 'ue')
u'\xfcber'
I expected the last string to be u'ueber'.
What I ultimately want to do is replace all u-umlauts in a file with 'ue':
import sys
import codecs
f = codecs.open(sys.argv[1],encoding='utf-8')
for line in f:
print repr(line).replace(unichr(252), 'ue')
Thanks for your help! (I'm using Python 2.3.)
I would define a dictionary of special characters (that I want to map) then I use translate method.
line = 'Ich möchte die Qualität des Produkts überprüfen, bevor ich es kaufe.'
special_char_map = {ord('ä'):'ae', ord('ü'):'ue', ord('ö'):'oe', ord('ß'):'ss'}
print(line.translate(special_char_map))
you will get the following result:
Ich moechte die Qualitaet des Produkts ueberpruefen, bevor ich es kaufe.
I think it's easiest and clearer to do it on a more straightforward way, using directly the unicode representation os 'ü' better than unichr(252).
>>> s = u'über'
>>> s.replace(u'ü', 'ue')
u'ueber'
There's no need to use repr, as this will print the 'Python representation' of the string, you just need to present the readable string.
You will need also to include the following line at the beggining of the .py file, in case it's not already present, to tell the encoding of the file
#-*- coding: UTF-8 -*-
Added: Of course, the coding declared must be the same as the encoding of the file. Please check that as can be some problems (I had problems with Eclipse on Windows, for example, as it writes by default the files as cp1252. Also it should be the same encoding of the system, which could be utf-8, or latin-1 or others.
Also, don't use str as the definition of a variable, as it is part of the Python library. You could have problems later.
(I am trying on Python 2.6, I think in Python 2.3 the result is the same)
repr(str) returns a quoted version of str, that when printed out, will be something you could type back in as Python to get the string back. So, it's a string that literally contains \xfcber, instead of a string that contains über.
You can just use str.replace(unichr(252), 'ue') to replace the ü with ue.
If you need to get a quoted version of the result of that, though I don't believe you should need it, you can wrap the entire expression in repr:
repr(str.replace(unichr(252), 'ue'))
You can avoid all that sourcefile encoding stuff and its problems. Use the Unicode names, then its screamingly obvious what you are doing and the code can be read and modified anywhere.
I don't know of any language where the only accented Latin letter is lower-case-u-with-umlaut-aka-diaeresis, so I've added code to loop over a table of translations under the assumption that you'll need it.
# coding: ascii
translations = (
(u'\N{LATIN SMALL LETTER U WITH DIAERESIS}', u'ue'),
(u'\N{LATIN SMALL LETTER O WITH DIAERESIS}', u'oe'),
# et cetera
)
test = u'M\N{LATIN SMALL LETTER O WITH DIAERESIS}ller von M\N{LATIN SMALL LETTER U WITH DIAERESIS}nchen'
out = test
for from_str, to_str in translations:
out = out.replace(from_str, to_str)
print out
output:
Moeller von Muenchen

Categories

Resources