python loop breaks on special chars in django template - python

So I am working with django and an external List of names and values. With a custom template tag I am trying to display some values in my html template.
Here is a example of what the list could look like:
names.txt
hello 4343.5
bye 43233.4
Hëllo 554.3
whatever 4343.8
My template tag looks like this (simplified names of variables):
# -*- coding: utf-8 -*-
from django import template
register = template.Library()
#register.filter(name='operation_name')
def operation_name(member):
with open('/pathtofile/member.txt','r') as f:
for line in f:
if member.member_name in line:
number = float(line.split()[1])
if number is not member.member_number:
member.member_number = number
member.save()
return member.member_number
return 'Not in List'
It works fine for entries without specials char. But it stops when a name in member.member_names has special chars. So if member.member_names would be Hëllo the entire script just stops. I can't return anything. This is driving me crazy. Even the names without special chars won't be displayed after any name with special chars occurred.
I appreciate any help, thanks in advance.
EDIT:
So this did the trick:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
But I don't know if this is a good solution.

This may help you try to compare both to unicode:-
if (member.member_name).decode('latin1') in (line).decode('latin1'):
number = float(line.split()[1])
if number is not member.member_number:
member.member_number = number
member.save()

Related

String splitting in python in CSV columns

So I am working with a CSV that has a many to one relationship and I have 2 problems I need assistance in solving. The first is that I have the string set up like
thisismystr=thisisanemail#addy.com,blah,blah,blah, startnewCSVcol
So I need to split the string twice, once on = and once on , as I am basically attempting to get the portion that is an e-mail address (thisisanemail#addy.com) so far I have figured out how to split the string on the = using something like this:
str = thisismystr=thisisanemail#addy.com,blah,blah,blah
print str.split("=")
Which returns this "thisisanemail#addy.com,blah,blah,blah"... however this leaves the ,blah,blah,blah portion to be removed... after a bit of research I am stumped as nothing explains how to remove from the middle, just the 1st part or the last part. Does anyone know how to do this?
For the 2nd part I need to do this from multiple line, so this is more of an advice question... is it best to plug this into a variable and loop through like (i = 1 for i, #endofCSV do splitcmd) or is there a more efficient manner to do this? I am more familiar with LUA and I am learning that the more I work with python the more it differs from LUA.
Please help. Thanks!
Does this solve your problem?
#!/usr/bin/env python
#-*- coding:utf-8 -*-
myString = 'thisismystr=thisisanemail#addy.com,blah,blah,blah'
a = myString.split('=')
b = []
for i in a:
b.extend(i.split(','))
print b
I believe you want the email out of strings in this format: 'thisismystr=thisisanemail#addy.com,blah,blah,blah'
This is how you would do that:
str = 'thisismystr=thisisanemail#addy.com,blah,blah,blah'
email = str.split('=')[1].split(',')[0]
print email

Writing normal texts in python file

It's just a curiosity about python. Is there a way to write anything in python files without getting any error and without using comments? It could be a macro / pre processor word / python option to ignore the lines.
For instance:
#!/usr/bin/python
# coding: utf-8
I am writing anything I want here!
def func1(number):
print(number)
func1(3)
and the result wouldn't trigger any error, printing the number 3.
A similar C++ question: Force the compiler to ignore some lines in the program
You could use comments and possibly add a special character outside of the comments if your goal is to apply a custom pre-processor. This would be similar to what the #! commands at the top of your file.
For example, in the following, I just used M: as the special prefix:
#!/usr/bin/python
# coding: utf-8
# M: I am writing anything I want here!
def func1(number):
print(number)
func1(3)
Without comments:
Without comments, you could surround with quotes and assign to a variable:
tmp = "I am writing anything I want here!"
As such:
#!/usr/bin/python
# coding: utf-8
tmp = "I am writing anything I want here!"
def func1(number):
print(number)
func1(3)
With comments
You can comment it out as such:
1. Multi-line comments/strings
#!/usr/bin/python
# coding: utf-8
"""I am writing anything I want here!
Yep, literally anything!
def func1(number):
print(number)
func1(3)
2. Single-line comments
#!/usr/bin/python
# coding: utf-8
#I am writing anything I want here!
def func1(number):
print(number)
func1(3)
The easiest way to do this, although it's technically not Pythonic and should not be left in the final draft as it creates an unused string, is with triple quotes on each side. It's the simplest way to toggle medium-large portions of code. Something like:
x = 5
""" DEBUG:
x += 5
x *= 5 * 5
x = x % 3
x -= 2
"""
print x
If it's one or two lines, you can always use normal comments,
#like this
For your code, you'd want:
#!/usr/bin/python
# coding: utf-8
"""
I am writing anything I want here!
"""
def func1(number):
print(number)
func1(3)
I just found something that I was looking for.
The aswer to my question is available here:
Conditional compilation in Python
https://code.google.com/p/pypreprocessor/

Replacing from dictionary - Python

I'm building a program that is able to replace characters in a message with characters the user has entered into a dictionary. Some of the characters are given in a text file. So, to import them, I used this code:
d = {}
with open("dictionary.txt") as d:
for line in d:
(key, val) = line.split()
d[str(key)] = val
It works well, except it adds "" to the start of the dictionary. The array of to-be-replaced text is called 'words'. This is the code I have for that:
for each in d:
words = ";".join(words)
words = words.replace(d[each],each)
words = words.split(";")
print words
When I hit F5, however, I get a load of gobbledook. Here's an example:
\xef\xbb\xbf\xef\xbb\xbfA+/084&
I'm just a newbie at Python, so any help would be appreciated.
Ensure to save your file in dictionnary file in UTF-8.
With notepad++ (Windows) there are conversion functions if your former file is not utf-8.
The "" pattern is related to latin-1 encoding (you won't have it if you use utf-8 encoding)
Then, instead of str(key), use key.encode("utf-8") to avoid possible other errors in the future.
If you want to know more, you can take a look to the good Python documentation about this : http://docs.python.org/2/howto/unicode.html

String slugification in Python

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe
I have changed it a little bit to:
s = 'String to slugify'
slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)
Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?
There is a python package named python-slugify, which does a pretty good job of slugifying:
pip install python-slugify
Works like this:
from slugify import slugify
txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")
txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")
txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")
txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")
txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")
txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")
See More examples
This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over nine years later (last checked 2022-03-30), it still gets updated).
careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.
Install unidecode form from here for unicode support
pip install unidecode
# -*- coding: utf-8 -*-
import re
import unidecode
def slugify(text):
text = unidecode.unidecode(text).lower()
return re.sub(r'[\W_]+', '-', text)
text = u"My custom хелло ворлд"
print slugify(text)
>>> my-custom-khello-vorld
There is python package named awesome-slugify:
pip install awesome-slugify
Works like this:
from slugify import slugify
slugify('one kožušček') # one-kozuscek
awesome-slugify github page
def slugify(value):
"""
Converts to lowercase, removes non-word characters (alphanumerics and
underscores) and converts spaces to hyphens. Also strips leading and
trailing whitespace.
"""
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
value = re.sub('[^\w\s-]', '', value).strip().lower()
return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)
This is the slugify function present in django.utils.text
This should suffice your requirement.
It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.
Are you having any problems with it?
The problem is with the ascii normalization line:
slug = unicodedata.normalize('NFKD', s)
It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:
Mørdag -> mrdag
Æther -> ther
A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:
import unidecode
slug = unidecode.unidecode(s)
You get better results for the above strings and for many Greek and Russian characters too:
Mørdag -> mordag
Æther -> aether
Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one
A couple of options on GitHub:
https://github.com/dimka665/awesome-slugify
https://github.com/un33k/python-slugify
https://github.com/mozilla/unicode-slugify
Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.
In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.
Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24
You might consider changing the last line to
slug=re.sub(r'--+',r'-',slug)
since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.
But, of course, this is quite minor.
Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.
By your example, a fast manner to do that could be:
s = 'String to slugify'
slug = s.replace(" ", "-").lower()
another nice answer for creating it could be this form
import re
re.sub(r'\W+', '-', st).strip('-').lower()

How to check is first character is ñ - Django

I get a word from a form, and to slugify it I want to differentiate it.
Using django's slugify if I get the word 'Ñandu', the slug becomes 'nandu'. And if I get the word 'Nandu' the slug also becomes 'nandu'.
So I decided that if the word starts with 'Ñ' the slug will become 'word_ene'.
The problem is I can't find a way to check if the first character from the input is really a 'ñ' (or 'ñ').
I have tried both self.palabra[0]==u"ñ" and self.palabra[0]=="ñ" with and without encoding palabra before. But I can't get to work.
Thanks in advance.
This works for me:
>>> str = u"Ñandu"
>>> str[0] == u"\xd1"
True
>>> if str[0] == u"\xd1": print "Begins with \xd1!"
Begins with Ñ!
Watch out for case; lower case ñ is encoded as u"\xf1".
If you type things like u"ñ" directly in the code, then you have to remember about putting sth like (with your coding of choice of course):
# -*- coding: utf8 -*-
at the top of your .py file, otherwise Python doesn't know what to do.
http://www.python.org/dev/peps/pep-0263/

Categories

Resources