I'm getting prices in different currencies and want to display Brazilian R$
My formatting doesn't work and the display looks like this:
Price: 1.15..000.,00 R$
For good flexibility I've stored the price as a string: price=db.StringProperty(verbose_name="price")
I tried to implement my own filter and it didn't work:
{{ ad.price|separate }} R$
def separate(n, sep='.'):
ln = list(str(n))
ln.reverse()
newn = []
while len(ln) > 3:
newn.extend(ln[:3])
newn.append(sep)
ln = ln[3:]
newn.extend(ln)
newn.reverse()
return "".join(newn)
Can you help me? Should I just remove the filter? Should I enforce some regex to the input instead? A link to my site is http://www.koolbusiness.com/servead/4252196
UPDATE: I'm considering using something like one of these filters:
import locale
locale.setlocale(locale.LC_ALL, '')
def currency(value): # doesn't work
locale.setlocale(locale.LC_ALL, '')
return locale.currency(value, grouping=True)
register.filter(currency)
def currencyWithoutUsingLocale(value): # needs adjustment
value=float(value)
symbol = '$'
thousand_sep = ''
decimal_sep = ''
# try to use settings if set
try:
symbol = settings.CURRENCY_SYMBOL
except AttributeError:
pass
try:
thousand_sep = settings.THOUSAND_SEPARATOR
decimal_sep = settings.DECIMAL_SEPARATOR
except AttributeError:
thousand_sep = ','
decimal_sep = '.'
intstr = str(int(value))
f = lambda x, n, acc=[]: f(x[:-n], n, [(x[-n:])]+acc) if x else acc
intpart = thousand_sep.join(f(intstr, 3))
return "%s%s%s%s" % (symbol, intpart, decimal_sep, ("%0.2f" % value)[-2:])
register.filter(currencyWithoutUsingLocale)
Storing the price as a string is the first problem. It should be a Decimal. If you look at the Python standard library documentation for Decimal, you will see this http://docs.python.org/library/decimal.html#recipes
That moneyfmt recipe should do what you want
Currency formatting is a feature of locale.
http://docs.python.org/library/locale.html#locale.currency
You can use locale._override_localeconv dict to set your own overwrites:
>>> import locale
>>> locale.setlocale(locale.LC_MONETARY, 'pt_BR.UTF-8')
>>> locale.currency(1234.56)
'1234,56 R$'
>>> locale._override_localeconv.update({'p_cs_precedes': 1, 'n_cs_precedes': 1})
>>> locale.currency(1234.56)
'R$ 1234,56'
You can use this locale._override_localeconv.update({'p_cs_precedes': 1, 'n_cs_precedes': 1}) line right after the import locale, if you want.
Related
import urllib.request
import json
from collections import Counter
def count_coauthors(author_id):
coauthors_dict = {}
url_str = ('https://api.semanticscholar.org/graph/v1/author/47490276?fields=name,papers.authors')
respons = urllib.request.urlopen(url_str)
text = respons.read().decode()
for line in respons:
print(line.decode().rstip())
data = json.loads(text)
print(type(data))
print(list(data.keys()))
print(data["name"])
print(data["authorId"])
name = []
for lines in data["papers"]:
for authors in lines["authors"]:
name.append(authors.get("name"))
print(name)
count = dict()
names = name
for i in names:
if i not in count:
count[i] = 1
else:
count[i] += 1
print(count)
c = Counter(count)
top = c.most_common(10)
print(top)
return coauthors_dict
author_id = '47490276'
cc = count_coauthors(author_id)
top_coauthors = sorted(cc.items(), key=lambda item: item[1], reverse=True)
for co_author in top_coauthors[:10]:
print(co_author)
This is how my code looks this far, there are no error. I need to get rid of the rest of the text when I run it, so it should look like this:
('Diego Calvanese', 47)
('D. Lanti', 28)
('Martín Rezk', 21)
('Elem Güzel Kalayci', 18)
('B. Cogrel', 17)
('E. Botoeva', 16)
('E. Kharlamov', 16)
('I. Horrocks', 12)
('S. Brandt', 11)
('V. Ryzhikov', 11)
I have tried using rstrip and split on my 'c' variable but it doesn't work. Im only allowed importing what I already have imported and must use the link which is included.
Tips on simplifying or bettering the code is also appreciated!
("Extend the program below so that it prints the names of the top-10 coauthors together with the numbers of the coauthored publications")
From what I understand you are not quite sure where your successful output originates from. It is not the 5 lines at the end.
Your result is printed by the print(top) on line 39. This top variable is what you want to return from the function, as the coauthors_dict you are currently returning never actually gets any data written to it.
You will also have to slightly adjust your sorted(...) as you now have a list and not a dictionary, but you should then get the correct result.
If I understand correctly you are wanting this function to return a count of each distinct co-author (excluding the author), which it seems like you already have in your count variable, which you don't return. The variable you DO return is empty.
Instead consider:
import urllib.request
import json
from collections import Counter
def count_coauthors(author_id):
url_str = (f'https://api.semanticscholar.org/graph/v1/author/{author_id}?fields=name,papers.authors')
response = urllib.request.urlopen(url_str)
text = response.read().decode()
data = json.loads(text)
names = [a.get("name") for l in data["papers"] for a in l["authors"] if a['authorId'] != author_id]
#The statement above can be written long-hand like:
#names=[]
#for l in data["papers"]:
# for a in l["authors"]:
# if a['authorId'] != author_id:
# names.append(a.get("name"))
return list(Counter(names).items())
author_id = '47490276'
cc = count_coauthors(author_id)
top_coauthors = sorted(cc, key=lambda item: item[1], reverse=True)
for co_author in top_coauthors[:10]:
print(co_author)
('Diego Calvanese', 47)
('D. Lanti', 28)
('Martín Rezk', 21)
('Elem Güzel Kalayci', 18)
('B. Cogrel', 17)
('E. Botoeva', 16)
('E. Kharlamov', 16)
('I. Horrocks', 12)
('S. Brandt', 11)
('V. Ryzhikov', 11)
You might also consider moving the top N logic into the function as an optional paramter:
import urllib.request
import json
from collections import Counter
def count_coauthors(author_id, top=0):
url_str = (f'https://api.semanticscholar.org/graph/v1/author/{author_id}?fields=name,papers.authors')
response = urllib.request.urlopen(url_str)
text = response.read().decode()
data = json.loads(text)
names = [a.get("name") for l in data["papers"] for a in l["authors"] if a['authorId'] != author_id]
name_count = list(Counter(names).items())
top = top if top!=0 else len(name_count)
return sorted(name_count, key=lambda x: x[1], reverse=True)[:top]
author_id = '47490276'
for auth in count_coauthors(author_id, top=10):
print(auth)
Using a Korean Input Method Editor (IME), it's possible to type 버리 + 어 and it will automatically become 버려.
Is there a way to programmatically do that in Python?
>>> x, y = '버리', '어'
>>> z = '버려'
>>> ord(z[-1])
47140
>>> ord(x[-1]), ord(y)
(47532, 50612)
Is there a way to compute that 47532 + 50612 -> 47140?
Here's some more examples:
가보 + 아 -> 가봐
끝나 + ㄹ -> 끝날
I'm a Korean. First, if you type 버리 + 어, it becomes 버리어 not 버려. 버려 is an abbreviation of 버리어 and it's not automatically generated. Also 가보아 cannot becomes 가봐 automatically during typing by the same reason.
Second, by contrast, 끝나 + ㄹ becomes 끝날 because 나 has no jongseong(종성). Note that one character of Hangul is made of choseong(초성), jungseong(중성), and jongseong. choseong and jongseong are a consonant, jungseong is a vowel. See more at Wikipedia. So only when there's no jongseong during typing (like 끝나), there's a chance that it can have jongseong(ㄹ).
If you want to make 버리 + 어 to 버려, you should implement some Korean language grammar like, especially for this case, abbreviation of jungseong. For example ㅣ + ㅓ = ㅕ, ㅗ + ㅏ = ㅘ as you provided. 한글 맞춤법 chapter 4. section 5 (I can't find English pages right now) defines abbreviation like this. It's possible, but not so easy job especially for non-Koreans.
Next, if what you want is just to make 끝나 + ㄹ to 끝날, it can be a relatively easy job since there're libraries which can handle composition and decomposition of choseong, jungseong, jongseong. In case of Python, I found hgtk. You can try like this (nonpractical code):
# hgtk methods take one character at a time
cjj1 = hgtk.letter.decompose('나') # ('ㄴ', 'ㅏ', '')
cjj2 = hgtk.letter.decompose('ㄹ') # ('ㄹ', '', '')
if cjj1[2]) == '' and cjj2[1]) == '':
cjj = (cjj1[0], cjj1[1], cjj2[0])
cjj2 = None
Still, without proper knowledge of Hangul, it will be very hard to get it done.
You could use your own Translation table.
The drawback is you have to input all pairs manual or you have a file to get it from.
For instance:
# Sample Korean chars to map
k = [[('버리', '어'), ('버려')], [('가보', '아'), ('가봐')], [('끝나', 'ㄹ'), ('끝날')]]
class Korean(object):
def __init__(self):
self.map = {}
for m in k:
key = m[0][0] + m[0][1]
self.map[hash(key)] = m[1]
def __getitem__(self, item):
return self.map[hash(item)]
def translate(self, s):
return [ self.map[hash(token)] for token in s]
if __name__ == '__main__':
k_map = Korean()
k_chars = [ m[0][0] + m[0][1] for m in k]
print('Input: %s' % k_chars)
print('Output: %s' % k_map.translate(k_chars))
one_char_3 = k[0][0][0] + k[0][0][1]
print('%s = %s' % (one_char_3, k_map[ one_char_3 ]) )
Input: ['버리어', '가보아', '끝나ㄹ']
Output: ['버려', '가봐', '끝날']
버리어 = 버려
Tested with Python:3.4.2
Can we use regex to detect text within a pdf (using pdfquery or another tool)?
I know we can do this:
pdf = pdfquery.PDFQuery("tests/samples/IRS_1040A.pdf")
pdf.load()
label = pdf.pq('LTTextLineHorizontal:contains("Cash")')
left_corner = float(label.attr('x0'))
bottom_corner = float(label.attr('y0'))
cash = pdf.pq('LTTextLineHorizontal:in_bbox("%s, %s, %s, %s")' % \
(left_corner, bottom_corner-30, \
left_corner+150, bottom_corner)).text()
print cash
'179,000.00'
But we need something like this:
pdf = pdfquery.PDFQuery("tests/samples/IRS_1040A.pdf")
pdf.load()
label = pdf.pq('LTTextLineHorizontal:regex("\d{1,3}(?:,\d{3})*(?:\.\d{2})?")')
cash = str(label.attr('x0'))
print cash
'179,000.00'
This is not exactly a lookup for a regex, but it works to format/filter the possible extractions:
def regex_function(pattern, match):
re_obj = re.search(pattern, match)
if re_obj != None and len(re_obj.groups()) > 0:
return re_obj.group(1)
return None
pdf = pdfquery.PDFQuery("tests/samples/IRS_1040A.pdf")
pattern = ''
pdf.extract( [
('with_parent','LTPage[pageid=1]'),
('with_formatter', 'text'),
('year', 'LTTextLineHorizontal:contains("Form 1040A (")',
lambda match: regex_function(SOME_PATTERN_HERE, match)))
])
I didn't test this next one, but it might work also:
def some_regex_function_feature():
# here you could use some regex.
return float(this.get('width',0)) * float(this.get('height',0)) > 40000
pdf.pq('LTPage[page_index="1"] *').filter(regex_function_filter_here)
[<LTTextBoxHorizontal>, <LTRect>, <LTRect>]
I am quiet new to regular expressions. I have a string that looks like this:
str = "abc/def/([default], [testing])"
and a dictionary
dict = {'abc/def/[default]' : '2.7', 'abc/def/[testing]' : '2.1'}
and using Python RE, I want str in this form, after comparisons of each element in dict to str:
str = "abc/def/(2.7, 2.1)"
Any help how to do it using Python RE?
P.S. its not the part of any assignment, instead it is the part of my project at work and I have spent many hours to figure out solution but in vain.
import re
st = "abc/def/([default], [testing], [something])"
dic = {'abc/def/[default]' : '2.7',
'abc/def/[testing]' : '2.1',
'bcd/xed/[something]' : '3.1'}
prefix_regex = "^[\w*/]*"
tag_regex = "\[\w*\]"
prefix = re.findall(prefix_regex, st)[0]
tags = re.findall(tag_regex, st)
for key in dic:
key_prefix = re.findall(prefix_regex, key)[0]
key_tag = re.findall(tag_regex, key)[0]
if prefix == key_prefix:
for tag in tags:
if tag == key_tag:
st = st.replace(tag, dic[key])
print st
OUTPUT:
abc/def/(2.7, 2.1, [something])
Here is a solution using re module.
Hypotheses :
there is a dictionary whose keys are composed of a prefix and a variable part, the variable part is enclosed in brackets ([])
the values are strings by which the variable parts are to be replaced in the string
the string is composed by a prefix, a (, a list of variable parts and a )
the variable parts in the string are enclosed in []
the variable parts in the string are separated by a comma followed by optional spaces
Python code :
import re
class splitter:
pref = re.compile("[^(]+")
iden = re.compile("\[[^]]*\]")
def __init__(self, d):
self.d = d
def split(self, s):
m = self.pref.match(s)
if m is not None:
p = m.group(0)
elts = self.iden.findall(s, m.span()[1])
return p, elts
return None
def convert(self, s):
p, elts = self.split(s)
return p + "(" + ", ".join((self.d[p + elt] for elt in elts)) + ")"
Usage :
s = "abc/def/([default], [testing])"
d = {'abc/def/[default]' : '2.7', 'abc/def/[testing]' : '2.1'}
sp = splitter(d)
print(sp.convert(s))
output :
abc/def/(2.7, 2.1)
Regex is probably not required here. Hope this helps
lhs,rhs = str.split("/(")
rhs1,rhs2 = rhs.strip(")").split(", ")
lhs+="/"
print "{0}({1},{2})".format(lhs,dict[lhs+rhs1],dict[lhs+rhs2])
output
abc/def/(2.7,2.1)
I want to compare two strings in a python unittest which contain html.
Is there a method which outputs the result in a human friendly (diff like) version?
A simple method is to strip whitespace from the HTML and split it into a list. Python 2.7's unittest (or the backported unittest2) then gives a human-readable diff between the lists.
import re
def split_html(html):
return re.split(r'\s*\n\s*', html.strip())
def test_render_html():
expected = ['<div>', '...', '</div>']
got = split_html(render_html())
self.assertEqual(expected, got)
If I'm writing a test for working code, I usually first set expected = [], insert a self.maxDiff = None before the assert and let the test fail once. The expected list can then be copy-pasted from the test output.
You might need to tweak how whitespace is stripped depending on what your HTML looks like.
I submitted a patch to do this some years back. The patch was rejected but you can still view it on the python bug list.
I doubt you would want to hack your unittest.py to apply the patch (if it even still works after all this time), but here's the function for reducing two strings a manageable size while still keeping at least part of what differs. So long as all you didn't want the complete differences this might be what you want:
def shortdiff(x,y):
'''shortdiff(x,y)
Compare strings x and y and display differences.
If the strings are too long, shorten them to fit
in one line, while still keeping at least some difference.
'''
import difflib
LINELEN = 79
def limit(s):
if len(s) > LINELEN:
return s[:LINELEN-3] + '...'
return s
def firstdiff(s, t):
span = 1000
for pos in range(0, max(len(s), len(t)), span):
if s[pos:pos+span] != t[pos:pos+span]:
for index in range(pos, pos+span):
if s[index:index+1] != t[index:index+1]:
return index
left = LINELEN/4
index = firstdiff(x, y)
if index > left + 7:
x = x[:left] + '...' + x[index-4:index+LINELEN]
y = y[:left] + '...' + y[index-4:index+LINELEN]
else:
x, y = x[:LINELEN+1], y[:LINELEN+1]
left = 0
cruncher = difflib.SequenceMatcher(None)
xtags = ytags = ""
cruncher.set_seqs(x, y)
editchars = { 'replace': ('^', '^'),
'delete': ('-', ''),
'insert': ('', '+'),
'equal': (' ',' ') }
for tag, xi1, xi2, yj1, yj2 in cruncher.get_opcodes():
lx, ly = xi2 - xi1, yj2 - yj1
edits = editchars[tag]
xtags += edits[0] * lx
ytags += edits[1] * ly
# Include ellipsis in edits line.
if left:
xtags = xtags[:left] + '...' + xtags[left+3:]
ytags = ytags[:left] + '...' + ytags[left+3:]
diffs = [ x, xtags, y, ytags ]
if max([len(s) for s in diffs]) < LINELEN:
return '\n'.join(diffs)
diffs = [ limit(s) for s in diffs ]
return '\n'.join(diffs)
Maybe this is a quite 'verbose' solution. You could add a new 'equality function' for your user defined type (e.g: HTMLString) which you have to define first:
class HTMLString(str):
pass
Now you have to define a type equality function:
def assertHTMLStringEqual(first, second):
if first != second:
message = ... # TODO here: format your message, e.g a diff
raise AssertionError(message)
All you have to do is format your message as you like. You can also use a class method in your specific TestCase as a type equality function. This gives you more functionality to format your message, since unittest.TestCase does this a lot.
Now you have to register this equality function in your unittest.TestCase:
...
def __init__(self):
self.addTypeEqualityFunc(HTMLString, assertHTMLStringEqual)
The same for a class method:
...
def __init__(self):
self.addTypeEqualityFunc(HTMLString, 'assertHTMLStringEqual')
And now you can use it in your tests:
def test_something(self):
htmlstring1 = HTMLString(...)
htmlstring2 = HTMLString(...)
self.assertEqual(htmlstring1, htmlstring2)
This should work well with python 2.7.
I (the one asking this question) use BeautfulSoup now:
def assertEqualHTML(string1, string2, file1='', file2=''):
u'''
Compare two unicode strings containing HTML.
A human friendly diff goes to logging.error() if there
are not equal, and an exception gets raised.
'''
from BeautifulSoup import BeautifulSoup as bs
import difflib
def short(mystr):
max=20
if len(mystr)>max:
return mystr[:max]
return mystr
p=[]
for mystr, file in [(string1, file1), (string2, file2)]:
if not isinstance(mystr, unicode):
raise Exception(u'string ist not unicode: %r %s' % (short(mystr), file))
soup=bs(mystr)
pretty=soup.prettify()
p.append(pretty)
if p[0]!=p[1]:
for line in difflib.unified_diff(p[0].splitlines(), p[1].splitlines(), fromfile=file1, tofile=file2):
logging.error(line)
raise Exception('Not equal %s %s' % (file1, file2))