YouTube API search list_next() throws UnicodeEncodeError - python
When I feed a non-English string into the YouTube API library's
search, it only works during the initial search. If I call list_next(),
it throws a UnicodeEncodeError.
When I use a simple ascii string, everything works correctly.
Any suggestions about what I should do?
Here's a simplified code of what I'm doing:
# -*- coding: utf-8 -*-
import apiclient.discovery
def test(query):
youtube = apiclient.discovery.build('youtube', 'v3', developerKey='xxx')
ys = youtube.search()
req = ys.list(
q=query.encode('utf-8'),
type='video',
part='id,snippet',
maxResults=50
)
while (req):
res = req.execute()
for i in res['items']:
print(i['id']['videoId'])
req = ys.list_next(req, res)
test(u'한글')
test(u'日本語')
test(u'\uD55C\uAE00')
test(u'\u65E5\u672C\u8A9E')
Error message:
Traceback (most recent call last):
File "E:\prj\scripts\yt\search.py", line 316, in _search
req = ys.list_next(req, res)
File "D:\Apps\Python\lib\site-packages\googleapiclient\discovery.py", line 966, in methodNext
parsed[4] = urlencode(newq)
File "D:\Apps\Python\lib\urllib.py", line 1343, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
Versions:
google-api-python-client (1.6.2)
Python 2.7.13 (Win32)
EDIT: I posted a workaround below.
If anyone else is interested, here's one workaround that works for me:
googleapiclient/discovery.py:
(old) q = parse_qsl(parsed[4])
(new) q = parse_qsl(parsed[4].encode('ascii'))
Explanation
In discovery.py, list_next() parses and unescapes the previous url, then makes a new url from it:
pageToken = previous_response['nextPageToken']
parsed = list(urlparse(request.uri))
q = parse_qsl(parsed[4])
# Find and remove old 'pageToken' value from URI
newq = [(key, value) for (key, value) in q if key != 'pageToken']
newq.append(('pageToken', pageToken))
parsed[4] = urlencode(newq)
uri = urlunparse(parsed)
It seems the problem is when parse_qsl unescapes the unicode parsed[4], it
returns the utf-8 encoded value in a unicode type. urlencode does not like
this:
q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80')
[(u'q', u'\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
UnicodeEncodeError
If parse_qsl is given a plain ascii string, it returns a plain utf-8 encoded string which urlencode likes:
q = urlparse.parse_qsl(u'q=%ED%95%9C%EA%B8%80'.encode('ascii'))
[('q', '\xed\x95\x9c\xea\xb8\x80')]
urllib.urlencode(q)
'q=%ED%95%9C%EA%B8%80'
Related
decode binary from xmlrpc python
I'm new to python and xml-rpc , and I'm stuck with decoding binary data coming from a public service : the service request response with this code is : from xmlrpc.client import Server import xmlrpc.client from pprint import pprint DEV_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx' logFile = open('stat.txt', 'w') s1 = Server('http://muovi.roma.it/ws/xml/autenticazione/1') s2 = Server('http://muovi.roma.it/ws/xml/paline/7') token = s1.autenticazione.Accedi(DEV_KEY, '') res = s2.paline.GetStatPassaggi(token) pprint(res, logFile) response : {'id_richiesta': '257a369dbf46e41ba275f8c821c7e1e0', 'risposta': {'periodi_aggregazione': <xmlrpc.client.Binary object at 0x0000027B7D6E2588>, 'tempi_attesa_percorsi': <xmlrpc.client.Binary object at 0x0000027B7D9276D8>}} I need to decode these two binary objects , and I'm stuck with this code : from xmlrpc.client import Server import xmlrpc.client from pprint import pprint DEV_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxx' logFile = open('stat.txt', 'w') s1 = Server('http://muovi.roma.it/ws/xml/autenticazione/1') s2 = Server('http://muovi.roma.it/ws/xml/paline/7') token = s1.autenticazione.Accedi(DEV_KEY, '') res = s2.paline.GetStatPassaggi(token) dat = xmlrpc.client.Binary(res) out = xmlrpc.client.Binary.decode(dat) pprint(out, logFile) that ends in : Traceback (most recent call last): File "stat.py", line 18, in dat = xmlrpc.client.Binary(res) File "C:\Users\Leonardo\AppData\Local\Programs\Python\Python35\lib\xmlrpc\client.py", line 389, in init data.class.name) TypeError: expected bytes or bytearray, not dict The only doc I found for xmlrpc.client is the one at docs.python.org , but I can't figure out how I could decode these binaries
If the content of res variable (what you get from the 2nd (s2) server) is the response you pasted into the question, then you should modify the last 3 lines of your 2nd snippet to (as you already have 2 Binary objects in the res dictionary): # ... Existing code res = s2.paline.GetStatPassaggi(token) answer = res.get("risposta", dict()) aggregation_periods = answer.get("periodi_aggregazione", xmlrpc.client.Binary()) timeout_paths = answer.get("tempi_attesa_percorsi", xmlrpc.client.Binary()) print(aggregation_periods.data) print(timeout_paths.data) Notes: According to [Python.Docs]: xmlrpc.client - Binary Objects (emphasis is mine): Binary objects have the following methods, supported mainly for internal use by the marshalling/unmarshalling code: I wasn't able to connect (and this test the solution), since DEV_KEY is (obviously) fake
How do I read this stringified javascript variable into Python?
I'm trying to read _pageData from https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1 into Python 2.7.11 so that I can process it, using this code: #!/usr/bin/env python # -*- coding: utf-8 -*- """ Testing _pageData processing. """ import urllib2 import re import ast import json import yaml BASE_URL = 'https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1' def main(): """ Do the business. """ response = urllib2.urlopen(BASE_URL, None) results = re.findall('var _pageData = \\"(.*?)\\";</script>', response.read()) first_result = results[0] # These all fail data = ast.literal_eval(first_result) # data = yaml.load(first_result) # data = json.loads(first_result) if __name__ == '__main__': main() but get the following error: Traceback (most recent call last): File "./temp.py", line 24, in <module> main() File "./temp.py", line 19, in main data = ast.literal_eval(first_result) File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 49, in literal_eval node_or_string = parse(node_or_string, mode='eval') File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 37, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<unknown>", line 1 [[1,true,true,true,true,true,true,true,true,,\"at\",\"\",\"\",1450364255674,\"\",\"en_US\",false,[]\n,\"https://www.google.com/maps/d/viewer?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/embed?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/edit?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/thumbnail?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",,,true,\"https://www.google.com/maps/d/print?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/pdf?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/viewer?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",false,false,\"/maps/d\",\"maps/sharing\",\"//www.google.com/intl/en_US/help/terms_maps.html\",true,\"https://docs.google.com/picker\",[]\n,false,true,[[[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-regular-001.png\",143,25]\n,[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-regular-2x-001.png\",286,50]\n]\n,[[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-small-001.png\",113,20]\n,[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-small-2x-001.png\",226,40]\n]\n]\n,1,\"https://www.gstatic.com/mapspro/_/js/k\\u003dmapspro.gmeviewer.en_US.8b9lQX3ifcs.O/m\\u003dgmeviewer_base/rt\\u003dj/d\\u003d0/rs\\u003dABjfnFWonctWGGtD63MaO3UZxCxF6UPKJQ\",true,true,false,true,\"US\",false,true,true,5,false]\n,[\"mf.map\",\"zBghbRiSwHlg.k2ATNtn6BCk0\",\"Hollywood, FL\",\"\",[-80.16005,26.01043,-80.16005,26.01043]\n,[-80.16005,26.01043,-80.16005,26.01043]\n,[[,\"zBghbRiSwHlg.kq4rrF9BNRIg\",\"Untitled layer\",\"\",[[[\"https://mt.googleapis.com/vt/icon/name\\u003dicons/onion/22-blue-dot.png\\u0026scale\\u003d1.0\"]\n,[]\n,1,1,[[,[26.01043,-80.16005]\n]\n,\"MDZBMzJCQjRBOTAwMDAwMQ~CjISKmdlby1tYXBzcHJvLm1hcHNob3AtbGF5ZXItNDUyOWUwMTc0YzhkNmI2ZBgAKAAwABIZACBawIJBU4Fe8v7vNSoAg0dtnhhVotEBLg\",\"vdb:\",\"zBghbRiSwHlg.kq4rrF9BNRIg\",[26.01043,-80.16005]\n,[0,-32]\n,\"06A32BB4A9000001\"]\n,[[\"Hollywood, FL\"]\n]\n,[]\n]\n]\n,,1.0,true,true,,,,[[\"zBghbRiSwHlg.kq4rrF9BNRIg\",1,,,,\"https://mapsengine.google.com/map/kml?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\\u0026lid\\u003dzBghbRiSwHlg.kq4rrF9BNRIg\",,,,,0,2,true,[[[\"06A32BB4A9000001\",[[[26.01043,-80.16005]\n]\n]\n,[]\n,[]\n,0,[[\"name\",[\"Hollywood, FL\"]\n,1]\n,,[]\n,[]\n]\n,,0]\n]\n,[[[\"https://mt.googleapis.com/vt/icon/name\\u003dicons/onion/22-blue-dot.png\\u0026filter\\u003dff\\u0026scale\\u003d1.0\",[16,32]\n,1.0]\n,[[\"0000FF\",0.45098039215686275]\n,5000]\n,[[\"0000FF\",0.45098039215686275]\n,[\"000000\",0.25098039215686274]\n,3000]\n]\n]\n]\n]\n]\n,[]\n,,,,,1]\n]\n,[2]\n,,,\"mapspro\",\"zBghbRiSwHlg.k2ATNtn6BCk0\",,true,false,false,\"\",2,false,\"https://mapsengine.google.com/map/kml?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",3807]\n]\n ^ SyntaxError: invalid syntax var _pageData is in this format: "[[1,true,true,true,true,true,true,true,true,,\"at\",\"\",\"\",1450364255674,\"\",\"en_US\",false,[]\n,\"https://www.google.com/maps/d/viewer?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/embed?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/edit?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/thumbnail?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",,,true,\"https://www.google.com/maps/d/print?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/pdf?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/viewer?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",false,false,\"/maps/d\",\"maps/sharing\",\"//www.google.com/intl/en_US/help/terms_maps.html\",true,\"https://docs.google.com/picker\",[]\n,false,true,[[[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-regular-001.png\",143,25]\n,[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-regular-2x-001.png\",286,50]\n]\n,[[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-small-001.png\",113,20]\n,[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-small-2x-001.png\",226,40]\n]\n]\n,1,\"https://www.gstatic.com/mapspro/_/js/k\\u003dmapspro.gmeviewer.en_US.8b9lQX3ifcs.O/m\\u003dgmeviewer_base/rt\\u003dj/d\\u003d0/rs\\u003dABjfnFWonctWGGtD63MaO3UZxCxF6UPKJQ\",true,true,false,true,\"US\",false,true,true,5,false]\n,[\"mf.map\",\"zBghbRiSwHlg.k2ATNtn6BCk0\",\"Hollywood, FL\",\"\",[-80.16005,26.01043,-80.16005,26.01043]\n,[-80.16005,26.01043,-80.16005,26.01043]\n,[[,\"zBghbRiSwHlg.kq4rrF9BNRIg\",\"Untitled layer\",\"\",[[[\"https://mt.googleapis.com/vt/icon/name\\u003dicons/onion/22-blue-dot.png\\u0026scale\\u003d1.0\"]\n,[]\n,1,1,[[,[26.01043,-80.16005]\n]\n,\"MDZBMzJCQjRBOTAwMDAwMQ~CjISKmdlby1tYXBzcHJvLm1hcHNob3AtbGF5ZXItNDUyOWUwMTc0YzhkNmI2ZBgAKAAwABIZACBawIJBU4Fe8v7vNSoAg0dtnhhVotEBLg\",\"vdb:\",\"zBghbRiSwHlg.kq4rrF9BNRIg\",[26.01043,-80.16005]\n,[0,-32]\n,\"06A32BB4A9000001\"]\n,[[\"Hollywood, FL\"]\n]\n,[]\n]\n]\n,,1.0,true,true,,,,[[\"zBghbRiSwHlg.kq4rrF9BNRIg\",1,,,,\"https://mapsengine.google.com/map/kml?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\\u0026lid\\u003dzBghbRiSwHlg.kq4rrF9BNRIg\",,,,,0,2,true,[[[\"06A32BB4A9000001\",[[[26.01043,-80.16005]\n]\n]\n,[]\n,[]\n,0,[[\"name\",[\"Hollywood, FL\"]\n,1]\n,,[]\n,[]\n]\n,,0]\n]\n,[[[\"https://mt.googleapis.com/vt/icon/name\\u003dicons/onion/22-blue-dot.png\\u0026filter\\u003dff\\u0026scale\\u003d1.0\",[16,32]\n,1.0]\n,[[\"0000FF\",0.45098039215686275]\n,5000]\n,[[\"0000FF\",0.45098039215686275]\n,[\"000000\",0.25098039215686274]\n,3000]\n]\n]\n]\n]\n]\n,[]\n,,,,,1]\n]\n,[2]\n,,,\"mapspro\",\"zBghbRiSwHlg.k2ATNtn6BCk0\",,true,false,false,\"\",2,false,\"https://mapsengine.google.com/map/kml?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",3807]\n]\n" I've tried replacing the \" and \n and decoding the \uXXXX before using, without success. I've also tried replacing ,, with ,"", and ,'', without success. Thank you.
It seems like there are three kinds of syntactic errors in your string: , followed by , [ followed by , , followed by ] Assuming that those are supposed to be null elements (or ''?), you can just replace those in the original string -- exactly like you did for the ,, case, but you missed the others. Also, you have to do the ,, replacement twice, otherwise you will miss cases such as ,,,,. Then, you can load the JSON string with json.loads. >>> s = "your messed up json string" >>> s = re.sub(r",\s*,", ", null,", s) >>> s = re.sub(r",\s*,", ", null,", s) >>> s = re.sub(r"\[\s*,", "[ null,", s) >>> s = re.sub(r",\s*\]", ", null]", s) >>> json.loads(s)
I started off using ast.literal.eval(...) because I was under the (mistaken?) impression that javascript arrays and Python lists were mutually compatible, so all I had to do was destringify _pageData. However, I hadn't noticed that Python doesn't like ,, true, false or [,. Fixing them does the trick (thank you #Two-Bit Alchemist and #tobias_k) So, the following appears to work: #!/usr/bin/env python # -*- coding: utf-8 -*- """ Testing _pageData processing. """ import urllib2 import re import ast import json import yaml BASE_URL = 'https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1' def main(): """ Do the business. """ response = urllib2.urlopen(BASE_URL, None) results = re.findall('var _pageData = \\"(.*?)\\";</script>', response.read()) first_result = results[0] first_result = first_result.replace(',,,,,,', ',None,None,None,None,None,') first_result = first_result.replace(',,,,,', ',None,None,None,None,') first_result = first_result.replace(',,,,', ',None,None,None,') first_result = first_result.replace(',,,', ',None,None,') first_result = first_result.replace(',,', ',None,') first_result = first_result.replace('[,', '[None,') first_result = first_result.replace('\\"', '\'') first_result = first_result.replace('\\n', '') first_result = first_result.replace('true', 'True') first_result = first_result.replace('false', 'False') data = ast.literal_eval(first_result) for entry in data: print entry if __name__ == '__main__': main()
Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)
I am playing around with NLTK to do an assignment on sentiment analysis. I am using Python 2.7. NLTK 3.0 and NumPy1.9.1 version. This is the code : __author__ = 'karan' import nltk import re import sys def main(): print("Start"); # getting the stop words stopWords = open("english.txt","r"); stop_word = stopWords.read().split(); AllStopWrd = [] for wd in stop_word: AllStopWrd.append(wd); print("stop words-> ",AllStopWrd); # sample and also cleaning it tweet1= 'Love, my new toyí ½í¸í ½í¸#iPhone6. Its good https://twitter.com/Sandra_Ortega/status/513807261769424897/photo/1' print("old tweet-> ",tweet1) tweet1 = tweet1.lower() tweet1 = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",tweet1).split()) print(tweet1); tw = tweet1.split() print(tw) #tokenize sentences = nltk.word_tokenize(tweet1) print("tokenized ->", sentences) #remove stop words Otweet =[] for w in tw: if w not in AllStopWrd: Otweet.append(w); print("sans stop word-> ",Otweet) # get taggers for neg/pos/inc/dec/inv words taggers ={} negWords = open("neg.txt","r"); neg_word = negWords.read().split(); print("ned words-> ",neg_word) posWords = open("pos.txt","r"); pos_word = posWords.read().split(); print("pos words-> ",pos_word) incrWords = open("incr.txt","r"); inc_word = incrWords.read().split(); print("incr words-> ",inc_word) decrWords = open("decr.txt","r"); dec_word = decrWords.read().split(); print("dec wrds-> ",dec_word) invWords = open("inverse.txt","r"); inv_word = invWords.read().split(); print("inverse words-> ",inv_word) for nw in neg_word: taggers.update({nw:'negative'}); for pw in pos_word: taggers.update({pw:'positive'}); for iw in inc_word: taggers.update({iw:'inc'}); for dw in dec_word: taggers.update({dw:'dec'}); for ivw in inv_word: taggers.update({ivw:'inv'}); print("tagger-> ",taggers) print(taggers.get('little')) # get parts of speech posTagger = [nltk.pos_tag(tw)] print("posTagger-> ",posTagger) main(); This is the error that I am getting when running my code: SyntaxError: Non-ASCII character '\xc3' in file C:/Users/karan/PycharmProjects/mainProject/sentiment.py on line 19, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details How do I fix this error? I also tried the code using Python 3.4.2 and with NLTK 3.0 and NumPy 1.9.1 but then I get the error: Traceback (most recent call last): File "C:/Users/karan/PycharmProjects/mainProject/sentiment.py", line 80, in <module> main(); File "C:/Users/karan/PycharmProjects/mainProject/sentiment.py", line 72, in main posTagger = [nltk.pos_tag(tw)] File "C:\Python34\lib\site-packages\nltk\tag\__init__.py", line 100, in pos_tag tagger = load(_POS_TAGGER) File "C:\Python34\lib\site-packages\nltk\data.py", line 779, in load resource_val = pickle.load(opened_resource) UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 0: ordinal not in range(128)
Add the following to the top of your file # coding=utf-8 If you go to the link in the error you can seen the reason why: Defining the Encoding Python will default to ASCII as standard encoding if no other encoding hints are given. To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as: # coding=
TypeError: decode() argument 1 must be str, not None
I'm create new project I create two function. First func normally run, but second function not working First func: #it's work def twitter(adress): reqURL = request.urlopen("https://cdn.api.twitter.com/1/urls/count.json?url=%s" % adress) encodingData = reqURL.headers.get_content_charset() jsonLoad = json.loads(reqURL.read().decode(encodingData)) print("Sharing:", jsonLoad['count']) Second func: #Doesn't work def facebook(adress): reqURL = request.urlopen("https://api.facebook.com/method/links.getStats?urls=%s&format=json" % adress) encodingData = reqURL.headers.get_content_charset() jsonLoad = json.loads(reqURL.read().decode(encodingData)) print("Sharing:", jsonLoad['share_count']) How to fix second func(facebook) I get the error for facebook func: Traceback (most recent call last): File "/myproject/main.py", line 24, in <module> facebook(url) File "/myproject/main.py", line 15, in facebook jsonLoad = json.loads(reqURL.read().decode(encodingData)) TypeError: decode() argument 1 must be str, not None twitter func out: Sharing: 951 How can I solve this problem? Thanks
The Facebook response doesn't include a charset parameter indicating the encoding used: >>> from urllib import request >>> adress = 'www.zopatista.com' >>> reqURL = request.urlopen("https://cdn.api.twitter.com/1/urls/count.json?url=%s" % adress) >>> reqURL.info().get('content-type') 'application/json;charset=utf-8' >>> reqURL = request.urlopen("https://api.facebook.com/method/links.getStats?urls=%s&format=json" % adress) >>> reqURL.info().get('content-type') 'application/json' Note the charset=utf-8 part in the Twitter response. The JSON standard states that the default characterset is UTF-8, so pass that to the get_content_charset() method: encodingData = reqURL.headers.get_content_charset('utf8') jsonLoad = json.loads(reqURL.read().decode(encodingData)) Now, when no content charset parameter is set, the default 'utf8' is returned instead. Note that Facebook's JSON response contains a list of matches; because you are passing in just one URL, you could take just the first result: print("Sharing:", jsonLoad[0]['share_count'])
unicode error displayed on the server on running app (django)
my views.py code: #!/usr/bin/python from django.template import loader, RequestContext from django.http import HttpResponse #from skey import find_root_tags, count, sorting_list from search.models import Keywords from django.shortcuts import render_to_response as rr def front_page(request): if request.method == 'POST' : from skey import find_root_tags, count, sorting_list str1 = request.POST['word'] fo = open("/home/pooja/Desktop/xml.txt","r") for i in range(count.__len__()): file = fo.readline() file = file.rstrip('\n') find_root_tags(file,str1,i) list.append((file,count[i])) sorting_list(list) for name, count1 in list: s = Keywords(file_name=name,frequency_count=count1) s.save() fo.close() list1 = Keywords.objects.all() t = loader.get_template('search/results.html') c = RequestContext({'list1':list1, }) return HttpResponse(t.render(c)) else : str1 = '' list = [] template = loader.get_template('search/front_page.html') c = RequestContext(request) response = template.render(c) return HttpResponse(response) skey.py has another function called within from find_root_tags(): def find_text(file,str1,i): str1 = str1.lower() exp = re.compile(r'<.*?>') with open(file) as f: lines = ''.join(line for line in f.readlines()) text_only = exp.sub('',lines).strip() text_only = text_only.lower() k = text_only.count(str1) #**line 34** count[i] = count[i]+k when I ran my app on server it gave me this error: UnicodeDecodeError at /search/ 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) Request Method: POST Request URL: http://127.0.0.1:8000/search/ Django Version: 1.4 Exception Type: UnicodeDecodeError Exception Value: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) Exception Location: /home/pooja/Desktop/mysite/search/skey.py in find_text, line 34 Python Executable: /usr/bin/python Python Version: 2.6.5 Python Path: ['/home/pooja/Desktop/mysite', '/usr/lib/python2.6', '/usr/lib/python2.6/plat-linux2', '/usr/lib/python2.6/lib-tk', '/usr/lib/python2.6/lib-old', '/usr/lib/python2.6/lib-dynload', '/usr/lib/python2.6/dist-packages', '/usr/lib/python2.6/dist-packages/PIL', '/usr/lib/python2.6/dist-packages/gst-0.10', '/usr/lib/pymodules/python2.6', '/usr/lib/python2.6/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.6/gtk-2.0', '/usr/local/lib/python2.6/dist-packages'] error : Can anyone tell me why is it showing this error?How can I remove this error Please help.
You're mixing Unicode strings and bytestrings. str1 = request.POST['word'] is probably a Unicode string and text_only is a bytestring. Python fails to convert the later to Unicode. You could use codecs.open() to specify the character encoding of the file. See Pragmatic Unicode and The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Probable your str1 is in unicode, but text_only is not (on line 34). The next is not a panacea but if this corrects your problem then I am right. k = u"{0}".format( text_only ).count(str1)