im trying to sending a response to a web, with multilingual characters on python 3 but all the time, it comes this:
"\\xd8\\xa7\\xd9\\x84\\xd9\\x82\\xd8\\xa7\\xd9\\x85\\xd9\\x88\\xd8\\xb3 \\xd8\\xa7\\xd9\\x84\\xd8\\xb9\\xd8\\xb1\\xd8\\xa8\\xd9\\x8a Espa\\xc3\\xb1a".
When the correct answer is this:
القاموس العربي España.
This is the code:
s="القاموس العربي España".encode(encoding='UTF-8')
Where can be my mistake?
I found it! It was a mess with the JSON responser, that i was writing with ensure_ascii=True, and the response was trying to send it as a JSON not as HTML. By using the ensure_ascii=True the system will print correctly any JSON answer.
Related
I am trying to use python to read a Japanese PDF or HTML file as input, and I want to get each Japanese characters' unicode in the file.
Someone suggests that I can use 'tika' library to read a PDF file. I ran the following code and got a series of garbled text as below.
import tika
from tika import parser
parsed = parser.from_file('jpn.pdf')
print(parsed["content"])
result:
��������������������������������
�1948.12.10
������
����������������
�������������� �!"#$%&'()#�&*+,-.#/01�(
)#2345678(9:3;<=>$?�#A&B(�&3
�-�CD=>EFG3���HI/JK6LMNOPQR/SNTU3VW=>XY
�9:GZ8T[3]=>^_�+,45�`aG3Yc��d�ef�gh#U�
iVj[N�&3
�kGlm#no#6p�(eq�rs#U�tu6vw()#G+,xy6�(Nz
623{�|}6xM��-/~��()#G��&B(�&3
k��������/���()#G��&B(�&3
���67,�3#���-3�k�!"=>���>6��
��-6�,��X�/��1U3��3Y��*+9:�y�&���� #¡¢£
¤�¥¦#/���()#/§¨UN�&3
���#«¬U�3�-=>#��9:�®�+!¯=>°±���/
²��()#/³´UN�&3
)[T�-.=>9:6p�(µ¶�·¸23)�³´/¹º6�(Nz6SM#S¯
�&B(�&3
���23 63
9Á�»¼�=>»½�G3)�45�-iV/¾6�¿6À*+GT3©ª
�ÃÄÅ6B(ÆÇ����k6S3)[T�-.#9:
#�!¯/ÈÉ=>ÊË6xM����()#�>6Ì[T�®�ÍÀ6xM��~
#G²���*µ¶�#¤#U�����#����
�3)��-iV/ÏÐ�(Ñ
Is there any recommended Python library or code to deal with the aforesaid problem ?
This is my first time to ask question on this platform. Please help......
You can solve your problem by using tika-python library ;
You can also try this code :
import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('fileName.pdf')
print(parsed["metadata"])
print(parsed["content"])
I'm using this program, and all the tweets that I'm getting are like this"because it is in Arabic Language":
"text": "\\u0637\\u0627\\u0644\\u0628\\u0629 \\u062c\\u0633\\u0645\\u0647\\u0627 \\u062c\\u0628\\u0627\\u0631 \\u062a\\u062a\\u062e\\u062f \\u0645\\u0646 \\u0627\\u0644\\u0634\\u0627\\u0631\\u0639 \\u0648 \\u062a\\u062a\\u0646\\u0627\\u0643..\\n\\n\\u0633\\u0643\\u0633_\\u0627\\u062c\\u0646\\u0628\\u064a\\n\\u0645
I had a question about it and got the answer here
the question is : Where I can use ensure_ascii=False in the program so it can read the Arabic tweet correctly? I don't know in which place I need to copy it.
You need to modify twitter_search.py
Replace all
json.dump(<something>,fd)
For
json.dump(<something>,fd,ensure_ascii=False)
You'll also need to replace all <file_descriptor> for utf-8 ones
import codecs
...
...
fd = codecs.open("/tmp/lol", "w", "utf-8")
If you're processing the results with python another approach would be to unescape the ascii string.
s='\\u0637\\u0627\\u0644\\u0628\\u0629...'
print s.encode("utf-8").decode('unicode_escape')
I'm new to Python and am writing my first python script. I have made good progress, but I am having trouble handling the response from a web service. Here is some code that will get a sample response:
import urllib
import json
urlstring = 'http://geonb-t.snb.ca/arcgis/rest/services/Utilities/Geometry/GeometryServer/project?inSR=2219&outSR=2953&geometries=674728.283,5319788.292&transformation=1841&transformForward=TRUE&f=json'
ro1 = urllib.urlopen(urlstring)
ro2 = ro1.read()
print ro2
Sample response:
{"geometries":[{"x":2488268.7116061845,"y":7667607.8963871095}]}
The web service response looks like a Python dictionary, but when I save it I get a string. How do I read this response into a Python list or dictionary? I need to extract the 'x' and 'y' values. I am working in Python 2.6.5.
It's a JSON string. Use the json module to parse it, as in
json.loads(ro2)
I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.
import urllib2
result = urllib2.urlopen("https://graph.facebook.com/163146530455639")
rawdata = result.read().decode('utf-8')
print "HEADER: " + str(result.info())
print "I want this to work ", rawdata.find('http://www.facebook.com')
print "I dont want this to work ", rawdata.find('http:\/\/www.facebook.com')
I guess what im getting isnt utf-8 even though the header seems to say it is. Or as a newbie to Python im doing something dumb. :(
Thanks for any help,
Phil
You're getting JSON back from Facebook, so the easiest thing to do is use the built in json module to decode it (provided you're using Python 2.6+, otherwise you'll have to install).
import json
import urllib2
result = urllib2.urlopen("https://graph.facebook.com/163146530455639")
rawdata = result.read()
jsondata = json.load(rawdata)
print jsondata['link']
gives you:
u'http://www.facebook.com/GrosvenorCafe'
I capture screen of my pygame program like this
data = pygame.image.tostring(pygame.display.get_surface(),"RGB")
How can I convert it into base64 string? (WITHOUT having to save it to HDD). Its important that there is no saving to HDD. I know I can save it to a file and then just encode the file to base64 but I cant seem to encode "on the fly"
thanks
If you want, you can save it to a StringIO, which is basically a virtual file stored as a string.
However, I'd really recommend using the base64 module, which has a method called base64.b64encode. It handles your 'on the fly' requirement well.
Code example:
import base64
data = pygame.image.tostring(pygame.display.get_surface(),"RGB")
base64data = base64.b64encode(data)
Happy coding!
Actually - pygame.image.tostring() is a pretty strange function (really dont understand the binary string it returns, I cant find anythin that can process it right).
There seems to be an enhancement issue on this at pygame bitbucket:
(https://bitbucket.org/pygame/pygame/issue/48/add-optional-format-argument-to)
I got around it like this:
data = cStringIO.StringIO()
pygame.image.save(pygame.display.get_surface(), data)
data = base64.b64encode(data.getvalue())
So in the end you get the valid and RIGHT base64 string. And it seems to work. Not sure about the format yet tho, will add more info tmrw