Convert string to correct charset

Convert string to correct charset - python

Im trying to save unicode data to an external webservice.
When I try to save æ-ø-å, it get saved as Ã¦-Ã¸-Ã¥ in the external system.
Edit:
(My firstname value is Jørn) (Value from django J\\xf8rn)
firstname.value=user_firstname = JÃ¸rn
Here is my result if I try to use encode:
firstname.value=user_firstname.encode('ascii', 'replace') = J?rn
firstname.value=user_firstname.encode('ascii', 'xmlcharrefreplace') = Jørn
firstname.value=user_firstname.encode('ascii', 'backslashreplace') = J\xf8rn
firstname.value=user_firstname.encode('ascii', 'ignore') = I get a unicode error using ignore.
My form for updating a user:
def show_userform(request):
if request.method == 'POST':
form = UserForm(request.POST, request.user)
if form.is_valid():
u = UserProfile.objects.get(username = request.user)
firstname = form.cleaned_data['first_name']
lastname = form.cleaned_data['last_name']
tasks.update_webservice.delay(user_firstname=firstname, user_lastname=lastname)
return HttpResponseRedirect('/thank-you/')
else:
form = UserForm(instance=request.user) # An unbound form
return render(request, 'myapp/form.html', {
'form': form,
})
Here is my task:
from suds.client import Client
#task()
def update_webservice(user_firstname, user_lastname):
membermap = client.factory.create('ns2:Map')
firstname = client.factory.create('ns2:mapItem')
firstname.key="Firstname"
firstname.value=user_firstname
lastname = client.factory.create('ns2:mapItem')
lastname.key="Lastname"
lastname.value=user_lastname
membermap.item.append(firstname)
membermap.item.append(lastname)
d = dict(CustomerId='xxx', Password='xxx', PersonId='xxx', ContactData=membermap)
try:
#Send updates to SetPerson function
result = client.service.SetPerson(**d)
except WebFault, e:
print e
What do I need to do, to make the data saved correctly?

Your external system is interpreting your UTF-8 as if it were Latin-1, or maybe Windows-1252. That's bad.
Encoding or decoding ASCII is not going to help. Your string is definitely not plain ASCII.
If you're lucky, it's just that you're missing some option in that web service's API, with which you could tell it that you're sending it UTF-8.
If not, you've got quite a maintenance headache on your hands, but you can still fix what you get back. The web service took the string you encoded as UTF-8 and decoded it as Latin-1, so you just need to do the exact reverse of that:
user_firstname = user_firstname.encode('latin-1').decode('utf-8')

Use decode and encode methods for str type.
for example :
x = "this is a test" # ascii encode
x = x.encode("utf-8") # utf-8 encoded
x = x.decode("utf-8") # ascii encoded

Related

Why serializing a QuerySet I iobtain a string?

I'm using a js function to obtain some data from my django models. Concretely, I want to obtain the last value from my sensors.
I'm doing the following,
from django.core import serializers
def getData(request):
ctx = {}
if request.method == 'POST':
select = int(request.POST['Select'])
last_val = DevData.objects.order_by('dev_id','-data_timestamp').distinct('dev_id')
data = serializers.serialize('json', last_val)
print(data)
print('****************')
print(data[0]) # I just obtain a "[" then is a string not a list
ctx = {'Select':data}
return JsonResponse(ctx)
My question is, why the output is a string? How can I convert it to a Json object and then pass it to my js function?
Thank you very much!!

You obtain a string, because JSON is a text format. You can for example use json.loads to convert it back to a list of dictionaries:
from json import loads as jsonloads
from django.core import serializers
def getData(request):
ctx = {}
if request.method == 'POST':
select = int(request.POST['Select'])
last_val = DevData.objects.order_by('dev_id','-data_timestamp').distinct('dev_id')
data = jsonloads(serializers.serialize('json', last_val))
ctx = {'Select':data}
return JsonResponse(ctx)
The JSON serialization in Django is just a special JsonEncoder named DjangoJSONEncoder [GitHub], that has some special cases for a datetime object, etc.

Uploading Image in Django: 'utf-8' codec can't decode byte 0x89 in position 246: invalid start byte

I'm trying to upload images in my api using pure json, but when i tried to upload image in request.Files and the token in request.body, I face this error:
UnicodeDecodeError at /api/auth/set-profile-image/
'utf-8' codec can't decode byte 0x89 in position 246: invalid start byte
and it says:
Unicode error hint
The string that could not be encoded/decoded was: " �PNG
but i'm sending JPG! :D
View.py
#csrf_exempt
def set_profile_image(request):
if request.method == 'POST':
request_info = json.loads(request.body)
token = request_info.get('token')
img = request.FILES['image']
if token is not None and img is not None:
user = User.objects.filter(token=token)
if user.exists():
user = user.get()
form = UploadImage(request.POST, request.FILES['image'])
if form.is_valid():
user.profile_image = form.cleaned_data['image']
user.save()
response = {
'status_code': 200,
'image_set': True,
'image_url': user.profile_image.url
}
Form.py
from django import forms
class UploadImage(forms.Form):
image = forms.ImageField()
and my test code is:
import requests
data = {
'token': 'helloworld1--_aFsV-ZVG9lVpi0KSydrx3pG3TSMPqqHVKWD2Yc8bE'
}
url = 'http://localhost:8000/api/auth/set-profile-image/'
with open('test.jpg', 'rb') as img:
response = requests.post(
url=url,
data=data,
files={
'image': img
}
)
# print(response.text)
f = open('test.html', 'wb')
f.write(response.content)
f.close()
I'm using ImageField as my db field in models.py
Warmest Regards,
Poor Maintenance guy who's stuck in others code!

Images are going to contain data which is not safe to represent in UTF-8. If you need to transport image data this way, you should encode it into something else first which is safe in UTF-8, such as base64.
To do this in your Python sender, you can use something like base64.b64encode(img)
And to decode it again on the other side, you will need to use base64.b64decode(img)

Ok after some tests and thanks to Jacob See's answer, i found a solution:
First I encode my image to base64 and sent it through the API like this:
with open('test.jpg', 'rb') as img:
data = {
"token": "some token"
}
response = requests.post(
url=url,
data={
'data': json.dumps(data),
'image': b64encode(img.read())
}
)
And in my API i do this:
new_base64_img = request.POST.get('image', None)
try:
image = Image.open(BytesIO(b64decode(new_base64_img)))
image_content = ContentFile(b64decode(new_base64_img))
file_name = "some file name" + image.format
user.profile_image.save(file_name, image_content, save=True)
return JsonResponse({"OK"})
except IOError:
return JsonResponse({"Image should be base64 encoded!"})
Problem Solved :D

Decode 'quoted-printable' in python

I want to decode 'quoted-printable' encoded strings in Python, but I seem to be stuck at a point.
I fetch certain mails from my gmail account based on the following code:
import imaplib
import email
import quopri
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mail#gmail.com', '*******')
mail.list()
mail.select('"[Gmail]/All Mail"')
typ, data = mail.search(None, 'SUBJECT', '"{}"'.format('123456'))
data[0].split()
print(data[0].split())
for e_mail in data[0].split():
typ, data = mail.fetch('{}'.format(e_mail.decode()),'(RFC822)')
raw_mail = data[0][1]
email_message = email.message_from_bytes(raw_mail)
if email_message.is_multipart():
for part in email_message.walk():
if part.get_content_type() == 'text/plain':
if part.get_content_type() == 'text/plain':
body = part.get_payload()
to = email_message['To']
utf = quopri.decodestring(to)
text = utf.decode('utf-8')
print(text)
.
.
.
If I print 'to' for example, the result is this if the 'to' has characters like é,á,ó...:
=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=
I can decode the 'body' quoted-printable encoded string successfully using the quopri library as such:
quopri.decodestring(sometext).decode('utf-8')
But the same logic doesn't work for other parts of the e-mail, such as the to, from, subject.
Anyone knows a hint?

The subject string you have is not pure quoted printable encoding (i.e. not standard quopri) — it is a mixture of base64 and quoted printable. You can decode it with the standard library:
from email.header import decode_header
result = decode_header('=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=')
# ^ the result is a list of tuples of the form [(decoded_bytes, encoding),]
for data, encoding in result:
print(data.decode(encoding))
# outputs: Péter Petőcz

You are trying to decode latin characters using utf-8. The output you are getting is base64. It reads:
No printable characters found, try another source charset, or upload your data as a file for binary decoding.
Give this a try.
Python: Converting from ISO-8859-1/latin1 to UTF-8

This solves it:
from email.header import decode_header
def mail_header_decoder(header):
if header != None:
mail_header_decoded = decode_header(header)
l=[]
header_new=[]
for header_part in mail_header_decoded:
l.append(header_part[1])
if all(item == None for item in l):
# print(header)
return header
else:
for header_part in mail_header_decoded:
header_new.append(header_part[0].decode())
header_new = ''.join(header_new) # convert list to string
# print(header_new)
return header_new

python unicode #-- coding: utf-8 -- not doing the job...where to encode?I think I need to remove str and encode

Hello this is what I want to do; I simply want to switch a word,"with" here with non-english, translation of the word "with".
return "<a href='%(verify_read)s?next=%(target_url)s'>%(sender)s %(verb)s %(target)s with %(action)s</a>" %context
I'm unable to use other non-english in the return. I don't know why...I have python file, I'm using django, and have put # -- coding: utf-8 --
at the top of my python file
This is my full code
def __unicode__(self):
target_url = self.target_object.get_absolute_url()
context = {
"sender":self.sender_object,
"verb":self.verb,
"action":self.action_object,
"target":self.target_object,
"verify_read": reverse("notifications_read", kwargs={"id": self.id}),
"target_url":target_url,
}
if self.target_object:
if self.action_object and target_url:
return "%(sender)s %(verb)s <a href='%(verify_read)s?next=%(target_url)s'>%(target)s</a> with %(action)s" %context
if self.action_object and not target_url:
return "%(sender)s %(verb)s %(target)s with %(action)s" %context
return "%(sender)s %(verb)s %(target)s" %context
return "%(sender)s %(verb)s" %context
#property
def get_link(self):
try:
target_url = self.target_object.get_absolute_url()
except:
target_url = reverse("notifications_all")
context = {
"sender": self.sender_object,
"verb": self.verb,
"action": self.action_object,
"target": self.target_object,
"verify_read": reverse("notifications_read", kwargs={"id": self.id}),
"target_url": target_url,
}
if self.target_object:
return "<a href='%(verify_read)s?next=%(target_url)s'>%(sender)s %(verb)s %(target)s with %(action)s</a>" %context
else:
return "<a href='%(verify_read)s?next=%(target_url)s'>%(sender)s %(verb)s</a>" %context
Edit:I had similar problem, and I solved it by removing string and encode
#login_required
def get_notifications_ajax(request):
if request.is_ajax() and request.method == "POST":
notifications = Notification.objects.all_for_user(MyProfile.objects.get(user=request.user)).recent()
count = notifications.count()
notes = []
for note in notifications:
notes.append(note.get_link.encode('utf-8'))
data = {
"notifications": notes,
"count": count,
}
print data
json_data = json.dumps(data)
print json_data
return HttpResponse(json_data, content_type='application/json')
else:
raise Http404
I encoded this line;
notes.append(note.get_link.encode('utf-8'))
I think I need to do similar thing but don't know

The __unicode__() magic method MUST return a unicode string (an instance of the unicode type), but you are returning a byte string (instance of the str type).
Adding the "# coding" mark on top of you code won't turn byte strings into unicode ones, it will only tells Python that your byte strings litterals are utf-8 encoded - but they are still byte strings.
The solution is dead simple: make sure you return a unicode string. First make sure each and every string in your context is unicode, then make all your litteral strings unicode too by prefixing them with a u, ie:
return u"%(sender)s %(verb)s <a href='%(verify_read)s?next=%(target_url)s'>%(target)s</a> with %(action)s" % context
# ...
return u"%(sender)s %(verb)s %(target)s with %(action)s" % context
# ...
return u"%(sender)s %(verb)s %(target)s" % context
# ...
return u"%(sender)s %(verb)s" % context
If you don't grasp the difference between a unicode string and a utf-8 encoded byte string, you definitly want to read this : http://www.joelonsoftware.com/articles/Unicode.html

python URL handling issue

I am trying to write a python web app that will take some sql and a bunch of other things and return a Json file, the latter part is not the issue and I have not even put it in the script yet, the issue is that the url being passed is being utf-8 encoded and then url encoded
turning our example
query :SELECT + ;
test: 2
into
test=2&query=SELECT+%2B+%3B
This seems to be ok
but the receiving get seems to think that it can expand the codes back into chars
and it receives
test=2&query=SELECT+++;
then this is url decoded and it chops off the semicolon, and i want to keep the semicolon!
it also turns the +'s which are rightly spaces into spaces but the previous bug made the real plus code into a literal plus which turns it into a space!
{'test': '2', 'query': 'SELECT '}
code is as follows:
#!/usr/bin/python
import web
import psycopg2
import re
import urllib
import urlparse
urls = (
'/query', 'query',
'/data/(.*)', 'data'
)
app = web.application(urls, globals())
render = web.template.render('templates/')
class query:
def GET(self):
return render.query()
def POST(self):
i = web.input()
data = {}
data['query'] = i.sql.encode('utf-8')
data['test'] = '2'
murl = urllib.urlencode(data)
return "go!"
class data:
def GET(self, urlEncodedDict):
print "raw type:", type(urlEncodedDict)
print "raw:", urlEncodedDict
urlEncodedDict = urlEncodedDict.encode('ascii', 'ignore')
print "ascii type:", type(urlEncodedDict)
print "ascii:", urlEncodedDict
data = dict(urlparse.parse_qsl(urlEncodedDict, 1)) #bad bit
print "dict:", data
print "element:", data['query']
if ( re.match('SELECT [^;]+ ;', data['query'])):
return 'good::'+data['query']
else:
return 'Bad::'+data['query']
if __name__ == "__main__":
app.run()
Url generated from my test form is:
http://localhost:8080/data/test=2&query=SELECT+%2B+%3B
Output is as follows:
raw type: <type 'unicode'>
raw: test=2&query=SELECT+++;
ascii type: <type 'str'>
ascii: test=2&query=SELECT+++;
dict: {'test': '2', 'query': 'SELECT '}
element: SELECT
127.0.0.1:53272 - - [16/Nov/2012 11:05:44] "HTTP/1.1 GET /data/test=2&query=SELECT+++;" - 200 OK
127.0.0.1:53272 - - [16/Nov/2012 11:05:44] "HTTP/1.1 GET /favicon.ico" - 404 Not Found
I wish to get the same dict out of the get that i encode in the first place.

If you want to pass data into a GET request, you need to use the query string syntax using the question mark character [?] as a delimiter.
The URL should be:
http://localhost:8080/data/?test=2&query=SELECT+%2B+%3B
After that, you just have to use web.input() to get a dictionary with all arguments already decoded.
urls = (
'/query', 'query',
'/data/', 'data'
)
[...]
class data:
def GET(self):
data = web.input()
print "dict:", data
print "element:", data['query']
if ( re.match('SELECT [^;]+ ;', data['query'])):
return 'good::'+data['query']
else:
return 'Bad::'+data['query']
Result:
dict: <Storage {'test': u'2', 'query': u'SELECT + ;'}>
element: SELECT + ;
127.0.0.1:44761 - - [16/Nov/2012 15:06:06] "HTTP/1.1 GET /data/" - 200 OK

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert string to correct charset - python

Use decode and encode methods for str type. for example : x = "this is a test" # ascii encode x = x.encode("utf-8") # utf-8 encoded x = x.decode("utf-8") # ascii encoded

Related

Why serializing a QuerySet I iobtain a string?

Uploading Image in Django: 'utf-8' codec can't decode byte 0x89 in position 246: invalid start byte

Decode 'quoted-printable' in python

python unicode #-- coding: utf-8 -- not doing the job...where to encode?I think I need to remove str and encode

python URL handling issue

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert string to correct charset - python

Use decode and encode methods for str type. for example : x = "this is a test" # ascii encode x = x.encode("utf-8") # utf-8 encoded x = x.decode("utf-8") # ascii encoded

Related

Why serializing a QuerySet I iobtain a string?

Uploading Image in Django: 'utf-8' codec can't decode byte 0x89 in position 246: invalid start byte

Decode 'quoted-printable' in python

python unicode #-*- coding: utf-8 -*- not doing the job...where to encode?I think I need to remove str and encode

python URL handling issue

Categories

Resources

python unicode #-- coding: utf-8 -- not doing the job...where to encode?I think I need to remove str and encode