Sending UTF-8 formatted emojis from Android to Python API

Sending UTF-8 formatted emojis from Android to Python API - python

I have been trying to send emojis through post petitions to my server (python server-side) to store in a database. I get the full string and convert it to UTF-8, the problem is that some emojis are well sent and others throw an error on server-side Incorrect string value: '\\xF0\\x9F\\x8E\\xAE
I think this is because some emojis are converted to this %E2%9D%A4%EF%B8%8F on sending like ❤️, but others are converted to this %F0%9F%8E%AE like 🎮.
I have tested the petitions through postman and the red heart one works, but the others, with 4 codes don't and I see that error.
Here is some postman log capture
And here is the error from Python django API
OperationalError at /api/addcomment
(1366, "Incorrect string value: '\\xF0\\x9F\\x8E\\xAE' for column 'text' at row 1")
Django Version: 2.2.5
Exception Type: OperationalError
Exception Value:
(1366, "Incorrect string value: '\\xF0\\x9F\\x8E\\xAE' for column 'text' at row 1")
Exception Location: /var/www/vhosts/*/httpdocs/pythonvenv/lib/python3.5/site-packages/MySQLdb/connections.py in query, line 226
Python Executable: /var/www/vhosts/*/httpdocs/pythonvenv/bin/python
Python Version: 3.5.2
Python Path:
['/var/www/vhosts/*/httpdocs/pythonvenv/bin',
'/var/www/vhosts/*/httpdocs/app/app',
'/var/www/vhosts/*/httpdocs/app',
'/var/www/vhosts/*/httpdocs',
'/usr/share/passenger/helper-scripts',
'/var/www/vhosts/*/httpdocs/pythonvenv/lib/python35.zip',
'/var/www/vhosts/*/httpdocs/pythonvenv/lib/python3.5',
'/var/www/vhosts/*/httpdocs/pythonvenv/lib/python3.5/plat-x86_64-linux-gnu',
'/var/www/vhosts/*/httpdocs/pythonvenv/lib/python3.5/lib-dynload',
'/usr/lib/python3.5',
'/usr/lib/python3.5/plat-x86_64-linux-gnu',
'/var/www/vhosts/*/httpdocs/pythonvenv/lib/python3.5/site-packages']
I have changed the original URL with *
For more info, in phpmyadmin i cannot insert those emojis either (the 4 codes ones like the gamepad) on SQL or insert tab, but i can insert the 6 codes ones like the red heart on SQL or insert tab. I have tried several utf8 and utf8mb4 collations for both column and table.
This happens when inserting an emoji with db, table and column set to utf8mb4 or not
Any help? Thanks!

Both of these need to be set to utf8mb4:
The column charset
The database connection charset
The first one determines what strings can be stored in the column. The second determines the character set for string literals. (Oddly, if you put a 4-byte UTF-8 sequence in a string literal, MySQL can still think it's "3-byte utf8" and doesn't give an error until you try to use it)
To find if the database connection charset is the problem, you can try setting the character set on the string literal explicitly. If this works, the column encoding is fine, but the connection isn't:
insert into demo_table set `text` = _utf8mb4'🎮';
You seem to be using Django. I don't know much about Django but it looks like the connection encoding is set somewhere in the database connection options. Going by https://chriskief.com/2017/06/18/django-and-mysql-emoticons/ :
DATABASES = {
'default': {
'ENGINE':'django.db.backends.mysql',
...
'OPTIONS': {'charset': 'utf8mb4'},
}
}

Related

Best practises when inserting a json variable into a MySQL table column of type json, using Python's pymysql library

I have a Python script, that's using PyMySQL to connect to a MySQL database, and insert rows in there. Some of the columns in the database table are of type json.
I know that in order to insert a json, we can run something like:
my_json = {"key" : "value"}
cursor = connection.cursor()
cursor.execute(insert_query)
"""INSERT INTO my_table (my_json_column) VALUES ('%s')""" % (json.dumps(my_json))
connection.commit()
The problem in my case is that the json is variable over which I do not have much control (it's coming from an API call to a third party endpoint), so my script keeps throwing new error for non-valid json variables.
For example, the json could very well contain a stringified json as a value, so my_json would look like:
{"key": "{\"key_str\":\"val_str\"}"}
→ In this case, running the usual insert script would throw a [ERROR] OperationalError: (3140, 'Invalid JSON text: "Missing a comma or \'}\' after an object member." at position 1234 in value for column \'my_table.my_json_column\'.')
Or another example are json variables that contain a single quotation mark in some of the values, something like:
{"key" : "Here goes my value with a ' quotation mark"}
→ In this case, the usual insert script returns an error similar to the below one, unless I manually escape those single quotation marks in the script by replacing them.
[ERROR] ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'key': 'Here goes my value with a ' quotation mark' at line 1")
So my question is the following:
Are there any best practices that I might be missing on, and that I can use in order to avoid my script breaking, in the 2 scenarios mentioned above, but also in any other potential examples of jsons that might break the insert query ?
I read some existing posts like this one here or this one, where it's recommended to insert the json into a string or a blob column, but I'm not sure if that's a good practice / if other issues (like string length limitations for example) might arise from using a string column instead of json.
Thanks !

Flask SQLAlchemy can't insert emoji to MySQL

I'm using Python 2.7 and flask framework with flask-sqlalchemy module.
I always get the following exception when trying to insert : Exception Type: OperationalError. Exception Value: (1366, "Incorrect string value: \xF09...
I already set MySQL database, table and corresponding column to utf8mb4_general_ci and I can insert emoji string using terminal.
Flask's app config already contains app.config['MYSQL_DATABASE_CHARSET'] = 'utf8mb4', however it doesn't help at all and I still get the exception.
Any help is appreciated

maybe this will someone in the future:
all i did was edit the sql connection in my config file:
SQLALCHEMY_DATABASE_URI = 'mysql://user:password#localhost/database?charset=utf8mb4'
this way i'm able to store emojis without any altering of the database or tables.
source: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-iv-database/page/13%0A Comment #322

It works for me:
import pickle
data = request.get_json().get("data")
data = pickle.dumps(data)
Then you can input the "data" to the database .
You can send the "data" like "😢" ...Whatever emoji you like.
Next time, when you get the "data" from the database :
you should :
data = pickle.loads(data)
then you can get "data" as "😢"

Add config file main file and set set 'charset' => 'utf8mb4'
you have to edit field in which you want to store emoji and set collation as utf8mb4_unicode_ci

Make sure to use a proper Python Unicode object, like the ones created with the u"..." literal. In other words, the type of your object should be unicode not str:
>>> type('ą')
<type 'str'>
>>> type(u'ą')
<type 'unicode'>
Please note that this only applies to Python 2, in Python 3 all string literals are Unicode by default.

Trying to save special characters in MySQL DB

I have a string that looks like this 🔴Use O Mozilla Que Não Trava! Testei! $vip ou $apoio
When I try to save it to my database with ...SET description = %s... and cursor.execute(sql, description) it gives me an error
Warning: (1366, "Incorrect string value: '\xF0\x9F\x94\xB4Us...' for column 'description' ...
Assuming this is an ASCII symbol, I tried description.decode('ascii') but this leads to
'str' object has no attribute 'decode'
How can I determine what encoding it is and how could I store anything like that to the database? The database is utf-8 encoded if that is important.
I am using Python3 and PyMySQL.
Any hints appreciated!

First, you need to make sure the table column has correct character set setting. If it is "latin1" you will not be able to store content that contains Unicode characters.
You can use following query to determine the column character set:
SELECT CHARACTER_SET_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='your_database_name' AND TABLE_NAME='your_table_name' AND COLUMN_NAME='description'
Following Mysql document here if you want to change column character set.
Also, you need to make sure character set is properly configured for Mysql connection. Quoted from Mysql doc:
Character set issues affect not only data storage, but also
communication between client programs and the MySQL server. If you
want the client program to communicate with the server using a
character set different from the default, you'll need to indicate
which one. For example, to use the utf8 Unicode character set, issue
this statement after connecting to the server:
SET NAMES 'utf8';
Once character set setting is correct, you will be able to execute your sql statement. There is no need to encode / decode in Python side. That is used for different purposes.

Unable to enter data into SQL DB

I am trying to insert values to a SQL DB where I pull data from a dictionary. I ran into a problem when my program tries to enter 0xqb_QWQDrabGr7FTBREfhCLMZLw4ztx into a column named VersionId. The following is my sample code and error.
cursor.execute("""insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[LastModified],[Size(Bytes)],[VersionID]) values (%s,%s,%s,%s,%s)""",(item['Key'],item['IsLatest'],item['LastModified'],item['Size'],item['VersionId']))
conn_db.commit()
pymssql.ProgrammingError: (102, "Incorrect syntax near 'qb_QWQDrabGr7FTBREfhCLMZLw4ztx'.DB-Lib error message 20018, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\n")
Based on the error I assume SQL does not like the 0x in the beginning of the VersionId string because of security issues. If my assumption is correct, what are my options? I also cannot change the value of the VersionId.
Edit: This what I get when I print that cursor command
insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[LastModified],[Size(Bytes)],[VersionID]) values (Docs/F1/Trades/Buy/Person1/Seller_Provided_-_Raw_Data/GTF/PDF/GTF's_v2/NID3154229_23351201.pdf,True,2015-07-22 22:05:38+00:00,753854,0xqb_QWQDrabGr7FTBREfhCLMZLw4ztx)
Edit 2: The odd thing is that when I try to enter the insert command manually on SQL management studio, it doesn't like the (') in the path name in the first parameter, so I escaped the character, added (') to each values except the number and the command worked. At this point I am pretty stumped on why the insert is not working.
Edit 3: I decided to do a try except on every insert and I see that the ones that VersionIds that get caught have the pattern 0x..... Again, does anyone know if my assumption of security correct?

I guess that's what happens when our libraries try to be smarter than us...
No SQL server around to test, but I assume the reason the 0x values are failing is because the way pymssql passes the parameter causes the server to interprete this as a hexadecimal string and the 'q' following the '0x' does not fit its expectations of 0-9 and A-F chars.
I don't have enough information to know if this is a library bug and/or if it can be worked around; the pymssql documentation is not very extensive, but I would try the following:
if you can, check in MSSQL Profiler what command is actually coming in
build your own command as a string and see if the error persists (remember Bobby Tables before putting that in production, though: https://xkcd.com/327/)
try to work around it by adding quotes etc
swith to another library / use SQLAlchemy

How to read national characters (>127) from US7ASCII Oracle using Python cx_Oracle?

I have problem with displaying national characters from “ENGLISH_UNITED KINGDOM.US7ASCII” Oracle 11 database using Python 3.3 cx_Oracle 5.1.2 and "NLS_LANG" environment variable.
Db table column type is "VARCHAR2(2000 BYTE)"
How to display string "£aÀÁÂÃÄÅÆÇÈ" from Oracle US7ASCII in Python? This will be some sort of hack.
The hank works in every other scripting language Perl, PHP, PL/SQL and in Python 2.7, but it does not work in Python 3.3.
In Oracle 11 Database I created SECURITY_HINTS.ANSWER="£aÀÁÂÃÄÅÆÇÈ". ANSWER column type is "VARCHAR2(2000 BYTE)".
Now when using cx_Oracle and default NLS_LANG, I get "¿a¿¿¿¿¿¿¿¿¿"
and when using NLS_LANG="ENGLISH_UNITED KINGDOM.US7ASCII" I get
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)"
Update1
I made some progress. When switching to Python 2.7 and cx_Oracle 5.1.2 for Python 2.7 the problem goes away (I get all >127 characters from db). In Python 2 strings are represented as bytes and in Python 3+ strings are represented as unicode. I still need best possible solution for Python 3.3.
Update2
One possible solution to the problem is to used rawtohex(utl_raw.cast_to_raw see code below.
cursor.execute("select rawtohex(utl_raw.cast_to_raw(ANSWER)) from security_hints where userid = '...'")
for rawValue in cursor:
print (''.join(['%c' % iterating_var for iterating_var in binascii.unhexlify(rawValue[0])]))
source code of my script is below or at GitHub and GitHub Sollution
def test_nls(nls_lang=None):
print (">>> run test_nls for %s" %(nls_lang))
if nls_lang:
os.environ["NLS_LANG"] = nls_lang
os.environ["ORA_NCHAR_LITERAL_REPLACE"] = "TRUE"
connection = get_connection()
cursor = connection.cursor()
print("version=%s\nencoding=%s\tnencoding=%s\tmaxBytesPerCharacter=%s" %(connection.version, connection.encoding,
connection.nencoding, connection.maxBytesPerCharacter))
cursor.execute("SELECT USERENV ('language') FROM DUAL")
for result in cursor:
print("%s" %(result))
cursor.execute("select ANSWER from SECURITY_HINTS where USERID = '...'")
for rawValue in cursor:
print("query returned [%s]" % (rawValue))
answer = rawValue[0]
str = ""
for iterating_var in answer:
str = ("%s [%d]" % (str, ord(iterating_var)))
print ("str %s" %(str))
cursor.close()
connection.close()
if __name__ == '__main__':
test_nls()
test_nls(".AL32UTF8")
test_nls("ENGLISH_UNITED KINGDOM.US7ASCII")
see log output below.
run test_nls for None
version=11.1.0.7.0
encoding=WINDOWS-1252 nencoding=WINDOWS-1252 maxBytesPerCharacter=1
ENGLISH_UNITED KINGDOM.US7ASCII
query returned [¿a¿¿¿¿¿¿¿¿¿]
str [191] [97] [191] [191] [191] [191] [191] [191] [191] [191] [191
run test_nls for .AL32UTF8
version=11.1.0.7.0
encoding=UTF-8 nencoding=UTF-8 maxBytesPerCharacter=4
AMERICAN_AMERICA.US7ASCII
query returned [�a���������]
str [65533] [97] [65533] [65533] [65533] [65533] [65533] [65533] [65533] [65533] [65533]
run test_nls for ENGLISH_UNITED KINGDOM.US7ASCII
version=11.1.0.7.0
encoding=US-ASCII nencoding=US-ASCII maxBytesPerCharacter=1
ENGLISH_UNITED KINGDOM.US7ASCII
Traceback (most recent call last):
File "C:/dev/tmp/Python_US7ASCII_cx_Oracle/showUS7ASCII.py", line 71, in <module>
test_nls("ENGLISH_UNITED KINGDOM.US7ASCII")
File "C:/dev/tmp/Python_US7ASCII_cx_Oracle/showUS7ASCII.py", line 55, in test_nls
for rawValue in cursor:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)
I am trying to Display it in Django Web page. But each character comes as character with code 191 or 65533.
I looked at
choosing NLS_LANG for Oracle and
Importing from Oracle using the correct encoding with Python
Cannot Insert Unicode Using cx-Oracle

If you want to get unchanged ASCII string in client application, the best way is transfer it from DB in binary mode. So, first conversion must be down on server side with help of UTL_RAW package and standard rawtohex function.
Your select in cursor.execute may look like that:
select rawtohex(utl_raw.cast_to_raw(ANSWER)) from SECURITY_HINTS where USERID = '...'
On the client you got a string of hexadecimal characters which may be converted to a string representation with help of binascii.unhexlify function:
for rawValue in cursor:
print("query returned [%s]" % (binascii.unhexlify(rawValue)))
P.S. I didn't know a Python language, so last statement may be incorrect.

I think you should not revert to such evil trickery. NLS_LANG should simply be set to the client's default encoding. Look at more solid options:
Extend the character set of the database to allow these characters in a VARCHAR column.
Upgrade this particular column to NVARCHAR. You could perhaps use a new name for this column and create a VARCHAR computed column with the old name for the legacy applications to read.
Keep the database as is but check the data when it gets entered and replace all non-ASCII characters with an acceptable ASCII equivalent.
Which option is best depends on how common the non-ASCII characters are. If there's more tables with the same issue, I'd suggest option 1. If this is the only table, option 2. If there are only a couple non-ASCII characters in the entire table, and their loss is not that big a deal: option 3.
One of the tasks of a database is to preserve the quality of your data after all, and if you cheat when forcibly inserting illegal characters into the column, it cannot do its job properly and each new client or upgrade or export will come with interesting new undefined behavior.
EDIT: See Oracle's comment on an example of a similar setup in the NLS_LANG faq (my emphasis):
A database is created on a UNIX system with the US7ASCII character
set. A Windows client connecting to the database works with the
WE8MSWIN1252 character set (regional settings -> Western Europe /ACP
1252) and the DBA, use the UNIX shell (ROMAN8) to work on the
database. The NLS_LANG is set to american_america.US7ASCII on the
clients and the server.
Note:
This is an INCORRECT setup to explain character set conversion, don't
use it in your environment!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.