I am using the Python mysql-connector module to insert unicode character point 128049 (U+1F431) into a mariaDB sql table.
My SQL table is defined as:
show create table t1;
CREATE TABLE `t1` (
`c1` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
And the python code is:
import mysql.connector as db
conn = db.connect(sql_mode = 'STRICT_ALL_TABLES')
curs = conn.cursor(prepared = True)
curs.execute('insert into t1 (c1) values(%)', chr(128049))
Since this is a plane 1 unicode value it needs 4 bytes, but changing the table and column to utf8mb4 as suggested here didn't work.
The error I'm getting is:
Incorrect string value: '\xF0\x9F\x90\xB1' for column 'c1' at row 1
The string being inserted looks correct when compared to:
chr(128049).encode('utf-8')
The sql_mode for this version of mariadb is not strict by default. While the insert works when I do not specify strict mode, the characters are converted to the default '?' character.
I can't figure out how why SQL thinks this is an invalid string.
I am connecting to mariadb 10.1.9 via mysql-connector 2.1.4 in python 3.6.1.
The connection needs to specify utf8mb4. Or SET NAMES utf8mb4. This is to specify the encoding of the client's bytes.
🐱 is a 4-byte Emoji.
More Python tips: http://mysql.rjweb.org/doc.php/charcoll#python
Rick James answer is correct. From that I was able to create a solution that worked for me.
SET NAMES 'utf8mb4';
Sets 3 global variables as seen here. The only issue is this only sets session variables so you have to issue this command for every connection.
It doesn't appear possible to set those 3 variables in the mysqld group of the my.cnf file (I believe this is because they can not be set at the command line. Note the missing command line detail in the definitions here)
Instead I set the init_file option in the mysqld group of the my.cnf options file.
[mysqld]
init_file=/path/to/file.sql
Within that file I set the 3 variables:
set ##global.character_set_client='utf8mb4';
set ##global.character_set_connection='utf8mb4';
set ##global.character_set_results='utf8mb4';
Setting these globally forced the session variables to the same value. Problem solved.
Related
I have a preexisting database which I am trying to access. I have already ran the command
python manage.py makemigrations dashboard
and
python manage.py migrate
However, I am getting an error when trying to migrate:
Unable to create the django_migrations table (ORA-2000: missing ALWAYS keyword)
(Just a side note: there's no ORA-2000; ORA-02000 is its error code).
Error you got sounds as if you are trying to create a table which uses an identity column. However, database version doesn't support it. As identity columns were introduced in 12c, you're probably using 11g or lower.
SQL> select * From v$version where rownum = 1;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
SQL> create table test
2 (id number generated by default on null as identity);
(id number generated by default on null as identity)
*
ERROR at line 2:
ORA-02000: missing ALWAYS keyword
SQL>
What to do? Either use a higher database version, or don't try to create an identity column. If you choose the second option, in those "lower" database versions the same effect ("autoincrementing") was achieved by a sequence and a database trigger.
Finally: how are question title and question text related? Title says that there's an invalid identifier. If that bothers you as well, that error usually means that you're referencing a column (ID) in a table (TOOL_WEBPAGE), but - there's no such a column in that table. Hint is: letter case (lower? Mixed? Should be uppercase).
I have a string that looks like this 🔴Use O Mozilla Que Não Trava! Testei! $vip ou $apoio
When I try to save it to my database with ...SET description = %s... and cursor.execute(sql, description) it gives me an error
Warning: (1366, "Incorrect string value: '\xF0\x9F\x94\xB4Us...' for column 'description' ...
Assuming this is an ASCII symbol, I tried description.decode('ascii') but this leads to
'str' object has no attribute 'decode'
How can I determine what encoding it is and how could I store anything like that to the database? The database is utf-8 encoded if that is important.
I am using Python3 and PyMySQL.
Any hints appreciated!
First, you need to make sure the table column has correct character set setting. If it is "latin1" you will not be able to store content that contains Unicode characters.
You can use following query to determine the column character set:
SELECT CHARACTER_SET_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='your_database_name' AND TABLE_NAME='your_table_name' AND COLUMN_NAME='description'
Following Mysql document here if you want to change column character set.
Also, you need to make sure character set is properly configured for Mysql connection. Quoted from Mysql doc:
Character set issues affect not only data storage, but also
communication between client programs and the MySQL server. If you
want the client program to communicate with the server using a
character set different from the default, you'll need to indicate
which one. For example, to use the utf8 Unicode character set, issue
this statement after connecting to the server:
SET NAMES 'utf8';
Once character set setting is correct, you will be able to execute your sql statement. There is no need to encode / decode in Python side. That is used for different purposes.
I have an sql server database hosted on Azure. I have put a string in the database with smart quotes('“test”'). I can connect to it and run a simple query:
import pymssql
import json
conn = pymssql.connect(
server='coconut.database.windows.net',
user='kingfish#coconut',
password='********',
database='coconut',
charset='UTF-8',
)
sql = """
SELECT * FROM messages WHERE id = '548a72cc-f584-7e21-2725-fe4dd594982f'
"""
cursor = conn.cursor()
cursor.execute(sql)
row = cursor.fetchone()
json.dumps(row[3])
When I run this query on my Mac (macOS 10.11.6, Python 3.4.4, pymssql 2.1.3) I get back the string:
"\u201ctest\u201d"
This is correctly interpreted as smart quotes and displays properly.
When I run this query on an Azure web deployment (Python 3.4, Azure App service) I get back a different (and incorrect) encoding for that same string:
"\u0093test\u0094"
I specified the charset as 'UTF-8' on the pymssql connection. Why does the Windows/Azure environment get back a different charset?
(note: I have put the pre-built binary pymssql-2.1.3-cp34-none-win32.whl in the wheelhouse of my project repo on Azure. This is the same as the pymssql pre-built binary pymssql-2.1.3-cp34-cp34m-win32.whl on PyPI only I had to rename the 'cp34m' to 'none' to convince pip to install it.)
According to your description, I think it seems that the issue was caused by the default charset encoding of the SQL Database on Azure. For verifing my thought, I did some testing below in Python 3.
The default charset encoding of SQL Database on Azure is Windows-1252 (CP-1252).
SQL Server Collation Support
The default database collation used by Microsoft Azure SQL Database is SQL_LATIN1_GENERAL_CP1_CI_AS, where LATIN1_GENERAL is English (United States), CP1 is code page 1252, CI is case-insensitive, and AS is accent-sensitive. It is not possible to alter the collation for V12 databases. For more information about how to set the collation, see COLLATE (Transact-SQL).
>>> u"\u201c".encode('cp1252')
b'\x93'
>>> u"\u201d".encode('cp1252')
b'\x94'
As the code above shown, the \u0093 & \u0094 can be got via encode \u201c & \u201d.
And,
>>> u"\u0093".encode('utf-8')
b'\xc2\x93'
>>> u"\u0093".encode('utf-8').decode('cp1252')[1]
'“' # It's `\u201c`
>>> u"\u201c" == u"\u0093".encode('utf-8').decode('cp1252')[1]
True
So I think the charset encoding of your current SQL Database for data storage is Latin-1, not UTF-8, when you created the SQL Database, as the figure below, the default property Collation on Azure portal is SQL_Latin1_General_CP1_CI_AS. Please try to use the other collation support UTF-8 instead of the default one.
I ended up recasting the column type from VARCHAR to NVARCHAR. This solved my problem, characters are correctly interpreted, regardless of platform.
There is one row in Mysql table as following:
1000, Intel® Rapid Storage Technology
The table's charset='utf8' when was created.
When I used python code to read it, it become the following:
Intel® Management Engine Firmware
My python code as following:
db = MySQLdb.connect(db,user,passwd,dbName,port,charset='utf8')
The weird thing was that when I removed the charset='utf8' as following:
db = MySQLdb.connect(db,user,passwd,dbName,port), the result become correct.
Why when I indicated charset='utf8' in my code, but got wrong result please?
Have you tried leaving off the charset in the connect string and then setting afterwards?
db = MySQLdb.connect(db,user,passwd,dbName,port)
db.set_character_set('utf8')
When trying to use utf8/utf8mb4, if you see Mojibake, check the following.
This discussion also applies to Double Encoding, which is not necessarily visible.
The bytes to be stored need to be utf8-encoded.
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4.
The column needs to be declared CHARACTER SET utf8 (or utf8mb4).
HTML should start with <meta charset=UTF-8>.
See also Python notes
I have a table systesttab that contains a few columns. One of those columns is of type CLOB and it's supposed to hold a string of a base64 encoded image.
CREATE TABLE systesttab(
...
f_picture CLOB DEFAULT ' ' NOT NULL,
...
)
However, when I try to update the table with a large base64 string (over 100k character), it fails and my python application crashes (even when put in a try...except block).
UPDATE systesttab SET f_picture = ' ...'
I have even tried to cast the value to clob:
UPDATE systesttab SET f_picture = TO_CLOB(' ...')
But all I get is this error:
Input string too long, limit 8192
Now, I guess that this is trying to tell me something about the chunk size, but it's not really helpful to me.
How can I update the table with a single statement?
Do I have to declare the table's create statement differently?
If there is a way to get this done in a single statement, it should also work when updating multiple columns on the same table.
Environment: python 3.4 & pyodbc
I have solved this by using SQL Bindings. It seems the character limit does not apply then.
In python the statement now looks like this:
pic = ' ...'
sql = "UPDATE systesttab SET f_picture = ?"
cursor.execute( sql, [pic] )
This also works fine when updating multiple fields at the same time.