In a Python script, I'm generating a scrypt hash using a salt made up of data from os.urandom and I would like to save these in a MySQL table. If I attempt to use the standard method I've seen used for efficiently storing hashes in a database, using a CHAR column, I get "Incorrect string value:" errors for both the hash and the salt. The only data type I've been able to find that allows the random data is blob, but since blobs are stored outside the table they have obvious efficiency problems.
What is the proper way to do this? Should I do something to the data prior to INSERTing it into the db to massage it into being accepted by CHAR? Is there another MySQL datatype that would be more appropriate for this?
Edit:
Someone asked for code, so, when I do this:
salt = os.urandom(255)
hash = scrypt.hash(password,salt,1<<15,8,1,255)
cursor.execute("INSERT INTO users (email,hash,salt) values (%s,%s,%s)", [email,hash,salt])
MySQL gives me the "Incorrect string value" errors when I attempt to insert these values.
Edit 2:
As per Joran's request, here is the schema that doesn't like this:
CREATE TABLE `users` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(254) NOT NULL DEFAULT '',
`hash` char(255) NOT NULL,
`salt` char(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Your hash is a binary value that most likely will contain "unprintable characters" if it is interpreted as a string. To store arbitrary binary data, use the BINARY or VARBINARY data type.
If you have to use a string datatype, you can use base64 encoding to convert arbitrary data to an ASCII string.
Related
I'm basically building a secured online diary application with Flask. However my Python source code returns a syntax error when I try to test the app. I can't detect what's wrong with the syntax. Your help will be appreciated.
I'm attaching a screenshot of the error. And here's my SQL database's schema:
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
username TEXT NOT NULL,
hash TEXT NOT NULL
);
CREATE TABLE sqlite_sequence(name,seq);
CREATE UNIQUE INDEX username ON users (username);
CREATE TABLE diaries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
name TEXT NOT NULL,
time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
title TEXT NOT NULL,
description TEXT NOT NULL,
img_url TEXT,
FOREIGN KEY(user_id) REFERENCES users(id)
);
New error: unsupported value
It is INSERT statement that causes error.
Well, not the insert itself but the way you're using it.
Values should be passed as a tuple (values between "(" and ")")
So, you need to update db.execute line with something like that
db.execute("insert into table_name(col1, col2) values(?, ?)", (col1_val, col2_val))
UPD. regarding the error on second screenshot.
db.execute("Select...) does not return a value but a set of values.
So, you might wanted to use fetchone() as in docs
res = cur.execute('SELECT count(rowid) FROM stocks') # gets you set records
print(res.fetchone()) # get first record
Anyway, check the docs I provided you link to with.
I am trying to iterate through a JSON object and save that information into Django fields and have had pretty good success so far. However when processing data from foreign countries I am having problems ignoring special characters.
a simplified version of the code block in customers.views is below:
customer_list = getcustomers() #pulls standard JSON object
if customer_list:
for mycustomer in customer_list:
entry = Customer(pressid=mycustomer['id'],
email = mycustomer['email'],
first_name = mycustomer['first_name']
)
The code above works great... until you introduce a foreign character, say a name with non-utf-8 charset.
An example error is:
Warning at /customers/update/
Incorrect string value: '\xC4\x97dos' for column 'first_name' at row 1
I have tried adding the .encode('utf-8') to the end of strings, but I still get this error, and haven't found a way to avoid it. I am okay with truncation of data in a particular field if it uses invalid characters, but I can't make a list of all possible characters because next thing you know a new customer will use a letter I didn't know existed.
Thanks in advance for the help!
Your databes is not configurated correctly.
https://docs.djangoproject.com/en/1.7/ref/unicode/
For example table like that:
CREATE TABLE IF NOT EXISTS `api_projekt` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nazwa` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `nazwa` (`nazwa`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;
Will raise error when you try add non-ASCII character. You need to change encoding from latin1 to utf-8.
It should look:
CREATE TABLE IF NOT EXISTS `api_projekt` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nazwa` varchar(30) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `nazwa` (`nazwa`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
To fix it:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
I had a look at the unicode python documents, and found a line that appears to be solving things https://docs.python.org/2/howto/unicode.html.
I added .encode('ascii', 'ignore') instead of .encode(utf-8) and it is now working on all values.
This method truncates all unknown characters, and it is the best I could come up with.
Here is my Mysql table schema
Table: booking
Columns:
id int(11) PK AI
apt_id varchar(200)
checkin_date date
checkout_date date
price decimal(10,0)
deposit decimal(10,0)
adults int(11)
source_id int(11)
confirmationCode varchar(100)
client_id int(11)
booking_date datetime
note mediumtext
Related Tables:property (apt_id → apt_id)
booking_source (source_id → id)
I am trying to insert the value using python .so Here what I have done
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
x.execute(sql)
But while executing the above script I am getting the error .
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
TypeError: %d format: a number is required, not NoneType
I think my strings formatter are not correct Please help me out .
it looks like either booking_date, notes, source_id, (also you are inserting notes value 2x?)
is None. You could check/validate each value before inserting.
Also please use parameterized queries, NOT string formatting
Usually your SQL operations will need to use values from Python
variables. You shouldn’t assemble your query using Python’s string
operations because doing so is insecure; it makes your program
vulnerable to an SQL injection attack (see http://xkcd.com/327/ for
humorous example of what can go wrong).
Instead, use the DB-API’s parameter substitution. Put ? as a
placeholder wherever you want to use a value, and then provide a tuple
of values as the second argument to the cursor’s execute() method.
something like:
x.execute("INSERT INTO thing (test_one, test_two) VALUES (?, ?)", (python_var_one, python_var_two,))
I have a table in a MySQL Database which has this structure:
CREATE TABLE `papers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(1000) COLLATE utf8_bin DEFAULT NULL,
`booktitle` varchar(300) COLLATE utf8_bin DEFAULT NULL,
`journal` varchar(300) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `title_fulltext` (`title`),
FULLTEXT KEY `booktitle_fulltext` (`booktitle`),
FULLTEXT KEY `journal_fulltext` (`journal`)
) ENGINE=MyISAM AUTO_INCREMENT=1601769 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Now I know that in the column title, somewhere within the millions of rows, there is a row which contains the string
nFOIL: Integrating Naïve Bayes and FOIL.
I want to look for
my_string = "nFOIL: integrating Naïve Bayes and FOIL"
and find the right row. You see it has to be a case insensitive search and the dot at the end is missing in the query. How do I implement this?
I tried
SELECT id FROM papers WHERE UPPER(title) LIKE %s
and converted my_string to upper case in python and put a "%" at the end of my_string but this doesn't seam a good way of handling this. It did not work too. =)
Thanks for any suggestions!
I see you have added FULLTEXT indexes, I though you already knew about MATCH AGAINST syntax of MySQL.
You should try
SELECT id FROM papers
WHERE MATCH (title,booktitle,journal) AGAINST ('nFOIL: integrating Naïve Bayes and FOIL' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION);
Change your collate in utf8_general_ci.
In this way your searches will be case insensitive.
I'm using the MySQLdb package for interacting with MySQL. I'm having trouble getting the proper type conversions.
I am using a 16-byte binary uuid as a primary key for the table and have a mediumblob holding zlib compressed json information.
I'm using the following schema:
CREATE TABLE repositories (
added_id int auto_increment not null,
id binary(16) not null,
data mediumblob not null,
create_date int not null,
update_date int not null,
PRIMARY KEY (added_id),
UNIQUE(id)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=InnoDB;
Then I create a new row in the table using the following code:
data = zlib.compress(json.dumps({'hello':'how are you :D'})
row_id = uuid.uuid(4).hex
added_id = cursor.execute('
INSERT INTO repositories (id, data, create_date, update_date)
VALUES (%s, %s, %s, %s)',
binascii.a2b_hex(row_id),
data,
time.time(),
time.time()
)
Then to retrieve data I use a similar query:
query = cursor.execute('SELECT added_id, id, data, create_date, update_date ' \
'FROM repositories WHERE id = %s',
binascii.a2b_hex(row_id)
)
Then the query returns an empty result.
Any help would be appreciated. Also, as an aside, is it better to store unix epoch dates as integers or TIMESTAMP?
NOTE: I am not having problems inserting the data, just trying to retrieve it from the database. The row exists when I check via mysqlclient.
Thanks Alot!#
One tip: you should be able to call uuid.uuid4().bytes to get the raw
bytes. As for timestamps, if you want to perform time/date manipulation
in SQL it's often easier to deal with real TIMESTAMP types.
I created a test table to try to reproduce what you're seeing:
CREATE TABLE xyz (
added_id INT AUTO_INCREMENT NOT NULL,
id BINARY(16) NOT NULL,
PRIMARY KEY (added_id),
UNIQUE (id)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=InnoDB;
My script is able to insert and query for the rows using the binary field as a
key without problem. Perhaps you are incorrectly fetching / iterating over the
results returned by the cursor?
import binascii
import MySQLdb
import uuid
conn = MySQLdb.connect(host='localhost')
key = uuid.uuid4()
print 'inserting', repr(key.bytes)
r = conn.cursor()
r.execute('INSERT INTO xyz (id) VALUES (%s)', key.bytes)
conn.commit()
print 'selecting', repr(key.bytes)
r.execute('SELECT added_id, id FROM xyz WHERE id = %s', key.bytes)
for row in r.fetchall():
print row[0], binascii.b2a_hex(row[1])
Output:
% python qu.py
inserting '\x96\xc5\xa4\xc3Z+L\xf0\x86\x1e\x05\xebt\xf7\\\xd5'
selecting '\x96\xc5\xa4\xc3Z+L\xf0\x86\x1e\x05\xebt\xf7\\\xd5'
1 96c5a4c35a2b4cf0861e05eb74f75cd5
% python qu.py
inserting '\xac\xc9,jn\xb2O#\xbb\xa27h\xcd<B\xda'
selecting '\xac\xc9,jn\xb2O#\xbb\xa27h\xcd<B\xda'
2 acc92c6a6eb24f40bba23768cd3c42da
To supplement existing answers, there's also an issue with the following warning when dealing with binary strings in queries:
Warning: (1300, "Invalid utf8 character string: 'ABCDEF'")
It is reproduced by the following:
cursor.execute('''
CREATE TABLE `table`(
bin_field` BINARY(16) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
''')
bin_value = uuid.uuid4().bytes
cursor.execute('INSERT INTO `table`(bin_field) VALUES(%s)', (bin_value,))
Whenever MySQL sees that a string literal in a query isn't valid against current character_set_connection it will emit the warning. There are several solutions to it:
Explicitly set _binary charset literal
INSERT INTO `table`(bin_field) VALUES(_binary %s)
Manually construct queries with hexadecimal literals
INSERT INTO `table`(bin_field) VALUES(x'abcdef')
Change connection charset if you're only working with binary strings
For more details see MySQL Bug 79317.
Update
As #charlax pointed out, there's binary_prefix flag which can be passed to the connection's initialiser to automatically prepend _binary prefix when interpolating arguments. It's supported by recent versions of both, mysql-client and pymysql.