MySQL query to match similar words/sentences - python

I have a table in a MySQL Database which has this structure:
CREATE TABLE `papers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(1000) COLLATE utf8_bin DEFAULT NULL,
`booktitle` varchar(300) COLLATE utf8_bin DEFAULT NULL,
`journal` varchar(300) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `title_fulltext` (`title`),
FULLTEXT KEY `booktitle_fulltext` (`booktitle`),
FULLTEXT KEY `journal_fulltext` (`journal`)
) ENGINE=MyISAM AUTO_INCREMENT=1601769 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Now I know that in the column title, somewhere within the millions of rows, there is a row which contains the string
nFOIL: Integrating Naïve Bayes and FOIL.
I want to look for
my_string = "nFOIL: integrating Naïve Bayes and FOIL"
and find the right row. You see it has to be a case insensitive search and the dot at the end is missing in the query. How do I implement this?
I tried
SELECT id FROM papers WHERE UPPER(title) LIKE %s
and converted my_string to upper case in python and put a "%" at the end of my_string but this doesn't seam a good way of handling this. It did not work too. =)
Thanks for any suggestions!

I see you have added FULLTEXT indexes, I though you already knew about MATCH AGAINST syntax of MySQL.
You should try
SELECT id FROM papers
WHERE MATCH (title,booktitle,journal) AGAINST ('nFOIL: integrating Naïve Bayes and FOIL' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION);

Change your collate in utf8_general_ci.
In this way your searches will be case insensitive.

Related

SQLite AUTO_INCREMENT id field not working

I am trying to create a database using python to execute the SQL commands (for CS50x problem set 7).
I have created a table with an id field set to AUTO_INCREMENT, but the field in the database is populated only by NULL values. I just want it to have an incrementing id starting at 1.
I've tried searching online to see if I'm using the right syntax and can't find anything obvious, nor can I find someone else with a similar problem, so any help would be much appreciated.
Here is the SQL command I am running:
# For creating the table
db.execute("""
CREATE TABLE students (
id INTEGER AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255) NOT NULL,
middle_name VARCHAR(255) DEFAULT (NULL),
last_name VARCHAR(255) NOT NULL,
house VARCHAR(10),
birth INTEGER
);
""")
# An example insert statement
db.execute("""
INSERT INTO students (
first_name,
middle_name,
last_name,
house,
birth
)
VALUES (
?, ?, ?, ?, ?
);
""", "Harry", "James", "Potter", "Gryffindor", 1980)
Here is a screenshot of the database schema shown in phpliteadmin :
And here is a screenshot of the resulting database:
My guess is that you are using SQLite with phpliteadmin and not MySql, in which case this:
id INTEGER AUTO_INCREMENT PRIMARY KEY
is not the correct definition of the auto increment primary key.
In fact, the data type of this column is set to INTEGER AUTO_INCREMENT, as you can see in phpliteadmin, which according to 3.1. Determination Of Column Affinity, has INTEGER affinity.
Nevertheless it is the PRIMARY KEY of the table but this allows NULL values.
The correct syntax to have an integer primary key is this:
id INTEGER PRIMARY KEY AUTOINCREMENT
This cannot happen, if your statements are executed correctly.
I notice that you are not checking for errors in your code. You should be doing that!
My guess is that the table is already created without the auto_increment attribute. The create table is generating an error and you are inserting into the older version.
You can fix this by dropping the table before you create it. You should also modify the code to check for errors.

Unconsumed column names sqlalchemy python

I am facing the following error using SQLAlchemy: Unconsumed column names: company
I want to insert data for 1 specific column, and not all columns in the table: INSERT INTO customers (company) VALUES ('sample name');
My code:
engine.execute(table('customers').insert().values({'company': 'sample name'}))
Create Table:
'CREATE TABLE `customers` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`company` varchar(255) DEFAULT NULL,
`first_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
`phone` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
UNIQUE KEY `company_UNIQUE` (`company`)
) ENGINE=InnoDB AUTO_INCREMENT=63 DEFAULT CHARSET=utf8'
After hours of frustration, I was able to test a way that I think works for my use case. As we know, you can insert to specific columns, or all columns in a table. In my use case, I dynamically need to insert to the customers table, depending on what columns a user has permissions to insert to.
I found that I needed to define all columns in the table() method of sqlalchemy, but I can pass in whatever columns and values that I need dynamically to the values() method.
Final code:
engine.execute(table('customers', column('company'), column('first_name'), column('last_name'), column('email'), column('phone')).insert().values({'company': 'sample name'}))
The original solution works great, however I'd like to add another approach that will allow working with tables dynamically, without specifying all of their columns. This can be useful when working with multiple tables.
We can use the Table class from sqlalchemy.schema and provide our engine to its autoload_with parameter, which will reflect the schema and populate the columns for us.
Then, we can work just like we in the OP's answer.
from sqlalchemy.schema import Table, MetaData
my_table_name = 'customers' # Could be passed as an argument as well :)
table = Table(my_table_name, MetaData(), autoload_with=engine)
engine.execute(my_table.insert({'company': 'sample name'}))

Django adding items with JSON, error Incorrect string value: '\xC4\x97dos'

I am trying to iterate through a JSON object and save that information into Django fields and have had pretty good success so far. However when processing data from foreign countries I am having problems ignoring special characters.
a simplified version of the code block in customers.views is below:
customer_list = getcustomers() #pulls standard JSON object
if customer_list:
for mycustomer in customer_list:
entry = Customer(pressid=mycustomer['id'],
email = mycustomer['email'],
first_name = mycustomer['first_name']
)
The code above works great... until you introduce a foreign character, say a name with non-utf-8 charset.
An example error is:
Warning at /customers/update/
Incorrect string value: '\xC4\x97dos' for column 'first_name' at row 1
I have tried adding the .encode('utf-8') to the end of strings, but I still get this error, and haven't found a way to avoid it. I am okay with truncation of data in a particular field if it uses invalid characters, but I can't make a list of all possible characters because next thing you know a new customer will use a letter I didn't know existed.
Thanks in advance for the help!
Your databes is not configurated correctly.
https://docs.djangoproject.com/en/1.7/ref/unicode/
For example table like that:
CREATE TABLE IF NOT EXISTS `api_projekt` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nazwa` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `nazwa` (`nazwa`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;
Will raise error when you try add non-ASCII character. You need to change encoding from latin1 to utf-8.
It should look:
CREATE TABLE IF NOT EXISTS `api_projekt` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nazwa` varchar(30) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `nazwa` (`nazwa`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
To fix it:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
I had a look at the unicode python documents, and found a line that appears to be solving things https://docs.python.org/2/howto/unicode.html.
I added .encode('ascii', 'ignore') instead of .encode(utf-8) and it is now working on all values.
This method truncates all unknown characters, and it is the best I could come up with.

Storing Random Data in MySQL (directly in table)

In a Python script, I'm generating a scrypt hash using a salt made up of data from os.urandom and I would like to save these in a MySQL table. If I attempt to use the standard method I've seen used for efficiently storing hashes in a database, using a CHAR column, I get "Incorrect string value:" errors for both the hash and the salt. The only data type I've been able to find that allows the random data is blob, but since blobs are stored outside the table they have obvious efficiency problems.
What is the proper way to do this? Should I do something to the data prior to INSERTing it into the db to massage it into being accepted by CHAR? Is there another MySQL datatype that would be more appropriate for this?
Edit:
Someone asked for code, so, when I do this:
salt = os.urandom(255)
hash = scrypt.hash(password,salt,1<<15,8,1,255)
cursor.execute("INSERT INTO users (email,hash,salt) values (%s,%s,%s)", [email,hash,salt])
MySQL gives me the "Incorrect string value" errors when I attempt to insert these values.
Edit 2:
As per Joran's request, here is the schema that doesn't like this:
CREATE TABLE `users` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(254) NOT NULL DEFAULT '',
`hash` char(255) NOT NULL,
`salt` char(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Your hash is a binary value that most likely will contain "unprintable characters" if it is interpreted as a string. To store arbitrary binary data, use the BINARY or VARBINARY data type.
If you have to use a string datatype, you can use base64 encoding to convert arbitrary data to an ASCII string.

INserting value in mysql DB using python

Here is my Mysql table schema
Table: booking
Columns:
id int(11) PK AI
apt_id varchar(200)
checkin_date date
checkout_date date
price decimal(10,0)
deposit decimal(10,0)
adults int(11)
source_id int(11)
confirmationCode varchar(100)
client_id int(11)
booking_date datetime
note mediumtext
Related Tables:property (apt_id → apt_id)
booking_source (source_id → id)
I am trying to insert the value using python .so Here what I have done
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
x.execute(sql)
But while executing the above script I am getting the error .
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
TypeError: %d format: a number is required, not NoneType
I think my strings formatter are not correct Please help me out .
it looks like either booking_date, notes, source_id, (also you are inserting notes value 2x?)
is None. You could check/validate each value before inserting.
Also please use parameterized queries, NOT string formatting
Usually your SQL operations will need to use values from Python
variables. You shouldn’t assemble your query using Python’s string
operations because doing so is insecure; it makes your program
vulnerable to an SQL injection attack (see http://xkcd.com/327/ for
humorous example of what can go wrong).
Instead, use the DB-API’s parameter substitution. Put ? as a
placeholder wherever you want to use a value, and then provide a tuple
of values as the second argument to the cursor’s execute() method.
something like:
x.execute("INSERT INTO thing (test_one, test_two) VALUES (?, ?)", (python_var_one, python_var_two,))

Categories

Resources