I'm building a software to combine some chemicals into different compounds (each compound can have 1,2,3 or 4 chemicals), but some chemicals cannot combine with some other chemicals.
I have a table in my mysql db that has the following columns:
chemical_id,chemicalName, and one column for each chemical in my list.
Each row has one of the chemicals. the value in the fields tell me if both these chemicals can go together in a compound, or not (1, or 0). So all chemicals have a row, and a column. They were created in the same order, too. Here (dummy data): https://imgur.com/a/e2Fbq1K
I have a python list of chemicals_ids, which I'm gonna combine with themselves to make compounds of 1,2,3 and 4 chems, but I need a function to determine if any two of them ain't compatible.
I was trying to mess around with INFORMATION_SCHEMA COLUMN_NAME but I'm kinda lost.
A loop around something like this would work, but the syntax won't.
list_of_chemicals = ['ChemName1','ChemName2','ChemName3'] #etc
def verify_comp(a,b): #will be passed with chem names
mycursor.execute("SELECT chemicalName FROM chemical_compatibility WHERE chemical_id = 'ChemName1' AND 'ChemName2' = 0")
#etc
I have tried to use %s placeholders but it seems only to work in certain parts of mysql query. I'm a beginner both at Python and SQL so any light will be much appreciated.
Thanks!
I followed #Akina's suggestion and made a new table containing pairs of chemicals and compatibility value for each pair.
I also learned that apart from placeholders %s, which can only be used for values on python cursor sql execute statements, you can use a py variable too by doing something like this:
mycursor.execute("SELECT * FROM "+variablename+" WHERE condition = 1")
I'm not worried about SQL Injection for this project nor do I know if what I say here is 100% correct, but maybe it can help people that are lost nevertheless.
Related
I'm to link my code to a MySQL database using pymysql. In general everything has gone smoothly but I'm having difficulty with the following function to find the minimum of a variable column.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN(%s) FROM table"
cur.execute(sql,column)
mintup = cur.fetchone()
if everything went smoothly this would return me a tuple with the minimum, e.g. (1,).
However, if I run the function:
findmin(column_name)
I have to put column name in "" (i.e. "column_name"), else Python sees it as an unknown variable. But if I put the quotation marks around column_name then SQL sees
SELECT MIN("column_name") FROM table
which just returns the column header, not the value.
How can I get around this?
The issue is likely the use of %s for the column name. That means the SQL Driver will try to escape that variable when interpolating it, including quoting, which is not what you want for things like column names, table names, etc.
When using a value in SELECT, WHERE, etc. then you do want to use %s to prevent SQL injections and enable quoting, among other things.
Here, you just want to interpolate using pure Python (assuming a trusted value; please see below for more information). That also means no bindings tuple passed to the execute method.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN({0}) FROM table".format(column)
cur.execute(sql)
mintup = cur.fetchone()
SQL fiddle showing the SQL working:
http://sqlfiddle.com/#!2/e70a41/1
In response to the Jul 15, 2014 comment from Colin Phipps (September 2022):
The relatively recent edit on this post by another community member brought it to my attention, and I wanted to respond to Colin's comment from many years ago.
I totally agree re: being careful about one's input if one interpolates like this. Certainly one needs to know exactly what is being interpolated. In this case, I would say a defined value within a trusted internal script or one supplied by a trusted internal source would be fine. But if, as Colin mentioned, there is any external input, then that is much different and additional precautions should be taken.
At my work, I have two SQL tables, one is called jobs, with string attributes, job and codes. The latter is called skills with string attributes code and skill.
job code
--- ----
j1 s0001,s0003
j2 s0002,20003
j3 s0003,s0004
code skills
----- ------
s0001 python programming language
s0002 oracle java
s0003 structured query language sql
s0004 microsoft excel
What my boss wants me to do is: Take values from the attribute code in jobs, split the string into an array, join this array on the codes (from skills table) and return the query in the format of job skills like:
job skills
--- ------
j1 python programming language,structured query language sql
At this point, I'd just like to know if (A) this is possible and (B) if there is a preferred alternative to this approach. I've listed my python solution, using dictionaries, below to illustrate my the concept:
jobs = {'j1':'s0001,s0003',
'j2':'s0002,20003',
'j3':'s0003,s0004'}
skills = {'s0001':'python programming language',
's0002':'oracle java',
's0003':'structured query language sql',
's0004':'microsoft excel'}
job_skills = {k:[] for k in jobs.keys()}
for j,s in jobs.items():
for code,skill in skills.items():
for i in s.split(','):
if i == code:
job_skills[j].append(skill)
for k,v in job_skills.items():
job_skills[k] = ','.join(v)
And the output:
{'j1': 'python programming language,structured query language sql',
'j2': 'oracle java',
'j3': 'structured query language sql,microsoft excel'}
The real crux of this problem is that there aren't just 4 different skills in our data. Our company's data includes ~5000 skills. My boss would greatly like to avoid creating a table with 5000 attributes, 1 for each skill; he believes the above approach will result in simpler queries, with potentially better memory management.
I'm still pretty new to SQL, and technically only do SQLite3 anyway so the best I can probably do is a Python solution. I'll tell you how I would solve it, and hopefully someone can come along and fix it, because doing things purely in SQL is vastly faster than ever using Python.
I'm going to assume that this is SQLite, because you tagged Python. If it's not, there's probably ways to convert the database to the .db format in order to use that if you prefer this solution.
I'm assuming that conn is your connection to the database conn = sqlite3.connect(your_database_path) or a cursor for it. I don't use cursors, but it's almost certainly better practice to use them.
First, I would fetch the 'skills' table and convert it to a dict. I would do so with:
skills_array = conn.execute("""SELECT * FROM skills""")
skills_dict = dict()
#replace i with something else. I just did it so that I could use 'skill' as a variable
for i in skills_array:
#skills array is an iterator of tuples, which means the first position is the code number, and the second position is the skill itself
code = i[0]
skill = i[1]
skills_dict[code] = skill
There's probably better ways to do this. If it's important, I recommend researching them. But if it's a one time thing this will work just fine. All this is doing is making giving an easy way to look up skills given a code. You could do this dozens of ways. You didn't mention it being a particularly large database, so this should be fine.
Before the next part, something should be mentioned about SQLite. It has very limited table modifying mechanics-- something I coincidentally found out about today. The recommended method is just to create a new table instead of trying to finagle with an old one. But there are easy ways to modify them with SQLiteBrowser-- something I highly recommend you use. At the very least it's much easier to view info in it for me, and it's available on all the important OS's.
Second, we need to combine the job table and the skills dict. There are much better ways to go about it, but I chose the easy approach. Delimiting the job.skills column by commas and going from there. I'll also create a new table, and insert directly to there.
conn.execute("""CREATE TABLE combined (job TEXT PRIMARY KEY, skills text)""")
conn.commit()
job_array = conn.execute("""SELECT * FROM jobs""")
for i in job_array:
job = i[0]
skill = i[1]
for code in skill.split(","):
skill.replace(code, skills_dict[code])
conn.execute("""INSERT INTO combined VALUES (?, ?)""", (job, skill,))
conn.commit()
And to combine it all...
import sqlite3
conn = sqlite3.connect(your_database_path)
skills_array = conn.execute("""SELECT * FROM skills""")
skills_dict = dict()
#replace i with something else. I just did it so that I could use 'skill' as a variable
for i in skills_array:
#skills array is an iterator of tuples, which means the first position is the code number, and the second position is the skill itself
code = i[0]
skill = i[1]
skills_dict[code] = skill
conn.execute("""CREATE TABLE combined (job TEXT PRIMARY KEY, skills text)""")
conn.commit()
job_array = conn.execute("""SELECT * FROM jobs""")
for i in job_array:
job = i[0]
skill = i[1]
for code in skill.split(","):
skill.replace(code, skills_dict[code])
conn.execute("""INSERT INTO combined VALUES (?, ?)""", (job, skill,))
conn.commit()
To explain a little further if you/someone is confused on the job_array for loop:
Splitting skills allows you to see each individual code, meaning that all you have to do is replace every instance of the code being looked up with the corresponding skill.
And that's it. There's probably a mistake or two in the above code, so I would backup your database/tables before trying it, but this should work. One thing that you might find helpful are context managers, that would make it far more Pythonic. If you plan to use this consistently (for some strange reason), refactoring for speed and readability may also be prudent.
I would also like to believe that there's an SQLite only approach, since this is exactly what databases are made for.
Hope this helps. If it did, let me know. :>
P.S. If you're confused by something/want more explanation feel free to comment.
Hello StackEx community.
I am implementing a relational database using SQLite interfaced with Python. My table consists of 5 attributes with around a million tuples.
To avoid large number of database queries, I wish to execute a single query that updates 2 attributes of multiple tuples. These updated values depend on the tuples' Primary Key value and so, are different for each tuple.
I am trying something like the following in Python 2.7:
stmt= 'UPDATE Users SET Userid (?,?), Neighbours (?,?) WHERE Username IN (?,?)'
cursor.execute(stmt, [(_id1, _Ngbr1, _name1), (_id2, _Ngbr2, _name2)])
In other words, I am trying to update the rows that have Primary Keys _name1 and _name2 by substituting the Neighbours and Userid columns with corresponding values. The execution of the two statements returns the following error:
OperationalError: near "(": syntax error
I am reluctant to use executemany() because I want to reduce the number of trips across the database.
I am struggling with this issue for a couple of hours now but couldn't figure out either the error or an alternate on the web. Please help.
Thanks in advance.
If the column that is used to look up the row to update is properly indexed, then executing multiple UPDATE statements would be likely to be more efficient than a single statement, because in the latter case the database would probably need to scan all rows.
Anyway, if you really want to do this, you can use CASE expressions (and explicitly numbered parameters, to avoid duplicates):
UPDATE Users
SET Userid = CASE Username
WHEN ?5 THEN ?1
WHEN ?6 THEN ?2
END,
Neighbours = CASE Username
WHEN ?5 THEN ?3
WHEN ?6 THEN ?4
END,
WHERE Username IN (?5, ?6);
I'm to link my code to a MySQL database using pymysql. In general everything has gone smoothly but I'm having difficulty with the following function to find the minimum of a variable column.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN(%s) FROM table"
cur.execute(sql,column)
mintup = cur.fetchone()
if everything went smoothly this would return me a tuple with the minimum, e.g. (1,).
However, if I run the function:
findmin(column_name)
I have to put column name in "" (i.e. "column_name"), else Python sees it as an unknown variable. But if I put the quotation marks around column_name then SQL sees
SELECT MIN("column_name") FROM table
which just returns the column header, not the value.
How can I get around this?
The issue is likely the use of %s for the column name. That means the SQL Driver will try to escape that variable when interpolating it, including quoting, which is not what you want for things like column names, table names, etc.
When using a value in SELECT, WHERE, etc. then you do want to use %s to prevent SQL injections and enable quoting, among other things.
Here, you just want to interpolate using pure Python (assuming a trusted value; please see below for more information). That also means no bindings tuple passed to the execute method.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN({0}) FROM table".format(column)
cur.execute(sql)
mintup = cur.fetchone()
SQL fiddle showing the SQL working:
http://sqlfiddle.com/#!2/e70a41/1
In response to the Jul 15, 2014 comment from Colin Phipps (September 2022):
The relatively recent edit on this post by another community member brought it to my attention, and I wanted to respond to Colin's comment from many years ago.
I totally agree re: being careful about one's input if one interpolates like this. Certainly one needs to know exactly what is being interpolated. In this case, I would say a defined value within a trusted internal script or one supplied by a trusted internal source would be fine. But if, as Colin mentioned, there is any external input, then that is much different and additional precautions should be taken.
I am having troubles finding out if I can even do this. Basically, I have a csv file that looks like the following:
1111,804442232,1
1112,312908721,1
1113,A*2434,1
1114,A*512343128760987,1
1115,3512748,1
1116,1111,1
1117,1234,1
This is imported into a sqlite database in memory for manipulation. I will be importing multiple files into this database after some manipulation. Sqlite is allowing me to keep constraints on the tables and receive errors where needed without creating additional functions just to check each constraint while using arrays in python. I want to do a few things but the first of which is to prepend field2 where all field2 strings match an entry in field1.
For example, in the above data field2 in entry 6 matches entry 1. In this case I would like to prepend field2 in entry 6 with '555'
If this is not possible I do believe I could make do using a regex and just do this on every row with 4 digits in field2... though... I have yet to successfully get REGEX working using python/sqlite as it always throws me an error.
I am working within Python using Sqlite3 to connect/manipulate my sqlite database.
EDIT: I am looking for a method to manipulate the resultant tables which reside in a sqlite database rather than manipulating just the csv data. The data above is just a simple representation of what is contained in the files I am working with. Would it be better to work with arrays containing the data from the csv files? These files have 10,000+ entries and about 20-30 columns.
If you must do it in SQLite, how about this:
First, get the column names of the table by running the following and parsing the result
def get_columns(table_name, cursor):
cursor.execute('pragma table_info(%s)' % table_name)
return [row[1] for row in cursor]
conn = sqlite3.connect('test.db')
columns = get_columns('test_table',conn.cursor())
For each of those columns, run the following update, that does your prepending
def prepend(column, reference, prefix, cursor):
query = '''
UPDATE %s
SET %s = 'prefix' || %s
WHERE %s IN (SELECT %s FROM %s)
''' % (table, column, column, column, reference, table)
cursor.execute(query)
reference = 'field1'
[prepend('test_table', column, reference, '555', conn.cursor())
for column in columns
if column != reference]
Note that this is expensive: O(n^2) for each column you want to do it for.
As per your edit and Nathan's answer, it might be better to simply work with python's builtin datastructures. You can always insert it into SQLite after.
10,000 entries is not really much so it might not matter in the end. It all depends on your reason for requiring it to be done in SQLite (which we don't have much visibility of).
There is no need to use regex expressions to do this, just throw the contents from the first column into a set and then iterate through the rows and update the second field.
first_col_values = set(row[0] for row in rows)
for row in rows:
if row[1] in first_col_values:
row[1] = '555' + row[1]
So... I found the answer to my own question after a ridiculous amount of my own searching and trial and error. My unfamiliarity with SQL had me stumped as I was trying all kinds of crazy things. In the end... this was the simple type of solution I was looking for:
prefix="555"
cur.execute("UPDATE table SET field2 = %s || field2 WHERE field2 IN (SELECT field1 FROM table)"% (prefix))
I kept the small amount of python in there but what I was looking for was the SQL statement. Not sure why nobody else came up with something that simple =/. Unsatisfied with the answers so far, I had been searching far and wide for this simple line >_<.