How do you cleanly pass column names into cursor, Python/SQLite? - python

I'm new to cursors and I'm trying to practice by building a dynamic python sql insert statement using a sanitized method for sqlite3:
import sqlite3
conn = sqlite3.connect("db.sqlite")
cursor = conn.cursor()
list = ['column1', 'column2', 'column3', 'value1', 'value2', 'value3']
cursor.execute("""insert into table_name (?,?,?)
values (?,?,?)""", list)
When I attempt to use this, I get a syntax error "sqlite3.OperationalError: near "?"" on the line with the values. This is despite the fact that when I hard code the columns (and remove the column names from the list), I have no problem. I could construct with %s but I know that the sanitized method is preferred.
How do I insert these cleanly? Or am I missing something obvious?

The (?, ?, ?) syntax works only for the tuple containing the values, imho... That would be the reason for sqlite3.OperationalError
I believe(!) that you are ought to build it similar to that:
cursor.execute("INSERT INTO {tn} ({f1}, {f2}) VALUES (?, ?)".format(tn='testable', f1='foo', f1='bar'), ('test', 'test2',))
But this does not solve the injection problem, if the user is allowed to provide tablename or fieldnames himself, however.
I do not know any inbuilt method to help against that. But you could use a function like that:
def clean(some_string):
return ''.join(char for char in some_string if char.isalnum())
to sanitize the usergiven tablename or fieldnames. This should suffice, because table-/fieldnames usually consists only of alphanumeric chars.
Perhaps it may be smart to check, if
some_string == clean(some_string)
And if False, drop a nice exception to be on the safe side.
During my work with sql & python I felt, that you wont need to let the user name his tables and fieldnames himself, though. So it is/was rarely necessary for me.
If anyone could elaborate some more and give his insights, I would greatly appreciate it.

First, I would start by creating a mapping of columns and values:
data = {'column1': 'value1', 'column2': 'value2', 'column3': 'value3'}
And, then get the columns from here:
columns = data.keys()
# ['column1', 'column3', 'column2']
Now, we need to create placeholders for both columns and for values:
placeholder_columns = ", ".join(data.keys())
# 'column1, column3, column2'
placeholder_values = ", ".join([":{0}".format(col) for col in columns])
# ':column1, :column3, :column2'
Then, we create the INSERT SQL statement:
sql = "INSERT INTO table_name ({placeholder_columns}) VALUES ({placeholder_values})".format(
placeholder_columns=placeholder_columns,
placeholder_values=placeholder_values
)
# 'INSERT INTO table_name (column1, column3, column2) VALUES (:column1, :column3, :column2)'
Now, what we have in sql is a valid SQL statement with named parameters. Now you can execute this SQL query with the data:
cursor.execute(sql, data)
And, since data has keys and values, it will use the named placeholders in the query to insert the values in correct columns.
Have a look at the documentation to see how named parameters are used. From what I can see that you only need to worry about the sanitization for the values being inserted. And, there are two ways to do that 1) either using question mark style, or 2) named parameter style.

So, here's what I ended up implementing, I thought it was pretty pythonic, but couldn't have answered it without Krysopath's insight:
columns = ['column1', 'column2', 'column3']
values = ['value1', 'value2', 'value3']
columns = ', '.join(columns)
insertString=("insert into table_name (%s) values (?,?,?,?)" %columns)
cursor.execute(insertString, values)

import sqlite3
conn = sqlite3.connect("db.sqlite")
cursor = conn.cursor()
## break your list into two, one for column and one for value
list = ['column1', 'column2', 'column3']
list2= ['value1', 'value2', 'value3']
cursor.execute("""insert into table_name("""+list[0]+""","""+list[1]+""","""+list[2]+""")
values ("""+list2[0]+""","""+list2[1]+""","""+list2[2]+""")""")

Related

Using SQL Minus Operator in python

I want to perform a minus operation like the code below on two tables.
SELECT
column_list_1
FROM
T1
MINUS
SELECT
column_list_2
FROM
T2;
This is after a migration has happened. I have these two databases that I have connected like this:
import cx_Oracle
import pandas as pd
import pypyodbc
source = cx_Oracle.connect(user, password, name)
df = pd.read_sql(""" SELECT * from some_table """, source)
target = pypyodbc.connect(blah, blah, db)
df_2 = pd.read_sql(""" SELECT * from some_table """, target)
How can I run a minus operation on the source and target databases in python using a query?
Choose either one:
Use Python in order to perform a "manual" MINUS operation between the two result sets.
Use Oracle by means of a dblink. In this case, you won't need to open two connections from Python.
if you have a DB link then you can do a MINUS or you can use merge from Pandas.
df = pd.read_sql(""" SELECT * from some_table """, source)
df_2 = pd.read_sql(""" SELECT * from some_table """, target)
df_combine = df.merge(df2.drop_duplicates(),how='right', indicator=True)
print(df3)
There will be a new column _merge created in df_combine which will contain values both (row present in both the data frame) and right_only (row in data frame df).
In the same way you can join a left merge.

how to compare databases with tables using pandas

I am trying to compare different databases and trying to figure out if the tables inside those databases are same/equal. For example I have set it up has follows
Database 'a' has only one table called "abc"
Database 'b' has only one table called "abc"
Database 'c' has two tables called "abc" & "xyz"
I have written the following code and it works fine when executed but as you can see from the output
it says both as "false". But if you see my setup, database 'a' and database 'b' has only one identical Table and i expect it to print "True" BUT it prints "false" and when you compare database 'b' and database 'c', they are not identical because database 'c' has an extra table called 'xyz', so i expect it to print "false" which is correct.
please let me know what is wrong with my code or if there is work around. Basically i want to do a diff and compare two databases and check to see if they have same identical tables or not?
import pandas as pd
import mysql.connector
mydb1 = mysql.connector.connect(host="localhost", user="xxxxxxxx", passwd="xxxxxxxx", database="a")
mydb2 = mysql.connector.connect(host="localhost", user="xxxxxxxx", passwd="xxxxxxxx", database="b")
mydb3 = mysql.connector.connect(host="localhost", user="xxxxxxxx", passwd="xxxxxxxx", database="c")
querystmt1 = "SHOW TABLES"
querystmt2 = "SHOW TABLES"
querystmt3 = "SHOW TABLES"
df1 = pd.read_sql(querystmt1, mydb1)
df2 = pd.read_sql(querystmt2, mydb2)
df3 = pd.read_sql(querystmt3, mydb3)
print(df1)
print(df2)
print(df3)
print(df1.equals(df2))
print(df2.equals(df3))
Since you are interested in the values of the dataframes then a solution would be to convert the dataframes to dictionaries and then check if the values are the same:
df1 = pd.read_sql(querystmt1, mydb1)
d1 = df1.to_dict()
df2 = pd.read_sql(querystmt2, mydb2)
d2 = df2.to_dict()
df3 = pd.read_sql(querystmt3, mydb3)
d3 = df3.to_dict()
# Checking
print(list(d1.values()) == list(d2.values())) # True
print(list(d2.values()) == list(d3.values())) # False
This is not the most computationally efficient way to do it (contains a lot of type conversions) but it's sufficient if it's a one time thing.
If you want to check if the two dataframes contain at least one common value then you may use:
print(any(i in list(d3.values()) for i in list(d2.values())))
# The output is True since 'abc' is a table in both df2 and df3.
The headers are possibly different.
Try setting the headers to just indexes before comparing
df1.columns = range(df1.shape[1])
df2.columns = range(df2.shape[1])
df3.columns = range(df3.shape[1])
Under the assumption that the column order in all dataframes is the same
Try pd.testing.assert_frame_equal: It'll return nothing if the two dataframes are equal, and will raise an AssertionError if they're not.
It can receive all sorts of keyword arguments to select what to check for in the comparison (e.g. you can pass check_names=False if you don't want to check for column names).
It will also be explicit about where are the dataframes not equal; different sizes, different column names, different values - whatever it is, it'll be explicit about it.
Give it a try!

How to parse sql statement insert into to get values with pyspark

I have a sql dump with several insert into like the following one
query ="INSERT INTO `temptable` VALUES (1773,0,'morne',0),(6004,0,'ATT',0)"
I'm trying to get only the values in a dataframe
(1773,0,'morne',0)
(6004,0,'ATT',0)
I tried
spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
and get
'InsertIntoTable 'UnresolvedRelation `temptable`, false, false
+- 'UnresolvedInlineTable [col1, col2, col3, col4], [List(1773, 0,
morne, 0), List(6004,0, 0, ATT, 0)]
But I don't know how to retrieve those lists of value
is there a way to get without hive?
If you are trying to get only list of values from multiple insert statements then you may try below
listOfInserts = [('''INSERT INTO temptable VALUES (1773,0,'morne',0),(6004,0,'ATT',0)''',),('''INSERT INTO temptable VALUES (1673,0,'morne',0),(5004,0,'ATT',0)''',)]
df = spark.createDataFrame(listOfInserts, ['VALUES'])
from pyspark.sql.functions import substring_index
df.select(substring_index(df.VALUES, 'VALUES', -1).alias('right')).show(truncate = False)

Only a column name can be used for the key in a dtype mappings argument

I've successfully brought in one table using dask read_sql_table from a oracle database. However, when I try to bring in another table I get this error KeyError: 'Only a column name can be used for the key in a dtype mappings argument.'
I've checked my connection string and schema and all of that is fine. I know the table name exists and the column i'm trying to use as an index is a primary key on the table in the oracle database.
Can someone please explain why this error occurs when the column name clearly exists?
I know I can use Pandas chunk, but would rather use dask in this scenario.
below is how i'm connecting to the oracle database and the last bit of the error message
host='*******'
port='*****'
sid='****'
user='******'
password='*****'
con_string = 'oracle://' + user + ':' + password + '#' + host + ':' + port + '/' + sid
engine = create_engine(con_string)
df =ddf.read_sql_table('table_name', uri=con_string, index_col='id', npartitions=None, schema='*****')
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py
in astype(self, dtype, copy, errors, **kwargs) 5855
if col_name not in self: 5856 raise KeyError(
-> 5857 "Only a column name can be used for the " 5858 "key in a dtype mappings
argument." 5859 )
KeyError: 'Only a column name can be used for the key in a dtype
mappings argument.'
Today for another table I added all the column names in ddf.read_sql_table and the query worked. But for another table I tried to do the same thing by listing all the column names and I got the KeyError above.
Thanks everyone.
This error generally occurs when there is a mismatch in the column name or there exists a column in the dtype mapping which is not present in the table.
The index_col is not one of the columns (as it is the index of the dataframe). To fix your issue you'd have to provide the columns arg to read_sql_table and provide a list of all the columns except the index_col.
E.g. in your case, let's say your table has columns id, foo and bar: df = ddf.read_sql_table('table_name', uri=con_string, index_col='id', npartitions=None, schema='*****', columns=['foo', 'bar'])

Python insert array into mysql table column avoiding record duplication

I have a simple array like this:
a=['a','b','c']
I need to insert all the "a" elements into Mysql table 'items' and column 'names' avoiding to insert the element if it's already present in 'names' column, avoiding iteration and multiple INSERT query.
Thanks
1) You can use MySQL specific INSERT ... ON DUPLICATE KEY UPDATE Syntax. (I assume there is PRIMARY KEY or UNIQUE KEY on column 'name')
(additionaly: a = list(set(a)) #to remove duplicates in a).
a = ['a', 'b', 'c']
c = conn.cursor()
c.executemany('INSERT INTO items (name) VALUES (%s) ON DUPLICATE KEY UPDATE name = name', a)
2) If there is no uniqueness constriaint on column 'name', you can check which names are already in database and remove them from your list to insert:
a = ['a', 'b', 'c']
c = conn.cursor()
c.execute('SELECT names from items')
existent_names = [name[0] for name in c]
a = list(set(a) - set(existent_names))
c.executemany('INSERT INTO items (name) VALUES (%s)', a)
Hi if you dont want to have duplicate 'names' into your table 'items' maybe that should be your primary key, and when ever you try to insert duplicate values, mysql simply wont allow it. However this code maybe can help you if you are using different primary key for your table:
import MySQLdb
conn = MySQLdb.connect(host= "localhost",
user="root",
passwd="yourpass",
db="yourdb")
x = conn.cursor()
a=['a','b','c']
for i in range (0,len(a))
try:
data = x.execute("""SELECT name FROM items WHERE name=%s """, a[i])
data = cursor.fetchone()
if data == None:
x.execute("""INSERT INTO items(name) VALUES(%s)""",a[i])
conn.commit()
except:
conn.rollback()
This is what I understand from your post. You want to add unique values from the list into the database. You can use python's set for it.
a = ['a', 'b', 'c']
set_a = list(set(a))
inset_into_db(set_a)
If holding all items.names won't be memory costly, you can run a query to select all items.names and keep these in a set.
cursor.execute('SELECT names from items')
names_set = set(name[0] for name in cursor)
Before you execute a query, filter out existing names in this set.
fresh_a = filter(lambda v: v not in names_set, a)
If you are concerned about duplicates names in a, you can apply cast it to a set.

Categories

Resources