sqlite3: Preserve table names in query with JOIN

sqlite3: Preserve table names in query with JOIN - python

I want to execute a SQL query with a JOIN where I can propagate the table aliases into the resulting dictionary keys. For example, I may have a query
query = """
SELECT t1.col1,t2.col1,t2.col2
FROM table1 t1 JOIN table2 t2
ON t1.col0=t2.col0
"""
and I want the output to maintain the t1, t2 aliases, since I have duplicate column names (col1). I would run
con = sqlite3.connect(dbpath, isolation_level=None, detect_types=sqlite3.PARSE_DECLTYPES)
def dict_factory(cursor, row):
d = {}
for idx,col in enumerate(cursor.description): d[col[0]] = row[idx]
return d
db.dict = con.cursor()
db.dict.row_factory = dict_factory
result = db.dict.execute(query).fetchone()
But this overwrites the col1 value. How can I have it return, say,
{'t1.col1':123, 't2.col1':234, 't2.col2':345}
Thanks!

The documentation says:
The name of a result column is the value of the "AS" clause for that column, if there is an AS clause. If there is no AS clause then the name of the column is unspecified and may change from one release of SQLite to the next.
So you have to do:
SELECT t1.col1 AS "t1.col1", t2.col1 AS "t2.col1", ...

Related

What's the meaning of table1.{0} == table2.{0} in SQL join clause?

I'm trying to understand the following code snip:
improt pandasql
data_sql = data[['account_id', 'id', 'date', 'amount']]
# data_sql is a table has the above columns
data_sql.loc[:, 'date_hist_min'] = data_sql.date.apply(lambda x: x + pd.DateOffset(months=-6))
# add one more column, 'date_hist_min', it is from the column 'data' with the month minus 6
sqlcode = '''
SELECT t1.id,
t1.date,
t2.account_id as "account_id_hist",
t2.date as "date_hist",
t2.amount as "amount_hist"
FROM data_sql as t1 JOIN data_sql as t2
ON (cast(strftime('%s', t2.date) as integer) BETWEEN
(cast(strftime('%s', t1.date_hist_min) as integer))
AND (cast(strftime('%s', t1.date) as integer)))
AND (t1.{0} == t2.{0})
'''
# perform the SQL query on the table with sqlcode:
newdf = pandasql.sqldf(sqlcode.format(column), locals())
The code is with Python pandasql. It manipulates dataframe as SQL table. You can assume
the above dataframe as SQL table.
The definition of the table is in the comments.
What's the meaning of t1.{0} == t2.{0} ? What does {0} stand for in the context?

sqlcode.format(column) is going format the string and inject the columns into {0}
The 0 means format will use the first parameter.
print("This {1} a {0}".format("string", "is")) would print "This is a string"

Count the number of non-null values in each column of each table in MySQL

Is there a way to produce this output using SQL for all tables in a given database (using MySQL) without having to specify individual table names and columns?
Table Column Count
---- ---- ----
Table1 Col1 0
Table1 Col2 100
Table1 Col3 0
Table1 Col4 67
Table1 Col5 0
Table2 Col1 30
Table2 Col2 0
Table2 Col3 2
... ... ...
The purpose is to identify columns for analysis based on how much data they contain (a significant number of columns are empty).
The 'workaround' solution using python (one table at a time):
# Libraries
import pymysql
import pandas as pd
import pymysql.cursors
# Connect to mariaDB
connection = pymysql.connect(host='localhost',
user='root',
password='my_password',
db='my_database',
charset='latin1',
cursorclass=pymysql.cursors.DictCursor)
# Get column metadata
sql = """SELECT *
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='my_database'
"""
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
# Store in dataframe
df = pd.DataFrame(result)
df = df[['TABLE_NAME', 'COLUMN_NAME']]
# Build SQL string (one table at a time for now)
my_table = 'my_table'
df_my_table = df[df.TABLE_NAME==my_table].copy()
cols = list(df_my_table.COLUMN_NAME)
col_strings = [''.join(['COUNT(', x, ') AS ', x, ', ']) for x in cols]
col_strings[-1] = col_strings[-1].replace(',','')
sql = ''.join(['SELECT '] + col_strings + ['FROM ', my_table])
# Execute
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
The result is a dictionary of column names and counts.

Basically, no. See also this answer.
Also, note that the closest match of the answer above is actually the method you're already using, but less efficiently implemented in reflective SQL.
I'd do the same as you did - build a SQL like
SELECT
COUNT(*) AS `count`,
SUM(IF(columnName1 IS NULL,1,0)) AS columnName1,
...
SUM(IF(columnNameN IS NULL,1,0)) AS columnNameN
FROM tableName;
using information_schema as a source for table and column names, then execute it for each table in MySQL, then disassemble the single row returned into N tuple entries (tableName, columnName, total, nulls).

It is possible, but it's not going to be quick.
As mentioned in a previous answer you can work your way through the columns table in the information_schema to build queries to get the counts. It's then just a question of how long you are prepared to wait for the answer because you end up counting every row, for every column, in every table. You can speed things up a bit if you exclude columns that are defined as NOT NULL in the cursor (i.e. IS_NULLABLE = 'YES').
The solution suggested by LSerni is going to be much faster, particularly if you have very wide tables and/or high row counts, but would require more work handling the results.
e.g.
DELIMITER //
DROP PROCEDURE IF EXISTS non_nulls //
CREATE PROCEDURE non_nulls (IN sname VARCHAR(64))
BEGIN
-- Parameters:
-- Schema name to check
-- call non_nulls('sakila');
DECLARE vTABLE_NAME varchar(64);
DECLARE vCOLUMN_NAME varchar(64);
DECLARE vIS_NULLABLE varchar(3);
DECLARE vCOLUMN_KEY varchar(3);
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE cur1 CURSOR FOR
SELECT `TABLE_NAME`, `COLUMN_NAME`, `IS_NULLABLE`, `COLUMN_KEY`
FROM `information_schema`.`columns`
WHERE `TABLE_SCHEMA` = sname
ORDER BY `TABLE_NAME` ASC, `ORDINAL_POSITION` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
DROP TEMPORARY TABLE IF EXISTS non_nulls;
CREATE TEMPORARY TABLE non_nulls(
table_name VARCHAR(64),
column_name VARCHAR(64),
column_key CHAR(3),
is_nullable CHAR(3),
rows BIGINT,
populated BIGINT
);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO vTABLE_NAME, vCOLUMN_NAME, vIS_NULLABLE, vCOLUMN_KEY;
IF done THEN
LEAVE read_loop;
END IF;
SET #sql := CONCAT('INSERT INTO non_nulls ',
'(table_name,column_name,column_key,is_nullable,rows,populated) ',
'SELECT \'', vTABLE_NAME, '\',\'', vCOLUMN_NAME, '\',\'', vCOLUMN_KEY, '\',\'',
vIS_NULLABLE, '\', COUNT(*), COUNT(`', vCOLUMN_NAME, '`) ',
'FROM `', sname, '`.`', vTABLE_NAME, '`');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END LOOP;
CLOSE cur1;
SELECT * FROM non_nulls;
END //
DELIMITER ;
call non_nulls('sakila');

how to set date value or null in sql formatted query string for a date column

I have a task to copy data from one db table to another db table using psycopg2 library in python language.
Now after fetching a row from first db I have to insert the row into the second db table, but the problem I face is that I need to format the query and insert the value for date column from a variable which may or may not have a date value, so my query is like the following:
cur.execute("""update table_name set column1 = '%s', column2 = '%s',column_date = '%s'""" % (value1, value2, value_date))
will now the value_date may be a date or a None value, so how to I convert this None value to sql null or something so that it can be stored in the date column.
Note: considering the value1, value2 and value_date are variables containing values.

Psycopg adapts a Python None to a Postgresql null. It is not necessary to do anything. If there is no processing at the Python side skip that step and update directly between the tables:
cur.execute("""
update t1
set column1 = t2.c1, column2 = t2.c2, column_date = t2.c3
from t2
where t1.pk = t2.pk
"""
This is how to pass date and None to Psycopg:
from datetime import date
query = '''
update t
set (date_1, date_2) = (%s, %s)
'''
# mogrify returns the query string
print (cursor.mogrify(query, (date.today(), None)).decode('utf8'))
cursor.execute(query, (date.today(), None))
query = 'select * from t'
cursor.execute(query)
print (cursor.fetchone())
Output:
update t
set (date_1, date_2) = ('2017-03-16'::date, NULL)
(datetime.date(2017, 3, 16), None)

Passing parameter in psycopg2

I am trying to access PostgreSQL using psycopg2:
sql = """
SELECT
%s
FROM
table;
"""
cur = con.cursor()
input = (['id', 'name'], )
cur.execute(sql, input)
data = pd.DataFrame.from_records(cur.fetchall())
However, the returned result is:
0
0 [id, name]
1 [id, name]
2 [id, name]
3 [id, name]
4 [id, name]
If I try to access single column, it looks like:
0
0 id
1 id
2 id
3 id
4 id
It looks like something is wrong with the quoting around column name (single quote which should not be there):
In [49]: print cur.mogrify(sql, input)
SELECT
'id'
FROM
table;
but I am following doc: http://initd.org/psycopg/docs/usage.html#
Anyone can tell me what is going on here? Thanks a lot!!!

Use the AsIs extension
import psycopg2
from psycopg2.extensions import AsIs
column_list = ['id','name']
columns = ', '.join(column_list)
cursor.execute("SELECT %s FROM table", (AsIs(columns),))
And mogrify will show that it is not quoting the column names and passing them in as is.

Nowadays, you can use sql.Identifier to do this in a clean and secure way :
from psycopg2 import sql
statement = """
SELECT
{id}, {name}
FROM
table;
"""
with con.cursor() as cur:
cur.execute(sql.SQL(statement).format(
id=sql.SQL.Identifier("id"),
name=sql.SQL.Identifier("name")
))
data = pd.DataFrame.from_records(cur.fetchall())
More information on query composition here : https://www.psycopg.org/docs/sql.html

The reason was that you were passing the string representation of the array ['id', 'name'] as SQL query parameter but not as the column names. So the resulting query was similar to
SELECT 'id, name' FROM table
Looks your table had 5 rows so the returned result was just this literal for each row.
Column names cannot be the SQL query parameters but can be just the usual string parameters which you can prepare before executing the query-
sql = """
SELECT
%s
FROM
table;
"""
input = 'id, name'
sql = sql % input
print(sql)
cur = con.cursor()
cur.execute(sql)
data = pd.DataFrame.from_records(cur.fetchall())
In this case the resulting query is
SELECT
id, name
FROM
table;

Python Sqlite3 insert operation with a list of column names

Normally, if i want to insert values into a table, i will do something like this (assuming that i know which columns that the values i want to insert belong to):
conn = sqlite3.connect('mydatabase.db')
conn.execute("INSERT INTO MYTABLE (ID,COLUMN1,COLUMN2)\
VALUES(?,?,?)",[myid,value1,value2])
But now i have a list of columns (the length of list may vary) and a list of values for each columns in the list.
For example, if i have a table with 10 columns (Namely, column1, column2...,column10 etc). I have a list of columns that i want to update.Let's say [column3,column4]. And i have a list of values for those columns. [value for column3,value for column4].
How do i insert the values in the list to the individual columns that each belong?

As far as I know the parameter list in conn.execute works only for values, so we have to use string formatting like this:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE t (a integer, b integer, c integer)')
col_names = ['a', 'b', 'c']
values = [0, 1, 2]
conn.execute('INSERT INTO t (%s, %s, %s) values(?,?,?)'%tuple(col_names), values)
Please notice this is a very bad attempt since strings passed to the database shall always be checked for injection attack. However you could pass the list of column names to some injection function before insertion.
EDITED:
For variables with various length you could try something like
exec_text = 'INSERT INTO t (' + ','.join(col_names) +') values(' + ','.join(['?'] * len(values)) + ')'
conn.exec(exec_text, values)
# as long as len(col_names) == len(values)

Of course string formatting will work, you just need to be a bit cleverer about it.
col_names = ','.join(col_list)
col_spaces = ','.join(['?'] * len(col_list))
sql = 'INSERT INTO t (%s) values(%s)' % (col_list, col_spaces)
conn.execute(sql, values)

I was looking for a solution to create columns based on a list of unknown / variable length and found this question. However, I managed to find a nicer solution (for me anyway), that's also a bit more modern, so thought I'd include it in case it helps someone:
import sqlite3
def create_sql_db(my_list):
file = 'my_sql.db'
table_name = 'table_1'
init_col = 'id'
col_type = 'TEXT'
conn = sqlite3.connect(file)
c = conn.cursor()
# CREATE TABLE (IF IT DOESN'T ALREADY EXIST)
c.execute('CREATE TABLE IF NOT EXISTS {tn} ({nf} {ft})'.format(
tn=table_name, nf=init_col, ft=col_type))
# CREATE A COLUMN FOR EACH ITEM IN THE LIST
for new_column in my_list:
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
conn.close()
my_list = ["Col1", "Col2", "Col3"]
create_sql_db(my_list)
All my data is of the type text, so I just have a single variable "col_type" - but you could for example feed in a list of tuples (or a tuple of tuples, if that's what you're into):
my_other_list = [("ColA", "TEXT"), ("ColB", "INTEGER"), ("ColC", "BLOB")]
and change the CREATE A COLUMN step to:
for tupl in my_other_list:
new_column = tupl[0] # "ColA", "ColB", "ColC"
col_type = tupl[1] # "TEXT", "INTEGER", "BLOB"
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))

As a noob, I can't comment on the very succinct, updated solution #ron_g offered. While testing, though I had to frequently delete the sample database itself, so for any other noobs using this to test, I would advise adding in:
c.execute('DROP TABLE IF EXISTS {tn}'.format(
tn=table_name))
Prior the the 'CREATE TABLE ...' portion.
It appears there are multiple instances of
.format(
tn=table_name ....)
in both 'CREATE TABLE ...' and 'ALTER TABLE ...' so trying to figure out if it's possible to create a single instance (similar to, or including in, the def section).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

sqlite3: Preserve table names in query with JOIN - python

Related

What's the meaning of table1.{0} == table2.{0} in SQL join clause?

Count the number of non-null values in each column of each table in MySQL

how to set date value or null in sql formatted query string for a date column

Passing parameter in psycopg2

Python Sqlite3 insert operation with a list of column names

Categories

Resources