I'm using pd.read_sql and sqlserver to get the column names from a specific table. Pandas however only returns a empty dataframe whereas I'm getting valid results from ssms. Here is the code I'm using:
query = f'''select * from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME = 'Table_name'
'''
df_output = pd.read_sql(query, connection)
The result is as follows:
Empty DataFrame
Columns: [TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, ORDINAL_POSITION, COLUMN_DEFAULT, IS_NULLABLE, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, CHARACTER_OCTET_LENGTH, NUMERIC_PRECISION, NUMERIC_PRECISION_RADIX, NUMERIC_SCALE, DATETIME_PRECISION, CHARACTER_SET_CATALOG, CHARACTER_SET_SCHEMA, CHARACTER_SET_NAME, COLLATION_CATALOG, COLLATION_SCHEMA, COLLATION_NAME, DOMAIN_CATALOG, DOMAIN_SCHEMA, DOMAIN_NAME]
Index: []
Thanks in advance
Related
I want get a db into pandas df in Python. I use a following code:
self.cursor = self.connection.cursor()
query = """
SELECT * FROM `an_visit` AS `visit`
JOIN `an_ip` AS `ip` ON (`visit`.`ip_id` = `ip`.`ip_id`)
JOIN `an_useragent` AS `useragent` ON (`visit`.`useragent_id` = `useragent`.`useragent_id`)
JOIN `an_pageview` AS `pageview` ON (`visit`.`visit_id` = `pageview`.`visit_id`)
WHERE `visit`.`visit_id` BETWEEN %s AND %s
"""
self.cursor.execute(query, (start_id, end_id))
df = pd.DataFrame(self.cursor.fetchall())
This code works, but I want to get column names as well. I tried this question MySQL: Get column name or alias from query
but this did not work:
fields = map(lambda x: x[0], self.cursor.description)
result = [dict(zip(fields, row)) for row in self.cursor.fetchall()]
How can I get column names from db into df? Thanks
The easy way to include column names within recordset is to set dictionary=True as following:
self.cursor = self.connection.cursor(dictionary=True)
Then, all of fetch(), fetchall() and fetchone() are return dictionary with column name and data
check out links:
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursordict.html
https://mariadb-corporation.github.io/mariadb-connector-python/connection.html
What work to me is:
field_names = [i[0] for i in self.cursor.description ]
the best practice to list all the columns in the database is to execute this query form the connection cursor
SELECT TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA='<schema>' AND TABLE_NAME = '<table_name>'
There is a column_names properties in MySql cursor that you can use:
row = dict(zip(self.cursor.column_names, self.cursor.fetchone()))
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-column-names.html
I recently swapped to Python from using SAS. I want to do some SQL queries in Python. I do them as follows (table1 and table2 are pandas dataframes):
import pandas as pd
import sqlite3
sql = sqlite3.connect(':memory:')
c = sql.cursor()
table1.to_sql('table1sql', sql, if_exists='replace', index=False)
table2.to_sql('table2sql', sql, if_exists='replace', index=False)
df_sql = c.execute('''
SELECT a.*, b.*
FROM table1sql as a
LEFT JOIN table2sql as b
ON a.id = b.id
''')
df = pd.DataFrame(df_sql.fetchall())
df.columns = list(map(lambda x: x[0], c.description)) # get column names from sql cursor
I work with very large datasets. Sometimes up to 60 mio observations. The query itself takes seconds. However, "fetching" the dataset, i.e. transforming the sql dataframe to a pandas dataframe, takes ages.
In SAS, the entire SQL query would take seconds. Is the way I am doing it inefficient? Is there any other way of doing what I am trying to do?
import pandas as pd
import sqlite3
# Connect to sqlite3 instance
con = sqlite3.connect(':memory:')
# Read sqlite query results into a pandas DataFrame
df = pd.read_sql_query('''
SELECT a.*, b.*
FROM table1sql as a
LEFT JOIN table2sql as b
ON a.id = b.id
''',
con
)
# Verify that result of SQL query is stored in the dataframe
print(df.head())
con.close()
Docs : https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html?highlight=read%20sql%20query#pandas.read_sql_query
EDIT
Wait, I just re-read your question, the sources are already pandas dataframes???
Why are you pushing them to SQLite, just to read them back out again? Just use pd.merge()?
df = pd.merge(
table1,
table2,
how="left",
on="id",
suffixes=("_x", "_y"),
copy=True
)
Docs : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html#pandas.merge
I am using pydobc to connect with my sql server. I have a table from which I need to delete a column.
I can read this table, the code I used to read this is as follows:
import pyodbc
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0}; Server=xyz; database=db; Trusted_Connection=yes;")
cursor = cnxn.cursor()
df = pd.read_sql("select * from [db].[username].[mytable]", cnxn)
df.shape
Above code works as expected. But when I try to drop a column from this table it says can not find the object.
Here is my code trial
query = 'ALTER TABLE [db].[username].[mytable] DROP COLUMN [TEMP CELCIUS]'
cursor.execute(query)
My question is how to drop this column. To add here this column has a white space in it's name.
Try:
query = 'ALTER TABLE [db].[username].[mytable] DROP COLUMN "TEMP CELCIUS"'
OR:
query = 'ALTER TABLE [db].[username].[mytable] DROP COLUMN `TEMP CELCIUS`'
New to python and pandas, Im facing the following issue:
I would like to pass multiple string into a sql query and struggle to insert the delimiter ',' :
Example data
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)
# Remove header (Not sure whether that is necessary)
df.columns = df.iloc[0]
pd.read_sql(
"""
SELECT
*
FROM emptable
WHERE empID IN ('{}',)
""".format(df.ix[:, 0]), # Which corresponds to 'Alex','Bob','Clarke'
con = connection)
I tried different combinations, however None of them have worked out.
Demo:
sql_ = """
SELECT *
FROM emptable
WHERE empID IN ({})
"""
sql = sql_.format(','.join([x for x in ['?'] * len(df)]))
print(sql)
new = pd.read_sql(query, conn, params=tuple(df['Name']))
Output:
In [166]: print(sql)
SELECT *
FROM emptable
WHERE empID IN (?,?,?)
NOTE: this approach will not work if your DF is large, because the generated SQL string would be too big.
In this case you can save/dump Names in a helper temporary table and use it in SQL:
df[['Name']].to_sql('tmp', conn, if_exists='replace')
sql = """
SELECT *
FROM emptable
WHERE empID IN (select Name from tmp)
"""
new = pd.read_sql(sql, conn)
Python Version - 2.7.6
Pandas Version - 0.17.1
MySQLdb Version - 1.2.5
In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.
Now I need to run this
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.
With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.
I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.
Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password#localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(table, engine, schema=schema, if_exists='append', index=False)
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.