Missing column names when importing data from database (python + postgre sql)

Missing column names when importing data from database (python + postgre sql) - python

I am trying to import some data from the database (Postgre SQL) to work with them in Python. I tried with the code below, which seems quite similar to the ones I've found on the internet.
import psycopg2
import sqlalchemy as db
import pandas as pd
engine = db.create_engine('database specifications')
connection = engine.connect()
metadata = db.MetaData()
data = db.Table(tabela, metadata, schema=shema, autoload=True, autoload_with=engine)
query = db.select([data])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
df = pd.DataFrame(ResultSet)
However, it returns data without column names. What did I forget?

It turned out the only thing needed is adding
columns = data.columns.keys()
df.columns = columns
There is a great debate about that in this thread.

Related

Passing a non-executable SQL object to pandas read_sql() method

I'm facing this new warning within some Python 3.9 code:
/usr/local/lib/python3.9/site-packages/pandas/io/sql.py:761:
UserWarning:
pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connectionother DBAPI2
objects are not tested, please consider using SQLAlchemy
on such snippet:
import pandas as pd
from psycopg2 import sql
fields = ('object', 'category', 'number', 'mode')
query = sql.SQL("SELECT {} FROM categories;").format(
sql.SQL(', ').join(map(sql.Identifier, fields))
)
df = pd.read_sql(
sql=query,
con=connector() # custom function which returns db parameters as a psycopg2 connection object
)
It works like a charm for the moment, but according to the warning message, I'd like to switch to SQLAlchemy.
But by doing so:
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://', creator=connector)
df = pd.read_sql(
sql=query,
con=engine
)
it says:
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object:
Composed([SQL('SELECT '), Composed([Identifier('object'), SQL(', '),
Identifier('category'), SQL(', '), Identifier('number'), SQL(', '),
Identifier('mode')]), SQL(' FROM categories;')])
So I have to tweak it this way to avoid this error:
engine = create_engine('postgresql+psycopg2://', creator=connector)
conn = connector()
curs = conn.cursor()
df = pd.read_sql(
sql=query.as_string(conn), # non-pythonic, isn't it?
con=engine
)
I'm wondering what's the benefit of using an SQLAlchemy engine with pandas if I have to "decode" the query string using a psycpg2 connection context... (in some specific cases where the query string is a binary string I have to "decode" it by applying .decode('UTF-8')...)
How can I rewrite the DataFrame construction in a proper (i.e. the best) way by using an SQLAlchemy engine with pandas?
The pandas doc is not 100% clear for me:
Parameters
sqlstr or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.
Version info:
python: 3.9
pandas: '1.4.3'
sqlalchemy: '1.4.35'
psycopg2: '2.9.3 (dt dec pq3 ext lo64)'

The query can be expressed in SQLAlchemy syntax like this:
import pandas as pd
import sqlalchemy as sa
fields = ('object', 'category', 'number', 'mode')
# Adjust engine configuration to match your environment.
engine = sa.create_engine('postgresql+psycopg2:///test')
metadata = sa.MetaData()
# Reflect the table from the database.
tbl = sa.Table('categories', metadata, autoload_with=engine)
# Get column objects for column names.
columns = [tbl.c[name] for name in fields]
query = sa.select(*columns)
df = pd.read_sql(sql=query, con=engine)
print(df)

How to keep a Postgresql database connection open with Python

I am running a loop, and in each iteration I go to the PostGres database to find certain matching records. My issue is that it takes too long to open a new database connection at each time through the loop.
import psycopg2
from psycopg2 import sql
from sqlalchemy import create_engine
engine = create_engine(mycredentials)
db_connect = engine.connect()
query = ("""SELECT ("CustomerId") AS "CustomerId",
("LastName") AS "LastName",
("FirstName") AS "FirstName",
("City") AS "City",
FROM
public."postgres_table"
WHERE (LEFT("LastName", 4) = %(blocking_lastname)s
AND LEFT("City", 4) = %(blocking_city)s)
OR (LEFT("FirstName", 3) = %(blocking_firstname)s
AND LEFT("City", 4) = %(blocking_city)s)
""")
for index, row in df_loop.iterrows():
blocking_lastname = row['LastName'][:4]
blocking_firstname = row['FirstName'][:3]
blocking_city = row['City'][:4]
df = pd.read_sql(query, db_connect, params= {
'blocking_firstname': blocking_firstname,
'blocking_lastname': blocking_lastname,
'blocking_city': blocking_city})
This code works, but it doesn't appear to be keeping the database connection open since there is too much latency with the query. (Note: The table columns have indexes.)
UPDATE: I updated the code above, using db_connect as an object outside of the loop rather than engine.connect() as a call in pd.read_sql() as I originally had it written.

How do I insert my Python dictionary into my SQL Server database table?

I have a dictionary with 3 keys which correspond to field names in a SQL Server table. The values of these keys come from an excel file and I store this dictionary in a dataframe which I now need to insert into a SQL table. This can all be seen in the code below:
import pandas as pd
import pymssql
df=[]
fp = "file path"
data = pd.read_excel(fp,sheetname ="CRM View" )
row_date = data.loc[3, ]
row_sita = "ABZPD"
row_event = data.iloc[12, :]
df = pd.DataFrame({'date': row_date,
'sita': row_sita,
'event': row_event
}, index=None)
df = df[4:]
df = df.fillna("")
print(df)
My question is how do I insert this dictionary into a SQL table now?
Also, as a side note, this code is part of a loop which needs to go through several excel files one by one, insert the data into dictionary then into SQL then delete the data in the dictionary and start again with the next excel file.

You could try something like this:
import MySQLdb
# connect
conn = MySQLdb.connect("127.0.0.1","username","passwore","table")
x = conn.cursor()
# write
x.execute('INSERT into table (row_date, sita, event) values ("%d", "%d", "%d")' % (row_date, sita, event))
# close
conn.commit()
conn.close()
You might have to change it a little based on your SQL restrictions, but should give you a good start anyway.

For the pandas dataframe, you can use the pandas built-in method to_sql to store in db. Following is the way to use it.
import sqlalchemy as sa
params = urllib.quote_plus("DRIVER={};SERVER={};DATABASE={};Trusted_Connection=True;".format("{SQL Server}",
"<db_server_url>",
"<db_name>"))
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = sa.create_engine(conn_str)
df.to_sql(<table_name>, engine,schema=<schema_name>, if_exists="append", index=False)
For this method you you will need to install sqlalchemy package.
pip install sqlalchemy
You will also need to setup the MSSql DSN on the machine.

Python to group SQL data

I used to read the data from CSV file, while I just imported all CSV data in SQL database, but I have difficulty in extracting data using Python from SQL.
My original code of read CSV is like this:
import pandas as pd
stock_data = pd.read_csv(filepath_or_buffer='stock_data_w.csv', parse_dates=[u'date'], encoding='gbk')
stock_data[u'change_weekly'] = stock_data.groupby(u'code')[u'change'].shift(-1)
Now I want to read data from SQL, here is my code, but it doesn't work and I am not sure how to sort it out:
import pandas as pd
import MySQLdb
db = MySQLdb.connect(host='localhost', user='root', passwd='232323', db='test', port=3306)
cur = db.cursor()
cur.execute("SELECT * FROM stock_data_w")
stock_data = pd.DataFrame(data=cur.fetchall(), columns=[i[0] for i in cur.description])
stock_data[u'change_weekly'] = stock_data.groupby(u'code')[u'change'].shift(-1)
the error is: "raise PandasError('DataFrame constructor not properly called!') pandas.core.common.PandasError: DataFrame constructor not properly called!"

Use below way to convert your cursor object to crate data frame.
stock_data = pd.DataFrame(data=cursor.fetchall(), index=None,
columns=cursor.keys())
print stock_data
In mysqldb, columns=[i[0] for i in cursor.description]
or
Make your connection with alchemy and use,
stock_data = pd.read_sql("SELECT * from stock_data_w",
con= cnx,parse_dates=['date'])
I'm not sure whether mysql.connector is supported in pandas read_sql(). You can give a try and let us know :)

How can I convert Sqlalchemy table object to Pandas DataFrame?

Is it possible to convert retrieved SqlAlchemy table object into Pandas DataFrame or do I need to write a particular function for that aim ?

This might not be the most efficient way, but it has worked for me to reflect a database table using automap_base and then convert it to a Pandas DataFrame.
import pandas as pd
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
connection_string = "your:db:connection:string:here"
engine = create_engine(connection_string, echo=False)
session = Session(engine)
# sqlalchemy: Reflect the tables
Base = automap_base()
Base.prepare(engine, reflect=True)
# Mapped classes are now created with names by default matching that of the table name.
Table_Name = Base.classes.table_name
# Example query with filtering
query = session.query(Table_Name).filter(Table_Name.language != 'english')
# Convert to DataFrame
df = pd.read_sql(query.statement, engine)
df.head()

I think I've tried this before. It's hacky, but for whole-table ORM query results, this should work:
import pandas as pd
cols = [c.name for c in SQLA_Table.__table__.columns]
pk = [c.name for c in SQLA_Table.__table__.primary_key]
tuplefied_list = [(getattr(item, col) for col in cols) for item in result_list]
df = pd.DataFrame.from_records(tuplefied_list, index=pk, columns=cols)
Partial query results (NamedTuples) will also work, but you have to construct the DataFrame columns and index to match your query.

Pandas database functions such as read_sql_query accept SQLAlchemy connection objects (so-called SQLAlchemy connectables, see pandas docs and sqlalchemy docs). Here's one example of using such object called my_connection:
import pandas as pd
import sqlalchemy
# create SQLAlchemy Engine object instance
my_engine = sqlalchemy.create_engine(f"{dialect}+{driver}://{login}:{password}#{host}/{db_name}")
# connect to the database using the newly created Engine instance
my_connection = my_engine.connect()
# run SQL query
my_df = pd.read_sql_query(sql=my_sql_query, con=my_connection)

I have a simpler way:
# Step1: import
import pandas as pd
from sqlalchemy import create_engine
# Step2: create_engine
connection_string = "sqlite:////absolute/path/to/database.db"
engine = create_engine(connection_string)
# Step3: select table
print (engine.table_names())
# Step4: read table
table_df = pd.read_sql_table('table_name', engine)
table_df.head()
For other types of connection_string, SQLAlchemy 1.4 Documentation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Missing column names when importing data from database (python + postgre sql) - python

It turned out the only thing needed is adding columns = data.columns.keys() df.columns = columns There is a great debate about that in this thread.

Related

Passing a non-executable SQL object to pandas read_sql() method

How to keep a Postgresql database connection open with Python

How do I insert my Python dictionary into my SQL Server database table?

Python to group SQL data

How can I convert Sqlalchemy table object to Pandas DataFrame?

Categories

Resources