I'm using the PyTd teradata module to query data from Teradata and want to read it into a Pandas DataFrame
import teradata
import pandas as pd
# teradata connection
udaExec = teradata.UdaExec(appName="Example", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", system="", username="", password="")
# Create empty dataframe with column names
query = session.execute("SELECT TOP 1 * FROM table")
cols = [str(d[0]) for d in query.description]
df = pd.DataFrame(columns=cols)
# Read data into dataframe
for row in session.execute("SELECT * FROM table"):
print type(row)
df.append(row)
row is of teradata.util.Row class and can't be appended to the dataframe. I tried converting it to a list but the format gets messed up.
How can I read my data into a dataframe from Teradata using the teradata module? I'm not able to use the pyodbc module for this.
Is there a better way to create the empty dataframe with column names matching those in the database?
You can use pandas.read_sql :)
import teradata
import pandas as pd
# teradata connection
udaExec = teradata.UdaExec(appName="Example", version="1.0",
logConsole=False)
with udaExec.connect(method="odbc", system="", username="", password="") as session:
query ="SELECT * FROM table"
df = pd.read_sql(query,session)
Using ‘with’ will ensure close of session after the query. I hope that helped :)
I know its a little late. But putting a note nevertheless.
There are a few questions here.
How can I read my data into a dataframe from Teradata using the
teradata module?
At the end of the day, a teradata.util.Row is simply a list. So a simple list operation should help you get things out of Row.
','.join(str(item) for item in row)
kinda thing.
Pushing that into a pandas dataframe should be a list to df conversion exercise.
I'm not able to use the pyodbc module for this.
I used teradata's python module to do a LDAP auth. All worked fine. Didn't have this requirement. Sorry.
Is there a better way to create the empty dataframe with column names matching those in the database?
I assume, given a table name, you can query to figure it schema (table names) >> convert that to a list and create your pandas df?
I know this is very late.
You can use read_sql() from pandas module. It returns pandas dataframe.
Here is the reference:
http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_sql.html
Related
I have a table in a database in sqlite3 in which I am trying to add a list with values from python as a new column. I can only find how to add a new column without values or change specific rows, could somebody help me with this?
This is probably the sort of thing you can google.
I cant find any way to add data to a column on creation, but you can add a default value (ALTER TABLE table_name ADD COLUMN column_name NOT NULL DEFAULT default_value) if that helps at all. Then afterwards you are going to have to add the data separately. There are a few places to find how to do that. These questions might be relevant:
Populate Sqlite3 column with data from Python list using for loop
Adding column in SQLite3, then filling it
You can read the database into a pandas dataframe, add a list as a column to that dataframe, then replace the original file from the dataframe:
import sqlite3
import pandas as pd
conn = sqlite3.connect("my_data.db")
df = pd.read_sql_query("SELECT * FROM my_table", conn)
conn.close()
df['new_column'] = my_list
conn = sqlite3.connect("my_data.db")
df.to_sql(name='my_table', if_exists='replace', con=conn)
conn.close()
I am having troubles when trying to insert data from a df into an Oracle database table, this is the error: DatabaseError: ORA-01036: illegal variable name/number
These are the steps I did:
This is the dataframe I have imported from yfinance package and elaborated in order to respect the integrity of the data types of my df
I transformed my df into a list, these are my data in the list:
this is the table where I want to insert my data:
This is the code:
sql_insert_temp = "INSERT INTO TEMPO('GIORNO','MESE','ANNO') VALUES(:2,:3,:4)"
index = 0
for i in df.iterrows():
cursor.execute(sql_insert_temp,df_list[index])
index += 1
connection.commit()
I have tried a single insert in the sqldeveloper worksheet, using the data you can see in the list, and it worked, so I guess I have made some mistake in the code. I have seen other discussions, but I couldn't find any solution to my problem.. Do you have any idea of how I can solve this or maybe is it possible to do this in another way?
I have tried to print the iterated queries and that's the result, that's why it's not inserting my data:
If you already have a pandas DataFrame, then you should be able to use the to_sql() method provided by the pandas library.
import cx_Oracle
import sqlalchemy
import pandas as pd
DATABASE = 'DB'
SCHEMA = 'DEV'
PASSWORD = 'password'
connection_string = f'oracle://{SCHEMA}:{PASSWORD}#{DATABASE}'
db_conn = sqlalchemy.create_engine(connection_string)
df_to_insert = df[['GIORNO', 'MESE', 'ANNO']] #creates a dataframe with only the columns you want to insert
df_to_insert.to_sql(name='TEMPO', con=db_connection, if_exists='append')
name is the name of the table
con is the connection object
if_exists='append' will add the rows to end of the table. There are other options to add fail or drop and re-create the table
other parameters can be found on the pandas website. pandas.to_sql()
So I have a large (7GB) dataset stored in postgres that I'm trying to import into Dask. I'm trying the read_sql_table function, but keep getting ArgumentErrors.
My info in postgres is the following:
database is "my_database"
schema is "public"
data table is "table"
username is "fred"
password is "my_pass"
index in postgres is 'idx'
I am trying to get this piece of code to work:
df = dd.read_sql_table('public.table', 'jdbc:postgresql://localhost/my_database?user=fred&password=my_pass', index_col='idx')
Am I formatting something incorrectly?
I was finally able to figure it out by using psycopg2. The answer is below:
df = dd.read_sql_table('table', 'postgresql+psycopg2://postgres:fred#localhost/my_database', index_col = 'idx')
Additionally, I had to create a different index in the postgres table. The original index needed to be a whole separate column. I did this with the following line in Postgres:
alter table table add idx serial;
I have a data frame in jupyter notebook. My objective is to import this df into snowflake as a new table.
Is there any way to write a new table into snowflake directly without defining any table columns' names and types?
i am using
import snowflake.connector as snow
from snowflake.connector.pandas_tools import write_pandas
from sqlalchemy import create_engine
import pandas as pd
connection = snowflake.connector.connect(
user='XXX',
password='XXX',
account='XXX',
warehouse='COMPUTE_WH',
database= 'SNOWPLOW',
schema = 'DBT_WN'
)
df.to_sql('aaa', connection, index = False)
it ran into an error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
Can anyone provide the sample code to fix this issue?
Here's one way to do it -- apologies in advance for my code formatting in SO combined with python's spaces vs tabs "model". Check the tabs/spaces if you cut-n-paste ...
Because of the Snowsql security model, in your connection parameters be sure to specify the ROLE you are using as well. (Often the default role is 'PUBLIC')
Since you already have sqlAlchemy in the mix ... this idea doesn't use the snowflake write_pandas, so it isn't a good answer for large dataframes ... Some odd behaviors with sqlAlchemy and Snowflake; make sure the dataframe column names are upper case; yet use a lowercase table name in the argument to to_sql() ...
def df2sf_alch(target_df, target_table):
# create a sqlAlchemy connection object
engine = create_engine(f"snowflake://{your-sf-account-url}",
creator=lambda:connection)
# re/create table in Snowflake
try:
# sqlAlchemy creates table based on a lower-case table name
# and it works to have uppercase df column names
target_df.to_sql(target_table.lower(), con=engine, if_exists='replace', index=False)
print(f"Table {target_table.upper()} re/created")
except Exception as e:
print(f"Could not replace table {target_table.upper()}", exc_info=1)
nrows = connection.cursor().execute(f"select count(*) from {target_table}").fetchone()[0]
print(f"Table {target_table.upper()} rows = {nrows}")
Note this function needs to be changed to reflect the appropriate 'snowflake account url' in order to create the sqlAlchemy connection object. Also, assuming the case naming oddities are taken care of in the df, along with your already defined connection, you'd call this function simply passing the df and the name of the table, like df2sf_alch(my_df, 'MY_TABLE')
I want to query a PostgreSQL database and return the output as a Pandas dataframe.
I created a connection to the database with 'SqlAlchemy':
from sqlalchemy import create_engine
engine = create_engine('postgresql://user#localhost:5432/mydb')
I write a Pandas dataframe to a database table:
i=pd.read_csv(path)
i.to_sql('Stat_Table',engine,if_exists='replace')
Based on the docs, looks like pd.read_sql_query() should accept a SQLAlchemy engine:
a=pd.read_sql_query('select * from Stat_Table',con=engine)
But it throws an error:
ProgrammingError: (ProgrammingError) relation "stat_table" does not exist
I'm using Pandas version 0.14.1.
What's the right way to do this?
You are bitten by the case (in)sensitivity issues with PostgreSQL. If you quote the table name in the query, it will work:
df = pd.read_sql_query('select * from "Stat_Table"',con=engine)
But personally, I would advise to just always use lower case table names (and column names), also when writing the table to the database to prevent such issues.
From the PostgreSQL docs (http://www.postgresql.org/docs/8.0/static/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS):
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case
To explain a bit more: you have written a table with the name Stat_Table to the database (and sqlalchemy will quote this name, so it will be written as "Stat_Table" in the postgres database). When doing the query 'select * from Stat_Table' the unquoted table name will be converted to lower case stat_table, and so you get the message that this table is not found.
See eg also Are PostgreSQL column names case-sensitive?
Read postgres sql data in pandas in given below and image link
import psycopg2 as pg
import pandas.io.sql as psql
connection = pg.connect("host=localhost dbname=kinder user=your_username password=your_password")
dataframe = psql.read_sql('SELECT * FROM product_product', connection)
product_category = psql.read_sql_query('select * from product_category', connection)
https://i.stack.imgur.com/1bege.png
Late to the party here, but to give you a full example of this:
import pandas as pd
import psycopg2 as pg
engine = pg.connect("dbname='my_db_name' user='pguser' host='127.0.0.1' port='15432' password='pgpassword'")
df = pd.read_sql('select * from Stat_Table', con=engine)
You need to run the following to install the dependencies for ubuntu:
pip install pandas psycopg2-binary SQLAlchemy
Pandas docs on the subject here
The error message is telling you that a table named:
stat_table
does not exist( a relation is a table in postgres speak). So, of course you can't select rows from it. Check your db after executing:
i.to_sql('Stat_Table',engine,if_exists='replace')
and see if a table by that name got created in your db.
When I use your read statement:
df = pd.read_sql_query('select * from Stat_Table',con=engine)
I get the data back from a postgres db, so there's nothing wrong with it.
import sqlalchemy
import psycopg2
engine = sqlalchemy.create_engine('postgresql://user#localhost:5432/mydb')
You must specify schema and table
df = pd.read_sql_query("""select * from "dvd-rental".film""", con=engine)