I am having troubles when trying to insert data from a df into an Oracle database table, this is the error: DatabaseError: ORA-01036: illegal variable name/number
These are the steps I did:
This is the dataframe I have imported from yfinance package and elaborated in order to respect the integrity of the data types of my df
I transformed my df into a list, these are my data in the list:
this is the table where I want to insert my data:
This is the code:
sql_insert_temp = "INSERT INTO TEMPO('GIORNO','MESE','ANNO') VALUES(:2,:3,:4)"
index = 0
for i in df.iterrows():
cursor.execute(sql_insert_temp,df_list[index])
index += 1
connection.commit()
I have tried a single insert in the sqldeveloper worksheet, using the data you can see in the list, and it worked, so I guess I have made some mistake in the code. I have seen other discussions, but I couldn't find any solution to my problem.. Do you have any idea of how I can solve this or maybe is it possible to do this in another way?
I have tried to print the iterated queries and that's the result, that's why it's not inserting my data:
If you already have a pandas DataFrame, then you should be able to use the to_sql() method provided by the pandas library.
import cx_Oracle
import sqlalchemy
import pandas as pd
DATABASE = 'DB'
SCHEMA = 'DEV'
PASSWORD = 'password'
connection_string = f'oracle://{SCHEMA}:{PASSWORD}#{DATABASE}'
db_conn = sqlalchemy.create_engine(connection_string)
df_to_insert = df[['GIORNO', 'MESE', 'ANNO']] #creates a dataframe with only the columns you want to insert
df_to_insert.to_sql(name='TEMPO', con=db_connection, if_exists='append')
name is the name of the table
con is the connection object
if_exists='append' will add the rows to end of the table. There are other options to add fail or drop and re-create the table
other parameters can be found on the pandas website. pandas.to_sql()
Related
I have a table in a database in sqlite3 in which I am trying to add a list with values from python as a new column. I can only find how to add a new column without values or change specific rows, could somebody help me with this?
This is probably the sort of thing you can google.
I cant find any way to add data to a column on creation, but you can add a default value (ALTER TABLE table_name ADD COLUMN column_name NOT NULL DEFAULT default_value) if that helps at all. Then afterwards you are going to have to add the data separately. There are a few places to find how to do that. These questions might be relevant:
Populate Sqlite3 column with data from Python list using for loop
Adding column in SQLite3, then filling it
You can read the database into a pandas dataframe, add a list as a column to that dataframe, then replace the original file from the dataframe:
import sqlite3
import pandas as pd
conn = sqlite3.connect("my_data.db")
df = pd.read_sql_query("SELECT * FROM my_table", conn)
conn.close()
df['new_column'] = my_list
conn = sqlite3.connect("my_data.db")
df.to_sql(name='my_table', if_exists='replace', con=conn)
conn.close()
So I have a large (7GB) dataset stored in postgres that I'm trying to import into Dask. I'm trying the read_sql_table function, but keep getting ArgumentErrors.
My info in postgres is the following:
database is "my_database"
schema is "public"
data table is "table"
username is "fred"
password is "my_pass"
index in postgres is 'idx'
I am trying to get this piece of code to work:
df = dd.read_sql_table('public.table', 'jdbc:postgresql://localhost/my_database?user=fred&password=my_pass', index_col='idx')
Am I formatting something incorrectly?
I was finally able to figure it out by using psycopg2. The answer is below:
df = dd.read_sql_table('table', 'postgresql+psycopg2://postgres:fred#localhost/my_database', index_col = 'idx')
Additionally, I had to create a different index in the postgres table. The original index needed to be a whole separate column. I did this with the following line in Postgres:
alter table table add idx serial;
I have a data frame in jupyter notebook. My objective is to import this df into snowflake as a new table.
Is there any way to write a new table into snowflake directly without defining any table columns' names and types?
i am using
import snowflake.connector as snow
from snowflake.connector.pandas_tools import write_pandas
from sqlalchemy import create_engine
import pandas as pd
connection = snowflake.connector.connect(
user='XXX',
password='XXX',
account='XXX',
warehouse='COMPUTE_WH',
database= 'SNOWPLOW',
schema = 'DBT_WN'
)
df.to_sql('aaa', connection, index = False)
it ran into an error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
Can anyone provide the sample code to fix this issue?
Here's one way to do it -- apologies in advance for my code formatting in SO combined with python's spaces vs tabs "model". Check the tabs/spaces if you cut-n-paste ...
Because of the Snowsql security model, in your connection parameters be sure to specify the ROLE you are using as well. (Often the default role is 'PUBLIC')
Since you already have sqlAlchemy in the mix ... this idea doesn't use the snowflake write_pandas, so it isn't a good answer for large dataframes ... Some odd behaviors with sqlAlchemy and Snowflake; make sure the dataframe column names are upper case; yet use a lowercase table name in the argument to to_sql() ...
def df2sf_alch(target_df, target_table):
# create a sqlAlchemy connection object
engine = create_engine(f"snowflake://{your-sf-account-url}",
creator=lambda:connection)
# re/create table in Snowflake
try:
# sqlAlchemy creates table based on a lower-case table name
# and it works to have uppercase df column names
target_df.to_sql(target_table.lower(), con=engine, if_exists='replace', index=False)
print(f"Table {target_table.upper()} re/created")
except Exception as e:
print(f"Could not replace table {target_table.upper()}", exc_info=1)
nrows = connection.cursor().execute(f"select count(*) from {target_table}").fetchone()[0]
print(f"Table {target_table.upper()} rows = {nrows}")
Note this function needs to be changed to reflect the appropriate 'snowflake account url' in order to create the sqlAlchemy connection object. Also, assuming the case naming oddities are taken care of in the df, along with your already defined connection, you'd call this function simply passing the df and the name of the table, like df2sf_alch(my_df, 'MY_TABLE')
Is there any way to do an SQL update-where from a dataframe without iterating through each line? I have a postgresql database and to update a table in the db from a dataframe I would use psycopg2 and do something like:
con = psycopg2.connect(database='mydb', user='abc', password='xyz')
cur = con.cursor()
for index, row in df.iterrows():
sql = 'update table set column = %s where column = %s'
cur.execute(sql, (row['whatver'], row['something']))
con.commit()
But on the other hand if im either reading a table from sql or writing an entire dataframe to sql (with no update-where), then I would just use pandas and sqlalchemy. Something like:
engine = create_engine('postgresql+psycopg2://user:pswd#mydb')
df.to_sql('table', engine, if_exists='append')
It's great just having a 'one-liner' using to_sql. Isn't there something similar to do an update-where from pandas to postgresql? Or is the only way to do it by iterating through each row like i've done above. Isn't iterating through each row an inefficient way to do it?
Consider a temp table which would be exact replica of your final table, cleaned out with each run:
engine = create_engine('postgresql+psycopg2://user:pswd#mydb')
df.to_sql('temp_table', engine, if_exists='replace')
sql = """
UPDATE final_table AS f
SET col1 = t.col1
FROM temp_table AS t
WHERE f.id = t.id
"""
with engine.begin() as conn: # TRANSACTION
conn.execute(sql)
It looks like you are using some external data stored in df for the conditions on updating your database table. If it is possible why not just do a one-line sql update?
If you are working with a smallish database (where loading the whole data to the python dataframe object isn't going to kill you) then you can definitely conditionally update the dataframe after loading it using read_sql. Then you can use a keyword arg if_exists="replace" to replace the DB table with the new updated table.
df = pandas.read_sql("select * from your_table;", engine)
#update information (update your_table set column = "new value" where column = "old value")
#still may need to iterate for many old value/new value pairs
df[df['column'] == "old value", "column"] = "new value"
#send data back to sql
df.to_sql("your_table", engine, if_exists="replace")
Pandas is a powerful tool, where limited SQL support was just a small feature at first. As time goes by people are trying to use pandas as their only database interface software. I don't think pandas was ever meant to be an end-all for database interaction, but there are a lot of people working on new features all the time. See: https://github.com/pandas-dev/pandas/issues
I have so far not seen a case where the pandas sql connector can be used in any scalable way to update database data. It may have seemed like a good idea to build one, but really, for operational work it just does not scale.
What I would recommend is to dump your entire dataframe as CSV using
df.to_csv('filename.csv', encoding='utf-8')
Then loading the CSV into the database using COPY for PostgreSQL or LOAD DATA INFILE for MySQL.
If you do not make other changes to the table in question while the data is being manipulated by pandas, you can just load into the table.
If there are concurrency issues, you will have to load the data into a staging table that you then use to update your primary table from.
In the later case, your primary table needs to have a datetime which tells you when the latest modification to it was so you can determine if your pandas changes are the latest or if the database changes should remain.
I was wondering why donnt you update the df first based on your equation and then store the df to the database, you could use if_exists='replace', to store on the same table.
In case the column names have not changed I prefer removing all rows and then appending the data to the now empty table. Otherwise, dependent views will have to be regenerated as well:
from sqlalchemy import create_engine
from sqlalchemy import MetaData
engine = create_engine(f'postgresql://postgres:{pw}#localhost:5432/table')
# Get main table and delete all rows
# without deleting the table
meta = MetaData(engine)
meta.reflect(engine)
table = meta.tables['table']
del_st = table.delete()
conn = engine.connect()
res = conn.execute(del_st)
# Insert new data
df.to_sql('table', engine, if_exists='append', index=False)
I try the first answer and find it works not so well, then I change some parts to pass all situation by using pandas+sqlalchemy to update.
def update_to_sql(self, table_name, key_name)
a = []
self.table = table_name
self.primary_key = key_name
for col in df.columns:
if col == self.primary_key:
continue
a.append("f.{col}=t.{col}".format(col=col))
df.to_sql('temporary_table', self.sql_engine, if_exists='replace', index=False)
update_stmt_1 = "UPDATE {final_table} AS f".format(final_table=self.table)
update_stmt_2 = " INNER JOIN (SELECT * FROM temporary_table) AS t ON t.{primary_key}=f.{primary_key} ".format(primary_key=self.primary_key)
update_stmt_3 = "SET "
update_stmt_4 = ", ".join(a)
update_stmt_5 = update_stmt_1 + update_stmt_2 + update_stmt_3 + update_stmt_4 + ";"
print(update_stmt_5)
with self.sql_engine.begin() as cnx:
cnx.execute(update_stmt_5)
Here is an approach that I found to be somewhat clean. This utilizes sqlalchemy. It only updates one column at a time but can easily be generalized.
def dataframe_update(df, table, engine, primary_key, column):
md = MetaData(engine)
table = Table(table, md, autoload=True)
session = sessionmaker(bind=engine)()
for _, row in df.iterrows():
session.query(table).filter(table.columns[primary_key] == row[primary_key]).update({column: row[column]})
session.commit()
I'm using the PyTd teradata module to query data from Teradata and want to read it into a Pandas DataFrame
import teradata
import pandas as pd
# teradata connection
udaExec = teradata.UdaExec(appName="Example", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", system="", username="", password="")
# Create empty dataframe with column names
query = session.execute("SELECT TOP 1 * FROM table")
cols = [str(d[0]) for d in query.description]
df = pd.DataFrame(columns=cols)
# Read data into dataframe
for row in session.execute("SELECT * FROM table"):
print type(row)
df.append(row)
row is of teradata.util.Row class and can't be appended to the dataframe. I tried converting it to a list but the format gets messed up.
How can I read my data into a dataframe from Teradata using the teradata module? I'm not able to use the pyodbc module for this.
Is there a better way to create the empty dataframe with column names matching those in the database?
You can use pandas.read_sql :)
import teradata
import pandas as pd
# teradata connection
udaExec = teradata.UdaExec(appName="Example", version="1.0",
logConsole=False)
with udaExec.connect(method="odbc", system="", username="", password="") as session:
query ="SELECT * FROM table"
df = pd.read_sql(query,session)
Using ‘with’ will ensure close of session after the query. I hope that helped :)
I know its a little late. But putting a note nevertheless.
There are a few questions here.
How can I read my data into a dataframe from Teradata using the
teradata module?
At the end of the day, a teradata.util.Row is simply a list. So a simple list operation should help you get things out of Row.
','.join(str(item) for item in row)
kinda thing.
Pushing that into a pandas dataframe should be a list to df conversion exercise.
I'm not able to use the pyodbc module for this.
I used teradata's python module to do a LDAP auth. All worked fine. Didn't have this requirement. Sorry.
Is there a better way to create the empty dataframe with column names matching those in the database?
I assume, given a table name, you can query to figure it schema (table names) >> convert that to a list and create your pandas df?
I know this is very late.
You can use read_sql() from pandas module. It returns pandas dataframe.
Here is the reference:
http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_sql.html