I have a python query which retrieves data through an API. The data returned is a dictionary. I want to save the data in sqlite3 database. There are two main columns('scan','tests'). I'm only interested in the data inside these two columns e.g. 'grade': 'D+', 'likelihood_indicator': 'MEDIUM'.
Any help is appreciated.
import pandas as pd
from httpobs.scanner.local import scan
import sqlite3
website_to_scan = 'digitalnz.org'
scan_site = scan(website_to_scan)
df = pd.DataFrame(scan_site)
print(scan_site)
print(df)`
Results of print(scan_site):
Results of print(df) attached:
This depends on how you have set up your table in sqlite but essentially you would write an INSERT INTO SQL clause and use the connection.execute() function in Python and pass your SQL string as an argument.
Its difficult to give a more precise answer for your question (i.e. code) because you haven't declared the connection variable. Lets imagine you already have your sqlite DB set up with the connection:
connection_variable.execute("""INSERT INTO table_name
(column_name1, column_name2) VALUES (value1, value2);""")
Related
I am having troubles when trying to insert data from a df into an Oracle database table, this is the error: DatabaseError: ORA-01036: illegal variable name/number
These are the steps I did:
This is the dataframe I have imported from yfinance package and elaborated in order to respect the integrity of the data types of my df
I transformed my df into a list, these are my data in the list:
this is the table where I want to insert my data:
This is the code:
sql_insert_temp = "INSERT INTO TEMPO('GIORNO','MESE','ANNO') VALUES(:2,:3,:4)"
index = 0
for i in df.iterrows():
cursor.execute(sql_insert_temp,df_list[index])
index += 1
connection.commit()
I have tried a single insert in the sqldeveloper worksheet, using the data you can see in the list, and it worked, so I guess I have made some mistake in the code. I have seen other discussions, but I couldn't find any solution to my problem.. Do you have any idea of how I can solve this or maybe is it possible to do this in another way?
I have tried to print the iterated queries and that's the result, that's why it's not inserting my data:
If you already have a pandas DataFrame, then you should be able to use the to_sql() method provided by the pandas library.
import cx_Oracle
import sqlalchemy
import pandas as pd
DATABASE = 'DB'
SCHEMA = 'DEV'
PASSWORD = 'password'
connection_string = f'oracle://{SCHEMA}:{PASSWORD}#{DATABASE}'
db_conn = sqlalchemy.create_engine(connection_string)
df_to_insert = df[['GIORNO', 'MESE', 'ANNO']] #creates a dataframe with only the columns you want to insert
df_to_insert.to_sql(name='TEMPO', con=db_connection, if_exists='append')
name is the name of the table
con is the connection object
if_exists='append' will add the rows to end of the table. There are other options to add fail or drop and re-create the table
other parameters can be found on the pandas website. pandas.to_sql()
I have a data frame in jupyter notebook. My objective is to import this df into snowflake as a new table.
Is there any way to write a new table into snowflake directly without defining any table columns' names and types?
i am using
import snowflake.connector as snow
from snowflake.connector.pandas_tools import write_pandas
from sqlalchemy import create_engine
import pandas as pd
connection = snowflake.connector.connect(
user='XXX',
password='XXX',
account='XXX',
warehouse='COMPUTE_WH',
database= 'SNOWPLOW',
schema = 'DBT_WN'
)
df.to_sql('aaa', connection, index = False)
it ran into an error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
Can anyone provide the sample code to fix this issue?
Here's one way to do it -- apologies in advance for my code formatting in SO combined with python's spaces vs tabs "model". Check the tabs/spaces if you cut-n-paste ...
Because of the Snowsql security model, in your connection parameters be sure to specify the ROLE you are using as well. (Often the default role is 'PUBLIC')
Since you already have sqlAlchemy in the mix ... this idea doesn't use the snowflake write_pandas, so it isn't a good answer for large dataframes ... Some odd behaviors with sqlAlchemy and Snowflake; make sure the dataframe column names are upper case; yet use a lowercase table name in the argument to to_sql() ...
def df2sf_alch(target_df, target_table):
# create a sqlAlchemy connection object
engine = create_engine(f"snowflake://{your-sf-account-url}",
creator=lambda:connection)
# re/create table in Snowflake
try:
# sqlAlchemy creates table based on a lower-case table name
# and it works to have uppercase df column names
target_df.to_sql(target_table.lower(), con=engine, if_exists='replace', index=False)
print(f"Table {target_table.upper()} re/created")
except Exception as e:
print(f"Could not replace table {target_table.upper()}", exc_info=1)
nrows = connection.cursor().execute(f"select count(*) from {target_table}").fetchone()[0]
print(f"Table {target_table.upper()} rows = {nrows}")
Note this function needs to be changed to reflect the appropriate 'snowflake account url' in order to create the sqlAlchemy connection object. Also, assuming the case naming oddities are taken care of in the df, along with your already defined connection, you'd call this function simply passing the df and the name of the table, like df2sf_alch(my_df, 'MY_TABLE')
I'm trying to make a specific insert statement that has an ON CONFLICT argument (I'm uploading to a Postgres database); will the df.to_sql(method='callable') allow that? Or is it intended for another purpose? I've read through the documentation, but I wasn't able to grasp the concept. I looked around on this website and others for similar questions, but I haven't found one yet. If possible I would love to see an example of how to use the 'callable' method in practice. Any other ideas on how to effectively load large numbers of rows from pandas using ON CONFLICT logic would be much appreciated as well. Thanks in advance for the help!
Here's an example on how to use postgres's ON CONFLICT DO NOTHING with to_sql
# import postgres specific insert
from sqlalchemy.dialects.postgresql import insert
def to_sql_on_conflict_do_nothing(pd_table, conn, keys, data_iter):
# This is very similar to the default to_sql function in pandas
# Only the conn.execute line is changed
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(insert(pd_table.table).on_conflict_do_nothing(), data)
conn = engine.connect()
df.to_sql("some_table", conn, if_exists="append", index=False, method=to_sql_on_conflict_do_nothing)
I have just had similar problem, and followed by to this answer I came up with solution on how to send df to potgresSQL ON CONFLICT:
1. Send some initial data to the database to create the table
from sqlalchemy import create_engine
engine = create_engine(connection_string)
df.to_sql(table_name,engine)
2. add primary key
ALTER TABLE table_name ADD COLUMN id SERIAL PRIMARY KEY;
3. prepare index on the column (or columns) you want to check the uniqueness
CREATE UNIQUE INDEX review_id ON test(review_id);
4. map the sql table with sqlalchemy
from sqlalchemy.ext.automap import automap_base
ABase = automap_base()
Table = ABase.classes.table_name
Table.__tablename__ = 'table_name'
6. do your insert on conflict with:
from sqlalchemy.dialects.postgresql import insert
insrt_vals = df.to_dict(orient='records')
insrt_stmnt = insert(Table).values(insrt_vals)
do_nothing_stmt = insrt_stmnt.on_conflict_do_nothing(index_elements=['review_id'])
results = engine.execute(do_nothing_stmt)
I am fairly new to the world of programming. I'm using Python, Pandas and SQLlite; and recently I've started to build Postgresql databases. I am trying to query a postgres database and create a Pandas dataframe with the results. I've found that the following works:
import pandas as pd
from sqlalchemy import create_engine # database connection
engine = create_engine('postgresql://postgres:xxxxx#localhost:xxxx/my_postgres_db')
df = pd.read_sql("SELECT * FROM my_table Where province='Saskatchewan'", engine)
The works perfectly but my problem is how to pass user input to the sql query. Specifically, I want to do the following:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=province_name", engine)
However, this returns an error message:
ProgrammingError: (psycopg2.ProgrammingError) column "province_selected" does not exist
LINE 1: SELECT * FROM my_table Where province =province_selec...
Can anyone provide guidance on this matter? In addition, can anyone advise me as to how to handle field names in a postgres database that have characters such as '/'. My database has a field (column header) called CD/CSD and when I try to run a query on that field (similar to code above) I just get error messages. Any help would be greatly appreciated.
You should use the functionality provided by the DBAPI module that SQLAlchemy uses to send parameters to the query. Using psycopg2 that could look like this:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=%s", engine, params=(province_name,))
This is safer than using Python's string formatting to insert the parameter into the query.
Passing parameters using psycopg2
pandas.read_sql documentation
I have been looking for answers to my questions, but haven't found a definitive answer. I am new to python, mysql, and data science, so any advice is appreciated
What I want to be able to do is:
use python to pull daily close data from quandl for n securities
store the data in a database
retrieve, clean, and normalize the data
run regressions on different pairs
write the results to a csv file
The pseudocode below shows in a nutshell what I want to be able to do.
The questions I have are:
How do I store the quandl data in MySQL?
How do I retrieve that data from MySQL? Do I store it into lists and use statsmodels?
tickers = [AAPL, FB, GOOG, YHOO, XRAY, CSCO]
qCodes = [x + 'WIKI/' for x in tickers]
for i in range(0, len(qCodes)):
ADD TO MYSQLDB->Quandl.get(qCodes[i], collapse='daily', start_date=start, end_date=end)
for x in range(0, len(qCodes)-1):
for y in range(x+1, len(qCodes)):
//GET FROM MYSQLDB-> x, y
//clean(x,y)
//normalize(x,y)
//write to csv file->(regression(x,y))
There is a nice library called MySQLdb in Python, which helps you interact with the MySQL db's. So, for the following to execute successfully, you have to have your python shell and the MySQL shells fired up.
How do I store the quandl data in MySQL?
import MySQLdb
#Setting up connection
db = MySQLdb.connect("localhost", user_name, password, db_name)
cursor = db.cursor()
#Inserting records into the employee table
sql = """INSERT INTO EMPLOYEE(FIRST_NAME, LAST_NAME, AGE, SEX, INCOME) VALUES('Steven', "Karpinski", "50", "M", "43290")"""
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
db.close()
I did it for custom values. So, for quandl data, create the schema in a similar way and store them by executing a loop.
How do I retrieve that data from MySQL? Do I store it into lists and use statsmodels?
For data retrieval, you execute the following command, similar to the above command.
sql2 = """SELECT * FROM EMPLOYEE;
"""
try:
cursor.execute(sql2)
db.commit()
except:
db.rollback()
result = cursor.fetchall()
The result variable now contains the result of the query inside sql2 variable, and it is in form of tuples.
So, now you can convert those tuples into a data structure of your choice.
Quandl has a python package that makes interacting with the site trivial.
From Quandl's python page:
import Quandl
mydata = Quandl.get("WIKI/AAPL")
By default, Quandl's package returns a pandas dataframe. You can use Pandas to manipulate/clean/normalize your data as you see fit and use Pandas to upload the data directly to a sql database :
import sqlalchemy as sql
engine = sql.create_engine('mysql://name:blah#location/testdb')
mydata.to_sql('db_table_name', engine, if_exists='append')
To get the data back from your database, you can also use Pandas:
import pandas as pd
import sqlalchemy as sql
engine = sql.create_engine('mysql://name:blah#location/testdb')
query = sql.text('''select * from quandltable''')
mydata = pd.read_sql_query(engine, query)
After using statsmodels to run your analyses, you can use either pandas' df.to_csv() method or numpy's savetxt() function. (Sorry, i cannot post the links for those functions; I don't have enough reputation yet!)