I have a script that downloads data from an API and turns all of this info into a CSV. I need this data to be into a table in MySQL (I already created it and established the connection with MySQL Connector). Is there anyway to do this?
Pandas.DataFrame has a method to_sql which writes a dataframe into a sql table.
Simplest way to use the method is to create a connection with sqlalchemy.(You will need to install mysql-python) and use .to_sql to read the data into the table.
from sqlalchemy import create_engine
engine = create_engine('mysql://username:password#host:port/database') #change to connect your mysql
#if you want to append the data to an existing table
df.to_sql(name='SQL Table name',con=engine,if_exists='append',index=False)
#if you want to create a new table
df.to_sql(name='New SQL Table name',con=engine,if_exists='fail',index=False)
Please note that you will need to use param dtype to define the dtype of the table columns if you have created the table before hand.
Related
I have a dataframe which contains some columns and snowflake table is having some columns. Some columns are same and some columns are different between them. As of now, I am extracting the snowflake table to python code and concatenating both and again replacing the table. But table is having huge data, it's very hectic. Is it possible to append the dataframe directly to the snowflake table when some columns are different and some are same. If yes, please tell me how can I do this.No solution is working for me. How can I do it effectively, with less time?
Yes It's possible to append the data to an existing table in a snowflake.
Setup your connection.
You can use sqlalchemy and create an engine later you can push df to snowflake using:
from snowflake.connector.pandas_tools import pd_writer
df.to_sql('<snowflaketablename>', engine, index=False, method=pd_writer, if_exists='append')
remember to give option if_exists="append" to append the data frame to existing table.
hello someone can help me I must create a program in Python that connects to my database, read the database records and create a pivot table with Pandas with that data
You might be looking for pandas.read_sql
You can use con parameter to pass your database connection string.
Here's a link to connect SQL to Python. Depending on the database you use, library will vary. Search for your particular database to python connection.
https://www.geeksforgeeks.org/how-to-connect-python-with-sql-database/
This will give you a 5 min crash course with Pandas and how to use pivot tables.
https://medium.com/bhavaniravi/python-pandas-tutorial-92018da85a33#:~:text=%20Get%20Started%20With%20Pandas%20In%205%20mins,DataFrame%20and%20understood%20its%20structure%2C%20let%E2%80%99s...%20More%20
I wonder if there's anyways to upload a dataframe and create a new table in Exasol? import_from_pandas assumes the table already exists. Do we need to run a SQL separately to create the table? for other databases, to_sql can just create the table if it doesn't exist.
Yes, As you mentioned import_from_pandas requires a table. So, you need to create a table before writing to it. You can run a SQL create table ... script by connection.execute before using import_from_pandas. Also to_sql needs a table since based on the documentation it will be translated to a SQL insert command.
Pandas to_sql allows to create a new table if it does not exist, but it needs an SQLAlchemy connection, which is not supported for Exasol out of the box. However, there seems to be a SQLAlchemy dialect for Exasol you could use (haven't tried it yet): sqlalchemy-exasol.
Alternatively, I think you have to use a create table statement and then populate the table via pyexasol's import_from_pandas.
I want to delete a record in a dataframe object in snowflake table .
Similarly I want to perform update on basis of "key" in dataframe in snowflake table.
My research indicates that the utils method can perform the DDL operation but i am unable to find the some example to refer to.
As you mentioned, you can use the runQuery() method of the Utils object to execute DDL/DML SQL statements:
https://docs.snowflake.net/manuals/user-guide/spark-connector-use.html#executing-ddl-dml-sql-statements
If you want to do it based on some keys, then you can iterate items on DataFrame, and run an SQL for each item:
how to loop through each row of dataFrame in pyspark
But this will be a performance killer. Snowflake is a data warehouse, so you should always prefer "batch updates" instead of single row updates.
I would suggest you to write your dataframe to a staging table in Snowflake, and then call a SQL to update the rows in target table based on the staging table.
I am using python pandas to load data from a MySQL database, change, then update another table. There are a 100,000+ rows so the UPDATE query's take some time.
Is there a more efficient way to update the data in the database than to use the df.iterrows() and run an UPDATE query for each row?
The problem here is not pandas, it is the UPDATE operations. Each row will fire its own UPDATE query, meaning lots of overhead for the database connector to handle.
You are better off using the df.to_csv('filename.csv') method for dumping your dataframe into CSV, then read that CSV file into your MySQL database using the LOAD DATA INFILE
Load it into a new table, then DROP the old one and RENAME the new one to the old ones name.
Furthermore, I suggest you do the same when loading data into pandas. Use the SELECT INTO OUTFILE MySQL command and then load that file into pandas using the pd.read_csv() method.