Error while saving a dataframe to Redshift using Python

Error while saving a dataframe to Redshift using Python - python

I'm trying to copy a table from the Redshift database to a dataframe in Python and then save it again in Redshift.
So, the first step is working but I have some problems with the second step. I get some errors when I'm trying to save a dataframe which has 100 rows.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://database")
df = pd.read_sql_query('select * from testing.table1 limit 100', engine)
df.to_sql(name='table2',schema='testing',con=engine,index=False,if_exists='append')
And I'm getting this error:
DBAPIError: (pyodbc.Error) ('HY000', '[HY000] [Amazon][ODBC] (10920) No data can be obtained from input parameter whose value has already been pushed down.
It's strange because when I'm trying to save a dataframe which has 10 rows there is no error at all.

Related

Invalid precision value (0) when using df.to_sql

I am attempting to save a pandas dataframe to a local microsoft sql server, however, I keep getting an invalid precision value error, when using the .to_sql function.
When I try the following code:
from sqlalchemy import create_engine
import urllib
import pyodbc
import pandas as pd
quoted = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=localhost;DATABASE=Database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted),echo=False)
df = pd.read_csv('file_name')
df.to_sql(name='WW_Table', con=engine,if_exists='replace',method='multi',chunksize=500,index=False)
I receive this error:
Error: ('HY104', '[HY104] [Microsoft][ODBC SQL Server Driver]Invalid precision value (0) (SQLBindParameter)')
I have tried multiple things including creating a table from scratch, preloading the CSV file and replacing it, but with no luck.
any help would be greatly appreciated.

Save pandas on spark API dataframe to a new table in azure databricks

Context: I have a dataframe that I queried using SQl. From this query, I saved to a dataframe using pandas on spark API. Now, after some transformations, I'd like to save this new dataframe on a new table at a given database.
Example:
spark = SparkSession.builder.appName('transformation').getOrCreate()
df_final = spark.sql("SELECT * FROM table")
df_final = ps.DataFrame(df_final)
## Write Frame out as Table
spark_df_final = spark.createDataFrame(df_final)
spark_df_final.write.mode("overwrite").saveAsTable("new_database.new_table")
but this doesn't work. How can I save a pandas on spark API dataframe directly to a new table in a database (this database doesn't exist yet)
Thanks

You can use the following procedure. I have the following demo table.
You can convert it to pandas dataframe of spark API using the following code:
df_final = spark.sql("SELECT * FROM demo")
pdf = df_final.to_pandas_on_spark()
#print(type(pdf))
#<class 'pyspark.pandas.frame.DataFrame'>
Now after performing your required operations on this pandas dataframe on spark API, you can convert it back to spark dataframe using the following code:
spark_df = pdf.to_spark()
print(type(spark_df))
display(spark_df)
Now to write this dataframe to a table into a new database, you have to first create the database first and then write the dataframe to table.
spark.sql("create database newdb")
spark_df.write.mode("overwrite").saveAsTable("newdb.new_table")
You can see that the table is written to the new database. The following is a reference image of the same:

How to fix 'Python sqlite3.OperationalError: no such table' issue

I through my collegue recieved .db file (which includes text and number data) which I need to pass into pandas dataframe for further processing. I never worked or know about SQLite. But, with few google search,I written following line of code:
import pandas as pd
import numpy as np
import sqlite3
conn = sqlite3.connect('data.db') # This create `data.sqlite`
sql="""
SELECT * FROM data;
"""
df=pd.read_sql_query(sql,conn)
df.head()
This giving me following error
'error Execution failed on sql ' SELECT * FROM data;
': no such table: data
What table this code is referring to ? I had only data.db.
I do not quite understand where i am going wrong with this. Any advice how to get my data into dataframe df?

I'm also new to SQL but based on what you've provided, "data" is referring to a table in your database "data.db".
The query that you typed is instructing the program to select all items from the table called "data". This website helped me with creating tables: https://www.tutorialspoint.com/sqlite/sqlite_create_table.htm

Python SQL Pandas - Cannot import dataframe larger than 27650 rows into database

I am trying to import a large csv file (5 million rows) into a local MySQL database using the code below:
Code:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[username]:[password]#[host]:[port]/[schema]', echo=False)
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df = df.head(27650)
df.to_sql(con= engine, name='data', if_exists='replace', chunksize = 50000)
If I execute this code it works so long as df.head([row limit]) is less than 27650. However, as soon as I increase this row limit by just a single row, the import fails and now data is transferred to MySQL. Does anyone know why this would happen?

Pandas DataFrame should have no memory limit except for your local machine's memory. So I think it's because your machine is running out of memory. You can use memory_profiler, a Python library I like to use to check real-time memory usage. More info can be found at the docs here: https://pypi.org/project/memory-profiler/
You should never read big files in one go as it's a single point of failure and also slow. Load the data into the database in chunks, like how they did in this post: https://soprasteriaanalytics.se/2020/10/22/working-with-large-csv-files-in-pandas-create-a-sql-database-by-reading-files-in-chunks/

How to turn excel workbook into Mysql database using python

Is there any way to turn my excel workbook into MySQL database. Say for example my Excel workbook name is copybook.xls then MySQL database name would be copybook. I am unable to do it. Your help would be really appreciated.

Here I give an outline and explanation of the process including links to the relevant documentation. As some more thorough details were missing in the original question, the approach needs to be tailored to particular needs.
The solution
There's two steps in the process:
1) Import the Excel workbook as Pandas data frame
Here we use the standard method of using pandas.read_excel to get the data out from Excel file. If there is a specific sheet we want, it can be selected using sheet_name. If the file contains column labels, we can include them using parameter index_col.
import pandas as pd
# Let's select Characters sheet and include column labels
df = pd.read_excel("copybook.xls", sheet_name = "Characters", index_col = 0)
df contains now the following imaginary data frame, which represents the data in the original Excel file
first last
0 John Snow
1 Sansa Stark
2 Bran Stark
2) Write records stored in a DataFrame to a SQL database
Pandas has a neat method pandas.DataFrame.to_sql for interacting with SQL databases through SQLAlchemy library. The original question mentioned MySQL so here we assume we already have a running instance of MySQL. To connect the database, we use create_engine. Lastly, we write records stored in a data frame to the SQL table called characters.
from sqlalchemy import create_engine
engine = create_engine('mysql://USERNAME:PASSWORD#localhost/copybook')
# Write records stored in a DataFrame to a SQL database
df.to_sql("characters", con = engine)
We can check if the data has been stored
engine.execute("SELECT * FROM characters").fetchall()
Out:
[(0, 'John', 'Snow'), (1, 'Sansa', 'Stark'), (2, 'Bran', 'Stark')]
or better, use pandas.read_sql_table to read back the data directly as data frame
pd.read_sql_table("characters", engine)
Out:
index first last
0 0 John Snow
1 1 Sansa Stark
2 2 Bran Stark
Learn more
No MySQL instance available?
You can test the approach by using an in-memory version of SQLite database. Just copy-paste the following code to play around:
import pandas as pd
from sqlalchemy import create_engine
# Create a new SQLite instance in memory
engine = create_engine("sqlite://")
# Create a dummy data frame for testing or read it from Excel file using pandas.read_excel
df = pd.DataFrame({'first' : ['John', 'Sansa', 'Bran'], 'last' : ['Snow', 'Stark', 'Stark']})
# Write records stored in a DataFrame to a SQL database
df.to_sql("characters", con = engine)
# Read SQL database table into a DataFrame
pd.read_sql_table('characters', engine)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error while saving a dataframe to Redshift using Python - python

Related

Invalid precision value (0) when using df.to_sql

Save pandas on spark API dataframe to a new table in azure databricks

How to fix 'Python sqlite3.OperationalError: no such table' issue

Python SQL Pandas - Cannot import dataframe larger than 27650 rows into database

How to turn excel workbook into Mysql database using python

Categories

Resources