I am trying to read data from Excel to pandas dataframe and then write the dataframe to Snowflake table. Code as below.
Connection is established and Excel read is working fine but write to snowflake table is not working. Am getting below error . Requesting help to resolve the error
snowflake.connector.errors.MissingDependencyError: Missing optional dependency: pandas Process finished with exit code 1
import pandas as pd
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
from snowflake.connector.pandas_tools import pd_writer
url = URL(
account = '',
user = '',
schema = 'TMP',
database = 'TMP',
warehouse= 'DATABRICKS',
role = '',
authenticator='externalbrowser',
)
engine = create_engine(url)
con = engine.connect()
df = pd.read_excel("C:\\Final.xlsx")
df.columns = df.columns.astype(str)
table_name = 'test_connect'
if_exists = 'replace'
df.to_sql(name=table_name.lower(), con=con,index= False, if_exists=if_exists, method=pd_writer)
Detailed Error info below
Traceback (most recent call last):
File "C:\Users\XYZ\AppData\Roaming\JetBrains\DataSpell2022.2\scratches\scratch.py", line 32, in <module>
df.to_sql(name=table_name.lower(), con=con,index= False, if_exists=if_exists, method=pd_writer)
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\pandas\core\generic.py", line 2963, in to_sql
return sql.to_sql(
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\pandas\io\sql.py", line 697, in to_sql
return pandas_sql.to_sql(
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\pandas\io\sql.py", line 1739, in to_sql
total_inserted = sql_engine.insert_records(
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\pandas\io\sql.py", line 1322, in insert_records
return table.insert(chunksize=chunksize, method=method)
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\pandas\io\sql.py", line 950, in insert
num_inserted = exec_insert(conn, keys, chunk_iter)
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\snowflake\connector\pandas_tools.py", line 320, in pd_writer
df = pandas.DataFrame(data_iter, columns=keys)
File "C:\Users\XYZ\AppData\Roaming\Python\Python310\site-packages\snowflake\connector\options.py", line 36, in __getattr__
raise MissingDependencyError(self._dep_name)
snowflake.connector.errors.MissingDependencyError: Missing optional dependency: pandas
Process finished with exit code 1
I believe the following dependency install step has not been completed: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#installation
Related
Running a small python code to create a pandas dataframe from Bigquery table results . When i run the code I see the below results. The db_dtypes is already installed , not sure what other dependencies i need to add. Any help is appreciated.
Here is the code
import pandas
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'/Users/kar/Downloads/data-4045ff698b4f.json')
project_id = 'data-platform'
client = bigquery.Client(credentials=credentials, project=project_id)
sql = """SELECT * FROM `data-platform.airbnb.raw_hosts` LIMIT 1"""
query_job = client.query(sql)
df = query_job.to_dataframe()
Error
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/ka/PycharmProjects/pythonProject4/main.py", line 17, in <module>
df = query_job.to_dataframe()
File "/Users/ka/PycharmProjects/pythonProject4/venv/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1689, in to_dataframe
geography_as_object=geography_as_object,
File "/Users/ka/PycharmProjects/pythonProject4/venv/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1965, in to_dataframe
_pandas_helpers.verify_pandas_imports()
File "/Users/ka/PycharmProjects/pythonProject4/venv/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 991, in verify_pandas_imports
raise ValueError(_NO_DB_TYPES_ERROR) from db_dtypes_import_exception
ValueError: Please install the 'db-dtypes' package to use this function.
Process finished with exit code 1
I have data formatted in .json file. The end goal is to reformat the data to sqlite table and store into a database for further analysis.
Here is a sample of the data:
{"_id":{"$oid":"60551"},"barcode":"511111019862","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac114be37ce2ead437550"},"$ref":"Cogs"},"name":"test brand #1612366101024","topBrand":false}
{"_id":{"$oid":"601c5460be37ce2ead43755f"},"barcode":"511111519928","brandCode":"STARBUCKS","category":"Beverages","categoryCode":"BEVERAGES","cpg":{"$id":{"$oid":"5332f5fbe4b03c9a25efd0ba"},"$ref":"Cogs"},"name":"Starbucks","topBrand":false}
{"_id":{"$oid":"601ac142be37ce2ead43755d"},"barcode":"511111819905","brandCode":"TEST BRANDCODE #1612366146176","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac142be37ce2ead437559"},"$ref":"Cogs"},"name":"test brand #1612366146176","topBrand":false}
{"_id":{"$oid":"601ac142be37ce2ead43755a"},"barcode":"511111519874","brandCode":"TEST BRANDCODE #1612366146051","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac142be37ce2ead437559"},"$ref":"Cogs"},"name":"test brand #1612366146051","topBrand":false}
Followed by the code:
import pandas as pd
import json
import sqlite3
# Open json file and convert to a list
with open("users.json") as f:
dat = [json.loads(line.strip()) for line in f]
# create a datafrom from json file
df = pd.DataFrame(dat)
#open database connection
con = sqlite3.connect("fetch_rewards.db")
c = con.cursor()
df.to_sql("users", con)
c.close()
The error I am getting:
Traceback (most recent call last):
File "C:\Users\mohammed.alabbas\Desktop\sqlite\import_csv.py", line 16, in <module>
df.to_sql("users", con)
File "C:\Users\name\AppData\Roaming\Python\Python39\site-packages\pandas\core\generic.py", line 2605, in to_sql
sql.to_sql(
File "C:\Users\name\AppData\Roaming\Python\Python39\site-packages\pandas\io\sql.py", line 589, in to_sql
pandas_sql.to_sql(
File "C:\Users\name\AppData\Roaming\Python\Python39\site-packages\pandas\io\sql.py", line 1828, in to_sql
table.insert(chunksize, method)
File "C:\Users\mname\AppData\Roaming\Python\Python39\site-packages\pandas\io\sql.py", line 830, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Users\mname\AppData\Roaming\Python\Python39\site-packages\pandas\io\sql.py", line 1555, in _execute_insert
conn.executemany(self.insert_statement(num_rows=1), data_list)
sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.
Thanks in advance
This is my code.
import sqlite3
import pandas
db = sqlite3.connect('testdb.db')
df = pandas.read_csv('testcsv.csv')
df.to_sql('testTable', 'db', if_exists='append', index=False)
I got the last two lines of code from another article on stackoverflow, but it doesn't work for me. This is the error I get, even after I installed sqlalchemy, because it complained that it wasn't installed.
Traceback (most recent call last):
File "C:/Users/pitye/PycharmProjects/gradeCalcV2/venv/sqlite.py", line 7, in <module>
df.to_sql('testTable', 'db', if_exists='append', index=False)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
method=method,
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\pandas\io\sql.py", line 503, in to_sql
pandas_sql = pandasSQL_builder(con, schema=schema)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\pandas\io\sql.py", line 577, in pandasSQL_builder
con = _engine_builder(con)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\pandas\io\sql.py", line 564, in _engine_builder
con = sqlalchemy.create_engine(con)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\sqlalchemy\engine\__init__.py", line 479, in create_engine
return strategy.create(*args, **kwargs)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\sqlalchemy\engine\strategies.py", line 54, in create
u = url.make_url(name_or_url)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\sqlalchemy\engine\url.py", line 229, in make_url
return _parse_rfc1738_args(name_or_url)
File "C:\Users\pitye\PycharmProjects\gradeCalcV2\venv\lib\site-packages\sqlalchemy\engine\url.py", line 291, in _parse_rfc1738_args
"Could not parse rfc1738 URL from string '%s'" % name
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'db'
I just want to create a table from a CSV file in SQLite. Is this even the right way of doing it, or am I waaay off?
I think you just have to replace
df.to_sql('testTable', 'db', if_exists='append', index=False)
With
df.to_sql('testTable', db, if_exists='append', index=False)
I simply try to write a pandas dataframe to local mysql database on ubuntu.
from sqlalchemy import create_engine
import tushare as ts
df = ts.get_tick_data('600848', date='2014-12-22')
engine = create_engine('mysql://user:passwd#127.0.0.1/db_name?charset=utf8')
df.to_sql('tick_data',engine, flavor = 'mysql', if_exists= 'append')
and it pop the error
biggreyhairboy#ubuntu:~/git/python/fjb$ python tushareDB.py
Error on sql SHOW TABLES LIKE 'tick_data'
Traceback (most recent call last):
File "tushareDB.py", line 13, in <module>
df.to_sql('tick_data', con = engine,flavor ='mysql', if_exists= 'append')
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1261, in to_sql
self, name, con, flavor=flavor, if_exists=if_exists, **kwargs)
File "/usr/lib/python2.7/dist-packages/pandas/io/sql.py", line 207, in write_frame
exists = table_exists(name, con, flavor)
File "/usr/lib/python2.7/dist-packages/pandas/io/sql.py", line 275, in table_exists
return len(tquery(query, con)) > 0
File "/usr/lib/python2.7/dist-packages/pandas/io/sql.py", line 90, in tquery
cur = execute(sql, con, cur=cur)
File "/usr/lib/python2.7/dist-packages/pandas/io/sql.py", line 53, in execute
con.rollback()
AttributeError: 'Engine' object has no attribute 'rollback'
the dataframe is not empty, database is ready without tables, i have tried other method to create table in python with mysqldb and it works fine.
a related question:
Writing to MySQL database with pandas using SQLAlchemy, to_sql
but no actual reason was explained
You appear to be using an older version of pandas. I did a quick git bisect to find the version of pandas where line 53 contains con.rollback(), and found pandas at v0.12, which is before SQLAlchemy support was added to the execute function.
If you're stuck on this version of pandas, you'll need to use a raw DBAPI connection:
df.to_sql('tick_data', engine.raw_connection(), flavor='mysql', if_exists='append')
Otherwise, update pandas and use the engine as you intend to. Note that you don't need to use the flavor parameter when using SQLAlchemy:
df.to_sql('tick_data', engine, if_exists='append')
In Python 2.7, I'm connecting to an external data source using the following:
import pypyodbc
import pandas as pd
import datetime
import csv
import boto3
import os
# Connect to the DataSource
conn = pypyodbc.connect("DSN = FAKE DATA SOURCE; UID=FAKEID; PWD=FAKEPASSWORD")
# Specify the query we're going to run on it
script = ("SELECT * FROM table")
# Create a dataframe from the above query
df = pd.read_sql_query(script, conn)
I get the following error:
C:\Python27\python.exe "C:/Thing.py"
Traceback (most recent call last):
File "C:/Thing.py", line 30, in <module>
df = pd.read_sql_query(script,conn)
File "C:\Python27\lib\site-packages\pandas-0.18.1-py2.7-win32.egg\pandas\io\sql.py", line 431, in read_sql_query
parse_dates=parse_dates, chunksize=chunksize)
File "C:\Python27\lib\site-packages\pandas-0.18.1-py2.7-win32.egg\pandas\io\sql.py", line 1608, in read_query
data = self._fetchall_as_list(cursor)
File "C:\Python27\lib\site-packages\pandas-0.18.1-py2.7-win32.egg\pandas\io\sql.py", line 1617, in _fetchall_as_list
result = cur.fetchall()
File "build\bdist.win32\egg\pypyodbc.py", line 1819, in fetchall
File "build\bdist.win32\egg\pypyodbc.py", line 1871, in fetchone
ValueError: could not convert string to float: ?
It's seems to me that in one of the float columns, there is a '?' symbol for some reason. I've reached out to the owner of the data source, but they cannot change the underlying table.
Is there a way to replace incorrect data like this using pandas? I've tried using replace after the read_sql_query statement, but I get the same error.
Hard to know for certain without having your data obviously, but you could try setting coerce_float to False, i.e. replace your last line with
df = pd.read_sql_query(script, conn, coerce_float=False)
See the documentation of read_sql_query.