Data extracting from Cassandra DB into excel using Python - python

I am trying to extract the cassandra db results into excel using Python. When i run the below code, i am getting the following error. I couldn't able to resolve the following issue. Can someone please help me? This error appears for all columns.
Error: AttributeError: 'dict' object has no attribute "Column1"
Code:
import pandas as pd
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import dict_factory
auth_provider = PlainTextAuthProvider(username=CASSANDRA_USER, password=CASSANDRA_PASS)
cluster = Cluster(contact_points= ["host"], port=xxx, auth_provider=auth_provider)
session = cluster.connect("keyspace")
session.row_factory = dict_factory
sql_query = "SELECT * FROM db.tablename"
dictionary = {"column1":[],"column2":[],"column3":[],"column4":[]}
for row in session.execute(sql_query):
dictionary["column1"].append(row.column1)
dictionary["column2"].append(row.column2)
dictionary["column3"].append(row.column3)
dictionary["column4"].append(row.column4)
df = pd.DataFrame(dictionary)
df.to_excel(r'C:\Users\data.xlsx')

Related

How to read a SQLite database file using polars package in Python

I want to read a SQLite database file (database.sqlite) using polars package. I tried following unsuccessfully:
import sqlite3
import polars as pl
conn = sqlite3.connect('database.sqlite')
df = pl.read_sql("SELECT * from table_name", conn)
print(df)
Getting following error:
AttributeError: 'sqlite3.Connection' object has no attribute 'split'
Any suggestions?
From the docs, you can see pl.read_sql accepts connection string as a param, and you are sending the object sqlite3.Connection, and that's why you get that message.
You should first generate the connection string, which is url for your db
db_path = 'database.sqlite'
connection_string = 'sqlite://' + db_path
And after that, you can type the updated next line, which gave you problems:
df = pl.read_sql("SELECT * from table_name", connection_string)

pd.read_sql, how to convert data types?

I am running a sql query from my python code and attempting to create a dataframe from it. When I execute the code, pandas produces the following error message:
pandas.io.sql.DatabaseError: Execution failed on sql '*my connection info*' : expecting string or bytes object
The relevant code is:
import cx_Oracle
import cx_Oracle as cx
import pandas as pd
dsn_tns = cx.makedsn('x.x.x.x', 'y',
service_name='xxx')
conn = cx.connect(user='x', password='y', dsn=dsn_tns)
sql_query1 = conn.cursor()
sql_query1.execute("""select * from *table_name* partition(p20210712) t""")
df = pd.read_sql(sql_query1,conn)
I was thinking to convert all values in query result to strings with df.astype(str) function, but I cannot find the proper way to accomplish this within the pd.read_sql statement. Would data type conversion correct this issue?

How to Connect MySQL database to AFS server

I have a python code that scrapes real-time data from yahoo finance (using finance library), then I convert it into a Dataframe and now I need to store this data into a MySQL table using (sqlalchemy library) The code is running perfectly fine in my laptop and it is storing data in the desired table on my local system. Now, I am running this code on AFS remote server (I don't know how many of you are familiar with AFS, it is a remote server which universities use). Now, the problem that I am facing is I cannot set up a MySQL server on AFS, or I cannot connect MySQL to AFS. Here's my code-
import pandas as pd
import yfinance as yf
from pandas.io import sql
import sys
from sys import argv
from sqlalchemy import create_engine
import requests
link = "link from where the ticker list is taken"
f = requests.get(link)
ticker= f.text
ticker= ticker.strip('[]').split('\n')
ticker= list(filter(None, ticker))
data= yf.download(tickers= ticker,period= '1d',interval='1m',group_by='Ticker')
column_list = data.columns
comp_list = list(set([clm[0] for clm in column_list]))
temp_df = data[comp_list[0]]
temp_df.insert(0,'Ticker','')
temp_df['Ticker'] = comp_list[0]
for comp_name in comp_list[1:]:
new_df = data[comp_name]
new_df['Ticker'] = comp_name
temp_df = pd.concat([temp_df, new_df], axis=0)
final_data = temp_df.reset_index()
final_data=final_data.sort_values(by='Datetime',ascending=False)
final_data= final_data.dropna()
engine = create_engine('mysql+pymysql://username:password#localhost/database')
pandas_sql = pd.io.sql.pandasSQL_builder(engine)
final_data.to_sql('stocks',con=engine,
if_exists='replace',index=False)
print('task_done')```
#Please help me connect MySQL to AFS and save the data in a database table. Any leads would be appreciated.

Getting an error going from Dataframe to SQL Server

I'm looking at the documentation here.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
I keep getting this error.
'DataFrame' object has no attribute 'to_sql'
Below, is all my code. I don't see what's wrong here. What is going on?
import pandas as pd
from sqlalchemy import create_engine
import urllib
import pyodbc
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=server_name.database.windows.net;DATABASE=my_db;UID=my_id;PWD=my_pw")
myeng = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
df.to_sql(name="dbo.my_table", con=myeng, if_exists='append', index=False)
As it turns out, the object wasn't an actual dataframe that Pandas could interpret. This fixed the problem.
# convert pyspark.sql DF to Pandas DF
df = df.toPandas()

Is it possible to use dask dataframe with teradata python module?

I have this code:
import teradata
import dask.dataframe as dd
login = login
pwd = password
udaExec = teradata.UdaExec (appName="CAF", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", DSN="Teradata",
USEREGIONALSETTINGS='N', username=login,
password=pwd, authentication="LDAP");
And the connection is working.
I want to get a dask dataframe. I have tried this:
sqlStmt = "SOME SQL STATEMENT"
df = dd.read_sql_table(sqlStmt, session, index_col='id')
And I'm getting this error message:
AttributeError: 'UdaExecConnection' object has no attribute '_instantiate_plugins'
Does anyone have a suggestion?
Thanks in advance.
read_sql_table expects a SQLalchemy connection string, not a "session" as you are passing. I have not heard of teradata being used via sqlalchemy, but apparently there is at least one connector you could install, and possibly other solutions using the generic ODBC driver.
However, you may wish to use a more direct approach using delayed, something like
from dask import delayed
# make a set of statements for each partition
statements = [sqlStmt + " where id > {} and id <= {}".format(bounds)
for bounds in boundslist] # I don't know syntax for tera
def get_part(statement):
# however you make a concrete dataframe from a SQL statement
udaExec = ..
session = ..
df = ..
return dataframe
# ideally you should provide the meta and divisions info here
df = dd.from_delayed([delayed(get_part)(stm) for stm in statements],
meta= , divisions=)
We will be interested to hear of your success.

Categories

Resources