Python Teradata Connector - Column Names - python

Does anyone know how to retrive the column names using Python's Teradata libary..?
Here is a code sample:
import teradata
udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0" logConsole=False)
session = udaExec.connect(method='odbc', system='system_name',authentication='LDAP', username='username', password='$$tdwallet(tdprod)');
lst_results = []
for row in session.execute("select * from table_name"):
print(row)
lst_results.append(row)
The code above will not return the column names. Ultimately, I would like to put the query's results into a Panda's dataframe

You do not need for-loop:
df = pandas.read_sql(session,query)
Will do the job. See my answer here:
Python PyTd teradata Query Into Pandas DataFrame

Related

How do I get the schema of all the tables in Hive db using Python?

How do I get the schema of all the tables in Hive db using Python ,
Can I use the "SHOW TABLES" as query like in the following example ? :
with pyodbc.connect('DNS = Hive_Connection',autocommit=True) as conn :
df = pd.read_sql('SHOW TABLES',conn)
Thank You (-:
I think the command SHOW DATABASES gives you the names of all databases, which are the HIVE equivalent of schemas in other DBMSs.
So the combination of the two commands should allow you to infer the complete name of a table on HIVE.
import pandas as pd
import pyodbc
cnxn = pyodbc.connect("DSN=Hive_Connection", autocommit=True)
my_databases = """SHOW DATABASES;"""
df_databases = pd.read_sql(my_databases, cnxn)
print(df_databases)
for ind in df_databases.index:
my_database_tables = "SHOW TABLES IN " + df_databases["database_name"][ind]
print(my_database_tables)
df_new

How do I insert my Python dictionary into my SQL Server database table?

I have a dictionary with 3 keys which correspond to field names in a SQL Server table. The values of these keys come from an excel file and I store this dictionary in a dataframe which I now need to insert into a SQL table. This can all be seen in the code below:
import pandas as pd
import pymssql
df=[]
fp = "file path"
data = pd.read_excel(fp,sheetname ="CRM View" )
row_date = data.loc[3, ]
row_sita = "ABZPD"
row_event = data.iloc[12, :]
df = pd.DataFrame({'date': row_date,
'sita': row_sita,
'event': row_event
}, index=None)
df = df[4:]
df = df.fillna("")
print(df)
My question is how do I insert this dictionary into a SQL table now?
Also, as a side note, this code is part of a loop which needs to go through several excel files one by one, insert the data into dictionary then into SQL then delete the data in the dictionary and start again with the next excel file.
You could try something like this:
import MySQLdb
# connect
conn = MySQLdb.connect("127.0.0.1","username","passwore","table")
x = conn.cursor()
# write
x.execute('INSERT into table (row_date, sita, event) values ("%d", "%d", "%d")' % (row_date, sita, event))
# close
conn.commit()
conn.close()
You might have to change it a little based on your SQL restrictions, but should give you a good start anyway.
For the pandas dataframe, you can use the pandas built-in method to_sql to store in db. Following is the way to use it.
import sqlalchemy as sa
params = urllib.quote_plus("DRIVER={};SERVER={};DATABASE={};Trusted_Connection=True;".format("{SQL Server}",
"<db_server_url>",
"<db_name>"))
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = sa.create_engine(conn_str)
df.to_sql(<table_name>, engine,schema=<schema_name>, if_exists="append", index=False)
For this method you you will need to install sqlalchemy package.
pip install sqlalchemy
You will also need to setup the MSSql DSN on the machine.

create a table in sqllite by using a dataframe

I'm new to sqllite3 and trying to understand how to create a table in sql environment by using my existing dataframe. I already have a database that I created as "pythonsqlite.db"
#import my csv to python
import pandas as pd
my_data = pd.read_csv("my_input_file.csv")
## connect to database
import sqlite3
conn = sqlite3.connect("pythonsqlite.db")
##push the dataframe to sql
my_data.to_sql("my_data", conn, if_exists="replace")
##create the table
conn.execute(
"""
create table my_table as
select * from my_data
""")
However, when I navigate to my SQLlite studio and check the tables under my database, I cannot see the table I've created. I'd really appreciate if someone tells me what I'm missing here.
I replaced just one part of the code, the 'read_csv' instead I create a small dataframe (see below), I think the issue will be either with the name of your script ( example: pandas.py)
import pandas as pd
# my_data = pd.read_csv("my_input_file.csv")
columns = ['a','b']
my_data = pd.DataFrame([[1, 2], [3, 4]], columns=columns)
## connect to database
import sqlite3
conn = sqlite3.connect("pythonsqlite.db")
##push the dataframe to sql
my_data.to_sql("my_data", conn, if_exists="replace")
##create the table
conn.execute(
"""
create table my_table as
select * from my_data
""")
I ran it and I don't see to have a problem

Querying json object in dataframe using Pyspark

I have a MySql table with following schema:
id-int
path-varchar
info-json {"name":"pat", "address":"NY, USA"....}
I used JDBC driver to connect pyspark to MySql. I can retrieve data from mysql using
df = sqlContext.sql("select * from dbTable")
This query works all fine. My question is, how can I query on "info" column? For example, below query works all fine in MySQL shell and retrieve data but this is not supported in Pyspark (2+).
select id, info->"$.name" from dbTable where info->"$.name"='pat'
from pyspark.sql.functions import *
res = df.select(get_json_object(df['info'],"$.name").alias('name'))
res = df.filter(get_json_object(df['info'], "$.name") == 'pat')
There is already a function named get_json_object
For your situation:
df = spark.read.jdbc(url='jdbc:mysql://localhost:3306', table='test.test_json',
properties={'user': 'hive', 'password': '123456'})
df.createOrReplaceTempView('test_json')
res = spark.sql("""
select col_json,get_json_object(col_json,'$.name') from test_json
""")
res.show()
Spark sql is almost like HIVE sql, you can see
https://cwiki.apache.org/confluence/display/Hive/Home

Python to group SQL data

I used to read the data from CSV file, while I just imported all CSV data in SQL database, but I have difficulty in extracting data using Python from SQL.
My original code of read CSV is like this:
import pandas as pd
stock_data = pd.read_csv(filepath_or_buffer='stock_data_w.csv', parse_dates=[u'date'], encoding='gbk')
stock_data[u'change_weekly'] = stock_data.groupby(u'code')[u'change'].shift(-1)
Now I want to read data from SQL, here is my code, but it doesn't work and I am not sure how to sort it out:
import pandas as pd
import MySQLdb
db = MySQLdb.connect(host='localhost', user='root', passwd='232323', db='test', port=3306)
cur = db.cursor()
cur.execute("SELECT * FROM stock_data_w")
stock_data = pd.DataFrame(data=cur.fetchall(), columns=[i[0] for i in cur.description])
stock_data[u'change_weekly'] = stock_data.groupby(u'code')[u'change'].shift(-1)
the error is: "raise PandasError('DataFrame constructor not properly called!') pandas.core.common.PandasError: DataFrame constructor not properly called!"
Use below way to convert your cursor object to crate data frame.
stock_data = pd.DataFrame(data=cursor.fetchall(), index=None,
columns=cursor.keys())
print stock_data
In mysqldb, columns=[i[0] for i in cursor.description]
or
Make your connection with alchemy and use,
stock_data = pd.read_sql("SELECT * from stock_data_w",
con= cnx,parse_dates=['date'])
I'm not sure whether mysql.connector is supported in pandas read_sql(). You can give a try and let us know :)

Categories

Resources