I have requirement of exporting a pandas dataframe into teradata's temp table .
I usually try with udaExec for connecting terdata .
So temp table has to be created on the fly while loading the data , since Dataframe on today might be 100 col's tomorrow might be 200 col's due to insatiability of data arrival i'm afraid i can't create a DDL and then load.
Please suggest .
You can use pandas.to_sql function (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)
you'll need to pass a SqlAlchemy connection and the name of your SQL table,
something like this:
my_pandas_df.to_sql(
name="my_table_in_teradata",
con=my_connection_to_teradata,
if_exist="replace",
)
Related
So I have a large (7GB) dataset stored in postgres that I'm trying to import into Dask. I'm trying the read_sql_table function, but keep getting ArgumentErrors.
My info in postgres is the following:
database is "my_database"
schema is "public"
data table is "table"
username is "fred"
password is "my_pass"
index in postgres is 'idx'
I am trying to get this piece of code to work:
df = dd.read_sql_table('public.table', 'jdbc:postgresql://localhost/my_database?user=fred&password=my_pass', index_col='idx')
Am I formatting something incorrectly?
I was finally able to figure it out by using psycopg2. The answer is below:
df = dd.read_sql_table('table', 'postgresql+psycopg2://postgres:fred#localhost/my_database', index_col = 'idx')
Additionally, I had to create a different index in the postgres table. The original index needed to be a whole separate column. I did this with the following line in Postgres:
alter table table add idx serial;
I am trying to query data from Snowflake into a Jupyter Notebook. Since some columns were not present in the original table, I did create a temporary table which had the required new columns. Unfortunately, due to work restrictions, I couldn't show the whole output here. But when I did run the CREATE TEMPORARY TABLE command, got the following output.
Table CUSTOMER_ACCOUNT_NEW successfully created.
Here is the query I used to make the TEMP table.
CREATE OR REPLACE TEMPORARY TABLE DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT_NEW AS
SELECT ID,
VERIFICATION_PROFILE,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults')::VARCHAR AS identitymind,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults."mm:1"')::VARCHAR AS mm1,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults."mm:2"')::VARCHAR AS mm2,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults.res')::VARCHAR AS res,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults."ss:1"')::VARCHAR AS sanctions,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.account.riskScore')::VARCHAR AS riskscore,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.giact.verificationResponse')::VARCHAR AS GIACT,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.account.type')::VARCHAR AS acct_type,
get_path(VERIFICATION_PROFILE,'autoVerified.verified')::VARCHAR AS verified,
get_path(VERIFICATION_PROFILE,'bankInformationProvided')::VARCHAR AS Bank_info_given,
get_path(VERIFICATION_PROFILE,'businessInformationProvided')::VARCHAR AS Business_info_given,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.account.industry.riskLevel')::VARCHAR AS industry_risk
FROM DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT
WHERE DATEDIFF('day',TO_DATE(TIME_UPDATED),CURRENT_DATE())<=90
I would like to mention that VERIFICATION_PROFILE is a JSON blob, hence I had to use get_path to retrieve the values. Moreover, keys like mm:1 are originally in double quotes, so I did use it as it is, and it is working fine in snowflake.
Then using snowflake connector python, I did try to run following query;
import pandas as pd
import warnings
import snowflake.connector as sf
ctx = sf.connect(
user='*****',
password='*****',
account='*******',
warehouse='********',
database='DATA_LAKE',
schema='CUSTOMER'
)
#create cursor
curs = ctx.cursor()
sqlnew2 = "SELECT * \
FROM DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT_NEW;"
curs.execute(sqlnew2)
df = curs.fetch_pandas_all()
Here curs is the cursor object created earlier. Then I got the following message;
ProgrammingError: 002003 (42S02): SQL compilation error:
Object 'DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT_NEW' does not exist or not authorized.
May I know does snowflake connector allow us to query data from temporary tables pr not? Help/advice is greatly appreciated.
Temp tables only live as long as the session they were created lives:
Temporary tables can have a Time Travel retention period of 1 day; however, a temporary table is purged once the session (in which the table was created) ends so the actual retention period is for 24 hours or the remainder of the session, whichever is shorter.
You might want to use a transient table instead:
https://docs.snowflake.com/en/user-guide/tables-temp-transient.html#comparison-of-table-types
I read from an API the following data into a pandas dataframe:
Now, I want to write this data into a MySQL-DB-table, using pandas to_sql:
In MySQL, the column is set up correctly, but has not written the values:
Then I looked in the debugger to show me the dataframe:
I thought it would maybe a formatting issue, and added the following lines:
In the debugger, it looks now fine:
But now, in the database, it wants to write the index column as text
... and interrupts the execution with an error:
Is there a way to get this going, aka to write df index data as date into a MySQL DB using pandas to_SQL in connection with a sqlalchemy engine?
Edit:
Table schema:
DataFrame Header:
It seems you are using Date column as primary key. I would suggest not to use that as primary key instead you should use Date + Ticker as primary key.
I am working with a Teradata table that has a timestamp column: TIMESTAMP(6) with data that looks like this:
2/14/2019 13:09:51.210000
Currently I have a Python time variable that I want to send into the Teradata table via SQL, that looks like below:
from datetime import datetime
time = datetime.now().strftime("%m/%d/%Y %H:%M:%S")
02/14/2019 13:23:24
How can I reformat that to insert correctly? It is error'ing out with:
teradata.api.DatabaseError: (6760, '[22008] [Teradata][ODBC Teradata Driver][Teradata Database](-6760)Invalid timestamp.')
I tried using the same format the Teradata timestamp column uses:
time = datetime.now().strftime("%mm/%dd/%YYYY %HH24:%MI:%SS")
Same error message
Thanks
Figured it out. Turned out to be unrelated to the timestamp, and I had to reformat the DataFrame column it was being read from. Changing the Data Type fixed it:
final_result_set['RECORD_INSERTED'] = pd.to_datetime(final_result_set['RECORD_INSERTED'])
Now when looping through and inserting via SQL, the following worked fine for populating 'RECORD_INSERTED':
time = datetime.now().strftime("%m/%d/%Y %H:%M:%S")
Sorry for the confusion
I'm using the PyTd teradata module to query data from Teradata and want to read it into a Pandas DataFrame
import teradata
import pandas as pd
# teradata connection
udaExec = teradata.UdaExec(appName="Example", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", system="", username="", password="")
# Create empty dataframe with column names
query = session.execute("SELECT TOP 1 * FROM table")
cols = [str(d[0]) for d in query.description]
df = pd.DataFrame(columns=cols)
# Read data into dataframe
for row in session.execute("SELECT * FROM table"):
print type(row)
df.append(row)
row is of teradata.util.Row class and can't be appended to the dataframe. I tried converting it to a list but the format gets messed up.
How can I read my data into a dataframe from Teradata using the teradata module? I'm not able to use the pyodbc module for this.
Is there a better way to create the empty dataframe with column names matching those in the database?
You can use pandas.read_sql :)
import teradata
import pandas as pd
# teradata connection
udaExec = teradata.UdaExec(appName="Example", version="1.0",
logConsole=False)
with udaExec.connect(method="odbc", system="", username="", password="") as session:
query ="SELECT * FROM table"
df = pd.read_sql(query,session)
Using ‘with’ will ensure close of session after the query. I hope that helped :)
I know its a little late. But putting a note nevertheless.
There are a few questions here.
How can I read my data into a dataframe from Teradata using the
teradata module?
At the end of the day, a teradata.util.Row is simply a list. So a simple list operation should help you get things out of Row.
','.join(str(item) for item in row)
kinda thing.
Pushing that into a pandas dataframe should be a list to df conversion exercise.
I'm not able to use the pyodbc module for this.
I used teradata's python module to do a LDAP auth. All worked fine. Didn't have this requirement. Sorry.
Is there a better way to create the empty dataframe with column names matching those in the database?
I assume, given a table name, you can query to figure it schema (table names) >> convert that to a list and create your pandas df?
I know this is very late.
You can use read_sql() from pandas module. It returns pandas dataframe.
Here is the reference:
http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_sql.html