function for Loading data from python to postgres - python

I am trying to load data from a CSV file, transform it and load it to postgres on adminer. I have got some code which i dont really understand but works for when I insert more than 2 columns of data into a table, I have a table with just one column I want to insert but the code gives me 'Series' object has no attribute 'columns' and its due to cols = ','.join(list(datafrm.columns)) in the error comment.
How can I account for this so I can load 1 or more columns without error?
How can I make this code more agnostic?
Any support will be greatly appreciated
# Define function using psycopg2.extras.execute_values() to insert the dataframe.
def execute_values(conn, datafrm, table):
# Creating a list of tupples from the dataframe values
tpls = [tuple(x) for x in datafrm.to_numpy()]
# dataframe columns with Comma-separated
cols = ','.join(list(datafrm.columns))
# SQL query to execute
sql = "INSERT INTO %s(%s) VALUES %%s" % (table, cols)
cursor = conn.cursor()
try:
extras.execute_values(cursor, sql, tpls)
print("Data inserted using execute_values() successfully..")
except (Exception, psycopg2.DatabaseError) as err:
# pass exception to function
show_psycopg2_exception(err)
cursor.close()
# Connect to the database
conn = connect(conn_params)
conn.autocommit = True
execute_values(conn, datafrm, 'table')

Related

Dask Dataframe read_sql_table & to_sql method throwing TypeError - Provided index column is of type "object"

Problem/Context: The problem i am facing is i am trying to insert the dask dataframe to postgres db but it seems i am going through bad luck. I tried almost 20-30 times by changing param also different methods and still the same error I am getting. This is might feel similar but it isn't.
Error: *TypeError: Provided index column is of type "object". If divisions is not provided the index column type must be numeric or datetime. OR cannot process UnicodeResultProcessor objects *
Some function help to get started:
This will help in creating the table in postgres
def create_table(db, table):
"""
Function to create table if not exist
"""
try:
conn = psycopg2.connect('postgresql://postgres:root#localhost:5432/dask_ml_table')
except:
print("Unable to connect to the database!")
cur = conn.cursor()
if table == "one_table":
try:
cur.execute("""CREATE TABLE one_table (date date,Sales integer,X_1 integer,X_2 integer,X_3 integer,X_4 integer,X_5 integer);""")
except Exception as e:
print(e)
conn.commit()
conn.close()
cur.close()
#Call Fn
create_table('dask_ml_table','one_table')
This will help in creating dataframe to insert
from faker import Faker
from dask.dataframe import from_pandas
import pandas as pd
import random
fake = Faker()
def create_rows_faker(num=1):
output = [{"X_5":fake.name(),
"X_4":fake.address(),
"X_3":fake.email(),
#"bs":fake.bs(),
"X_2":fake.city(),
"X_1":fake.state(),
"date":fake.date_time(),
#"paragraph":fake.paragraph(),
#"Conrad":fake.catch_phrase(),
"Sales":random.randint(1000,2000)} for x in range(num)]
return output
df_faker = pd.DataFrame(create_rows_faker(1000))
ddf = from_pandas(df_faker, npartitions=1)
Action Plan:
Insert the above generated dataframe to postgres db
Fetch all rows back in dask dataframe
My Approach:
i) First Write
ddf.to_sql('one_table',uri='postgresql://postgres:root#localhost/dask_ml_test',
if_exists='append',method='multi',index=False )
dataframe.compute()
Error: TypeError: can't pickle sqlalchemy.cprocessors.UnicodeResultProcessor objects
ii) Read
username = 'postgres'
password = 'XXXX'
server = 'localhost'
database = 'dask_ml_table'
connection_string = f'postgresql+psycopg2://{username}:{password}#{server}/{database}'
engine = create_engine(connection_string)
metadata = MetaData()
t = Table('one_table', metadata,
Column('date'))
sel = select([t]).limit(5).alias('foo')
dd.read_sql_table(sel, connection_string, index_col='date')
Error: TypeError: Provided index column is of type "object". If divisions is not provided the index column type must be numeric or datetime
iii) 2nd Read
ndf = dd.read_sql_table("select * from one_table", "postgresql://postgres:root#localhost:5432/dask_ml_table", 'date', npartitions=1)
Error: Table not found
Please Help, Need help in both inserting and reading data from dask only. I know it might feel similar, but i have tried to find majority of solution from SOF only, but hardly any worked.

Error when writing dask dataframe to mssql via turbodc

I have a dask dataframe which has 220 partitions and 7 columns. I have imported this file from a bcp file as and completed some wrangling in dask. I then want to write this whole file to mssql using turboodbc. I connect to the DB as follows:
mydb = 'TEST'
from turbodbc import connect, make_options
connection = connect(driver="ODBC Driver 17 for SQL Server",
server="TEST SERVER",
port="1433",
database=mydb,
uid="sa",
pwd="5pITfir3")
I then use the function i found from a medium article to write to a test table in the DB:
def turbo_write(mydb, df, table):
"""Use turbodbc to insert data into sql."""
start = time.time()
# preparing columns
columns = '('
columns += ', '.join(df.columns)
columns += ')'
# preparing value place holders
val_place_holder = ['?' for col in df.columns]
sql_val = '('
sql_val += ', '.join(val_place_holder)
sql_val += ')'
# writing sql query for turbodbc
sql = f"""
INSERT INTO {mydb}.dbo.{table} {columns}
VALUES {sql_val}
"""
print(sql)
print(sql_val)
# writing array of values for turbodbc
values_df = [df[col].values for col in df.columns]
print(values_df)
# cleans the previous head insert
with connection.cursor() as cursor:
cursor.execute(f"delete from {mydb}.dbo.{table}")
connection.commit()
# inserts data, for real
with connection.cursor() as cursor:
#try:
cursor.executemanycolumns(sql, values_df)
connection.commit()
# except Exception:
# connection.rollback()
# print('something went wrong')
stop = time.time() - start
return print(f'finished in {stop} seconds')
This works when I upload a small amount of rows as follows:
turbo_write(mydb, df_train.head(1000), table)
When i try to do a larger number of rows it fails:
turbo_write(mydb, df_train.head(10000), table)
I get the error:
RuntimeError: Unable to cast Python instance to C++ type (compile in
debug mode for details)
How do i write the whole dask dataframe to mssql without any errors?
I needed to convert to a maskedarray by changing:
# writing array of values for turbodbc
values_df = [df[col].values for col in df.columns]
to
values_df = [np.ma.MaskedArray(df[col].values, pd.isnull(df[col].values)) for col in df.columns]
I can then write all the data using:
for i in range(df_train.npartitions):
partition = df_train.get_partition(i)
turbo_write(mydb, partition, table)
i += 1
This still takes a long time to write in comparison to saving the file and writing to the DB using BCP. If anyone has any more efficient suggestions I would love to see them.

How to interact with Python-Mysql

I have done the following code and I would like to ask the user to input how many new records want and after to fill column by column those records.
import MySQLdb
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd="Adam!977",
database="testdb1"
)
cur = mydb.cursor()
get_tables_statement = """SHOW TABLES"""
cur.execute(get_tables_statement)
tables = cur.fetchall()
table = tables(gene)
x=input("How many records you desire: ")
x
print "Please enter the data you would like to insert into table %s" %(table)
columns = []
values = []
for j in xrange(0, len(gene)):
column = gene[j][0]
value = raw_input("Value to insert for column '%s'?"%(gene[j][0]))
columns.append(str(column))
values.append('"' + str(value) + '"')
columns = ','.join(columns)
values = ','.join(values)
print columns
print values
The error that i get is about table gene( The table exist in db of SQL)
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\π.py", line 25, in
table = tables(gene)
NameError: name 'gene' is not defined
Also, even I don't know if working properly the code. Please, I need help. Thank you
The error being returned by python is down to the lack of definition of a variable gene. In the following line you reference gene, without it existing:
table = tables(gene)
In the documentation for the python mysql connector, under cursor.fetchall() you'll notice that this method returns either a list of tuples or an empty list. It is therefore somewhat puzzling why you call tables as a function and attempt to pass a parameter to it - this is not correct syntax for accessing a list, or a tuple.
At the beginning of your code example you fetch a list of all of the tables in your database, despite knowing that you only want to update a specific table. It would make more sense to simply reference the name of the table in your SQL query, rather than querying all of the tables that exist and then in python selecting one. For example, the following query would give you 10 records from the table 'gene':
SELECT * FROM gene LIMIT 10
Below is an attempt to correct your code:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd="Adam!977",
database="testdb1"
)
x=input("How many records you desire: ")
cur = mydb.cursor()
get_rows_statement = """SELECT * FROM gene"""
cur.execute(get_rows_statement)
results = cur.fetchall()
This should give you all of the rows within the table.

How to insert a dictionary into Postgresql Table with Pscycopg2

How do I insert a python dictionary into a Postgresql2 table? I keep getting the following error, so my query is not formatted correctly:
Error syntax error at or near "To" LINE 1: INSERT INTO bill_summary VALUES(To designate the facility of...
import psycopg2
import json
import psycopg2.extras
import sys
with open('data.json', 'r') as f:
data = json.load(f)
con = None
try:
con = psycopg2.connect(database='sanctionsdb', user='dbuser')
cur = con.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute("CREATE TABLE bill_summary(title VARCHAR PRIMARY KEY, summary_text VARCHAR, action_date VARCHAR, action_desc VARCHAR)")
for d in data:
action_date = d['action-date']
title = d['title']
summary_text = d['summary-text']
action_date = d['action-date']
action_desc = d['action-desc']
q = "INSERT INTO bill_summary VALUES(" +str(title)+str(summary_text)+str(action_date)+str(action_desc)+")"
cur.execute(q)
con.commit()
except psycopg2.DatabaseError, e:
if con:
con.rollback()
print 'Error %s' % e
sys.exit(1)
finally:
if con:
con.close()
You should use the dictionary as the second parameter to cursor.execute(). See the example code after this statement in the documentation:
Named arguments are supported too using %(name)s placeholders in the query and specifying the values into a mapping.
So your code may be as simple as this:
with open('data.json', 'r') as f:
data = json.load(f)
print(data)
""" above prints something like this:
{'title': 'the first action', 'summary-text': 'some summary', 'action-date': '2018-08-08', 'action-desc': 'action description'}
use the json keys as named parameters:
"""
cur = con.cursor()
q = "INSERT INTO bill_summary VALUES(%(title)s, %(summary-text)s, %(action-date)s, %(action-desc)s)"
cur.execute(q, data)
con.commit()
Note also this warning (from the same page of the documentation):
Warning: Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
q = "INSERT INTO bill_summary VALUES(" +str(title)+str(summary_text)+str(action_date)+str(action_desc)+")"
You're writing your query in a wrong way, by concatenating the values, they should rather be the comma-separated elements, like this:
q = "INSERT INTO bill_summary VALUES({0},{1},{2},{3})".format(str(title), str(summery_text), str(action_date),str(action_desc))
Since you're not specifying the columns names, I already suppose they are in the same orders as you have written the value in your insert query. There are basically two way of writing insert query in postgresql. One is by specifying the columns names and their corresponding values like this:
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)
VALUES (value1, value2, value3,...valueN);
Another way is, You may not need to specify the column(s) name in the SQL query if you are adding values for all the columns of the table. However, make sure the order of the values is in the same order as the columns in the table. Which you have used in your query, like this:
INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);

How to get table column-name/header for SQL query in python

I have the data in pandas dataframe which I am storing in SQLITE database using Python. When I am trying to query the tables inside it, I am able to get the results but without the column names. Can someone please guide me.
sql_query = """Select date(report_date), insertion_order_id, sum(impressions), sum(clicks), (sum(clicks)+0.0)/sum(impressions)*100 as CTR
from RawDailySummaries
Group By report_date, insertion_order_id
Having report_date like '2014-08-12%' """
cursor.execute(sql_query)
query1 = cursor.fetchall()
for i in query1:
print i
Below is the output that I get
(u'2014-08-12', 10187, 2024, 8, 0.3952569169960474)
(u'2014-08-12', 12419, 15054, 176, 1.1691244851866613)
What do I need to do to display the results in a tabular form with column names
In DB-API 2.0 compliant clients, cursor.description is a sequence of 7-item sequences of the form (<name>, <type_code>, <display_size>, <internal_size>, <precision>, <scale>, <null_ok>), one for each column, as described here. Note description will be None if the result of the execute statement is empty.
If you want to create a list of the column names, you can use list comprehension like this: column_names = [i[0] for i in cursor.description] then do with them whatever you'd like.
Alternatively, you can set the row_factory parameter of the connection object to something that provides column names with the results. An example of a dictionary-based row factory for SQLite is found here, and you can see a discussion of the sqlite3.Row type below that.
Step 1: Select your engine like pyodbc, SQLAlchemy etc.
Step 2: Establish connection
cursor = connection.cursor()
Step 3: Execute SQL statement
cursor.execute("Select * from db.table where condition=1")
Step 4: Extract Header from connection variable description
headers = [i[0] for i in cursor.description]
print(headers)
Try Pandas .read_sql(), I can't check it right now but it should be something like:
pd.read_sql( Q , connection)
Here is a sample code using cx_Oracle, that should do what is expected:
import cx_Oracle
def test_oracle():
connection = cx_Oracle.connect('user', 'password', 'tns')
try:
cursor = connection.cursor()
cursor.execute('SELECT day_no,area_code ,start_date from dic.b_td_m_area where rownum<10')
#only print head
title = [i[0] for i in cursor.description]
print(title)
# column info
for x in cursor.description:
print(x)
finally:
cursor.close()
if __name__ == "__main__":
test_oracle();

Categories

Resources