Setting Data Frame Column Names with Data Frame includes extra characters: ('ColumnName',) - python

I've got a python script set to pull data and column names from a Pervasive PSQL database, and it then creates the table and records in MS SQL. I'm creating data frames for the data and for the column names, then renaming the the data's column names from the column name date frame.
However, when the table is created in MS SQL the column names come in as = ('ColumnName',)
The desired column names would not have ('',) and should read as = ColumnName
Below is the code i'm using to get here. Any help on formatting the column names to not include those extra characters would be very helpful!
'''
Start - Pull Table Data
'''
conn = pyodbc.connect(conn_str, autocommit=True)
cursor = conn.cursor()
statement = 'select * from '+db_tbl_name
stRows = cursor.execute(statement)
df_data = pandas.DataFrame((tuple(t) for t in stRows))
df_data = df_data.applymap(str)
'''
End - Pull Table Data
'''
'''
Start - Pull Column Names
'''
conn = pyodbc.connect(conn_str, autocommit=True)
cursor_col = conn.cursor()
statement_col = "select CustomColumnName from "+db_col_tbl_name+" where CustomTableName = '"+db_tbl_name+"' and ODBCOptions > 0 order by FieldNumber"
stRows_col = cursor_col.execute(statement_col)
df_col = pandas.DataFrame((tuple(t) for t in stRows_col))
'''
End - Pull Column Names
'''
'''
Start - Add Column Names to Table data (df_data)
'''
df_data.columns = df_col
'''
End - Add Column Names to Table data (df_data)
'''
'''
Start - Create a sqlalchemy engine
'''
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=Server;"
"DATABASE=DB;"
"UID=UID;"
"PWD=PWD;")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
'''
End - Create a sqlalchemy engine
'''
'''
Start - Create sql table and rows from sqlalchemy engine
'''
df_data.to_sql(name=db_tbl_name, con=engine, if_exists='replace')
'''
End - Create sql table and rows from sqlalchemy engine
'''

This worked for me and should resolve the issue. Change.
df_col = pandas.DataFrame((tuple(t) for t in stRows_col))
to
df_col=[]
for row in stRows_col:
df_col.append(row[0])
Pyodbc would be moving the data it captures into pyodbc objects. The type(stRows_col) would yield <class 'pyodbc.Cursor'> and row would give you a <class 'pyodbc.Row'>. You can get the value for the pyodbc.Row by using row[0]

Related

How to insert data frame result into one specific column in python?

I have a data frame result which I need to insert into existing table line by line, How will I insert result into a specific column name - imgtext
table structure
As in SQL I know I can write query as -
INSERT INTO tableName(imgtext) VALUES('Learn MySQL INSERT Statement');
Python script:
This code takes some value from csv and return some data with help of beautifulsoup
resulted data will save in csv
Problem:
Now rather than saving into csv how will I inserted resulted data into SQL table in specific column name imgtext
seeking solution:
How will I process csv data using data frame for inserting result into SQL rather than CSV.
`
img_text_list = []
df1 = pd.DataFrame(
columns=['imgtext'])
img_formats = [".jpg", ".jpeg"]
df = pd.read_csv("urls.csv")
urls = df["urls"].tolist()
for y in urls:
response = requests.get(y)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img', class_='pick')
img_srcs = ["https://myimpact.in/" + img['src'].replace(
'\\', '/') if img.has_attr('src') else '-' for img in img_tags]
for count, x in enumerate(img_srcs):
if x != '-':
if pathlib.Path(x).suffix in img_formats:
response = requests.get(x)
img = Image.open(io.BytesIO(response.content))
text = pt.image_to_string(img, lang="hin")
# how to insert this text value into sql table column name - imgtext
img_text_list.append(text)
df1['img_text'] = img_text_list
df1.to_csv('data.csv', encoding='utf-8')
`
to add value from CSV to your SQL table you will need to use a Python SQL Driver (pyodbc). Please see the sample code for connecting python to SQL.
sample code:
import pyodbc
import pandas as pd
server = 'yourservername'
database = 'yourdatabasename'
username = 'username'
password = 'yourpassword'
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# Insert Dataframe into SQL Server:
for index, row in df.iterrows():
cursor.execute("Insert your QUERY here")
cnxn.commit()
cursor.close()
Prequisite:
Please install the pyodbc package here https://mkleehammer.github.io/pyodbc/
Reference:
https://learn.microsoft.com/en-us/sql/machine-learning/data-exploration/python-dataframe-sql-server?view=sql-server-ver16

Combining external temp table from cloud storage with pre existing big query table- append from python

I have a permanent table in bigquery that I want to append to with data coming from a csv in google cloud storage. I first read the csv file into a big query temp table:
table_id = "incremental_custs"
external_config = bigquery.ExternalConfig("CSV")
external_config.source_uris = [
"gs://location/to/csv/customers_5083983446185_test.csv"
]
external_config.schema=schema
external_config.options.skip_leading_rows = 1
job_config = bigquery.QueryJobConfig(table_definitions={table_id: external_config})
sql_test = "SELECT * FROM `{table_id}`;".format(table_id=table_id)
query_job = bq_client.query(sql_test,job_config=job_config)
customer_updates = query_job.result()
print(customer_updates.total_rows)
Up until here all works and I retrieve the records from the tmp table. Issue arises when I try to then combine it with a permanent table:
sql = """
create table `{project_id}.{dataset}.{table_new}` as (
select customer_id, email, accepts_marketing, first_name, last_name,phone,updated_at,orders_count,state,
total_spent,last_order_name,tags,ll_email,points_approved,points_spent,guest,enrolled_at,ll_updated_at,referral_id,
referred_by,referral_url,loyalty_tier_membership,insights_segment,rewards_claimed
from (
select * from `{project_id}.{dataset}.{old_table}`
union all
select * from `{table_id}`
ORDER BY customer_id, orders_count DESC
))
order by orders_count desc
""".format(project_id=project_id, dataset=dataset_id, table_new=table_new, old_table=old_table, table_id=table_id)
query_job = bq_client.query(sql)
query_result = query_job.result()
I get the following error:
BadRequest: 400 Table name "incremental_custs" missing dataset while no default dataset is set in the request.
Am I missing something here? Thanks !
Arf, you forgot the external config! You don't pass it in your second script
query_job = bq_client.query(sql)
Simply update it like in the first one
query_job = bq_client.query(sql_test,job_config=job_config)
A fresh look is always easier!

SQL query not running in Python

I have the following Python code:
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector
# Give the location of the file
loc = ("C:\\Users\\27826\\Desktop\\11Sixteen\\Models and Reports\\Historical results files\\EPL 1993-94.csv")
df = pd.read_csv(loc)
# Remove empty columns then rows
df = df.dropna(axis=1, how='all')
df = df.dropna(axis=0, how='all')
# Create DataFrame and then import to db (new game results table)
engine = create_engine("mysql://root:xxx#localhost/11sixteen")
df.to_sql('new_game_results', con=engine, if_exists="replace")
# Move from new games results table to game results table
db = mysql.connector.connect(host="localhost",
user="root",
passwd="xxx",
database="11sixteen")
my_cursor = db.cursor()
my_cursor.execute("INSERT INTO 11sixteen.game_results "
"SELECT * FROM 11sixteen.new_game_results WHERE "
"NOT EXISTS (SELECT date, HomeTeam "
"FROM 11sixteen.game_results WHERE "
"11sixteen.game_results.date = 11sixteen.new_game_results.date AND "
"11sixteen.game_results.HomeTeam = 11sixteen.new_game_results.HomeTeam)")
print("complete")
Basically the objective is that I copy data from several excel files to a SQL table (one at a time) and then transfer it from there to the fuller table where ALL the data will be aggregated (without duplicates hopefully)
Everything works 100% except the SQL query as below:
INSERT INTO 11sixteen.game_results
SELECT * FROM 11sixteen.new_game_results
WHERE NOT EXISTS ( SELECT date, HomeTeam
FROM 11sixteen.game_results WHERE
11sixteen.game_results.date = 11sixteen.new_game_results.date AND
11sixteen.game_results.HomeTeam = 11sixteen.new_game_results.HomeTeam)
If I run the same query on MySQL Workbench it works perfect. Any ideas why I can't get Python to execute the query as expected?
add a commit at the end.
db.commit()

Sqlalchemy: add into mysql table new rows from pandas dataframe, if they don't exist already in the table

I created a table inserting data fetched from an api and store in to a pandas dataframe using sqlalchemy.
I am gonna need to query the api, every 4 hours, to get new data.
Problem being that the api, will give me back not only the new data but as well the old ones, already imported in mysql
how can i import just the new data into the mysql table
i retrieved the data from the api, stored the data in to a pandas object, created the connection to the mysql db and created a fresh new table.
import requests
import json
from pandas.io.json import json_normalize
myToken = 'xxx'
myUrl = 'somewebsite'
head = {'Authorization': 'token {}'.format(myToken)}
response = requests.get(myUrl, headers=head)
data=response.json()
#print(data.dumps(data, indent=4, sort_keys=True))
results=json_normalize(data['results'])
results.rename(columns={'datastream.name': 'datastream_name',
'datastream.url':'datastream_url',
'datastream.datastream_type_id':'datastream_id',
'start':'error_date'}, inplace=True)
results_final=pd.DataFrame([results.datastream_name,
results.datastream_url,
results.error_date,
results.datastream_id,
results.message,
results.type_label]).transpose()
from sqlalchemy import create_engine
from sqlalchemy import exc
engine = create_engine('mysql://usr:psw#ip/schema')
con = engine.connect()
results_final.to_sql(name='error',con=con,if_exists='replace')
con.close()
End goal is to insert into the table, just the not existing data coming from the api
You could pull the results already in the database into a new dataframe and then compare the two dataframes. After that you would only insert the rows not in the table. Not knowing the format of your table or data I'm just using a generic SELECT statement here.
from sqlalchemy import create_engine
from sqlalchemy import exc
engine = create_engine('mysql://usr:psw#ip/schema')
con = engine.connect()
sql = "SELECT * FROM table_name"
old_results = pd.read_sql(sql, con)
df = pd.merge(old_results, results_final, how='outer', indicator=True)
new_results = df[df['_merge']=='right_only'][results_final.columns]
new_results.to_sql(name='error',con=con,if_exists='append')
con.close()
You also need to change if_exists to append because set to replace it drops all values in the table and replaces them with the values in the pandas dataframe.
I developed this function to handle both: news values and when columns from the source table and target table are not equal.
def load_data(df):
engine = create_engine('mysql+pymysql://root:pass#localhost/dw', echo_pool=True, pool_size=10, max_overflow=20)
with engine.connect() as conn, conn.begin():
try:
df_old = pd.read_sql('SELECT * FROM table', conn)
# Check if exists new rows to be inserted
if len(df) > len(df_saved) or df.disconnected_time.max() > df_saved.disconnected_time.max():
print("There are new rows to be inserted. ")
df_merged = pd.merge(df_old, df, how='outer', indicator=True)
df_final = df_merged[df_merged['_merge']=='right_only'][df.columns]
df_final.to_sql(name='table',con=conn,index=False, if_exists='append')
except Exception as err:
print (str(err))
else:
# This handling errors when the lengths of the columns are not equal to the target
if df_bulbr.shape[1] > df_old.shape[1]:
data = pd.read_sql('SELECT * FROM table', conn)
df2 = pd.concat([df,data])
df2.to_sql('table', conn, index=False, if_exists='replace')
outcome = conn.execute("select count(1) from table")
countRow = outcome.first()[0]
return print(f" Total of {countRow} rows load." )

Postgresql: how to copy a column from a table to another using psycopg2?

I am using psycopg2 in python to manage my database.
I created two tables. One called data that I want to populate with some data. For doing so I created a temporary table called temporarytable from a .csv file.
I want to copy the column numberofinhabitants. So what I am doing is the following:
### Add Column
import psycopg2 as ps
stat1 = """ ALTER TABLE data ADD COLUMN numberofinhabitants integer"""
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
cur.execute(stat1)
cur.close()
con.close()
### Copy Column
stat2 = """INSERT INTO data (numberofinhabitants) SELECT numberofinhabitants FROM temporarytable"""
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
cur.execute(stat2)
cur.close()
con.close()
but I get the following error
ProgrammingError: column "numberofinhabitants" does not exist
LINE 1: INSERT INTO data (numberofinhabitants) SELECT numberofinhabi...
^
HINT: There is a column named "numberofinhabitants" in table "data", but it cannot be referenced from this part of the query.
Below a screenshot from pgAdmin3 after SELECT * FROM temporarytable;
I think the problem is that PostgreSQL's columns are case sensitive. You should try this as stat2:
stat2 = """INSERT INTO data (numberofinhabitants) SELECT "numberOfInhabitants" FROM temporarytable"""
Note that you should also use " for columns with upper characters in them.

Categories

Resources