I have thousands of related CSVs and I want to write their contents to a Postgres table in a way that includes metadata about where each row came from.
I am not clear on how to write the variables I created near the top of my script into the table.
Can anyone advise?
target_directory = Path(sys.argv[1]).resolve()
# FOR THE WAC AND RAC DATASETS
for file in target_directory.rglob('*.csv'):
print(str(file.stem).split('_'))
state = str(file.stem).split('_')[0]
data_category = str(file.stem).split('_')[1]
workforce_segment = str(file.stem).split('_')[2] # THIS IS DIFFERENT FROM THE O-D DATASETS
job_type = str(file.stem).split('_')[3]
year = str(file.stem).split('_')[4]
print('Writing: ' + str(file.name))
# MAKE SURE THIS IS THE RIGHT TABLE FOR THE FILES
cur.execute(create_table_WAC)
with open(file,'r') as file_in:
# INSERT THE DATA IN USING THE COLUMN NAMES....SO YOU CAN ADD YOUR SPLIT STRING INFO ABOVE.....
# MAKE SURE THIS HAS THE RIGHT TABLE NAME IN THE COPY STATEMENT
cur.execute("INSERT INTO opendata_uscensus_usa_lodes_wac (serial_id, state_name, data_category, workforce_segment, job_type, year, w_geocode, C000, CA01, CA02, CA03, CE01, CE02) \
VALUES (%s, state_name, data_category, workforce_segment, job_type, year, %s, %s, %s, %s, %s, %s)")
conn.commit()
conn.close()
As per PEP-249 (Python Database API Specification) which most DB-APIs adhere to including pymssql, cx_oracle, ibm_db, pymysql, sqlite3, and pyodbc, in psycopg2 variables to be binded as parameters in prepared statements would go into the second argument of cur.execute(query, params).
Specifically, combine your file level variables with CSV variables during iteration and pass them as a list or tuple of parameters into execution call. Below uses the csv.DictReader method that builds a dictionary of every row from csv data.
NOTE: below query leaves out primary key, serial_id, which should populate via a sequence in Postgres table.
for file in target_directory.rglob('*.csv'):
print(str(file.stem).split('_'))
# FILE LEVEL VARIABLES
state_name = str(file.stem).split('_')[0]
data_category = str(file.stem).split('_')[1]
workforce_segment = str(file.stem).split('_')[2]
job_type = str(file.stem).split('_')[3]
year = str(file.stem).split('_')[4]
# PREPARED STATEMENT
sql = """INSERT INTO opendata_uscensus_usa_lodes_wac
(state_name, data_category, workforce_segment,
job_type, year, w_geocode, C000, CA01, CA02, CA03, CE01, CE02)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"""
with open(file,'r') as file_in:
# ITERATE THROUGH FOR CSV VARIABLES
reader = csv.DictReader(file_in)
for row in reader:
cur.execute(sql, (state,data_category,workforce_segment,job_type,year,
row['w_geocode'], row['C000'], row['CA01'],
row['CA02'], row['CA03'], row['CE01'], row['CE02'])
)
conn.commit()
Related
I am trying to format a insert query string for multiple rows but also with ON CONFLICT. I got all mixed up with formatting arguments.
ses_crud_sql = """INSERT INTO session_crud(orgid, appid, sessionid, userid, customlabel, uploadedon)
VALUES (%s, %s, %s, %s, %s, %s)
ON CONFLICT (orgid, appid)
DO UPDATE SET customlabel = array_append(customlabel, '%(label_name)s') WHERE sessionid=sessionid
"""
ses_crud_rows = [(org_id, app_id, sessionid, userid, str({label_name}), datetime.strftime(current_time, '%Y-%m-%d %H:%M:%S'))
for sessionid in session_ids]
cursor.executemany(ses_crud_sql, ses_crud_rows)
I need to insert multiple rows for every session in session_ids list.
So I also want to add %(label_name)s but this gives me
psycopg2.ProgrammingError: argument formats can't be mixed
I'm trying to import some CSV files to a table on a MySQL database. The CSV files are updated daily and my intention is to use this program in python to automate the process.
The idea is: if the information already exists, I want to update it. If the information doesn't exist, I want to insert the data.
But I'm having this error:
AttributeError
'DictCursor' object has no attribute 'update'
Thanks in advance.
csv_data = csv.reader(open('ATEG_REGIONAL_MG_DADOS_TB_ATIVIDADE.csv', encoding='ISO-8859-15'), delimiter=';')
next(csv_data)
for row in csv_data:
for i, l in enumerate(row):
if row[i] == '':
row[i] = None
cursor.execute('SELECT * FROM atividade WHERE CD_ATIVIDADE=%s', row[0])
if cursor.fetchall():
cursor.update('UPDATE atividade WHERE CD_ATIVIDADE = row[0]'),
else:
cursor.execute('INSERT INTO atividade (CD_ATIVIDADE, NM_ATIVIDADE, ST_ATIVO, COD_USUARIO_INCLUSAO, COD_USUARIO_ALTERACAO, DAT_INCLUSAO, DAT_ALTERACAO, CO_ATIVIDADE_REZOLVE, ROWID, FLG_SAFRA, FLG_PRODUTIVO, FLG_TIPO_ATIVIDADE, FLG_INDICADOR_ISA) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', row)
# close the connection to the database.
db.commit()
cursor.close()
print("Imported!")
If you are using psycopg2, there is no cursor.update() function present. Try cursor.execute() instead.
Also row[0] is considered as a string in your query. So, change it to:
cursor.execute('UPDATE atividade WHERE CD_ATIVIDADE = ' + row[0])
Seems like you are confusing two different libraries.
import MySQLdb and from flask_mysqldb import MySQL are two different libraries.
Since you are using flask adding this line app.config['MYSQL_CURSORCLASS'] = 'DictCursor' and then calling the cursor cursor=db.connection.cursor() should solve your problem.
Taken from the official git page
Upsert to MySQL using python and data from excel.
Im working on populating a MySQL DB, using python.
The data is stored on excel sheets.
Because the DB is suppossed to be used for monitoring "projects", there's a posibility for repeated pk, so in that case it need to be updated instead of insert, because a project can have many stages.
Also, there's a value to be inserted in the DB table, that can't be added from the spreadsheet. So i'm wondering if in that case, the insert of this value, most be done using a separated query for it or if theres a way to insert it in the same query. The value is the supplier ID and needs to be inserted between id_ops and cif_store.
And to finish, I need to perform an inner join, to import the store_id using the store_cif, from another table called store. I know how do it, but im wondering if it also must be executed from a sepparated query or can be performed at the sameone.
So far, i have done this.
import xlrd
import MySQLdb
def insert():
book = xlrd.open_workbook(r"C:\Users\DevEnviroment\Desktop\OPERACIONES.xlsx")
sheet = book.sheet_by_name("Sheet1")
database = MySQLdb.connect (host="localhost", user = "pytest", passwd = "password", db = "opstest1")
cursor = database.cursor()
query = """INSERT INTO operation (id_ops, cif_store, date, client,
time_resp, id_area_service) VALUES (%s, %s, %s, %s, %s, %s)"""
for r in range(1, sheet.nrows):
id_ops = sheet.cell(r,0).value
cif_store = sheet.cell(r,1).value
date = sheet.cell(r,2).value
client = sheet.cell(r,3).value
time_resp = sheet.cell(r,4).value
id_area_service = sheet.cell(r,5).value
values = (id_ops, cif_store, date, client, time_resp, id_area_service)
cursor.execute(query, values)
# Close the cursor
cursor.close()
# Commit the transaction
database.commit()
# Close the database connection
database.close()
# Print results
print ("")
print ("")
columns = str(sheet.ncols)
rows = str(sheet.nrows)
print ("Imported", columns,"columns and", rows, "rows. All Done!")
insert()
What you are looking for is INSERT ... ON DUPLICATE KEY UPDATE ...
Take a look here https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
Regarding the extraneous data, if its a static value for all rows you can just hard code it right into the INSERT query. If it's dynamic you'll have to write some additional logic.
For example:
query = """INSERT INTO operation (id_ops, hard_coded_value, cif_store, date, client,
time_resp, id_area_service) VALUES (%s, "my hard coded value", %s, %s, %s, %s, %s)"""
I am scraping a website and getting the companies details from it, Now I trying to store the data into database. But I am getting some error like
raise InternalError(errno, errorvalue)
pymysql.err.InternalError: (1054, "Unknown column 'companyaddress' in 'field list'")
Here is my code
for d in companydetail:
lis = d.find_all('li')
companyname = lis[0].get_text().strip()
companyaddress = lis[1].get_text().strip()
companycity = lis[2].get_text().strip()
try:
companypostalcode = lis[3].get_text().strip()
companypostalcode = companypostalcode.replace(",","")
except:
companypostalcode = lis[3].get_text().strip()
try:
companywebsite = lis[4].get_text().strip()
except IndexError:
companywebsite = 'null'
print (companyname)
print (companyaddress)
print (companycity)
print (companypostalcode)
print (companywebsite)
try:
with connection.cursor() as cursor:
print ('saving to db')
cursor.execute("INSERT INTO company(companyname,address,city,pincode,website) VALUES (companyname,companyaddress,companycity,companypostalcode,companywebsite)")
connection.commit()
connection.close()
I am getting my data which I want but it I am not able to store data into database.
The result which I get while print (companyname) and print (campanyaddress) is :
NINGBO BOIGLE DIGITAL TECHNOLOGY CO.,LTD.
TIANYUAN INDUSTRIAL ZONE CIXI NINGBO
ZHEJIANGNINGBO
315325
http://www.boigle.com.cn
You cannot simply use variable names inside a query string as you do:
cursor.execute("INSERT INTO company(companyname,address,city,pincode,website) VALUES (companyname,companyaddress,companycity,companypostalcode,companywebsite)")
Instead, pass your variables into the query making it parameterized:
params = (companyname, companyaddress, companycity, companypostalcode, companywebsite)
cursor.execute("""
INSERT INTO
company
(companyname, address, city, pincode, website)
VALUES
(%s, %s, %s, %s, %s)
""", params)
In
cursor.execute("INSERT INTO company(companyname,address,city,pincode,website) VALUES (companyname,companyaddress,companycity,companypostalcode,companywebsite)")
the values in the second bracket are interpreted as table fields, rather than as python variables. Try
cursor.execute("""INSERT INTO company(
companyname,address,city,pincode,website)
VALUES (%s, %s, %s, %s, %s)""",
(companyname, companyaddress, companycity,
companypostalcode, companywebsite))
instead. You may also want to consult the docs on that.
I need to take data from a csv file and import it into two mysql tables within the same database.
CSV file:
username,password,path
FP_Baby,7tO0Oj/QjRSSs16,FP_Baby
lukebryan,uu00U62SKhO.sgE,lukebryan
saul,r320QdyLJEXKEsQ,saul
jencarlos,LOO07D5ZxpyzMAg,jencarlos
abepark,HUo0/XGUeJ28jaA,abepark
From the CSV file
username and password go into the USERS table
path goes into VFS_PERMISSIONS table
The USERS table looks like
INSERT INTO `USERS` (`userid`, `username`, `password`, `server_group`) VALUES
(23, 'username', 'password', 'MainUsers'),
INSERT INTO `VFS_PERMISSIONS` (`userid`, `path`, `privs`) VALUES
(23, '/path/', '(read)(write)(view)(delete)(resume)(share)(slideshow)(rename)(makedir)(deletedir)'),
if possible I'd like to start the userid in both tables at 24 and increment +1 for each row in the csv.
SO far I can read the csv files but I can't figure out how to insert into two mysql tables.
#!/usr/bin/env python
import csv
import sys
import MySQLdb
conn = MySQLdb.connect(host= "localhost",
user="crushlb",
passwd="password",
db="crushlb")
x = conn.cursor()
f = open(sys.argv[1], 'rt')
try:
reader = csv.reader(f)
for row in reader:
## mysql stuff goes here right?
finally:
f.close()
You can reduce the number of calls to cursor.execute by preparing the arguments in advance (in the loop), and calling cursor.executemany after the loop has completed:
cursor = conn.cursor()
user_args = []
perm_args = []
perms = '(read)(write)(view)(delete)(resume)(share)(slideshow)(rename)(makedir)(deletedir)'
with open(sys.argv[1], 'rt') as f:
for id, row in enumerate(csv.reader(f), start = 24):
username, password, path = row
user_args.append((id, username, password, 'MainUsers'))
perm_args.append((id, path, perms))
insert_users = '''
INSERT IGNORE INTO `USERS`
(`userid`, `username`, `password`, `server_group`)
VALUES (%s, %s, %s, %s)
'''
insert_vfs_permissions = '''
INSERT IGNORE INTO `VFS_PERMISSIONS`
(`userid`, `path`, `privs`)
VALUES (%s, %s, %s)
'''
cursor.executemany(insert_users,user_args)
cursor.executemany(insert_vfs_permissions,perm_args)
INSERT IGNORE tells MySQL to try to insert rows into the MySQL table, but ignore the command if there is a conflict. For example, if userid is the PRIMARY KEY, and there is already a row with the same userid, then the INSERT IGNORE SQL will ignore the command to insert a new row since that would create two rows with the same PRIMARY KEY.
Without the IGNORE, the cursor.executemany command would raise an exception and fail to insert any rows.
I used INSERT IGNORE so you can run the code more than once without cursor.executemany raising an exception.
There is also a INSERT ... ON DUPLICATE KEY UPDATE command which tells MySQL to try to insert a row, but update it if there is a conflict, but I'll leave it at this unless you want to know more about ON DUPLICATE KEY.
Since you already know the sql statements that you wan to execute, it should be more or less straightforward to use the cursor.execute method:
offset = 23
for row_number, row in enumerate(reader):
username, password, path = row
x.execute("INSERT INTO `USERS` (`userid`, `username`, `password`, `server_group`) "
"VALUES (%s, %s, %s, 'MainUsers')", (row_number+offset, username, password))
x.execute("INSERT INTO `VFS_PERMISSIONS` (`userid`, `path`, `privs`) "
"VALUES (%s, %s, '(read)(write)(view)(delete)(resume)(share)(slideshow)(rename)(makedir)(deletedir)'", (row_number+offset, path))