Python generating column header attributes from unseen csv file into sql table - python

I understand how to populate an sql table with a csv file by using:
cursor.execute('CREATE TABLE test (name, age, number)')
csvfile = open('test.csv', 'rb')
creader = csv.reader(csvfile, delimiter=',')
for t in creader:
cursor.execute('INSERT INTO sentence VALUES (?,?,?)', t)
However, I'm faced with an issue where, I dont know what the csv file may hold, thus can't explicitly create a table with named column attributes. All I know is that the file will have column headers and my question is, how do I set that the headers are the column attributes? For example:
Row 1 in csv has the (unknown) headers e.g. name, number, group. I'd like those to be attributes in the table t.
I attempted this:
import csv, sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE t (col1, col2);")
with open('data.csv','rb') as fin:
dr = csv.reader(fin)
dicts = ({'col1': line[0], 'col2': line[1]} for line in dr)
to_db = ((i['col1'], i['col2']) for i in dicts)
cur.executemany("INSERT INTO t (col1, col2) VALUES (?, ?);", to_db)
con.commit()
But the error is that I'm getting a ValueError: I/O operation on closed file

I'll make an assumption based on your indentation being wrong in the question. You're probably still trying to work with the file after finishing your with statement.
import csv, sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE t (col1, col2);")
with open('data.csv','rb') as fin:
dr = csv.reader(fin)
dicts = ({'col1': line[0], 'col2': line[1]} for line in dr)
to_db = ((i['col1'], i['col2']) for i in dicts)
cur.executemany("INSERT INTO t (col1, col2) VALUES (?, ?);", to_db)
con.commit()
When you're using with open() the file is automatically closed when you're finished with it.
The alternative, older method, would be similar to;
fin = open('data.csv', 'rb') # opens the csv file
try:
dr = csv.reader(fin)
dicts = ({'col1': line[0], 'col2': line[1]} for line in dr)
to_db = ((i['col1'], i['col2']) for i in dicts)
finally:
f.close() # closing

Related

How to change column datatype when loading csv in postgresql

I have a big script, the result is that the data is stored in a dataframe and then in csv. Then csv is opened and written to PostgreSQL. But there is a problem that the data type of one column is int4, and after opening csv the column format is 'text'. I cannot change the data type in the database, they must be there exactly as int. Tell me pls how to do it.
total.to_csv("C:/Users/.../total_type19.csv", index = False, sep =';')
conn5 = psycopg2.connect(dbname='', user='',
password='', host='', port = '')
cursor5 = conn5.cursor()
with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
reader = csv.reader(file, delimiter = ";")
for row in reader:
# print(row)
cursor5.execute(
'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
(row[0], row[1], row[2], row[3], row[4], row[5]))
conn5.commit()
The test_id column must be in int4 format
['312229', "['[{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3},{from:[5,8],to:[9,8],color:5},{from:[5,11],to:[10,11],color:6},{from:[1,0],to:[1,11],color:0},{from:[10,1],to:[10,6],color:4},{from:[3,0],to:[8,0],color:1}],']", '[\'["v","b","c","c","a","h","i","e","r","s","f","j"],["d","i","w","s","s","r","i","f","y","y","f","c"],["j","b","m","w","d","q","s","q","t","w","e","m"],["x","l","m","m","l","s","o","x","d","q","u","t"],["l","i","f","p","l","a","c","e","t","u","t","o"],["m","o","s","b","r","t","c","y","z","v","r","r"],["j","t","x","c","a","r","t","a","b","l","e","o"],["b","h","k","m","d","b","r","y","q","u","i","y"],["y","è","s","r","h","g","o","m","m","e","w","h"],["u","q","p","c","s","c","x","b","k","e","d","o"],["u","u","o","l","q","v","y","y","b","y","e","h"],["r","e","o","u","j","b","u","r","e","a","u","k"]],\']', '[\'"#ff0000","#00fe00","#0000ff","#d2ea9a","#407f76","#211f95","#e1f233"\']', '[\'"place","cartable","gomme","bureau","bibliothèque","feutre","cahier"\']']
This is an example of one line from csv. Looks bad but that's the way it should be
Can you change your data to int or is it something like "m22" non-integer?
# to remove non-numeric digits from string
with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
reader = csv.reader(file, delimiter = ";")
header = next(reader )
print(f"HEADER {header}")
counter = 1 #or whatever number you want to start with
for row in reader:
print(row)
test_id =row[0]
test_id = ''.join([i for i in test_id if i.isdigit()])
if test_id == '':
counter +=1
test_id = counter
else:
test_id = int(test_id)
print(test_id)
cursor5.execute(
'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
(test_id, row[1], row[2], row[3], row[4], row[5]))
Use copy_expert from `psycopg2.
import psycopg2
conn5 = psycopg2.connect(dbname='', user='',
password='', host='', port = '')
cursor5 = conn5.cursor()
with open("C:/Users/.../total_type19.csv", "r") as csv_file:
cursor5.copy_expert("COPY interaction_fillword FROM STDIN WITH CSV HEADER", csv_file)
The CSV HEADER will do a couple of things:
Skip the header line automatically.
Take empty non-quoted strings as NULL.
copy_expert uses the Postgres COPY to do bulk data import(or export) a lot quicker then inserting. The down side is that COPY is all or nothing, either the entire import/export succeeds or a single error will rollback the entire thing.

Insert values from csv file with no header to a SQL table with headers

I have a csv file with no headers and created one SQL table with fields A, B, C, D and E. I need to import the data from the csv file into the table on python.
file = open(path)
data = csv.reader(file)
cursor = connection.cursor()
query = '''INSERT INTO table (id1, id2, name, birthday, score) VALUES (?,?, ?,?,?)'''
cursor.executemany(insert_movies, data)
cursor.close()
connection.commit()
I have also tried to loop through the rows
file = open(path)
data = csv.reader(file)
cursor = connection.cursor()
query = '''INSERT INTO table (id1, id2, name, birthday, score) VALUES (?,?, ?,?,?)'''
for row in data:
cursor.executemany(query, row)
connection.commit()
I ran this on Jupyter instead of visual code and it ran. Actually the output of this table was feeding into another which was showing no elements. But after running on Jupyter the same code generated the table. Not sure what the actual cause was.

Insert data from csv to postgreSQL database via Python

I'm brand new to postgreSQL or SQL at all.
I'm trying to create a table in a database via Python and then load data from a .csv file into the table.
My code looks like this:
import csv
import psycopg2
#Establish connection to database
con = psycopg2.connect(
host = "localhost",
database = "kundeavgang",
user = "postgres",
password = "postgres",
)
#Cursor
cur = con.cursor()
#If a mistake is made, start from scratch
cur.execute("DROP TABLE IF EXISTS kundeavgang")
#Create table
cur.execute('''
CREATE TABLE "kundeavgang"(
"customerID" TEXT,
"gender" TEXT,
"SeniorCitizen" TEXT,
"Partner" TEXT,
"Dependents" TEXT,
"tenure" INT,
"PhoneService" TEXT,
"MultipleLines" TEXT,
"InternetService" TEXT,
"OnlineSecurity" TEXT,
"DeviceProtection" TEXT,
"TechSupport" TEXT,
"StreamingMovies" TEXT,
"Contract" TEXT,
"PaperlessBilling" TEXT,
"PaymentMethod" TEXT,
"MonthlyCharges" FLOAT,
"TotalCharges" FLOAT,
"Churn" TEXT
)
''')
#Acsess .csv file
with open('kundeavgang.csv') as csvFile:
reader = csv.reader(csvFile)
skipHeader = next(reader) #Account for header
for row in reader:
customerID = row[0]
gender = row[1]
SeniorCitizen = row[2]
Partner = row[3]
Dependents = row[4]
tenure = row[5]
PhoneService = row[6]
MultipleLines = row[7]
InternetService = row[8]
OnlineSecurity = row[9]
OnlineBackup = row[10]
DeviceProtection = row[11]
TechSupport = row[12]
StreamingTV = [13]
StreamingMovies = row[14]
Contract = row[15]
PaperlessBilling = row[16]
PaymentMethod = row[17]
MonthlyCharges = row[18]
TotalCharges = row[19]
Churn = row[20]
cur.execute('''INSERT INTO kundeavgang(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))
#Commit the transaction
con.commit()
#End connection
con.close()
In pgAdmin, the table comes up as existing in the database. However, I cannot find the actual table. Further, I have no idea about this line of code:
cur.execute('''INSERT INTO kundeavgang(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))
What does all the %s stand for? I found it off an online example which was not very helpful, so I tried it without knowing what it means. I have seen some examples where question marks are inserted instead, but also this without explanation.
Lastly, as the code stands now, I get the error message:
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''',(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn))
IndexError: tuple index out of range
All help or explanations will be appreciated.
For bulk inserts from text files, consider copy_from or copy_expert of psycopg2. Also, be sure to commit your execution:
cur.execute("DROP TABLE IF EXISTS kundeavgang")
con.commit()
cur.execute('''CREATE TABLE "kundeavgang" ... ''')
con.commit()
with open('kundeavgang.csv') as csvFile:
next(csvFile) # SKIP HEADERS
cur.copy_from(csvFile, "kundeavgang", sep=",")
# POSTGRES COPY COMMAND FOR CSV MODE
# cur.copy_expert("""COPY "kundeavgang" FROM STDIN WITH CSV""", csvFile)
con.commit()
The %s are placeholders for the values that will be inserted and passed through the following tuple:
(customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn)
The problem that your insert statement going to insert to 20 columns, you provide 20 values in your tuple but you have 22 placeholders (%s).
The problem is a mismatch between the number of columns to be populated and the length of the list provided. This is an easy mistake to make when dealing with a lot of columns. One way to reduce risk of error is to use the length of the columns or values list to build the statement.
cols = [name1, name2,...]
vals = [val1, val2, ...]
assert len(cols) == len(vals), 'mismatch between number of columns and number of values'
template = """INSERT INTO tbl ({}) VALUES ({})"""
stmt = template.format(', '.join(cols), ','.join(['%s'] * len(vals)))
cur.execute(stmt, vals)
Note that when building the column names dynamically it's good practice to quote them - psycopg2 provides tools for this.
Change the line reader = csv.reader(csvFile) to:
reader = csv.reader(csvFile, delimiter=';')

Python: Insert data to database from CSV and then selecting a generated UUID from the table

I have an Excel sheet that is to be inserted into a database. I wrote a Python script, which takes an Excel file, converts it into a CSV, and then inserts it to the database.
The problem is that the database contains two tables, where one of them has a unique ID which is auto generated and gets set when the data is inserted into the table. The other table uses this as a foreign key.
This is how my tables are created:
create table table (
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY NOT NULL,
foo1 varchar(255),
foo2 varchar(255),
foo3 varchar(255),
foo4 varchar(255)
);
create table another_table (
id uuid PRIMARY KEY references table (id)
foo1 varchar(255),
foo2 varchar(255)
);
This is the code I use to insert the data into the database:
with open(csv_file, 'rb') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
next(reader)
for row in reader:
cur.execute(
"INSERT INTO table (foo1, foo2, foo3, foo4) VALUES (%s, %s, %s, %s); ",
"INSERT INTO another_table (foo1, foo2) VALUES (%s, %s),
row
)
conn.commit()
This will insert data into the database, but the ID field in another_table will be empty. Does anyone know how I can acquire this ID and put it into the second table?
I was able to solve this myself without doing much tweaks with my code. I had to solve another problem with my code where several values in the csv file where null values, but converting to csv made it look like it was empty strings instead. By using pandas I was able to set all null values to "None", and afterwards cleaning each row before inserting it into the database:
with open(csv_file, 'rb') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
next(reader)
for row in reader:
clean_row = []
for x in row:
if x == "None":
clean_row.append(None)
else:
clean_row.append(x)
cur.execute(
"INSERT INTO table (foo1, foo2, foo3, foo4) VALUES (%s, %s, %s, %s); ",
"INSERT INTO another_table (foo1, foo2) VALUES (%s, %s),
clean_row
)
conn.commit()
The values from the csv is now put into an array which I can use in my query to ask table for it' id, like this:
with open(csv_file, 'rb') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
next(reader)
for row in reader:
clean_row = []
for x in row:
if x == "None":
clean_row.append(None)
else:
clean_row.append(x)
cur.execute(
"INSERT INTO table (foo1, foo2, foo3, foo4) VALUES (%s, %s, %s, %s); ",
"INSERT INTO another_table (foo1, foo2, id) VALUES (%s, %s, SELECT id FROM table WHERE "
"foo1 = '" + clean_row[0] + "' AND foo2 = '" + clean_row[1] + "')),
clean_row
)
conn.commit()
This will acquire the ID and put it into another_table, and can be done as long as u have unique values in table.

Trying to insert data into database but this error appears: KeyError: 'identifier' (Python)

What i want to accomplish is to grab data from API-endpoint and store it as CSV. Then import CSV file into SQLite database. I can fetch the data from API just fine and output in CSV file. But to store the data from CSV file into SQLite is the problem. I keep getting this error:
Traceback (most recent call last):
File "info.py", line 46, in <module>
i['col1'], i['col2'], i['col3'],
i['col4'], i['col5']) for i in dr]
File "info.py", line 46, in <listcomp>
i['col1'], i['col2'], i['col3'],
i['col4'], i['col5']) for i in dr]
KeyError: 'col1'
I wonder why this error appears?
If there any other solutions I'm gladly and open for help. The goal is to store the data from CSV file into SQLite database.
Here is what i have so far:
import requests
import csv
import sqlite3
con = sqlite3.connect("business.db")
cur = con.cursor()
cur.execute('DROP TABLE IF EXISTS firms')
cur.execute(
"CREATE TABLE firms (col1 PRIMARY KEY, col2 TEXT, col3 TEXT, "
"col4 TEXT, col5 TEXT,...);"
)
r = requests.get('http://url/url/url')
outfile = open(r"C:\Users\...\test.csv", "w")
outfile.write(r.text)
with open(r'C:\Users\...\...\test.csv', 'r') as fin:
dr = csv.DictReader(fin)
to_db = [(i['col1'], i['col2'], i['col3'], i['col4'], i['col5'], ...) for i in dr]
cur.executemany("INSERT INTO firms (col1, col2, col3, col4, col5, "...) VALUES (?, ?, ?, ?, ?, "
"...);", to_db)
con.commit()
con.close()
Output from this command (print(next(dr))):
OrderedDict([('col1;"col2";"col3";"col4";"col5";"..."', 'value1;"value2";
"value3";"value4";"value5";"..."')])
the delimiter in your csv file is ';' instead of the default ','. So it reads the entire row as one entry, since it's trying to split it at the non-existent commas.
use dr = csv.DictReader(fin, delimiter=';')

Categories

Resources