Python, converting CSV file to SQL table - python

I have a CSV file without headers and am trying to create a SQL table from certain columns in the file. I tried the solutions given here: Importing a CSV file into a sqlite3 database table using Python,
but keep getting the error that col1 is not defined. I then tried inserting headers in my CSV file and am still getting a KeyError.
Any help is appreciated! (I am not very familiar with SQL at all)

If the .csv file has no headers, you don't want to use DictReader; DictReader assumes line 1 is a set of headers and uses them as keys for every subsequent line. This is probably why you're getting KeyErrors.
A modified version of the example from that link:
import csv, sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE t (col1, col2);")
with open('data.csv','rb') as fin:
dr = csv.reader(fin)
dicts = ({'col1': line[0], 'col2': line[1]} for line in dr)
to_db = ((i['col1'], i['col2']) for i in dicts)
cur.executemany("INSERT INTO t (col1, col2) VALUES (?, ?);", to_db)
con.commit()

This below code, will read all the csv files from the path and load all the data into table present in sqllite 3 database.
import sqllite3
import io
import os.path
import glob
cnx = sqlite3.connect(user='user', host='localhost', password='password',
database='dbname')
cursor=cnx.cursor(buffered= True);
path ='path/*/csv'
for files in glob.glob(path + "/*.csv"):
add_csv_file="""LOAD DATA LOCAL INFILE '%s' INTO TABLE tabkename FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;;;""" %(files)
print ("add_csv_file: %s" % files)
cursor.execute(add_csv_file)
cnx.commit()
cursor.close();
cnx.close();
Let me know if this works.

Related

How to ignore duplicate keys using the psycopg2 copy_from command copying .csv file into postgresql database

I'm using Python. I have a daily csv file that I need to copy daily into a postgresql table. Some of those .csv records may be same day over day so I want to ignore those, based on a primary key field. Using cursor.copy_from,Day 1 all is fine, new table created. Day 2, copy_from throws duplicate key error (as it should), but copy_from stops on 1st error. Is there a copy_from parameter that would ignore the duplicates and continue? If not, any other recommendations other than copy_from?
f = open(csv_file_name, 'r')
c.copy_from(f, 'mytable', sep=',')
This is how I'm doing it with psycopg3.
Assumes the file is in the same folder as the script and that it has a header row.
from pathlib import Path
from psycopg import sql
file = Path(__file__).parent / "the_data.csv"
target_table = "mytable"
conn = <your connection>
with conn.cursor() as cur:
# Create an empty table with the same columns as target_table.
cur.execute(f"CREATE TEMP TABLE tmp_table (LIKE {target_table})")
# The csv file imports as text.
# This approach tells postgres how to convert text to the proper column types.
column_types = sql.Identifier(target_table)
query = sql.SQL("COPY tmp_table FROM STDIN WITH(FORMAT csv, HEADER true)")
typed_query = query.format(column_types)
with cur.copy(typed_query) as copy:
with file.open() as csv_data:
copy.write(csv_data.read())
cur.execute(
f"INSERT INTO {target_table} SELECT * FROM tmp_table ON CONFLICT DO NOTHING"
)

Retrieve zipped file from bytea column in PostgreSQL using Python

I have a table in my PostgreSQL database in which a column type is set to bytea in order to store zipped files.
The storing procedure works fine. I have problems when I need to retrieve the zipped file I uploaded.
def getAnsibleByLibrary(projectId):
con = psycopg2.connect(
database="xyz",
user="user",
password="pwd",
host="localhost",
port="5432",
)
print("Database opened successfully")
cur = con.cursor()
query = "SELECT ansiblezip FROM library WHERE library.id = (SELECT libraryid from project WHERE project.id = '"
query += str(projectId)
query += "')"
cur.execute(query)
rows = cur.fetchall()
repository = rows[0][0]
con.commit()
con.close()
print(repository, type(repository))
with open("zippedOne.zip", "wb") as fin:
fin.write(repository)
This code creates a zippedOne.zip file but it seems to be an invalid archive.
I tried also saving repository.tobytes() but it gives the same result.
I don't understand how I can handle memoriview objects.
If I try:
print(repository, type(repository))
the result is:
<memory at 0x7f6b62879348> <class 'memoryview'>
If I try to unzip the file:
chain#wraware:~$ unzip zippedOne.zip
The result is:
Archive: zippedOne.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of zippedOne.zip or
zippedOne.zip.zip, and cannot find zippedOne.zip.ZIP, period.
Trying to extract it in windows gives me the error: "The compressed (zipped) folder is invalid"
This code, based on the example in the question, works for me:
import io
import zipfile
import psycopg2
DROP = """DROP TABLE IF EXISTS so69434887"""
CREATE = """\
CREATE TABLE so69434887 (
id serial primary key,
ansiblezip bytea
)
"""
buf = io.BytesIO()
with zipfile.ZipFile(buf, mode='w') as zf:
zf.writestr('so69434887.txt', 'abc')
with psycopg2.connect(database="test") as conn:
cur = conn.cursor()
cur.execute(DROP)
cur.execute(CREATE)
conn.commit()
cur.execute("""INSERT INTO so69434887 (ansiblezip) VALUES (%s)""", (buf.getvalue(),))
conn.commit()
cur.execute("""SELECT ansiblezip FROM so69434887""")
memview, = cur.fetchone()
with open('so69434887.zip', 'wb') as f:
f.write(memview)
and is unzippable (on Linux, at least)
$ unzip -p so69434887.zip so69434887.txt
abc
So perhaps the data is not being inserted correctly.
FWIW I got the "End-of-central-directory signature not found" until I made sure I closed the zipfile object before writing to the database.

Python Error - no viable alternative input when trying to insert values from file

I'm trying to insert some values from a csv file through Python but I'm getting a no viable alternative at input error. When I specify the values instead of %s the code works but when I try to use %s it fails. This is my code:
import jaydebeapi
import jpype
import pyodbc
import pandas as pd
import csv
conn = pyodbc.connect("myconnection")
cursor = conn.cursor()
with open('/Users/user/Desktop/TEST.csv') as f:
reader = csv.reader(f)
for row in reader:
cursor.execute("INSERT INTO mytable (user_id, email) VALUES(%s,%s)", row)
#close the connection to the database.
mydb.commit()
cursor.close()

trying to import csv file into postgresql database through python3

This is my cs50w project here i'm trying to import books.csv file into the postgresql database but i'm getting some errors, i think i'm having some problem with my script can someone correct it...
import psycopg2
import csv
#For connecting to the database
conn = psycopg2.connect("host=hostname_here port=5432 dbname=dbname_here user=username_here password=pass_here")
cur = conn.cursor()
#importing csv file
with open('books.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
cur.execute("INSERT INTO book VALUES (%s, %s, %s, %s)",
row
)
conn.commit()
Traceback (most recent call last):
File "import.py", line 15, in <module>
row
psycopg2.errors.SyntaxError: INSERT has more expressions than target columns
LINE 1: INSERT INTO book VALUES ('0380795272', 'Krondor: The Betraya...
sample of csv file :
sample of csv file :
INSERT has more expressions than target columns.
You are trying to insert a row with 4 values in a table that has less than 4 columns.
However, if the table indeed has 4 columns, you need to review your data source (books.cvs.) The source data may have some single quotes or commas. Either remove the problematic data from the file or modify your program to handle the data correctly.
Problem solved in my postgres table i set isbn to integer but i didnt see the alphabets with it now i changed isbn column to varchar and problem is solved

insert into mysql table using python from text file

import MySQLdb
db = MySQLdb.connect(host="?",
user="root",
passwd="?",
db="test")
cursor = db.cursor()
file = open('...../EM.txt', 'r')
file_content = file.read()
file.close()
query = "INSERT INTO EM VALUES (%s,%s,%s,%s,%s,%s)"
cursor.execute(query, (file_content,))
db.commit()
db.close()
I have try this code to read from a text file and insert into EM table ...can any one help me make this work?
I have no idea how your text file is formatted, but file.read() gives you the whole file as a single string and it seems like you have six fields to fill. So maybe the file consists of 6 tab- or space-separated fields?
First, split the file into lines with file.readlines() instead of file.read(). Next, build a list of rows that you can feed into executemany:
values = [line.split() for line in file_content]
cursor.executemany(query, values)
The split method splits the lines in the file on whitespace into a tuple, e.g., the string a b c is turned into the tuple ('a', 'b', 'c'), so that the list comprehension produces a list of tuples that can be fed into cursor.executemany to perform a bulk insert.
As #Evan points out, you also have to specify the columns the values are associated to in your SQL query, e.g.,
INSERT INTO EM (field, spam, ham, eggs, price, ni) VALUES (%s, %s, %s, %s, %s, %s)
Are you trying to insert the entire text file into a string column in the database, or do you want to import tabular data into the database? It seems like you want to import tabular data based on the way you specified the values, so I'll use that as my assumption.
To do this, you need to read each row from your file and insert it into the database one row at a time. You also need to update your query syntax to specify the column names you are inserting into.
import csv
import MySQLdb
db = MySQLdb.connect(host='localhost',
user='root',
passwd='?',
db='test')
cursor = mydb.cursor()
with open('...../EM.txt') as f:
reader = csv.reader(f)
for row in reader:
cursor.execute("""INSERT INTO testcsv(col1, col2, col3, col4, col5, col6)
VALUES(%s, %s, %s, %s, %s, %s)
""", row)
#close the connection to the database.
mydb.commit()
cursor.close()
print "Done"
i think your problem is readlines() and values = [line.split() for line in file_content]
To start off in order for python to communicate with your SQl database you have to use an adapter such as psycopg2. Once you have installed psycopg2 you have to import it. Another segment that would also improve the simplicity and functionality of your code, is by using a csv reader. This can be done by simply importing csv. SQL databases are stored as a csv file so using this will help you read csv files while using python.
You can find more about using csv reader here:
https://docs.python.org/2/library/csv.html
In terms of your code try this:
import MySQLdb
import psycopg2
db = MySQLdb.connect(host="?",
user="root",
passwd="?",
db="test")
cursor = db.cursor()
file = open('...../EM.txt', 'r')
file_content = file.read()
file.close()
query = "INSERT INTO EM VALUES (%s,%s,%s,%s,%s,%s)"
cursor.execute(query, (file_content,))
db.commit()
db.close()
By installing and importing psycopg2 will allow python to understand the SQL quires you are trying to execute.

Categories

Resources