I am trying to follow one copy_from example describe in stackoverflow but i modify little as i need to read data from csv file. Following this example i wrote a small program where the file is to be readed from file stored in disk and then copy data from that file to created table, My code is :
def importFile():
path = "C:\myfile.csv"
curs = conn.cursor()
curs.execute("Drop table if exists test_copy; ")
data = StringIO.StringIO()
data.write(path)
data.seek(0)
curs.copy_from(data, 'MyTable')
print("Data copied")
But i get error,
psycopg2.DataError: invalid input syntax for integer:
Does this mean there is mismatch between csv file and my table? OR is this syntax enough in order to copy csv file? or I need some more code ?? I am new to python, so any help will be appreciated..
Look at your .csv file with a text editor. You want to be sure that
the field-separator is a tab character
there are no quote-chars
there is no header row
If this is true, the following should work:
import psycopg2
def importFromCsv(conn, fname, table):
with open(fname) as inf:
conn.cursor().copy_from(inf, table)
def main():
conn = ?? # set up database connection
importFromCsv(conn, "c:/myfile.csv", "MyTable")
print("Data copied")
if __name__=="__main__":
main()
Related
I have this simple sql query in a sql file:
SELECT * FROM public.flores_comahue
WHERE codigo_postal::int > 7000
in this case i need to replace the number 7000 but it could be any other number.
I tried this, but obviously didn't work:
fin = open("prueba.sql", "r")
fout = open("prueba.sql", "w")
for line in fin:
for i in line:
if isinstance(i, int):
fout.write(fout.replace(i, 5))
fin.close()
fout.close()
I would really apreciate your help
When you open a file in w mode, the file is truncated. So you emptied the file before you read it.
You should do the read and write as separate steps -- first read the whole thing, then open it for writing.
Another problem is your for i in line: loop. line is a string, so i is a character (a string with one element). It will never be an int.
You can use a regular expression to find a number and replace it.
import re
with open("prueba.sql", "r") as fin:
contents = fin.read()
contents = re.sub(r'\b\d+\b', '5000', contents)
with open("prueba.sql", "w") as fout:
fout.write(contents)
If you are running the query from a script a function like the bellow may be a more useful way to change the query
import pyodbc
def run_query(number):
query = f"SELECT * FROM public.flores_comahue WHERE codigo_postal::int > {number}"
conn = pyodbc.connect(server_connection) # some connection
results = pd.read_sql_query(query, conn) # run query
conn.close()
return restults
this is just an example of how this could be done but in general constructing a string of the query should solve your issue
I have to store a small PDF file in a Postgres database (already have a table ready with a bytea column for the data), then be able to delete the file, and use the data in the database to restore the PDF as it was.
For context, I'm working with FastApi in Python3 so I can get the file as bytes, or as a whole file. So the main steps are:
Getting the file as bytes or a file via FastAPI
Inserting it into the Postgres DB
Retrieve the data in the DB
Make a new PDF file with the data.
How can I do that in a clean way?
The uploading function from FastAPI :
def import_data(file: UploadFile= File(...)):
# Put the whole data into a variable as bytes
pdfFile = file.file.read()
database.insertPdfInDb(pdfFile)
# Saving the file we just got to check if it's intact (it is)
file_name = file.filename.replace(" ", "-")
with open(file_name,'wb+') as f:
f.write(pdfFile)
f.close()
return {"filename": file.filename}
The function inserting the data into the Postgres DB :
def insertPdfInDb(pdfFile):
conn = connectToDb()
curs = conn.cursor()
curs.execute("INSERT INTO PDFSTORAGE(pdf, description) values (%s, 'Some description...')", (psycopg2.Binary(pdfFile),))
conn.commit()
print("PDF insertion in the database attempted.")
disconnectFromDb(conn)
return 0
# Saving the file we just got to check if it's intact (it is)
file_name = file.filename.replace(" ", "-")
with open(file_name,'wb+') as f:
f.write(pdfFile)
f.close()
return {"filename": file.filename}
The exporting part is just started and entirely try-and-error code.
I have a large json data file with 3.7gb. Iam going to load the json file to dataframe and delete unused columns than convert it to csv and load to sql.
ram is 40gb
My json file structure
{"a":"Ho Chi Minh City, Vietnam","gender":"female","t":"841675194476","id":"100012998502085","n":"Lee Mến"}
{"t":"84945474479","id":"100012998505399","n":"Hoàng Giagia"}
{"t":"841679770421","id":"100012998505466","n":"Thoại Mỹ"}
I try to load data but it fails because of out of memory
data_phone=[]
with open('data.json', 'r', encoding="UTF-8") as f:
numbers = ijson.items(f, 't',multiple_values=True)
for num in numbers :
data_phone.append(num)
It shows errors
Out of memory
I try another way
import json
fb_data={}
i=1
with open('output.csv', 'w') as csv_file:
with open("Vietnam_Facebook_Scrape.json", encoding="UTF-8") as json_file:
for line in json_file:
data = json.loads(line)
try:
csv_file.write('; '.join([str(i),"/",data["t"],data["fbid"]]))
except:
pass
Then I convert from csv to sql, it still show error "MemoryError:"
con = db.connect("fbproject.db")
cur = con.cursor()
with open('output.csv', 'r',encoding="UTF-8") as csv_file:
for item in csv_file:
cur.execute('insert into fbdata values (?)', (item,))
con.commit()
con.close()
Thanks for reading
Your proposal is:
Step 1 read json file
Step 2 load to dataframe
Step 3 save file as a csv
Step 4 load csv to sql
Step 5 load data to django to search
The problem with your second example is that you still use global lists (data_phone, data_name), which grow over time.
Here's what you should try, for huge files:
Step 1 read json
line by line
do not save any data into a global list
write data directly into SQL
Step 2 Add indexes to your database
Step 3 use SQL from django
You don't need to write anything to CSV. If you really want to, you could simply write the file line by line:
import json
with open('output.csv', 'w') as csv_file:
with open("Vietnam_Facebook_Scrape.json", encoding="UTF-8") as json_file:
for line in json_file:
data = json.loads(line)
csv_file.write(';'.join([data['id'], data['t']]))
Here's a question which might help you (Python and SQLite: insert into table), in order to write to a database row by row.
If you want to use your CSV instead, be sure that the program you use to convert CSV to SQL doesn't read the whole file but parse it line by line or in batch.
I want to fetch the SQL query from a text file and run it in Python program. This is my code:
csvfilelist=os.listdir(inputPath)
mycursor = mydb.cursor()
for csvfilename in csvfilelist:
with open(inputPath + csvfilename, 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
'''r = "INSERT INTO Terminate.RAW VALUES('%s','%s','%s','%s','%s')" %(row[0],row[1],row[2],row[3],row[4],row[5])'''
try:
result = mycursor.execute(r)
mydb.commit()
except mysql.connector.Error as err:
print(err)
csvFile.close()
Say you have a INI file containing the query
[main]
query=INSERT INTO Terminate.RAW VALUES('%s','%s','%s','%s','%s')
you may load it
config = configparser.ConfigParser()
config.read('myfile.ini')
query = config['main']['query']
and later you can call it with
r = query % (row[0],row[1],row[2],row[3],row[4],row[5])
As pointed out in comments, using "%" in queries is not a good solution, you should bind your variables when executing the query. I don't remember the exact syntax, it's something like
r = query
mycursor.execute(r, (row[0],row[1],row[2],row[3],row[4],row[5]))
Edit: sorry, I just read that your file is JSON, not INI. You wrote that in the title, not in the post. If so, you should use the json module instead of configparser module.
I am trying to load data to oracle table. here is my previous question where i have posted whole code how I am doing that .
this is the code:
def Create_list():
reader = csv.reader(open("Query_result_combined.csv","r"))
lines=[]
print("Creating a list")
for line in reader:
lines.append(line)
return lines
def Insert_data():
#connection logic goes here
print("Connecting Now!!")
#conn = cx_Oracle.connect(connstr)
con = cx_Oracle.connect(db_user,db_password,db_connection_name)
print("Connected to Oracle!!")
lines=Create_list()
cur=con.cursor()
print("Inserting data")
for line in lines:
cur.execute("INSERT INTO A608232_QUERY_RESULT (InteractionId,QueryId,Score,StartOffsetInMs,EndOffsetInMs,SpeakerRole,QueryIdentity,SpeakerId) VALUES(:1,:2,:3,:4,:5,:6,:7,:8)",line)
con.commit ()
cur.close()
print("completed")
So i rectified all aproches suggested in answer to my question so that error has been solved now i am getting this new error ORA-01722: invalid number.
when i loaded the data directly in to oracle using import option that data is getting loaded. When i am trying to read the same file in this code and trying to push the data i am getting this error.
I print lines[:1] and lines[:2] this is the output I get:
[['InteractionId', 'QueryId', 'Score', 'StartOffsetInMs', 'EndOffsetInMs',
'SpeakerRole', 'QueryIdentity', 'SpeakerId']]
[['InteractionId', 'QueryId', 'Score', 'StartOffsetInMs', 'EndOffsetInMs',
'SpeakerRole', 'QueryIdentity', 'SpeakerId'], ['34118470', '27', '45.63345',
'89900', '90980', 'U', 'e54fd492-8877-4534-997b-9dbe9a8fbd74', '']]
Inserting data
can someone please point out the mistake that i am doing in the code
You have a title line in your csv file, which is not part of the numerical data. You have to skip it using next(reader) just after mapping the csv.reader on the file handle.
Aside: use a context manager to ensure the file will be closed, and, no need for a loop, just convert the row iterator to a list (when the first line has been skipped) to read all the lines (will be faster)
def Create_list():
with open("Query_result_combined.csv","r") as f:
reader = csv.reader(f)
next(reader) # skips title line
return list(lines) # directly iterate on the rest of the lines