I'm using a simple script to pull data from an Oracle DB and write the data to a CSV file using the CSV writer.
The table i'm querying contains about 25k records, the script runs perfectly except for its actually very slow. It takes 25 minutes to finish.
In what way could i speed up this by altering the code? Any tips from you heroes are welcome.
#
# Load libraries
#
from __future__ import print_function
import cx_Oracle
import time
import csv
#
# Connect to Oracle and select the proper data
#
con = cx_Oracle.connect('secret')
cursor = con.cursor()
sql = "select * from table"
#
# Determine how and where the filename is created
#
path = ("c:\\path\\")
filename = time.strftime("%Y%m%d-%H%M%S")
extentionname = (".csv")
csv_file = open(path+filename+extentionname, "w")
writer = csv.writer(csv_file, delimiter=',', lineterminator="\n",
quoting=csv.QUOTE_NONNUMERIC)
r = cursor.execute(sql)
for row in cursor:
writer.writerow(row)
cursor.close()
con.close()
csv_file.close()
Did you try using writerows function from csv module? Instead of writing each record one by one, it gives provision to write all at once. This should fasten up the things.
data = [] #data rows
with open('csv_file.csv', 'w') as csv_file:
writer = csv.DictWriter(csv_file)
writer.writeheader()
writer.writerows(data)
Alternatively, you can also use pandas module to write big chunk of data to CSV file. This method is explained with examples here.
Related
I am trying to debug this issue that I have been having for a couple of weeks now. I am trying to copy the result of a query in a Postgresql db into a csv file using psycopg2 and copy expert, however when my script finishes running, sometimes I end up with less rows than if I ran the query directly into the db using pgAdmin. This is the code that runs the query and saves it into a csv:
cursor = pqlconn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
fd = open("query.sql", 'r')
sql_query = fd.read()
fd.close()
csv_path = 'test.csv'
query = "copy (" + sql_query + \
") TO STDOUT WITH (FORMAT csv, DELIMITER ',', HEADER)"
with open(csv_path, 'w', encoding='utf-8') as f_output:
cursor.copy_expert(query, f_output)
print("Saved information to csv: ", csv_path)`
When it runs I will sometimes end up with less rows than if I ran it directly on the db, running it again still returns less rows than what I am already seeing in the db directly. Would appreciate any guidance on this, thanks!
I have a large json data file with 3.7gb. Iam going to load the json file to dataframe and delete unused columns than convert it to csv and load to sql.
ram is 40gb
My json file structure
{"a":"Ho Chi Minh City, Vietnam","gender":"female","t":"841675194476","id":"100012998502085","n":"Lee Mến"}
{"t":"84945474479","id":"100012998505399","n":"Hoàng Giagia"}
{"t":"841679770421","id":"100012998505466","n":"Thoại Mỹ"}
I try to load data but it fails because of out of memory
data_phone=[]
with open('data.json', 'r', encoding="UTF-8") as f:
numbers = ijson.items(f, 't',multiple_values=True)
for num in numbers :
data_phone.append(num)
It shows errors
Out of memory
I try another way
import json
fb_data={}
i=1
with open('output.csv', 'w') as csv_file:
with open("Vietnam_Facebook_Scrape.json", encoding="UTF-8") as json_file:
for line in json_file:
data = json.loads(line)
try:
csv_file.write('; '.join([str(i),"/",data["t"],data["fbid"]]))
except:
pass
Then I convert from csv to sql, it still show error "MemoryError:"
con = db.connect("fbproject.db")
cur = con.cursor()
with open('output.csv', 'r',encoding="UTF-8") as csv_file:
for item in csv_file:
cur.execute('insert into fbdata values (?)', (item,))
con.commit()
con.close()
Thanks for reading
Your proposal is:
Step 1 read json file
Step 2 load to dataframe
Step 3 save file as a csv
Step 4 load csv to sql
Step 5 load data to django to search
The problem with your second example is that you still use global lists (data_phone, data_name), which grow over time.
Here's what you should try, for huge files:
Step 1 read json
line by line
do not save any data into a global list
write data directly into SQL
Step 2 Add indexes to your database
Step 3 use SQL from django
You don't need to write anything to CSV. If you really want to, you could simply write the file line by line:
import json
with open('output.csv', 'w') as csv_file:
with open("Vietnam_Facebook_Scrape.json", encoding="UTF-8") as json_file:
for line in json_file:
data = json.loads(line)
csv_file.write(';'.join([data['id'], data['t']]))
Here's a question which might help you (Python and SQLite: insert into table), in order to write to a database row by row.
If you want to use your CSV instead, be sure that the program you use to convert CSV to SQL doesn't read the whole file but parse it line by line or in batch.
I am new in python and trying to write a code in it. I am trying to run a select query but i am not able to to render a data to csv file ?
this is the psql query :
# \copy (
# SELECT
# sr.imei,
# sensors.label,sr.created_at,
# sr.received_at,
# sr.type_id,
#
but How to write it in python to render it to csv file ?
thanking you,
Vikas
sql = "COPY (SELECT * FROM sensor_readings WHERE reading=blahblahblah) TO STDOUT WITH CSV DELIMITER ';'"
with open("/tmp/sensor_readings.csv", "w") as file:
cur.copy_expert(sql, file)
I think you just need to change the sql for your use, and it should work.
Install psycopg2 via pip install psycopg2 than you need something like this
import csv
import psycopg2
query = """
SELECT
sr.imei,
sensors.label,sr.created_at,
sr.received_at,
sr.type_id,
sr.data FROM sensor_readings as sr LEFT JOIN sensors on sr.imei = sensors.imei
WHERE sr.imei not like 'test%' AND sr.created_at > '2019-02-01'
ORDER BY sr.received_at desc
"""
conn = psycopg2.connect(database="routing_template", user="postgres", host="localhost", password="xxxx")
cur = conn.cursor()
cur.execute(query)
with open('result.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
for row in cur.fetchall():
writer.writerow(row)
cur.close()
conn.close()
So I have a working piece of code which creates and modifies data in a SQL table. I now want to transfer all the data in the SQL table to a Excel file. Which libraries would I use and what functions in those libraries would I use?
an example of database with sqlite: memory.db
and table name is called table1 in the example
import os
import csv
import sqlite3
def db2csv(file,Table1):
con = sqlite3.connect("memory.db")
cur = con.cursor()
if not os.path.exists(file):
os.makedirs(file)
with open(file, 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in cur.execute('SELECT * FROM Table1 '):
spamwriter.writerow(row)
con.commit()
I am trying to migrate some code from Python 2 to Python 3 and cannot figure out why it is printing one character at a time as if it is reading the file as one long string.
I have been looking into it and maybe a need to use newline='' when opening the file?
But how can I do that when using urlopen()?
import csv
import urllib.request
url = "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = ftpstream.read().decode('utf-8')
csvfile = csv.reader(csvfile, delimiter=',')
for row in csvfile:
print(row)
Try to change
csvfile = ftpstream.read().decode('utf-8')
to
csvfile = ftpstream.read().decode('utf-8').split('\r')