How efficently change fetched data from django raw query - python

For my project, I needed to run a raw query to increase performance. The problem is here that for one part I need to change value of fetched data to sth else(gregorian date to jalali) and this causes my performance to decrease a lot.
cursor.execute("select date,number from my_db)
while True:
row = cursor.fetchone()
if row is None:
break
data.append(row)
This section longs about 1 min for 4 million data, but I need to chanfe date like this:
cursor.execute("select date,number from my_db)
while True:
row = cursor.fetchone()
if row is None:
break
row = list(row)
row[0] = (jdatetime.datetime.fromgregorian(datetime=row[0]).strftime( '%y/%m/%d, %H:%m'))
data.append(row)
this causes my code to run in 7 min. I wonder if there is a way to do this change efficiently

First part should not take 1 min to finish. Instead of raw sql and a loop, do
data = YourModel.objects.values_list('date', 'number')
For the second part, you can make your own SQL function.
https://raresql.com/tag/iranian-calendar-to-georgian-calendar-in-sql-server/

Related

can I get only the updated data from database instead of all the data

I am using sqlite3 in python 3 I want to get only the updated data from the database. what I mean by that can be explained as follows: the database already has 2 rows of data and I add 2 more rows of data. How can I read only the updated rows instead of total rows
Note: indexing may not help here because the no of rows updating will change.
def read_all():
cur = con.cursor()
cur.execute("SELECT * FROM CVT")
rows = cur.fetchall()
# print(rows[-1])
assert cur.rowcount == len(rows)
lastrowids = range(cur.lastrowid - cur.rowcount + 1, cur.lastrowid + 1)
print(lastrowids)
If you insert rows "one by one" like that
cursor.execute('INSERT INTO foo (xxxx) VALUES (xxxx)')
You then can retrieve the last inserted rows id :
last_inserted_id = cursor.lastrowid
BUT it will work ONLY if you insert a single row with execute. It will return None if you try to use it after a executemany.
If you are trying to get multiple ids of rows that were inserted at the same time see that answer that may help you.

Is there some way to save a mysql columns value in a python var?

I'm trying to save a column value into a python variable; I query my DB for a int in a columns called id_mType and id_meter depending of a value that I get from a XML. To test it I do the next (I'm new using databases):
m = 'R1'
id_cont1 = 'LGZ0019800712'
xdb = cursor.execute("SELECT id_mType FROM mType WHERE m_symbol = %s", m)
xdb1 = cursor.execute("select id_meter from meter where nombre = %s",
id_cont1)
print (xdb)
print (xdb1)
I get every time the value "1" where the id_mType for 'R1' = 3 and id_meter= 7 for id_cont1 value. I need this to insert in another table (where there are both FK: id_meter and id_mType. Dont know if there is an easiest way)
You can store it in a list. Is that okay?
results=cursor.fetchall()
my_list=[]
for result in results:
my_list.append(result[0])
Now my_list should hold the SQL column you get returned with your query.
Use the fetchone() method to fetch a row from a cursor.
row = xdb.fetchone()
if row:
mtype = row[0]
row = xdb1.fetchone()
if row:
meter = row[0]

How to match two queries of postgres in python

I have two queries and I want to match the rows of those two queries. That is I want to execute the same number of rows in both the queries. Below code executes the number of dates of the present month and score I have to change it manually every day which is not possible
cursor.execute("select TO_CHAR(i :: DATE, 'dd/mm/yyyy') from generate_series(date_trunc('month', current_date), current_date, '1 day'::interval) i ")
# data = cursor.fetchone()
rows = cursor.fetchall()
labels6 = list()
i = 0
for row in rows:
labels6.append(row[i])
Above is the code which executes dates of the current month
cursor.execute("select score*100 from daily_stats1 where user_id=102")
rows = cursor.fetchall()
# Convert query to objects of key-value pairs
presentmonth1 = list()
i = 0
for row in rows[:28]:
presentmonth1.append(row[i])
Above is the code which executes present month score.'28' is given manually I have to change it every day which is not possible.so I want a solution where the date rows match with the score rows
I assume the excess indentation in your code is a mistake.
If that is the case, I think this will solve your problem:
cursor.execute("select TO_CHAR(i :: DATE, 'dd/mm/yyyy') from "
"generate_series(date_trunc('month', current_date), current_date, '1 day'::interval) i ")
labels6 = cursor.fetchall()
cursor.execute("select score*100 from daily_stats1 where user_id=102")
presentmonth1 = cursor.fetchall()[:len(labels6)]
I removed some unneeded code, but the result should be correct.

Trying to change values in a field with a cursor

We are trying to change the values in a column from a shapefile to create a new field/column with the new values. We've tried using a search and update cursor to accomplish this as well as a for loop and field calculator. Does anyone have an idea for how to go about accomplishing this in a simplistic way?
Here's a sample of our script in which we try to convert all of the items labeled "Montane_Chaparal" in the column to a new value "Dry_Forest" in the new column
veg = "Cal_Veg.shp"
field = ["WHRNAME"]
therows=arcpy.UpdateCursor(veg, field)
for therow in therows:
if (therow.getValue(field)==" "):
therow.setValue("Montane_Chaparral","Dry_Forest")
therows.updateRow(therow)
print "Done."
I have also tried:
veg = "Cal_Veg.shp"
with arcpy.da.SearchCursor(veg,["WHRNAME","NewVeg"]) as SCursor:
for row in SCursor:
if row[0] == "Montane Chaparral":
with arcpy.da.InsertCursor(veg,["NewVeg"]) as ICursor:
for new_row in ICursor:
NewVeg = "Dry Forest"
ICursor.insertRow(new_row)
print "done"
I would definitely recommend the more modern da Update Cursor for your script, for a variety of reasons, but mainly for speed and efficiency.
It appears as if you are incorrectly running a nested cursor where all you need is a single Update Cursor. Try this.
veg = "Cal_Veg.shp"
with arcpy.da.UpdateCursor(veg,["WHRNAME","NewVeg"]) as cursor:
for row in cursor:
if row[0] == "Montane Chaparral": # row[0] references "WHRNAME"
row[1] = "Dry Forest" # row[1] references "NewVeg"
cursor.updateRow(row)
print "done"

Python process abruptly killed during execution

I'm new to python and am facing what seems to be a memory leakage error.
I've written a simple script that is trying to fetch multiple columns from a postgres database and then proceeds to perform simple subtraction on these columns and store the result in a temporary variable which is being written to a file. I need to do this on multiple pairs of columns from the db and I'm using a list of lists to store the different column names.
I'm loop over the individual elements of this list until the list is exhausted. While I'm getting valid results(by valid I mean that the output file contains the expected values) for the first few column pairs, the program abruptly gets "Killed" somewhere in between execution. Code below:
varList = [ ['table1', 'col1', 'col2'],
['table1', 'col3', 'col4'],
['table2', 'col1', 'col2'],
# ..
# and many more such lines
# ..
['table2', 'col3', 'col4']]
try:
conn = psycopg2.connect(database='somename', user='someuser', password='somepasswd')
c = conn.cursor()
for listVar in varList:
c.execute("SELECT %s FROM %s" %(listVar[1], listVar[0]))
rowsList1 = c.fetchall();
c.execute("SELECT %s FROM %s" %(listVar[2], listVar[0]))
rowsList2 = c.fetchall();
outfile = file('%s__%s' %(listVar[1], listVar[2]), 'w')
for i in range(0, len(rowsList1)):
if rowsList1[i][0] == None or rowsList2[i][0] == None:
timeDiff = -1
else:
timestamp1 = time.mktime(rowsList1[i][0].timetuple())
timestamp2 = time.mktime(rowsList2[i][0].timetuple())
timeDiff = timestamp2 - timestamp1
outfile.write(str(timeDiff) + '\n')
outfile.close();
del rowsList1, rowsList2
#numpy.savetxt('output.dat', column_stack(rows))
except psycopg2.DatabaseError, e:
print 'Error %s' % e
sys.exit(1)
finally:
if conn:
conn.close()
My initial guess was that there was some form of memory leak and in an attempt to fix this, I added a del statement on the two large arrays hoping that the memory gets properly collected. This time, I got slightly better outputs(by slightly better I mean that more output files were created for the db column pairs).
However, after the 10th or 11th pair of columns, my program was "Killed" again. Can someone tell me what could be wrong here. Is there a better way of getting this done?
Any help is appreciated.
PS: I know that this is a fairly inefficient implementation as I'm looping many times, but I needed something quick and dirty for proof of concept.
I think the problem here is you are selecting everything and then filtering it in the application code when you should be selecting what you want with the sql query. If you select what you want in the sql query like this:
for listvar in varlist:
select listvar[1], listvar[2] from listvar[0] where listvar[1] is not null and listvar[2] is not null
# then...
timeDiff = {}
for row in rows:
timestamp1 = time.mktime(row[0].timetuple())
timestamp2 = time.mktime(row[0].timetuple())
timeDiff[identifier] = timestamp2 - timestamp1 #still need to assoc timediff with row... maybe you need to query a unique identifyer also?
#and possibly a separate... (this may not be necessary depending on your application code. do you really need -1's for irrelevant data or can you just return the important data?)
select listvar[1], listvar[2] from listvar[0] where listvar[1] is null or listvar[2] is null
for row in rows:
timeDiff[identifier] = -1 # or None

Categories

Resources