how to make a rowcount in ponyorm? Python - python

I am using this Python ORM to manage my database in my application: ponyorm.com
I just changed an attribute in my table, I would make a rowcount to me that he return 1 for TRUE and 0 for FALSE.
Ex: Using sqlite3 only I would do so:
user = conn.execute('SELECT * FROM users')
count = user.rowcount
if count == 1:
print('Return %d lines' %count)
else:
print('Bad....return %d lines', %count)

Using the rowcount attribute is usually not the right way to count number of rows. According to SQLite documentation regarding rowcount attribute,
Although the Cursor class of the sqlite3 module implements this attribute, the database engine’s own support for the determination of “rows affected”/”rows selected” is quirky.
If you use SQL, the standard way to get count of rows is to use COUNT(*) function. In SQLite it may be achieved in the following way:
cursor = conn.cursor()
cursor.execute('SELECT COUNT(*) FROM users')
rowcount = cursor.fetchone()[0]
With PonyORM you can do count in three alternative ways.
rowcount = select(u for u in User).count() # approach 1
rowcount = User.select().count() # approach 2
rowcount = count(u for u in User) # approach 3
All lines above should produce the same query. You can choose the line which looks the most intuitive to you.
If you want to count not all rows, but only specific ones, you can add condition to the query. For example, to count the number of products which price is greater than 100 you can write any of the following lines:
rowcount = select(p for p in Product if p.price > 100).count() # approach 1
rowcount = Product.select(lambda p: p.price > 100).count() # approach 2
rowcount = count(p for p in Product if p.price > 100) # approach 3
Also you may want to count not the number of rows, but number of different values in specific column. For example, the number of distinct user countries. This may be done in the following way:
user_countries_count = select(count(u.country) for u in User).get()
Hope I answered your question.

Related

psycopg2 - sum of prices

I seem to have relatively easy question, but I have a little problem. I would like to iterr through the column prices in table products and then sum the prices.
I know an easy solution would be to change sql query -> sum(price), but in my exercise I need to avoid this solution.
import psycopg2
connection = psycopg2.connect(
host='host',
user='user',
password='password',
dbname='dbname',
)
cursor = connection.cursor()
sql = "select price from products"
cursor.execute(sql)
for price in cursor:
print(sum(price))
figured it out:
sum = 0
for price in cursor:
sum = sum + price[0]
print(sum)
You can iterate over the cursor directly:
column_sum = sum(row[0] for row in cursor)
Or you can use one of the various "fetch" methods to access the results and save them to a variable first.
Your cursor has 3 methods for fetching:
fetchone() returns a single row
fetchmany(n) returns a list of n many rows
fetchall() returns a list of all rows
results = cursor.fetchall()
column_sum = sum(row[0] for row in results)
Note that in all cases one row of data is a tuple. This is the case even if you're only selecting one column (a 1-tuple).

can I get only the updated data from database instead of all the data

I am using sqlite3 in python 3 I want to get only the updated data from the database. what I mean by that can be explained as follows: the database already has 2 rows of data and I add 2 more rows of data. How can I read only the updated rows instead of total rows
Note: indexing may not help here because the no of rows updating will change.
def read_all():
cur = con.cursor()
cur.execute("SELECT * FROM CVT")
rows = cur.fetchall()
# print(rows[-1])
assert cur.rowcount == len(rows)
lastrowids = range(cur.lastrowid - cur.rowcount + 1, cur.lastrowid + 1)
print(lastrowids)
If you insert rows "one by one" like that
cursor.execute('INSERT INTO foo (xxxx) VALUES (xxxx)')
You then can retrieve the last inserted rows id :
last_inserted_id = cursor.lastrowid
BUT it will work ONLY if you insert a single row with execute. It will return None if you try to use it after a executemany.
If you are trying to get multiple ids of rows that were inserted at the same time see that answer that may help you.

How to get last record in sqlite3 db with where condition

I am trying search records in sqlite3 table to get last record inserted with where condition, but I can do it with only one condition WHERE CODE = df = "DS3243". But what I want to do is with multiple WHERE conditions jf = "QS2134", df = "DS3243", sf = "MS5787", so that I can get the last record inserted with the codes provided.
DEMONTSTRATION
CODE POINT
QS2134 1000
DS3244 2000
MS5787 3000
QS2134 130
QS2134 200 # want to get this because it last with such code
DS3244 300
MS5787 4500
DS3244 860 # want to get this because it last with such code
MS5787 567
MS5787 45009 # want to get this because it last with such code
Am able to do for only one variable cur.execute("SELECT * FROM PROFILE WHERE CODE=? ORDER BY POINT ASC LIMIT 1 ",(df,)) but i want to do for multiple varaiables.
import sqlite3
jf = "QS2134"
df = "DS3243"
sf = "MS5787"
con = sqlite3.connect("TEST.db")
cur = con.cursor()
cur.execute("SELECT * FROM PROFILE WHERE CODE=? ORDER BY POINT ASC LIMIT 1 ",(df,)) # limit one means last one
rows = cur.fetchall()
for row in rows:
print(row)
con.commit()
con.close()
I'm not sure I understand your question, but is it possible that you meant that you want to group the results?
Is it "group by" clause that you're looking for?
Something like:
select CODE, MAX(POINT) group by CODE;
I think you are simply trying to extend your query, in which case, why don't you try string formatting?
x = "SELECT * FROM my_table where col1 = '{0}' or col2 ='{1}';".format(var_1, var_2)
cur.execute(x)
That way you can extend your query with as many conditions as you like.

Fast complex SQL queries on PostgreSQL database with Python

I have a large dataset with +50M records in a PostgreSQL database that require massive calculations, inner join.
Python is the tool of choice with Psycopg2.
Running the process with fetchmany of 20,000 records takes a couple of hours to finish.
The execution needs to take place sequentially, as in each record of the 50M needs to be fetched separately, then another query (in the below example) needs to run before a result is returned and saved in a separate table.
Indexes are properly configured on each table (5 tables in total) and the complex query (that returns a calculated value - example below) takes around 240MS to return results (when the database is not under load).
Celery is used to take care of database inserts of the calculated values in a separate table.
My question is about common strategies to reduce overall running time and produce results/calculations faster.
In other words, what is an effective way to go through all the records, one by one, calculate the value of a field via a second query then save the result.
UPDATE:
There is an important piece of information that I unintentionally missed mentioning while trying to obfuscate sensitive details. Sorry for that.
The original SELECT query calculates a value aggregated from different tables as follows:
SELECT CR.gg, (AX.b + BF.f)/CR.d AS calculated_field
FROM table_one CR
LEFT JOIN table_two AX ON EX.x = CR.x
LEFT JOIN table_three BF ON BF.x = CR.x
WHERE CR.gg = '123'
GROUP BY CR.gg;
PS: the SQL query is written by our experienced DBA so i trust that it is optimised.
don't loop over records and call the DBMS repeatedly for every record.
instead, let the DBMS process large chunks (preferrably: all) of data
and, let it spit out all the results.
Below is a snippet of my twitter-sucker(with a rather complex ugly query)
def fetch_referred_tweets(self):
self.curs = self.conn.cursor()
tups = ()
selrefd = """SELECT twx.id, twx.in_reply_to_id, twx.seq, twx.created_at
FROM(
SELECT tw1.id, tw1.in_reply_to_id, tw1.seq, tw1.created_at
FROM tt_tweets tw1
WHERE 1=1
AND tw1.in_reply_to_id > 0
AND tw1.is_retweet = False
AND tw1.did_resolve = False
AND NOT EXISTS ( SELECT * FROM tweets nx
WHERE nx.id = tw1.in_reply_to_id)
AND NOT EXISTS ( SELECT * FROM tt_tweets nx
WHERE nx.id = tw1.in_reply_to_id)
UNION ALL
SELECT tw2.id, tw2.in_reply_to_id, tw2.seq, tw2.created_at
FROM tweets tw2
WHERE 1=1
AND tw2.in_reply_to_id > 0
AND tw2.is_retweet = False
AND tw2.did_resolve = False
AND NOT EXISTS ( SELECT * FROM tweets nx
WHERE nx.id = tw2.in_reply_to_id)
AND NOT EXISTS ( SELECT * FROM tt_tweets nx
WHERE nx.id = tw2.in_reply_to_id)
-- ORDER BY tw2.created_at DESC
)twx
LIMIT %s;"""
# -- AND tw.created_at < now() - '15 min':: interval
# -- AND tw.created_at >= now() - '72 hour':: interval
count = 0
uniqs = 0
self.curs.execute(selrefd, (quotum_referred_tweets, ) )
tups = self.curs.fetchmany(quotum_referred_tweets)
for tup in tups:
if tup == None: break
print ('%d -->> %d [seq=%d] datum=%s' % tup)
self.resolve_list.append(tup[0] ) # this tweet
if tup[1] not in self.refetch_tweets:
self.refetch_tweets[ tup[1] ] = [ tup[0]] # referred tweet
uniqs += 1
count += 1
self.curs.close()
Note: your query makes no sense:
you only select fields from the ertable
so, the two LEFT JOINed tables could be omitted
if ex and ef do contain multiple matching rows, the resultset could be larger than just all the rows selected from er, resulting in duplicateder records
there is a GROUP BY present, but no aggregates are in the select list
select er.gg, er.z, er.y
from table_one er
where er.gg = '123'
-- or:
where er.gg >= '123'
and er.gg <= '456'
ORDER BY er.gg, er.z, er.y -- Or: some other ordering
;
since you are doing a join in your query, the logical thing to do is to work around it, meaning create what's known as a summary table, this summary table -residing on the database- will hold the final joined dataset, so in your python code you will just fetch/select data from it.
another way is to use materialized view link
I took #wildplasser's advice and moved the calculation operation inside the database as a function.
The result has been impressively efficient to say the least and total run time dropped to minutes/~ hour.
To recap:
Database records are no longer fetched in the sequence
mentioned earlier
Calculations happen inside the database via a function PostgreSQL function

Select single item in MYSQLdb - Python

I've been learning Python recently and have learned how to connect to the database and retrieve data from a database using MYSQLdb. However, all the examples show how to get multiple rows of data. I want to know how to retrieve only one row of data.
This is my current method.
cur.execute("SELECT number, name FROM myTable WHERE id='" + id + "'")
results = cur.fetchall()
number = 0
name = ""
for result in results:
number = result['number']
name = result['name']
It seems redundant to do for result in results: since I know there is only going to be one result.
How can I just get one row of data without using the for loop?
.fetchone() to the rescue:
result = cur.fetchone()
use .pop()
if results:
result = results.pop()
number = result['number']
name = result['name']

Categories

Resources