I'm trying to fetch a list of timestamps in MySQL by Python. Once I have the list, I check the time now and check which ones are longer than 15min ago. Onces I have those, I would really like a final total number. This seems more challenging to pull off than I had originally thought.
So, I'm using this to fetch the list from MySQL:
db = MySQLdb.connect(host=server, user=mysql_user, passwd=mysql_pwd, db=mysql_db, connect_timeout=10)
cur = db.cursor()
cur.execute("SELECT heartbeat_time FROM machines")
row = cur.fetchone()
print row
while row is not None:
print ", ".join([str(c) for c in row])
row = cur.fetchone()
cur.close()
db.close()
>> 2016-06-04 23:41:17
>> 2016-06-05 03:36:02
>> 2016-06-04 19:08:56
And this is the snippet I use to check if they are longer than 15min ago:
fmt = '%Y-%m-%d %H:%M:%S'
d2 = datetime.strptime('2016-06-05 07:51:48', fmt)
d1 = datetime.strptime('2016-06-04 23:41:17', fmt)
d1_ts = time.mktime(d1.timetuple())
d2_ts = time.mktime(d2.timetuple())
result = int(d2_ts-d1_ts) / 60
if str(result) >= 15:
print "more than 15m ago"
I'm at a loss how I am able to combine these though. Also, now that I put it in writing, there must be a easier/better way to filter these?
Thanks for the suggestions!
You could incorporate the 15min check directly into your SQL query. That way there is no need to mess around with timestamps and IMO it's far easier to read the code.
If you need some date from other columns from your table:
select * from machines where now() > heartbeat_time + INTERVAL 15 MINUTE;
If the total count is the only thing you are interested in:
SELECT count(*) FROM machines WHERE NOW() > heartbeat_time + INTERVAL 15 MINUTE;
That way you can do a cur.fetchone() and get either None or a tuple where the first value is the number of rows with a timestamp older than 15 minutes.
For iterating over a resultset it should be sufficient to write
cur.execute('SELECT * FROM machines')
for row in cur:
print row
because the base cursor already behaves like an iterator using .fetchone().
(all assuming you have timestamps in your DB as you stated in the question)
#user5740843: if str(result) >= 15: will not work as intended. This will always be True because of the str().
I assume heartbeat_time field is a datetime field.
import datetime
import MySQLdb
import MySQLdb.cursors
db = MySQLdb.connect(host=server, user=mysql_user, passwd=mysql_pwd, db=mysql_db, connect_timeout=10,
cursorclass=MySQLdb.cursors.DictCursor)
cur = db.cursor()
ago = datetime.datetime.utcnow() - datetime.timedelta(minutes=15)
try:
cur.execute("SELECT heartbeat_time FROM machines")
for row in cur:
if row['heartbeat_time'] <= ago:
print row['heartbeat_time'], 'more than 15 minutes ago'
finally:
cur.close()
db.close()
If data size is not that huge, loading all of them to memory is a good practice, which will release the memory buffer on the MySQL server. And for DictCursor, there is not such a difference between,
rows = cur.fetchall()
for r in rows:
and
for r in cur:
They both load data to the client. MySQLdb.SSCursor and SSDictCursor will try to transfer data as needed, while it requires MySQL server to support it.
Related
I'm trying to read data from an oracle db.
I have to read on python the results of a simple select that returns a million of rows.
I use the fetchall() function, changing the arraysize property of the cursor.
select_qry = db_functions.read_sql_file('src/data/scripts/03_perimetro_select.sql')
dsn_tns = cx_Oracle.makedsn(ip, port, sid)
con = cx_Oracle.connect(user, pwd, dsn_tns)
start = time.time()
cur = con.cursor()
cur.arraysize = 1000
cur.execute('select * from bigtable where rownum < 10000')
res = cur.fetchall()
# print res # uncomment to display the query results
elapsed = (time.time() - start)
print(elapsed, " seconds")
cur.close()
con.close()
If I remove the where condition where rownum < 10000 the python environment freezes and the fetchall() function never ends.
After some trials I found a limit for this precise select, it works till 50k lines, but it fails if I select 60k lines.
What is causing this problem? Do I have to find another way to fetch this amount of data or the problem is the ODBC connection? How can I test it?
Consider running in batches using Oracle's ROWNUM. To combine back into single object append to a growing list. Below assumes total row count for table is 1 mill. Adjust as needed:
table_row_count = 1000000
batch_size = 10000
# PREPARED STATEMENT
sql = """SELECT t.* FROM
(SELECT *, ROWNUM AS row_num
FROM
(SELECT * FROM bigtable ORDER BY primary_id) sub_t
) AS t
WHERE t.row_num BETWEEN :LOWER_BOUND AND :UPPER_BOUND;"""
data = []
for lower_bound in range(0, table_row_count, batch_size):
# BIND PARAMS WITH BOUND LIMITS
cursor.execute(sql, {'LOWER_BOUND': lower_bound,
'UPPER_BOUND': lower_bound + batch_size - 1})
for row in cur.fetchall():
data.append(row)
You are probably running out of memory on the computer running cx_Oracle. Don't use fetchall() because this will require cx_Oracle to hold all result in memory. Use something like this to fetch batches of records:
cursor = connection.cursor()
cursor.execute("select employee_id from employees")
res = cursor.fetchmany(numRows=3)
print(res)
res = cursor.fetchmany(numRows=3)
print(res)
Stick the fetchmany() calls in a loop, process each batch of rows in your app before fetching the next set of rows, and exit the loop when there is no more data.
What ever solution you use, tune cursor.arraysize to get best performance.
The already given suggestion to repeat the query and select subsets of rows is also worth considering. If you are using Oracle DB 12 there is a newer (easier) syntax like SELECT * FROM mytab ORDER BY id OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY.
PS cx_Oracle does not use ODBC.
This is my R piece of code but i want to do the same thing in python, as i am new in it having problems to write the correct code can anybody guide me how to write this is python. I have already made connections of database and also tried simple queries but here i am struggling
sql_command <- "SELECT COUNT(DISTINCT Id) FROM \"Bowlers\";"
total<-as.numeric(dbGetQuery(con, sql_command))
data<-setNames(data.frame(matrix(ncol=8,
nrow=total)),c("Name","Wkts","Ave","Econ","SR","WicketTaker","totalovers",
"Matches"))
for (i in 1:total){
sql_command <- paste("SELECT * FROM \"Bowlers\" where Id = ", i ,";",
sep="")
p<-dbGetQuery(con, sql_command)
p[is.na(p)] <- 0
data$Name[i] = p$bowler[1]
}
after this which works fine how should i proceed to write the loop code:
with engine.connect() as con:
rs=con.execute('SELECT COUNT(DISTINCT id) FROM "Bowlers"')
for row in rs:
print (row)
Use the format method for strings in python to achieve it.
I am using postgresql, but your connection should be similar. Something like:
connect to test database:
import psycopg2
con = psycopg2.connect("dbname='test' user='your_user' host='your_host' password='your_password'")
cur = con.cursor() # cursor method may differ for your connection
loop over your id's:
for i in range(1,total+1):
sql_command = 'SELECT * FROM "Bowlers" WHERE id = {}'.format(i)
cur.execute(sql_command) # execute and fetchall method may differ
rows = cur.fetchall() # check for your connection
print ("output first row for id = {}".format(i))
print (rows[0]) # sanity check, printing first row for ids
print('\n') # rows is a list of tuples
# you can convert them into numpy arrays
I got about 15 days of data from I would like to query the last 5 days. I would also like to work with epochTime in my conditions.
Here is what I'm using right now :
To get x values every 200 rows, very fast method by working with rowid only.
connection = sql.connect("/home/pi/data.db")
cursor = connection.cursor()
cursor.execute("SELECT epochTime,x from (select rowid,epochTime,x from table1 where (rowid % 200 = 0)) order by rowid ASC")
x = cursor.fetchall()
First try using classic timestamp, however I'm suprised this is taking way more time than soluce 1 (about +2 seconds if fetching the 15 days). Am I doing it wrong ?
cursor.execute("SELECT timestamp, epochTime, x from (select rowid,epochTime,x from table1 where (rowid % 200 = 0) and timestamp>(select datetime('now','-5 day'))) order by rowid ASC")
Finally, a first attempt by using epochTime but it's not working:
cursor.execute("SELECT epochTime, x from (select rowid,epochTime,x from table1 where (rowid % 200 = 0) and epochTime>(select datetime('now','-68400'))) order by rowid ASC")
Note: here is how epochTime is parsed, I had to add %s100 to work with highcharts, it can maybe causing an issue for sqlite conditions ? :
now = datetime.datetime.now()
epochTime = now.strftime("%s100")
Does working with "time" condition will automatically slowing the request ?
How should i write this down to work properly with UnixEpoch ?
Thanks a lot!
I'm using the adaptor pg8000 to read in records in my db with the following code:
cursor = conn.cursor()
results = cursor.execute("SELECT * from data_" + country + " WHERE date >= %s AND date <= %s AND time >= %s AND time <= %s", (datetime.date(fromdate[0], fromdate[1], fromdate[2]), datetime.date(todate[0], todate[1], todate[2]),datetime.time(fromtime[0],fromtime[1]), datetime.time(totime[0],totime[1])))
results = cursor.fetchall()
The problem emerges when I select a date range that brings in say 100 records. Its not a huge number of records but it is enough to cause the following issue and I cannot see where the issue might come from - as it seems to be dependent on the number of records brought back. For example: results = cursor.fetchall() appears to work just fine and return one result.
The error message I get is:
File "/mnt/opt/Centos5.8/python-2.7.8/lib/python2.7/site-packages/pg8000/core.py", line 1650, in handle_messages
raise error
pg8000.errors.ProgrammingError: ('ERROR', '34000', 'portal "pg8000_portal_0" does not exist')
Obviously I cannot find a way of fixing this despite exploring.
When using fetchmany(), here are the results:
results = cursor.fetchmany(100) WORKS - limited to 100
results = cursor.fetchmany(101) FAILS - same error as above
In autocommit mode you can't retrieve more rows than the pg8000 cache holds (which is 100 by default).
I've made a commit that gives a better error message when this happens, which will be in the next release of pg8000.
The reason is that if the number of rows returned by a query is greater than the number of rows in the pg8000 cache, the database portal is kept open, and then when the cache is empty, more rows are fetched from the portal. A portal can only exist within a transaction, so in autocommit mode a portal is immediately closed after the first batch of rows is retrieved. If you try and retrieve a second batch from the portal, you get the 'portal does not exist' error that was reported in the question.
It appears this can be solved by setting:
conn.autocommit = False
Now the code looks like:
conn.autocommit = False
cursor = conn.cursor()
results = cursor.execute("SELECT * from data_" + country + " WHERE date >= %s AND date <= %s AND time >= %s AND time <= %s", (datetime.date(fromdate[0], fromdate[1], fromdate[2]), datetime.date(todate[0], todate[1], todate[2]),datetime.time(fromtime[0],fromtime[1]), datetime.time(totime[0],totime[1])))
results = cursor.fetchall()
I'm not sure why this should be the case - but there seems a limit on the number of records of 100 with autocommit set to True
I'm using pyodbc to connect to a Teradata database and it seems that something is now working properly:
This:
conn = connect(params)
cur = conn.cursor()
if len(argv) > 1:
query = ''.join(open(argv[1]).readlines())
else:
query = "SELECT count(*) FROM my_table"
cur.execute(query)
print "...done"
print cur.fetchall()
returns what seems to be an overflow, a number like 140630114173190, but in fact there are only 260 entries in the table (which I do get by querying directly on the sql assistant from teradata)
However, when doing a select * the result seems to be correct.
Any idea of what could be going on?
Running on:
Linux eron-redhat-100338 2.6.32-131.0.15.el6.x86_64
Thanks
EDIT: I don't think this is a fetchall() issue. That's only gong to change whether I get a list, or a tuple or whatever but the number won't change.
Interestingly, I discovered that changing to
query = "SELECT CAST(count(*)) AS DECIMAL(10,2) FROM my_table"
does get the right number, only in as float number. Something is going on with the integers.
While fetchall() returns recordset, and you need 1st column of 1st record you should use something like:
print('# of rows: [%s]' % (c.fetchall()[0][0]))
or:
for row in c.fetchall():
print('# of rows: [%s]' % (row[0]))