postgres query to extract 1 hour old data using python

postgres query to extract 1 hour old data using python - python

I am trying to run below code to extract 1 hour old data keeping a check on start_time and end_time. However it is giving error:
Error is: not enough arguments for format string
I cannot understand the error.
curr = datetime.now()
starttime = modules.rounder(curr)
print("Current time rounded off : ", modules.rounder(curr))
cursor.execute("select node_name, node_ip, object_name, start_time, end_time, report_type, rxgemidle, rxploams, rxdroppedtoolong, txploams, fectotal_s, rxpacketsdropped, fec0to1_s, rxallocationsdisabled, rxcrcerrors, rxgem, rxploamserror, fecpost_s, rxgemdropped, rxfeccodewordsuncorrected, txomci, rxbip8bytes, rxploamsdropped, txcpu, rxfragmentserrors, rxploamsnonidle, rxomci, rxfeccodewords, section_interval_valid, fecpre_s, rxbip8errors, rxgemcorrected, rxgemillegal, fec1to0_s, rxkeyerrors, txdroppedtpidmiss, txdroppedillegallength, txgem, fecr_s, rxallocationsvalid, txdroppedvidmiss, rxallocationsinvalid, rxcpu, rxdroppedtooshort, time_interval, data_time from %s where start_time >= %s - interval '1 hour' and end_time <= %s" %data % starttime )

Related

discord.py How can I send a message if difference between the time of message and current time is above time I specified?

timestamp = f'{message.created_at}'
msg_time = datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S.%f')
now_time = datetime.now()
diff = now_time - msg_time
print('msg_time:', msg_time)
print('now_time:', now_time)
print('diff :', diff)
output :
msg_time: 2022-07-22 06:02:12.934000
now_time: 2022-07-23 01:53:52.375086
diff : 19:51:39.441086
If diff is greater than the time I specify, I want it to send a message to the channel like this:
if diff > 00:01:00.000000:
title2 = "test."
embed2 = discord.Embed(title=title2, color=0xf1c40f)
msg = await channels.send(embed=embed2)
I made it here so that if it's longer than 1 minute, it can be sent, but I don't know exactly, so it doesn't work, how can I do it?

Try using timedelta.total_seconds()
diff = diff.total_seconds()
This will convert the time difference into an integer representing the time in seconds. You can use that value much easier.
References:
https://pythontic.com/datetime/timedelta/total_seconds
https://www.geeksforgeeks.org/python-timedelta-total_seconds-method-with-example/

Error calculating time difference in Python

Quick question. Does someone know why I'am getting an 'Invalid Syntax' error usign this code? Thank you all.
def get_time_difference(date, time_string):
time_difference = datetime.now() - datetime.strptime(f"{date} {time_string}", "%d-%m-%Y %H:%M")
return f"{time_difference.hour}:{time_difference.minute}"
get_time_difference(1-1-2020 1:50)

You should call get_time_difference("1-1-2020", "1:50").
However, you will get another error:
AttributeError: 'datetime.timedelta' object has no attribute 'hour'
You can adapt get_time_difference as follows:
def get_time_difference(date, time_string):
time_difference = datetime.now() - datetime.strptime(
f"{date} {time_string}", "%d-%m-%Y %H:%M"
)
hours = time_difference.seconds // 3600
minutes = time_difference.seconds // 60 % 60
return f"{hours}:{minutes}"

from datetime import datetime
def get_time_difference(date, time_string):
time_difference = datetime.now() - datetime.strptime(f"{date} {time_string}", "%d-%m-%Y %H:%M")
return f"{time_difference.seconds // 3600}:{time_difference.seconds // 60 % 60}"
print(get_time_difference('1-1-2020', '1:50'))
Output
15:50

How can I speed up a python loop with a timestamp interval condition

I have this code that is rather done in a hurry but it works in general. The only thing it runs forever. The idea is to update 2 columns on a table that is holding 1495748 rows, so the number of the list of timestamp being queried in first place. For each update value there has to be done a comparison in which the timestamp has to be in an hourly interval that is formed by two timestamps coming from the api in two different dicts. Is there a way to speed up things a little or maybe multiprocess it?
Hint: db_mac = db_connection to a Postgres database.
the response looks like this:
{'meta': {'source': 'National Oceanic and Atmospheric Administration, Deutscher Wetterdienst'}, 'data': [{'time': '2019-11-26 23:00:00', 'time_local': '2019-11-27 00:00', 'temperature': 8.3, 'dewpoint': 5.9, 'humidity': 85, 'precipitation': 0, 'precipitation_3': None, 'precipitation_6': None, 'snowdepth': None, 'windspeed': 11, 'peakgust': 21, 'winddirection': 160, 'pressure': 1004.2, 'condition': 4}, {'time': '2019-11-27 00:00:00', ....
import requests
import db_mac
from collections import defaultdict
import datetime
import time
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=<APIKEY>".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for row in result:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE the code for multiprocessing. I'll let it run the night and give an update of the running time.
import requests
import db_mac
from collections import defaultdict
import datetime
import time
import multiprocessing as mp
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=wzwi2YR5".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
def find_parameters(x):
for row in result[x[0]:x[1]]:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
step1 = int(len(result) /4)
step2 = 2 * step1
step3 = 3 * step1
step4 = len(result)
steps = [[0,step1],[step1,step2],[step2,step3], [step3,step4]]
pool = mp.Pool(mp.cpu_count())
pool.map(find_parameters, steps)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE 2
It finished and ran for 2:45 hours in 4 cores on a raspberry pi. Though is there a more efficient way to do such things?

So theres a few minor things I can think of to speed this up a little. I figure anything little bit helps especially if you have a lot of rows to process. For starters, print statements can slow down your code a lot. I'd get rid of those if they are unneeded.
Most importantly, you are calling the api in every iteration of the loop. Waiting for a response from the API is probably taking up the bulk of your time. I looked a bit at the api you are using, but don't know the exact case you're using it for or what your dates "start" and "end" look like, but if you could do it in less calls that would surely speed up this loop by a lot. Another way you can do this is, it looks like the api has a .csv version of the data you can download and use. Running this on local data would be way faster. If you choose to go this route i'd suggest using pandas. (Sorry if you already know pandas and i'm over explaining) You can use: df = pd.read_csv("filename.csv") and edit the table from there easily. You can also do df.to_sql(params) to write to your data base. Let me know if you want help forming a pandas version of this code.
Also, not sure from your code if this would cause an error, but I would try, instead of your for loop (for i in weather["data"]).
hist_weather_list += weather["data"]
or possibly
hist_weather_list += [weather["data"]
Let me know how it goes!

(Python) Retrieve the sunrise and sunset times from Google

Hello!
I recently went to the LACMA museum of art, and stumbled upon this clock. Basically it uses a light sensor to determine the percent of the day that has passed. This means sunrise would be 0.00% and sunset would be 100%. I wanted to create a easier version of this, having a program Google the sunset and sunrise times for the day and work from there. Eventually this would all be transferred to a Raspberry Pi 3 (another problem for another day), therefore the code would have to be in Python. Could I maybe get some help writing it?
TLDR Version
I need a Python program that googles and returns the times of the sunset and sunrise for the day. Mind helping?

It's not pretty but it should work, just use your coordinates as the parameters.
From their website "NOTE: All times are in UTC and summer time adjustments are not included in the returned data."
import requests
from datetime import datetime
from datetime import timedelta
def get_sunrise_sunset(lat, long):
link = "http://api.sunrise-sunset.org/json?lat=%f&lng=%f&formatted=0" % (lat, long)
f = requests.get(link)
data = f.text
sunrise = data[34:42]
sunset = data[71:79]
print("Sunrise = %s, Sunset = %s" % (sunrise, sunset))
s1 = sunrise
s2 = sunset
FMT = '%H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
daylight = timedelta(days=0,seconds=tdelta.seconds, microseconds=tdelta.microseconds)
print('Total daylight = %s' % daylight)
t1 = datetime.strptime(str(daylight), '%H:%M:%S')
t2 = datetime(1900, 1, 1)
daylight_as_minutes = (t1 - t2).total_seconds() / 60.0
print('Daylight in minutes = %s' % daylight_as_minutes)
sr1 = datetime.strptime(str(sunrise), '%H:%M:%S')
sr2 = datetime(1900, 1, 1)
sunrise_as_minutes = (sr1 - sr2).total_seconds() / 60.0
print('Sunrise in minutes = %s' % sunrise_as_minutes)
ss1 = datetime.strptime(str(sunset), '%H:%M:%S')
ss2 = datetime(1900, 1, 1)
sunset_as_minutes = (ss1 - ss2).total_seconds() / 60.0
print('Sunset in minutes = %s' % sunset_as_minutes)
if __name__ == '__main__':
get_sunrise_sunset(42.9633599,-86.6680863)

Operation timed out error in Cassandra cluster

My cluster size is 6 machines and I often times receive this error message and I don't really know how to solve this:
code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'LOCAL_ONE'}
This is my complete code and the part of the code where the error message occurs is this:
batch.add(schedule_remove_stmt, (source, type, row['scheduled_for'],row['id']));session.execute(batch,30)
Complete code:
cluster = Cluster(['localhost'])
session = cluster.connect('keyspace')
d = datetime.utcnow()
scheduled_for = d.replace(second=0, microsecond=0)
rowid=[]
stmt = session.prepare('SELECT * FROM schedules WHERE source=? AND type= ? AND scheduled_for = ?')
schedule_remove_stmt = session.prepare("DELETE FROM schedules WHERE source = ? AND type = ? AND scheduled_for = ? AND id = ?")
schedule_insert_stmt = session.prepare("INSERT INTO schedules(source, type, scheduled_for, id) VALUES (?, ?, ?, ?)")
schedules_to_delete = []
articles={}
source=''
type=''
try:
rows = session.execute(stmt, [source,type, scheduled_for])
article_schedule_delete = ''
for row in rows:
schedules_to_delete.append({'id':row.id,'scheduled_for':row.scheduled_for})
article_schedule_delete=article_schedule_delete+'\''+row.id+'\','
rowid.append(row.id)
article_schedule_delete = article_schedule_delete[0:-1]
cql = 'SELECT * FROM articles WHERE id in (%s)' % article_schedule_delete
articles_row = session.execute(cql)
for row in articles_row:
articles[row.id]=row.created_at
except Exception as e:
print e
log.info('select error is:%s' % e)
try:
for row in schedules_to_delete:
batch = BatchStatement()
batch.add(schedule_remove_stmt, (source, type, row['scheduled_for'],row['id']))
try:
if row['id'] in articles.keys():
next_schedule =d
elapsed = datetime.utcnow() - articles[row['id']]
if elapsed <= timedelta(hours=1):
next_schedule += timedelta(minutes=6)
elif elapsed <= timedelta(hours=3):
next_schedule += timedelta(minutes=18)
elif elapsed <= timedelta(hours=6):
next_schedule += timedelta(minutes=36)
elif elapsed <= timedelta(hours=12):
next_schedule += timedelta(minutes=72)
elif elapsed <= timedelta(days=1):
next_schedule += timedelta(minutes=144)
elif elapsed <= timedelta(days=3):
next_schedule += timedelta(minutes=432)
elif elapsed <= timedelta(days=30) :
next_schedule += timedelta(minutes=1440)
if not next_schedule==d:
batch.add(schedule_insert_stmt, (source,type, next_schedule.replace(second=0, microsecond=0),row['id']))
#log.info('schedule id:%s' % row['id'])
except Exception as e:
print 'key error:',e
log.info('HOW IT CHANGES %s %s %s %s ERROR:%s' % (source,type, next_schedule.replace(second=0, microsecond=0), row['id'],e))
session.execute(batch,30)
except Exception as e:
print 'schedules error is =======================>',e
log.info('schedules error is:%s' % e)
Thanks a lot for the help I really don't know how to solve this!

I think you shouldn't use a batch statement in this case because you are tying to use the batch to perform a big number of operations for different partition keys, it leads to timeout exceptions. You should use batches to keep tables in sync but not for performance optimization.
You can find more about misusing batches in this article
Using an asynchronous driver api is more suitable to perform a lot of delete queries for you case. It will allow to keep performance of your code and avoid coordinator overload.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

postgres query to extract 1 hour old data using python - python

Related

discord.py How can I send a message if difference between the time of message and current time is above time I specified?

Error calculating time difference in Python

How can I speed up a python loop with a timestamp interval condition

(Python) Retrieve the sunrise and sunset times from Google

Operation timed out error in Cassandra cluster

Categories

Resources