Operation timed out error in Cassandra cluster

Operation timed out error in Cassandra cluster - python

My cluster size is 6 machines and I often times receive this error message and I don't really know how to solve this:
code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'LOCAL_ONE'}
This is my complete code and the part of the code where the error message occurs is this:
batch.add(schedule_remove_stmt, (source, type, row['scheduled_for'],row['id']));session.execute(batch,30)
Complete code:
cluster = Cluster(['localhost'])
session = cluster.connect('keyspace')
d = datetime.utcnow()
scheduled_for = d.replace(second=0, microsecond=0)
rowid=[]
stmt = session.prepare('SELECT * FROM schedules WHERE source=? AND type= ? AND scheduled_for = ?')
schedule_remove_stmt = session.prepare("DELETE FROM schedules WHERE source = ? AND type = ? AND scheduled_for = ? AND id = ?")
schedule_insert_stmt = session.prepare("INSERT INTO schedules(source, type, scheduled_for, id) VALUES (?, ?, ?, ?)")
schedules_to_delete = []
articles={}
source=''
type=''
try:
rows = session.execute(stmt, [source,type, scheduled_for])
article_schedule_delete = ''
for row in rows:
schedules_to_delete.append({'id':row.id,'scheduled_for':row.scheduled_for})
article_schedule_delete=article_schedule_delete+'\''+row.id+'\','
rowid.append(row.id)
article_schedule_delete = article_schedule_delete[0:-1]
cql = 'SELECT * FROM articles WHERE id in (%s)' % article_schedule_delete
articles_row = session.execute(cql)
for row in articles_row:
articles[row.id]=row.created_at
except Exception as e:
print e
log.info('select error is:%s' % e)
try:
for row in schedules_to_delete:
batch = BatchStatement()
batch.add(schedule_remove_stmt, (source, type, row['scheduled_for'],row['id']))
try:
if row['id'] in articles.keys():
next_schedule =d
elapsed = datetime.utcnow() - articles[row['id']]
if elapsed <= timedelta(hours=1):
next_schedule += timedelta(minutes=6)
elif elapsed <= timedelta(hours=3):
next_schedule += timedelta(minutes=18)
elif elapsed <= timedelta(hours=6):
next_schedule += timedelta(minutes=36)
elif elapsed <= timedelta(hours=12):
next_schedule += timedelta(minutes=72)
elif elapsed <= timedelta(days=1):
next_schedule += timedelta(minutes=144)
elif elapsed <= timedelta(days=3):
next_schedule += timedelta(minutes=432)
elif elapsed <= timedelta(days=30) :
next_schedule += timedelta(minutes=1440)
if not next_schedule==d:
batch.add(schedule_insert_stmt, (source,type, next_schedule.replace(second=0, microsecond=0),row['id']))
#log.info('schedule id:%s' % row['id'])
except Exception as e:
print 'key error:',e
log.info('HOW IT CHANGES %s %s %s %s ERROR:%s' % (source,type, next_schedule.replace(second=0, microsecond=0), row['id'],e))
session.execute(batch,30)
except Exception as e:
print 'schedules error is =======================>',e
log.info('schedules error is:%s' % e)
Thanks a lot for the help I really don't know how to solve this!

I think you shouldn't use a batch statement in this case because you are tying to use the batch to perform a big number of operations for different partition keys, it leads to timeout exceptions. You should use batches to keep tables in sync but not for performance optimization.
You can find more about misusing batches in this article
Using an asynchronous driver api is more suitable to perform a lot of delete queries for you case. It will allow to keep performance of your code and avoid coordinator overload.

Related

is there a way to automate VMware snapshot deletion by setting a schedule through Ansible or rest API

I am trying to figure out if there is a way to automate VMware snapshot deletion through Ansible.
I have found vmware_guest_powerstate.py"to be closest to it and tried to modified it but it's failing with "Failed to create scheduled task present as specifications given are invalid: A specified parameter was not correct: spec.action"
pstate = {
'present': vim.VirtualMachine.CreateSnapshot,
'absent': vim.VirtualMachine.RemoveAllSnapshots,
}
dt = ""
try:
dt = datetime.strptime(scheduled_at, "%d/%m/%Y %H:%M")
except ValueError as e:
module.fail_json(
msg="Failed to convert given date and time string to Python datetime object,"
"please specify string in 'dd/mm/yyyy hh:mm' format: %s"
% to_native(e)
)
schedule_task_spec = vim.scheduler.ScheduledTaskSpec()
schedule_task_name = module.params["schedule_task_name"] or "task_%s" % str(
randint(10000, 99999)
)
schedule_task_desc = module.params["schedule_task_description"]
if schedule_task_desc is None:
schedule_task_desc = (
"Schedule task for vm %s for "
"operation %s at %s"
% (vm.name, scheduled_at)
)
schedule_task_spec.name = schedule_task_name
schedule_task_spec.description = schedule_task_desc
schedule_task_spec.scheduler = vim.scheduler.OnceTaskScheduler()
schedule_task_spec.scheduler.runAt = dt
schedule_task_spec.action = vim.action.MethodAction()
schedule_task_spec.action.name = pstate[module.params['state']]
schedule_task_spec.enabled = module.params["schedule_task_enabled"]

Suggest trying "vim.VirtualMachine.CreateSnapshot_Task" and "RemoveAllSnapshots_Task" instead of "vim.VirtualMachine.CreateSnapshot" and "vim.VirtualMachine.RemoveAllSnapshots" to see if that works.
pstate = {'present': vim.VirtualMachine.CreateSnapshot_Task,'absent': vim.VirtualMachine.RemoveAllSnapshots_Task}
....
schedule_task_spec.name = schedule_task_name
schedule_task_spec.description = schedule_task_desc
schedule_task_spec.scheduler = vim.scheduler.OnceTaskScheduler()
schedule_task_spec.scheduler.runAt = dt
schedule_task_spec.action = vim.action.MethodAction()
schedule_task_spec.action.name = pstate[module.params['state']]
schedule_task_spec.enabled = module.params["schedule_task_enabled"]
print(schedule_task_spec.action)

Is there a way to use FORALL to insert data from an array?

I am running oracle 19c and I want to get the best insert performance I can. Currently, I insert using INSERT /*+APPEND */ ... which is fine, but not the speeds I wanted.
I read that using FORALL is a lot faster but I couldn't really find any examples.
here is the code snippet (python 3) :
connection = pool.acquire()
cursor = connection.cursor()
cursor.executemany("INSERT /*+APPEND*/ INTO RANDOM VALUES (:1, :2, :3)", list(random))
connection.commit()
cursor.close()
connection.close()

I really get interested in what would be faster, so I've tested some possibile ways to compare them:
simple executemany with no tricks.
the same with APPEND_VALUES hint inside the statement.
union all approach you've tried in another question. This should be slower than above since it generates a really very large statement (that potentially can require more network than the data itself). It then should be parsed at DB side that will also consume a lot of time and neglect all the benefits (not talking about potential size limit). Then I've executemany'ed it to test with chunks not to build a single statement for 100k records. I didn't use concatenation of values inside the statement, because wanted to keep it safe.
insert all. The same downsides, but no unions. Compare it with the union version.
serialize the data in JSON and do deserialization at DB side with json_table. Potentially good performance with single short statement and single data transfer with little overhead of JSON.
Your suggested FORALL in PL/SQL wrapper procedure. Should be the same as executemany since does the same, but at the database side. Overhead of transformation of the data into the collection.
The same FORALL, but with columnar approach to pass the data: pass simple lists of column values instead of complex type. Should be much faster than FORALL with collection since there's no need to serialize the data into collection's type.
I've used Oracle Autonomous Database in Oracle Cloud with free account. Each method was executed for 10 times in loop with the same input dataset of 100k records, table was recreated before each test. This is the result I've got. Preparation and execution times here are data transformation at client side end DB call itself respectively.
>>> t = PerfTest(100000)
>>> t.run("exec_many", 10)
Method: exec_many.
Duration, avg: 2.3083874 s
Preparation time, avg: 0.0 s
Execution time, avg: 2.3083874 s
>>> t.run("exec_many_append", 10)
Method: exec_many_append.
Duration, avg: 2.6031369 s
Preparation time, avg: 0.0 s
Execution time, avg: 2.6031369 s
>>> t.run("union_all", 10, 10000)
Method: union_all.
Duration, avg: 27.9444233 s
Preparation time, avg: 0.0408773 s
Execution time, avg: 27.8457551 s
>>> t.run("insert_all", 10, 10000)
Method: insert_all.
Duration, avg: 70.6442494 s
Preparation time, avg: 0.0289269 s
Execution time, avg: 70.5541995 s
>>> t.run("json_table", 10)
Method: json_table.
Duration, avg: 10.4648237 s
Preparation time, avg: 9.7907693 s
Execution time, avg: 0.621006 s
>>> t.run("forall", 10)
Method: forall.
Duration, avg: 5.5622837 s
Preparation time, avg: 1.8972456000000002 s
Execution time, avg: 3.6650380999999994 s
>>> t.run("forall_columnar", 10)
Method: forall_columnar.
Duration, avg: 2.6702698000000002 s
Preparation time, avg: 0.055710800000000005 s
Execution time, avg: 2.6105702 s
>>>
The fastest way is just executemany, not so much surprise. Interesting here is that APPEND_VALUES does not improve the query and gets more time on average, so this needs more investigation.
About FORALL: as expected, individual array for each column takes less time as there's no data preparation for it. It is more or less comparable with executemany, but I think PL/SQL overhead plays some role here.
Another interesting part for me is JSON: most of the time was spent on writing LOB into database and serialization, but the query itself was very fast. Maybe write operation can be improved in some way with chuncsize or some another way to pass LOB data into select statement, but as of my code it is far from very simple and straightforward approach with executemany.
There`re also possible approaches without Python that should be faster as native tools for external data, but I didn't tested them:
Oracle SQL*Loader
External table
Below is the code I've used for testing.
import cx_Oracle as db
import os, random, json
import datetime as dt
class PerfTest:
def __init__(self, size):
self._con = db.connect(
os.environ["ora_cloud_usr"],
os.environ["ora_cloud_pwd"],
"test_low",
encoding="UTF-8"
)
self._cur = self._con.cursor()
self.inp = [(i, "Test {i}".format(i=i), random.random()) for i in range(size)]
def __del__(self):
if self._con:
self._con.rollback()
self._con.close()
#Create objets
def setup(self):
try:
self._cur.execute("drop table rand")
#print("table dropped")
except:
pass
self._cur.execute("""create table rand(
id int,
str varchar2(100),
val number
)""")
self._cur.execute("""create or replace package pkg_test as
type ts_test is record (
id rand.id%type,
str rand.str%type,
val rand.val%type
);
type tt_test is table of ts_test index by pls_integer;
type tt_ids is table of rand.id%type index by pls_integer;
type tt_strs is table of rand.str%type index by pls_integer;
type tt_vals is table of rand.val%type index by pls_integer;
procedure write_data(p_data in tt_test);
procedure write_data_columnar(
p_ids in tt_ids,
p_strs in tt_strs,
p_vals in tt_vals
);
end;""")
self._cur.execute("""create or replace package body pkg_test as
procedure write_data(p_data in tt_test)
as
begin
forall i in indices of p_data
insert into rand(id, str, val)
values (p_data(i).id, p_data(i).str, p_data(i).val)
;
commit;
end;
procedure write_data_columnar(
p_ids in tt_ids,
p_strs in tt_strs,
p_vals in tt_vals
) as
begin
forall i in indices of p_ids
insert into rand(id, str, val)
values (p_ids(i), p_strs(i), p_vals(i))
;
commit;
end;
end;
""")
def build_union(self, size):
return """insert into rand(id, str, val)
select id, str, val from rand where 1 = 0 union all
""" + """ union all """.join(
["select :{}, :{}, :{} from dual".format(i*3+1, i*3+2, i*3+3)
for i in range(size)]
)
def build_insert_all(self, size):
return """
""".join(
["into rand(id, str, val) values (:{}, :{}, :{})".format(i*3+1, i*3+2, i*3+3)
for i in range(size)]
)
#Test case with executemany
def exec_many(self):
start = dt.datetime.now()
self._cur.executemany("insert into rand(id, str, val) values (:1, :2, :3)", self.inp)
self._con.commit()
return (dt.timedelta(0), dt.datetime.now() - start)
#The same as above but with prepared statement (no parsing)
def exec_many_append(self):
start = dt.datetime.now()
self._cur.executemany("insert /*+APPEND_VALUES*/ into rand(id, str, val) values (:1, :2, :3)", self.inp)
self._con.commit()
return (dt.timedelta(0), dt.datetime.now() - start)
#Union All approach (chunked). Should have large parse time
def union_all(self, size):
##Chunked list of big tuples
start_prepare = dt.datetime.now()
new_inp = [
tuple([item for t in r for item in t])
for r in list(zip(*[iter(self.inp)]*size))
]
new_stmt = self.build_union(size)
dur_prepare = dt.datetime.now() - start_prepare
#Execute unions
start_exec = dt.datetime.now()
self._cur.executemany(new_stmt, new_inp)
dur_exec = dt.datetime.now() - start_exec
##In case the size is not a divisor
remainder = len(self.inp) % size
if remainder > 0 :
start_prepare = dt.datetime.now()
new_stmt = self.build_union(remainder)
new_inp = tuple([
item for t in self.inp[-remainder:] for item in t
])
dur_prepare += dt.datetime.now() - start_prepare
start_exec = dt.datetime.now()
self._cur.execute(new_stmt, new_inp)
dur_exec += dt.datetime.now() - start_exec
self._con.commit()
return (dur_prepare, dur_exec)
#The same as union all, but with no need to union something
def insert_all(self, size):
##Chunked list of big tuples
start_prepare = dt.datetime.now()
new_inp = [
tuple([item for t in r for item in t])
for r in list(zip(*[iter(self.inp)]*size))
]
new_stmt = """insert all
{}
select * from dual"""
dur_prepare = dt.datetime.now() - start_prepare
#Execute
start_exec = dt.datetime.now()
self._cur.executemany(
new_stmt.format(self.build_insert_all(size)),
new_inp
)
dur_exec = dt.datetime.now() - start_exec
##In case the size is not a divisor
remainder = len(self.inp) % size
if remainder > 0 :
start_prepare = dt.datetime.now()
new_inp = tuple([
item for t in self.inp[-remainder:] for item in t
])
dur_prepare += dt.datetime.now() - start_prepare
start_exec = dt.datetime.now()
self._cur.execute(
new_stmt.format(self.build_insert_all(remainder)),
new_inp
)
dur_exec += dt.datetime.now() - start_exec
self._con.commit()
return (dur_prepare, dur_exec)
#Serialize at server side and do deserialization at DB side
def json_table(self):
start_prepare = dt.datetime.now()
new_inp = json.dumps([
{ "id":t[0], "str":t[1], "val":t[2]} for t in self.inp
])
lob_var = self._con.createlob(db.DB_TYPE_CLOB)
lob_var.write(new_inp)
start_exec = dt.datetime.now()
self._cur.execute("""
insert into rand(id, str, val)
select id, str, val
from json_table(
to_clob(:json), '$[*]'
columns
id int,
str varchar2(100),
val number
)
""", json=lob_var)
dur_exec = dt.datetime.now() - start_exec
self._con.commit()
return (start_exec - start_prepare, dur_exec)
#PL/SQL with FORALL
def forall(self):
start_prepare = dt.datetime.now()
collection_type = self._con.gettype("PKG_TEST.TT_TEST")
record_type = self._con.gettype("PKG_TEST.TS_TEST")
def recBuilder(x):
rec = record_type.newobject()
rec.ID = x[0]
rec.STR = x[1]
rec.VAL = x[2]
return rec
inp_collection = collection_type.newobject([
recBuilder(i) for i in self.inp
])
start_exec = dt.datetime.now()
self._cur.callproc("pkg_test.write_data", [inp_collection])
dur_exec = dt.datetime.now() - start_exec
return (start_exec - start_prepare, dur_exec)
#PL/SQL with FORALL and plain collections
def forall_columnar(self):
start_prepare = dt.datetime.now()
ids, strs, vals = map(list, zip(*self.inp))
start_exec = dt.datetime.now()
self._cur.callproc("pkg_test.write_data_columnar", [ids, strs, vals])
dur_exec = dt.datetime.now() - start_exec
return (start_exec - start_prepare, dur_exec)
#Run test
def run(self, method, iterations, *args):
#Cleanup schema
self.setup()
start = dt.datetime.now()
runtime = []
for i in range(iterations):
single_run = getattr(self, method)(*args)
runtime.append(single_run)
dur = dt.datetime.now() - start
dur_prep_total = sum([i.total_seconds() for i, _ in runtime])
dur_exec_total = sum([i.total_seconds() for _, i in runtime])
print("""Method: {meth}.
Duration, avg: {run_dur} s
Preparation time, avg: {prep} s
Execution time, avg: {ex} s""".format(
inp_s=len(self.inp),
meth=method,
run_dur=dur.total_seconds() / iterations,
prep=dur_prep_total / iterations,
ex=dur_exec_total / iterations
))

postgres query to extract 1 hour old data using python

I am trying to run below code to extract 1 hour old data keeping a check on start_time and end_time. However it is giving error:
Error is: not enough arguments for format string
I cannot understand the error.
curr = datetime.now()
starttime = modules.rounder(curr)
print("Current time rounded off : ", modules.rounder(curr))
cursor.execute("select node_name, node_ip, object_name, start_time, end_time, report_type, rxgemidle, rxploams, rxdroppedtoolong, txploams, fectotal_s, rxpacketsdropped, fec0to1_s, rxallocationsdisabled, rxcrcerrors, rxgem, rxploamserror, fecpost_s, rxgemdropped, rxfeccodewordsuncorrected, txomci, rxbip8bytes, rxploamsdropped, txcpu, rxfragmentserrors, rxploamsnonidle, rxomci, rxfeccodewords, section_interval_valid, fecpre_s, rxbip8errors, rxgemcorrected, rxgemillegal, fec1to0_s, rxkeyerrors, txdroppedtpidmiss, txdroppedillegallength, txgem, fecr_s, rxallocationsvalid, txdroppedvidmiss, rxallocationsinvalid, rxcpu, rxdroppedtooshort, time_interval, data_time from %s where start_time >= %s - interval '1 hour' and end_time <= %s" %data % starttime )

How can I speed up a python loop with a timestamp interval condition

I have this code that is rather done in a hurry but it works in general. The only thing it runs forever. The idea is to update 2 columns on a table that is holding 1495748 rows, so the number of the list of timestamp being queried in first place. For each update value there has to be done a comparison in which the timestamp has to be in an hourly interval that is formed by two timestamps coming from the api in two different dicts. Is there a way to speed up things a little or maybe multiprocess it?
Hint: db_mac = db_connection to a Postgres database.
the response looks like this:
{'meta': {'source': 'National Oceanic and Atmospheric Administration, Deutscher Wetterdienst'}, 'data': [{'time': '2019-11-26 23:00:00', 'time_local': '2019-11-27 00:00', 'temperature': 8.3, 'dewpoint': 5.9, 'humidity': 85, 'precipitation': 0, 'precipitation_3': None, 'precipitation_6': None, 'snowdepth': None, 'windspeed': 11, 'peakgust': 21, 'winddirection': 160, 'pressure': 1004.2, 'condition': 4}, {'time': '2019-11-27 00:00:00', ....
import requests
import db_mac
from collections import defaultdict
import datetime
import time
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=<APIKEY>".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for row in result:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE the code for multiprocessing. I'll let it run the night and give an update of the running time.
import requests
import db_mac
from collections import defaultdict
import datetime
import time
import multiprocessing as mp
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=wzwi2YR5".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
def find_parameters(x):
for row in result[x[0]:x[1]]:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
step1 = int(len(result) /4)
step2 = 2 * step1
step3 = 3 * step1
step4 = len(result)
steps = [[0,step1],[step1,step2],[step2,step3], [step3,step4]]
pool = mp.Pool(mp.cpu_count())
pool.map(find_parameters, steps)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE 2
It finished and ran for 2:45 hours in 4 cores on a raspberry pi. Though is there a more efficient way to do such things?

So theres a few minor things I can think of to speed this up a little. I figure anything little bit helps especially if you have a lot of rows to process. For starters, print statements can slow down your code a lot. I'd get rid of those if they are unneeded.
Most importantly, you are calling the api in every iteration of the loop. Waiting for a response from the API is probably taking up the bulk of your time. I looked a bit at the api you are using, but don't know the exact case you're using it for or what your dates "start" and "end" look like, but if you could do it in less calls that would surely speed up this loop by a lot. Another way you can do this is, it looks like the api has a .csv version of the data you can download and use. Running this on local data would be way faster. If you choose to go this route i'd suggest using pandas. (Sorry if you already know pandas and i'm over explaining) You can use: df = pd.read_csv("filename.csv") and edit the table from there easily. You can also do df.to_sql(params) to write to your data base. Let me know if you want help forming a pandas version of this code.
Also, not sure from your code if this would cause an error, but I would try, instead of your for loop (for i in weather["data"]).
hist_weather_list += weather["data"]
or possibly
hist_weather_list += [weather["data"]
Let me know how it goes!

Python 3 verification script not checking properly

I've been working on a python script and am having issues with some verification's I set up. I have this procedure file that has a function that uses a order number and a customer number to check some past history about the customers orders. Ive been testing live on our server and I keep failing the last if statement. The order number and customer number Im using does have more than one order and some are over 60 days so it should pass the test but it doesnt. Ive been looking over my code and I just cant see what could be causing this
edit: here are the print results of current and retrieved timestamps:
current_timestamp = 1531849617.921927
retrieved_timestamp = 1489622400
two_month_seconds = 5184000
one_month_seconds = 2592000
Python3
from classes import helper
from classes import api
from classes import order
from procedures import orderReleaseProcedure
import time
import datetime
import re
def verifyCustomer(customer_id, order_id):
self_helper = helper.Helper()
customer_blocked_reasons = self_helper.getConfig('customer_blocked_reasons')
order_statuses = self_helper.getConfig('order_statuses')
customer_is_blocked = False
self_api = api.Api()
self_order =order.Order(order_id)
status = {
'success' : 0,
'message' :'verify_payment_method'
}
results = self_api.which_api('orders?customer_id={}'.format(customer_id))
order_count = results['total_count']
if order_count > 1:
for result in results['orders']:
order_status_info= self_api.which_api('order_statuses/%d' % result['order_status_id'])
for customer_blocked_reason in customer_blocked_reasons:
if customer_blocked_reason in order_status_info['name']:
customer_is_blocked = True
order_id = 0
order_date = result['ordered_at']
two_month_seconds = (3600 * 24) * 60
one_month_seconds = (3600 * 24) * 30
stripped_date = order_date[:order_date.find("T")]
current_timestamp = time.time()
retrieved_timestamp = int(datetime.datetime.strptime(stripped_date, '%Y-%m-%d').strftime("%s"))
if retrieved_timestamp > (current_timestamp - one_month_seconds) and not customer_is_blocked:
status['success'] = 1
status['message'] = "Customer Verified with orders older than 30 days and no blocking reasons"
print(' 30 day check was triggered ')
print(status)
break
elif customer_is_blocked:
status_change_result = self_order.update_status(order_statuses['order_hold_manager_review'])
status['success'] = 1
status['message'] = "Changed order status to Order Hold - Manager Review"
print(' Customer block was triggered ')
print(status_change_result)
break
elif not retrieved_timestamp < (current_timestamp - two_month_seconds):
status['success'] = 0
status['message'] = "There is more than 1 order, and none are greater than 60 days, we need to check manually"
print(' 60 day check was triggered ')
print(status)
break
return status

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Operation timed out error in Cassandra cluster - python

Related

is there a way to automate VMware snapshot deletion by setting a schedule through Ansible or rest API

Is there a way to use FORALL to insert data from an array?

postgres query to extract 1 hour old data using python

How can I speed up a python loop with a timestamp interval condition

Python 3 verification script not checking properly

Categories

Resources