It is my first question here, please pardon any mistakes. I am trying to develop software using python/flask, which continuously receives IoT data from multiple devices, and log data at specific time intervals. for example, I have 3 devices and log data in 30 seconds intervals: device1,device2,device3 sends first data at 5:04:20, 5:04:29,5;04:31 respectively. Then these devices continuously send data every 1 or 2 seconds, I want to keep track of the last data and ensure that the next data is updated at 5:04:50,5:04:59, 5:05:01 respectively after that at 5:05:20 and so on.
I have written a script that ensures this for a single device using threading:
here is the code:
import paho.mqtt.client as mqtt
import csv
import os
import datetime
import threading
import queue
import time
q = queue.Queue()
header = ["Date", "Time", "Real_Speed"]
Rspd_key_1 = "key1="
Rspd_key_2 = "key2="
state = 0
message = ""
values = {"Date": 0, "Time": 0, "Real_Speed": 0}
writeFlag = True
logTime = 0
locallog = 0
nowTime = 0
dataUpdated = False
F_checkTime = True
prev_spd = 9999
def checkTime():
global logTime
global locallog
global values
global dataUpdated
timesDataMissed = 0
while (F_checkTime):
nowTime = time.time()
if(logTime != 0 and nowTime-logTime >= 30):
values['Date'] = datetime.date.today().strftime("%d/%m/%Y")
now = datetime.datetime.now()
values['Time'] = now.strftime("%H:%M:%S")
if(dataUpdated):
q.put(values)
logTime +=30
dataUpdated = False
print(values)
timesDataMissed=0
else:
values['Real_Speed'] = 'NULL'
q.put(values)
logTime = nowTime
dataUpdated = False
timesDataMissed += 1
print(values)
if(timesDataMissed > 10):
timesDataMissed = 0
logTime = 0
def on_connect(client, userdata, flags, rc):
print("Connected with result code "+str(rc))
client.subscribe("something")
def write_csv():
csvfile = open('InverterDataLogger01.csv', mode='w',
newline='', encoding='utf-8')
spamwriter = csv.DictWriter(csvfile, fieldnames=header)
spamwriter.writeheader()
csvfile.close()
while writeFlag:
# print("worker running ",csv_flag)
time.sleep(0.01)
# time.sleep(2)
while not q.empty():
results = q.get()
if results is None:
continue
csvfile = open('InverterDataLogger01.csv', mode='a',
newline='', encoding='utf-8')
spamwriter = csv.DictWriter(csvfile, fieldnames=header)
spamwriter.writerow(results)
csvfile.close()
print("Written in csv File")
def find_spd_val(message):
Do Something
return realspd
def on_message(client, userdata, msg):
message = str(msg.payload.decode("utf-8", "ignore"))
topic = str(msg.topic)
global values
global dataUpdated
global r_index
global prev_spd
global rspd
global locallog
if(logTime==0):
global logTime
logTime = time.time()
locallog=logTime
else:
try:
rspd = int(find_spd_val(message))
except:
pass
if(prev_spd == 9999):
prev_spd = rspd
else:
values['Real_Speed'] = rspd
def on_publish(client, userdata, mid):
print("Message Published")
client = mqtt.Client("hidden")
client.on_connect = on_connect
client.on_message = on_message
client.on_publish = on_publish
client.connect("hidden")
client.loop_start()
t1 = threading.Thread(target=write_csv) # start logger
t2 = threading.Thread(target=checkTime) # start logger
t1.start() # start logging thread
t2.start() # start logging thread
print('written')
try:
while True:
time.sleep(1)
pass
except:
print("interrrupted by keyboard")
client.loop_stop() # start loop
writeFlag = False # stop logging thread
F_checkTime = False
time.sleep(5)
I want to do the same thing using python/flask to handle multiple devices. I am new to flask, can you please give me any guidelines, how can I ensure this functionality in the flask, what technology should I use?
I'll propose a simple solution that avoids the need to run any timers at the cost of delaying the write to file for a second or so (whether this is an issue or not depends upon your requirements).
Data from the devices can be stored in a structure that looks something like (a very rough example!):
from datetime import datetime
dev_info = {
'sensor1': {'last_value': .310, 'last_receive': datetime(2021, 8, 28, 12, 8, 1, 1234), 'last_write': datetime(2021, 8, 28, 12, 8, 0, 541)},
'sensor2': {'last_value': 5.2, 'last_receive': datetime(2021, 8, 28, 12, 7, 59, 1234), 'last_write': datetime(2021, 8, 28, 12, 7, 58, 921)}
}
Every time a new sample is received (you can probably use a single subscription for this and determine which device the message is from by checking the message topic in on_message):
Retrieve the last_write time for the device from the structure
If its more than the desired interval old then write out the last_value to your CSV (using the timestamp last_write + interval) and update last_write (a bit of logic needed here; consider what happens if no info is received for a minute).
Update the info for the device in the structure (last_value / last_receive).
As I mentioned earlier the disadvantage of this is that the value is only written out after you receive a new value outside of the desired time window; however for many use-cases this is fine and will be considerably simpler than using timers. If you need more frequent writes then you could periodically scan for old data in the structure and write it out.
There are a few other factors you may want to consider:
MQTT does not guarantee real-time delivery (particularly at QOS 1+).
The comms to IoT units can often be spotty so using QOS1+ (and clean_session=False) is worth considering.
Based on the above you may want to consider embedding timestamps in the messages (but this does lead to a need to keep the remote device clocks synchronised).
Storage is cheap - are you sure there is no benefit to storing all data received and then downsampling later?
i think to properly answer your question more context is requireed. but simply put can your device make http request? if thats possible then you can create a flask web app to receive http calls and store the information. also i see you mentioned csv, which is not a good way of storing data as relying on read/write on files is not a good practice. i would recommend to use a proper database (eg. mysql etc.) to store information in a transactional manner.
Related
I have an use case which consists of loading huge tables from Oracle to Snowflake.
The Oracle server sits far away from Snowflake endpoint, so we do have connection issues when loading tables (views in fact) bigger than 12 GB by spool script or cx_oracle.
I was thinking of using ThreadPoolExecutor with 4 threads max., to test, and use SessionPool. With this, I get a connection per thread, that's the whole point. So, this means I would have to distribute the data fetch by batches for each thread.
My question is: how can I achieve this? Is it correct to do something like:
"Select * from table where rownum between x and y" (not this syntax, I know...but you get my point), should I rely on OFFSET, ...?
My idea was that each thread gets a "slice" of select , fetches data by batches and writerows to csv in batches as well, because I'll rather have small files then a huge file, to send to snowflake.
def query(start_off, pool):
start_conn = datetime.now()
con = pool.acquire()
end_conn = datetime.now()
print(f"Conn/Acquire time: {end_conn-start_conn}")
with con.cursor() as cur:
start_exec_ts = datetime.now()
cur.execute(QUERY, start_pos=start_off, end_pos=start_off+(OFFSET-1))
end_exec_ts = datetime.now()
rows = cur.fetchall()
end_fetch_ts = datetime.now()
total_exec_ts = end_exec_ts-start_exec_ts
total_fetch_ts = end_fetch_ts-end_exec_ts
print(f"Exec time : {total_exec_ts}")
print(f"Fetch time : {total_fetch_ts}")
print(f"Task executed {threading.current_thread().getName()}, {threading.get_ident()}")
return rows
def main():
pool = cx_Oracle.SessionPool(c.oracle_conn['oracle']['username'],
c.oracle_conn['oracle']['password'],
c.oracle_conn['oracle']['dsn'],
min=2, max=4, increment=1,
threaded=True,
getmode=cx_Oracle.SPOOL_ATTRVAL_WAIT
)
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(query, d, pool) for d in range(1,13,OFFSET)]
for future in as_completed(futures):
# process your records from each thread
print(repr(future.result()))
# process_records(future.result())
if __name__ == '__main__':
main()
Also, using fetchMany in query funcion , how could I send back the results so I can process them each time?
if you want to transfer the data by the python script
you can create a producer -> queue -> consumer workflow to do this
and the consumer rely on the ID of the data
producer
the producer fetch the IDs of the data
put "a slice of IDs" to the queue as a job
consumer
the consumers fetch the job from the queue
fetch the data with the IDs (e.g. "select * from table where id in ...")
save the data to somewhere
example
a quick example for such concept
import time
import threading
import queue
from dataclasses import dataclass
from concurrent.futures import ThreadPoolExecutor
#dataclass
class Job:
ids: list
jobs = queue.Queue()
executor = ThreadPoolExecutor(max_workers=4)
fake_data = [i for i in range(0, 200)]
def consumer():
try:
job = jobs.get_nowait()
# select * from table where id in job.ids
# save the data
time.sleep(5)
print(f"done {job}")
except Exception as exc:
print(exc)
def fake_select(limit, offset):
if offset >= len(fake_data):
return None
return fake_data[offset:(offset+limit)]
def producer():
limit = 10
offset = 0
stop_fetch = False
while not stop_fetch:
# select id from table limit l offset o
ids = fake_select(limit, offset)
if ids is None:
stop_fetch = True
else:
job = Job(ids=ids)
jobs.put(job)
print(f"put {job}")
offset += limit
executor.submit(consumer)
time.sleep(0.2)
def main():
th = threading.Thread(target=producer)
th.start()
th.join()
while not jobs.empty():
time.sleep(1)
executor.shutdown(wait=True)
print("all jobs done")
if __name__ == "__main__":
main()
besides,
if you want to do more operation after consumer fetching the data
you can do this in the consumer flow
or add another queue and consumer to do extra operations
the workflow will become like this
producer -> queue -> fetch and save data consumers -> queue -> consumer to do some extra operation
I have a socket connection which I want to monitor, it recevies market data with high burst.
while 1:
socket.recv()
print('data recevied')
The while loop should only execute print, once in sixty seconds.
Try this:
from datetime import datetime
last = datetime.now()
while 1:
socket.recv()
if (datetime.now() - last).seconds >= 60:
print("data received")
last = datetime.now()
You want some kind of asynchronous processing here: on one hand you want to continuously receive data, on the other hand you want to display a message every 60 seconds.
So threading would be my first idea: foreground display messages while background receives.
def recv_loop(socket, end_cond):
while True:
socket.recv(1024)
# assuming something ends above loop
end_cond[0] = True
end = False
recv_thr = threading.Thread(target = recv_loop, args = (socket,[end]), daemon = True)
recv_thr.start()
while not end:
time.sleep(60)
print('data received')
You haven't show any interesting data in the message, so I haven't either. But as all global variables are shared, it would be trivial to display for example the number of bytes received since last message
Alternatively, you could use select.select because it gives you a timeout. So you change threading for a more complex timeout processing.
last = datetime.datetime.now()
while True:
timeout = (datetime.datetime.now() - last).seconds + 60
if timeout <= 0:
last = datetime.datetime.now()
print("data received")
rl, _, _ = select.select([socket], [], [], timeout)
if (len(rl) > 0):
sockect.recv(1024)
I am using paho-mqtt in django to recieve messages. Everything works fine. But the on_message() function is executed twice.
I tried Debugging, but it seems like the function is called once, but the database insertion is happening twice, the printing of message is happening twice, everything within the on_message() function is happening twice, and my data is inserted twice for each publish.
I doubted it is happening in a parallel thread, and installed a celery redis backend to queue the insertion and avoid duplicate insertions. but still the data is being inserted twice.
I also tried locking the variables, to avoid problems in parallel threading, but still the data is inserted twice.
I am using Postgres DB
How do I solve this issue? I want the on_message() function to execute only once for each publish
my init.py
from . import mqtt
mqtt.client.loop_start()
my mqtt.py
import ast
import json
import paho.mqtt.client as mqtt
# Broker CONNACK response
from datetime import datetime
from raven.utils import logger
from kctsmarttransport import settings
def on_connect(client, userdata, flags, rc):
# Subcribing to topic and recoonect for
client.subscribe("data/gpsdata/server/#")
print 'subscribed to data/gpsdata/server/#'
# Receive message
def on_message(client, userdata, msg):
# from kctsmarttransport.celery import bus_position_insert_task
# bus_position_insert_task.delay(msg.payload)
from Transport.models import BusPosition
from Transport.models import Student, SpeedWarningLog, Bus
from Transport.models import Location
from Transport.models import IdleTimeLog
from pytz import timezone
try:
dumpData = json.dumps(msg.payload)
rawGpsData = json.loads(dumpData)
jsonGps = ast.literal_eval(rawGpsData)
bus = Bus.objects.get(bus_no=jsonGps['Busno'])
student = None
stop = None
if jsonGps['card'] is not False:
try:
student = Student.objects.get(rfid_value=jsonGps['UID'])
except Student.DoesNotExist:
student = None
if 'stop_id' in jsonGps:
stop = Location.objects.get(pk=jsonGps['stop_id'])
dates = datetime.strptime(jsonGps['Date&Time'], '%Y-%m-%d %H:%M:%S')
tz = timezone('Asia/Kolkata')
dates = tz.localize(dates)
lat = float(jsonGps['Latitude'])
lng = float(jsonGps['Longitude'])
speed = float(jsonGps['speed'])
# print msg.topic + " " + str(msg.payload)
busPosition = BusPosition.objects.filter(bus=bus, created_at=dates,
lat=lat,
lng=lng,
speed=speed,
geofence=stop,
student=student)
if busPosition.count() == 0:
busPosition = BusPosition.objects.create(bus=bus, created_at=dates,
lat=lat,
lng=lng,
speed=speed,
geofence=stop,
student=student)
if speed > 60:
SpeedWarningLog.objects.create(bus=busPosition.bus, speed=busPosition.speed,
lat=lat, lng=lng, created_at=dates)
sendSMS(settings.TRANSPORT_OFFICER_NUMBER, jsonGps['Busno'], jsonGps['speed'])
if speed <= 2:
try:
old_entry_query = IdleTimeLog.objects.filter(bus=bus, done=False).order_by('idle_start_time')
if old_entry_query.count() > 0:
old_entry = old_entry_query.reverse()[0]
old_entry.idle_end_time = dates
old_entry.save()
else:
new_entry = IdleTimeLog.objects.create(bus=bus, idle_start_time=dates, lat=lat, lng=lng)
except IdleTimeLog.DoesNotExist:
new_entry = IdleTimeLog.objects.create(bus=bus, idle_start_time=dates, lat=lat, lng=lng)
else:
try:
old_entry_query = IdleTimeLog.objects.filter(bus=bus, done=False).order_by('idle_start_time')
if old_entry_query.count() > 0:
old_entry = old_entry_query.reverse()[0]
old_entry.idle_end_time = dates
old_entry.done = True
old_entry.save()
except IdleTimeLog.DoesNotExist:
pass
except Exception, e:
logger.error(e.message, exc_info=True)
client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message
client.connect("10.1.75.106", 1883, 60)
As some one mentioned in the comments run your server using --noreload
eg: python manage.py runserver --noreload
(posted here for better visibility.)
I had the same problem!
Try using:
def on_disconnect(client, userdata, rc):
client.loop_stop(force=False)
if rc != 0:
print("Unexpected disconnection.")
else:
print("Disconnected")
I was using python to publish and subscribe message-queueing
publisher:
rc = redis.Redis(host='127.0.0.1', port=6379)
rc.ping()
ps = rc.pubsub()
ps.subscribe('bdwaf')
r_str = "--8198b507-A--"
for i in range(0, 20000):
rc.publish('bdwaf', r_str)
subscriber:
rc = redis.Redis(host='localhost', port=6379)
rc.ping()
ps = rc.pubsub()
ps.subscribe('bdwaf')
num = 0
while True:
item = ps.get_message()
if item:
num += 1
if item['type'] == 'message':
a.parser(item['data'])
print num
when the publisher loop range is higher than 20000, the subscriber seems to not get all datas, only when i add a sleep method to the publisher, it can work.
how can I make it work without adding a sleep method to the publisher, and no matter what is the range of the publisher to publish data, the subscriber can get all datas?
You can persist the messages in a distributed task queue. commonly used with redis is a distributed task queue written in python called celery (http://www.celeryproject.org/)
I am working on creating a HTTP client which can generate hundreds of connections each second and send up to 10 requests on each of those connections. I am using threading so concurrency can be achieved.
Here is my code:
def generate_req(reqSession):
requestCounter = 0
while requestCounter < requestRate:
try:
response1 = reqSession.get('http://20.20.1.2/tempurl.html')
if response1.status_code == 200:
client_notify('r')
except(exceptions.ConnectionError, exceptions.HTTPError, exceptions.Timeout) as Err:
client_notify('F')
break
requestCounter += 1
def main():
for q in range(connectionPerSec):
s1 = requests.session()
t1 = threading.Thread(target=generate_req, args=(s1,))
t1.start()
Issues:
It is not scaling above 200 connections/sec with requestRate = 1. I ran other available HTTP clients on the same client machine and against the server, test runs fine and it is able to scale.
When requestRate = 10, connections/sec drops to 30.
Reason: Not able to create targeted number of threads every second.
For issue #2, client machine is not able to create enough request sessions and start new threads. As soon as requestRate is set to more than 1, things start to fall apart.
I am suspecting it has something to do with HTTP connection pooling which requests uses.
Please suggest what am I doing wrong here.
I wasn't able to get things to fall apart, however the following code has some new features:
1) extended logging, including specific per-thread information
2) all threads join()ed at the end to make sure the parent process doesntt leave them hanging
3) multithreaded print tends to interleave the messages, which can be unwieldy. This version uses yield so a future version can accept the messages and print them clearly.
source
import exceptions, requests, threading, time
requestRate = 1
connectionPerSec = 2
def client_notify(msg):
return time.time(), threading.current_thread().name, msg
def generate_req(reqSession):
requestCounter = 0
while requestCounter < requestRate:
try:
response1 = reqSession.get('http://127.0.0.1/')
if response1.status_code == 200:
print client_notify('r')
except (exceptions.ConnectionError, exceptions.HTTPError, exceptions.Timeout):
print client_notify('F')
break
requestCounter += 1
def main():
for cnum in range(connectionPerSec):
s1 = requests.session()
th = threading.Thread(
target=generate_req, args=(s1,),
name='thread-{:03d}'.format(cnum),
)
th.start()
for th in threading.enumerate():
if th != threading.current_thread():
th.join()
if __name__=='__main__':
main()
output
(1407275951.954147, 'thread-000', 'r')
(1407275951.95479, 'thread-001', 'r')