Generating a nested dictionary in Python through iterations - python

I'm new in Python and I have to retrieve datas from a txt file (which I have already did) and then I need to make a nested dictionary like this:
new_dict = {"2009-10-16": {"KitchenSensor":"active for x minutes today",
"BathroomSensor":"active for y minutes today"...}
"2009-10-24":{"KitchenSensor":"active for x minutes today",
"BathroomSensor":"active for y minutes today"...}
"2009-11-13":{"KitchenSensor":"active for x minutes today",
"BathroomSensor":"active for y minutes today"...}}
my code looks like this
namesFile = open("data.txt", "r")
listaDati = namesFile.readlines()
listaDivisa = []
for i in listaDati:
if i[27] != "T":
listaDivisa.append(
i.split())
and the datas in my txt file have this format:
2009-10-16 00:01:04.000059 KITCHENSENSOR ON
2009-10-16 02:33:12.000093 KITCHENSENSOR OFF
2009-10-24 21:25:52.000023 BATHROOMSENSOR ON
2009-10-24 23:13:52.000014 BATHROOMSENSOR OFF
2009-11-13 09:03:23.000053 BATHROOMSENSOR ON
2009-11-13 12:13:42.000014 BATHROOMSENSOR OFF
the timestamp changes every now and then so I want to create a new key with the new timestamp everytime I meet a new one and saving the infos I have to save. I was trying doing this with an enumerative for loop but I don't understand how I can create the dictionary.
Thank you!

You're maybe looking for something like this; I separated the task into
parsing the input lines (could be from a file, but here they're just a list) into events (3-tuples of datetime, sensor name and state)
grouping the events by date, and looking at the state changes.
import datetime
from itertools import groupby
def parse_line(line):
# Split the line at the two spaces.
time_string, event = line.split(" ", 1)
# Split the rest of the line at one space.
sensor, event = event.split(" ", 1)
# Parse the time string to a real datetime object.
t = datetime.datetime.strptime(time_string, "%Y-%m-%d %H:%M:%S.%f")
return (t, sensor, event == "ON")
def collate_sorted_events(sorted_events):
zero_delta = datetime.timedelta(0)
for day, events in groupby(
sorted_events, lambda event_triple: event_triple[0].date()
):
# We're assuming all sensors start off each day.
turn_on_times = {}
durations = {}
for time, sensor, state in events:
if state: # Turning on?
# If it was on already, that's not an issue; we just consider that a glitch.
if sensor not in turn_on_times:
turn_on_times[sensor] = time
else:
if sensor not in turn_on_times:
raise ValueError("Sensor was turned off before it was turned on.")
this_duration = time - turn_on_times[sensor]
durations[sensor] = durations.get(sensor, zero_delta) + this_duration
del turn_on_times[sensor]
yield (day, durations)
if turn_on_times:
# This check could be removed, but for now it's a good sanity check...
raise ValueError(
"Some sensors were left on at the end of the day; this could be a problem"
)
listaDati = [
"2009-10-16 00:01:04.000059 KITCHENSENSOR ON",
"2009-10-16 02:33:12.000093 KITCHENSENSOR OFF",
"2009-10-24 21:25:52.000023 BATHROOMSENSOR ON",
"2009-10-24 23:13:52.000014 BATHROOMSENSOR OFF",
"2009-11-13 09:03:23.000053 BATHROOMSENSOR ON",
"2009-11-13 12:13:42.000014 BATHROOMSENSOR OFF",
]
# Parse and sort input lines. It's imperative that the events are sorted
# so the rest of the code works as it should.
sorted_events = sorted(parse_line(i) for i in listaDati)
# Collate events by day; the function yields day/durations tuples,
# and `dict` accepts that format to create a dict with.
output = dict(collate_sorted_events(sorted_events))
print(output)
for date, deltas in sorted(output.items()):
for sensor, delta in sorted(deltas.items()):
print(f"{date} {sensor} {delta.total_seconds() / 60:.2f} minutes")
The output is
{
datetime.date(2009, 10, 16): {'KITCHENSENSOR': datetime.timedelta(seconds=9128, microseconds=34)},
datetime.date(2009, 10, 24): {'BATHROOMSENSOR': datetime.timedelta(seconds=6479, microseconds=999991)},
datetime.date(2009, 11, 13): {'BATHROOMSENSOR': datetime.timedelta(seconds=11418, microseconds=999961)},
}
followed by the formatted
2009-10-16 KITCHENSENSOR 152.13 minutes
2009-10-24 BATHROOMSENSOR 108.00 minutes
2009-11-13 BATHROOMSENSOR 190.32 minutes

Related

Check if the number of slots is > 0 before picking a date and an hour?

I am building a vaccination appointment program that automatically assigns a slot to the user.
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date in the CSV table we just created:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I need help with making the program check if the random hour it chose on that date has vacant slots. I can manage the while loops that are needed if the number of vacant slots is 0. And no, I have not tried much because I have no clue of how to do this.
P.S. If you're going to try running the code, please remember to change the save and read location.
Here is how I would do it. I've also cleaned it up a bit.
import random
import pandas as pd
start_date, end_date = '1/1/2022', '31/12/2022'
hours = [f'{hour}:00' for hour in range(8, 18)]
df = pd.DataFrame(
data=pd.date_range(start_date, end_date),
columns=['Date/Time']
)
for hour in hours:
df[hour] = 100
# 1000 simulations
for _ in range(1000):
random_date, random_hour = random.randrange(365), random.choice(hours)
# Check if slot has vacant slot
if df.at[random_date, random_hour] > 0:
df.at[random_date, random_hour] -= 1
else:
# Pass here, but you can add whatever logic you want
# for instance you could give it the next free slot in the same day
pass
print(df.describe())
import pandas
import random
from random import randrange
# randrange randomly picks an index for date and time for the user
random_date = randrange(365)
# random_hour = randrange(10) #consider removing this line since it's not used
lista = [# consider avoid using Python preserved names
"8:00",
"9:00",
"10:00",
"11:00",
"12:00",
"13:00",
"14:00",
"15:00",
"16:00",
"17:00",
]
hour = random.choice(lista)
df = pandas.read_csv("new.csv")
date = df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
if df.loc[random_date, hour] > 0:#here is what you asked for
df.loc[random_date, hour] -= 1
else:
print(f"No Vacant Slots in {random_date}, {hour}")
df.to_csv(r"new.csv", index=False)
print(date)
print(hour)
Here's another alternative. I'm not sure you really need the very large and slow-to-load pandas module for this. This does it with plan Python structures. I tried to run the simulation until it got a failure, but with 365,000 open slots, and flushing the database to disk each time, it takes too long. I changed the 100 to 8, just to see it find a dup in reasonable time.
import csv
import datetime
import random
def create():
start = datetime.date( 2022, 1, 1 )
oneday = datetime.timedelta(days=1)
headers = ["date"] + [f"{i}:00" for i in range(8,18)]
data = []
for _ in range(365):
data.append( [start.strftime("%Y-%m-%d")] + [8]*10 ) # not 100
start += oneday
write( headers, data )
def write(headers, rows):
fcsv = csv.writer(open('data.csv','w',newline=''))
fcsv.writerow( headers )
fcsv.writerows( rows )
def read():
days = []
headers = []
for row in csv.reader(open('data.csv')):
if not headers:
headers = row
else:
days.append( [row[0]] + list(map(int,row[1:])))
return headers, days
def choose( headers, days ):
random_date = random.randrange(365)
random_hour = random.randrange(len(headers)-1)+1
choice = days[random_date][0] + " " + headers[random_hour]
print( "Chose", choice )
if days[random_date][random_hour]:
days[random_date][random_hour] -= 1
write(headers,days)
return choice
else:
print("Randomly chosen slot is full.")
return None
create()
data = read()
while choose( *data ):
pass

Python/Discord.py remove milliseconds from time.time()

I did this for my python Discord bot (basically it's a voice activity tracker), everything works fine but I want to remove the milliseconds from total_time. I would like to get something in this format '%H:%M:%S'
Is this possible ?
Here's a part of the code:
if(before.channel == None):
join_time = round(time.time())
userdata["join_time"] = join_time
elif(after.channel == None):
if(userdata["join_time"] == None): return
userdata = voice_data[guild_id][new_user]
leave_time = time.time()
passed_time = leave_time - userdata["join_time"]
userdata["total_time"] += passed_time
userdata["join_time"] = None
And here's the output:
{
"total_time": 7.4658853358879,
}
You can use a datetime.timedelta object, with some caveats.
>>> import datetime as dt
>>> data = {"total_time": 7.4658853358879}
>>> data["total_time"] = str(dt.timedelta(seconds=int(data["total_time"])))
>>> data
{'total_time': '0:00:07'}
If your time is greater than 1 day, or less than zero, the format starts including days
>>> str(dt.timedelta(days=1))
'1 day, 0:00:00'
>>> str(dt.timedelta(seconds=-1))
'-1 day, 23:59:59'
>>>

How can I speed up a python loop with a timestamp interval condition

I have this code that is rather done in a hurry but it works in general. The only thing it runs forever. The idea is to update 2 columns on a table that is holding 1495748 rows, so the number of the list of timestamp being queried in first place. For each update value there has to be done a comparison in which the timestamp has to be in an hourly interval that is formed by two timestamps coming from the api in two different dicts. Is there a way to speed up things a little or maybe multiprocess it?
Hint: db_mac = db_connection to a Postgres database.
the response looks like this:
{'meta': {'source': 'National Oceanic and Atmospheric Administration, Deutscher Wetterdienst'}, 'data': [{'time': '2019-11-26 23:00:00', 'time_local': '2019-11-27 00:00', 'temperature': 8.3, 'dewpoint': 5.9, 'humidity': 85, 'precipitation': 0, 'precipitation_3': None, 'precipitation_6': None, 'snowdepth': None, 'windspeed': 11, 'peakgust': 21, 'winddirection': 160, 'pressure': 1004.2, 'condition': 4}, {'time': '2019-11-27 00:00:00', ....
import requests
import db_mac
from collections import defaultdict
import datetime
import time
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=<APIKEY>".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for row in result:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE the code for multiprocessing. I'll let it run the night and give an update of the running time.
import requests
import db_mac
from collections import defaultdict
import datetime
import time
import multiprocessing as mp
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=wzwi2YR5".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
def find_parameters(x):
for row in result[x[0]:x[1]]:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
step1 = int(len(result) /4)
step2 = 2 * step1
step3 = 3 * step1
step4 = len(result)
steps = [[0,step1],[step1,step2],[step2,step3], [step3,step4]]
pool = mp.Pool(mp.cpu_count())
pool.map(find_parameters, steps)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE 2
It finished and ran for 2:45 hours in 4 cores on a raspberry pi. Though is there a more efficient way to do such things?
So theres a few minor things I can think of to speed this up a little. I figure anything little bit helps especially if you have a lot of rows to process. For starters, print statements can slow down your code a lot. I'd get rid of those if they are unneeded.
Most importantly, you are calling the api in every iteration of the loop. Waiting for a response from the API is probably taking up the bulk of your time. I looked a bit at the api you are using, but don't know the exact case you're using it for or what your dates "start" and "end" look like, but if you could do it in less calls that would surely speed up this loop by a lot. Another way you can do this is, it looks like the api has a .csv version of the data you can download and use. Running this on local data would be way faster. If you choose to go this route i'd suggest using pandas. (Sorry if you already know pandas and i'm over explaining) You can use: df = pd.read_csv("filename.csv") and edit the table from there easily. You can also do df.to_sql(params) to write to your data base. Let me know if you want help forming a pandas version of this code.
Also, not sure from your code if this would cause an error, but I would try, instead of your for loop (for i in weather["data"]).
hist_weather_list += weather["data"]
or possibly
hist_weather_list += [weather["data"]
Let me know how it goes!

Python 2.7 - manipulate some data from a CSV file

First of all I wanna emphasize that I'm a total beginner at python, the below code I made to manipulate some data from a CSV. I know that it's not the prettiest code and probably I could have made it more elegant, but it works, until a certain point and that's the reason I opened this question
import csv
from numpy import interp
from operator import sub
import math
import pandas as pd
from Tkinter import *
import Tkinter as tk
import tkFileDialog as filedialog
root = Tk()
root.withdraw()
filename= filedialog.askopenfilename( initialdir="C:/", title="select file", filetypes=(("CSV files", "*.CSV"), ("all files", "*.*")))
id_uri = []
ore = []
minute = []
zile = []
activi = []
listx = []
listsa = []
list_ore = []
listspi = []
listspf = []
list_min = []
zile_luna = 0
test = []
nume = []
with open (filename) as p, open ('activi.csv') as a:
reader = csv.reader(p,delimiter=',')
for row in reader:
id_uri.append(row[0])
ore.append(row[1])
minute.append(row[2])
zile.append(row[3])
reader = csv.reader(a)
for row in reader:
activi.append(row[0])
nume.append(row[1])
id_uri = map(int, id_uri)
ore = map(float, ore)
minute = map(float, minute)
minute = interp(minute,[0,60],[0,100])
ore = ore + minute/100
zile = map(int, zile)
activi = map(int, activi)
zile_luna = len(set(zile))+1
mimin = 0
maxim = 0
def pontaj():
global listx
global listsa
global listspi
global listspf
global list_ore
global list_min
global maxim
global minim
for x in range(3):
for y in range(len(id_uri)):
if zile[y] == z:
if activi[x] == id_uri[y]:
listx.append(ore[y])
minim = min(listx)
maxim = max(listx)
listsa.append(maxim-minim)
listx = []
listspi = [int(i) for i in listsa]
listspf = [i%1 for i in listsa]
for i in range(len(listspf)):
listspf[i] = round(listspf[i], 2)
listspf[i] = listspf[i]*100
listspf[i] = interp(listspf[i],[0,100],[0,60])
listspf[i] = int(listspf[i])
list_ore.append(listspi)
list_min.append(listspf)
listsa = []
for z in range(1,zile_luna):
pontaj()
for sublst in list_ore:
for item in range(len(sublst)):
sublst[item] = str(sublst[item])
for sublst in list_min:
for item in range(len(sublst)):
sublst[item] = str(sublst[item])
for i in range(len(list_ore)):
for j in range(len(list_ore[i])):
list_ore[i][j] = ' '.join(i + ':' + j for i,j in zip(list_ore[i][j],list_min[i][j]))
df = pd.DataFrame(list_ore)
df = df.T
nume = pd.Series(nume)
df['e'] = nume.values
df.to_csv('pontaj.csv', index = False, header = False)
print df
and the CSV file I read all the info from looks like this(employee code, hour, minute, day):
23,5,00,1
23,6,00,1
24,7,00,1
25,8,00,1
24,9,00,1
25,11,00,1
24,7,00,2
25,8,00,2
24,9,00,2
25,11,00,2
23,5,00,4
23,6,00,4
24,7,00,4
25,8,00,4
24,9,00,4
25,11,00,4
I have another CSV file that has employee code folowed by employee name like this:
23,aqwe
24,beww
25,cwww
Basically it's an attendance logger, it compares info from one CSV to another, finds the min and max hours in a certain day, subtracts min from max and writes this info in a list that is written to another csv.
Thing is, if all employees attend a certain day, all goes well, it calculates the attendance hours, puts them in the csv, all good. But what will happen if an employee skips one day? well as I found out, it ruins the calculation, because the code requires that all data must be consistent and in a perfect order.
The data written to the CSV file must finally look like this:
day1 day2 day3
hours hours hours employee_a
hours hours hours employee_b
hours hours hours employee_c
But if one skips a day, the hours get scrambled.
I've tried some different approaches but none worked, and I realize the problem is due to my simple way of thinking, but as I said, I only started with python a few days ago.
Do you have any suggestions on how I could improve the code to take the missed day of a certain employee in consideration and generate the data like so:
day1 day2 day3
1:20 2:30 3:40 employee_a
1:20 2:30 3:40 employee_b
0:0 2:30 3:40 employee_c
Any advice would be appreciated, thanks!

returning different time frames from datetime

I am parsing a file this way :
for d in csvReader:
print datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f").date()
date() returns : 2000-01-08, which is correct
time() returns : 06:20:00, which is also correct
How would I go about returning informations like "date+time" or "date+hours+minutes"
EDIT
Sorry I should have been more precise, here is what I am trying to achieve :
lmb = lambda d: datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f").date()
daily_quotes = {}
for k, g in itertools.groupby(csvReader, key = lmb):
lowBids = []
highBids = []
openBids = []
closeBids = []
for i in g:
lowBids.append(float(i["Low Bid"]))
highBids.append(float(i["High Bid"]))
openBids.append(float(i["Open Bid"]))
closeBids.append(float(i["Close Bid"]))
dayMin = min(lowBids)
dayMax = max(highBids)
open = openBids[0]
close = closeBids[-1]
daily_quotes[k.strftime("%Y-%m-%d")] = [dayMin,dayMax,open,close]
As you can see, right now I'm grouping values by day, I would like to group them by hour ( for which I would need date + hour ) or minutes ( date + hour + minute )
thanks in advance !
Don't use the date method of the datetime object you're getting from strptime. Instead, apply strftime directly to the return from strptime, which gets you access to all the member fields, including year, month, day, hour, minute, seconds, etc...
d = {"Date": "01-Jan-2000", "Time": "01:02:03.456"}
dt = datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f")
print dt.strftime("%Y-%m-%d-%H-%M-%S")

Categories

Resources