I am parsing a file this way :
for d in csvReader:
print datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f").date()
date() returns : 2000-01-08, which is correct
time() returns : 06:20:00, which is also correct
How would I go about returning informations like "date+time" or "date+hours+minutes"
EDIT
Sorry I should have been more precise, here is what I am trying to achieve :
lmb = lambda d: datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f").date()
daily_quotes = {}
for k, g in itertools.groupby(csvReader, key = lmb):
lowBids = []
highBids = []
openBids = []
closeBids = []
for i in g:
lowBids.append(float(i["Low Bid"]))
highBids.append(float(i["High Bid"]))
openBids.append(float(i["Open Bid"]))
closeBids.append(float(i["Close Bid"]))
dayMin = min(lowBids)
dayMax = max(highBids)
open = openBids[0]
close = closeBids[-1]
daily_quotes[k.strftime("%Y-%m-%d")] = [dayMin,dayMax,open,close]
As you can see, right now I'm grouping values by day, I would like to group them by hour ( for which I would need date + hour ) or minutes ( date + hour + minute )
thanks in advance !
Don't use the date method of the datetime object you're getting from strptime. Instead, apply strftime directly to the return from strptime, which gets you access to all the member fields, including year, month, day, hour, minute, seconds, etc...
d = {"Date": "01-Jan-2000", "Time": "01:02:03.456"}
dt = datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f")
print dt.strftime("%Y-%m-%d-%H-%M-%S")
Related
I'm new in Python and I have to retrieve datas from a txt file (which I have already did) and then I need to make a nested dictionary like this:
new_dict = {"2009-10-16": {"KitchenSensor":"active for x minutes today",
"BathroomSensor":"active for y minutes today"...}
"2009-10-24":{"KitchenSensor":"active for x minutes today",
"BathroomSensor":"active for y minutes today"...}
"2009-11-13":{"KitchenSensor":"active for x minutes today",
"BathroomSensor":"active for y minutes today"...}}
my code looks like this
namesFile = open("data.txt", "r")
listaDati = namesFile.readlines()
listaDivisa = []
for i in listaDati:
if i[27] != "T":
listaDivisa.append(
i.split())
and the datas in my txt file have this format:
2009-10-16 00:01:04.000059 KITCHENSENSOR ON
2009-10-16 02:33:12.000093 KITCHENSENSOR OFF
2009-10-24 21:25:52.000023 BATHROOMSENSOR ON
2009-10-24 23:13:52.000014 BATHROOMSENSOR OFF
2009-11-13 09:03:23.000053 BATHROOMSENSOR ON
2009-11-13 12:13:42.000014 BATHROOMSENSOR OFF
the timestamp changes every now and then so I want to create a new key with the new timestamp everytime I meet a new one and saving the infos I have to save. I was trying doing this with an enumerative for loop but I don't understand how I can create the dictionary.
Thank you!
You're maybe looking for something like this; I separated the task into
parsing the input lines (could be from a file, but here they're just a list) into events (3-tuples of datetime, sensor name and state)
grouping the events by date, and looking at the state changes.
import datetime
from itertools import groupby
def parse_line(line):
# Split the line at the two spaces.
time_string, event = line.split(" ", 1)
# Split the rest of the line at one space.
sensor, event = event.split(" ", 1)
# Parse the time string to a real datetime object.
t = datetime.datetime.strptime(time_string, "%Y-%m-%d %H:%M:%S.%f")
return (t, sensor, event == "ON")
def collate_sorted_events(sorted_events):
zero_delta = datetime.timedelta(0)
for day, events in groupby(
sorted_events, lambda event_triple: event_triple[0].date()
):
# We're assuming all sensors start off each day.
turn_on_times = {}
durations = {}
for time, sensor, state in events:
if state: # Turning on?
# If it was on already, that's not an issue; we just consider that a glitch.
if sensor not in turn_on_times:
turn_on_times[sensor] = time
else:
if sensor not in turn_on_times:
raise ValueError("Sensor was turned off before it was turned on.")
this_duration = time - turn_on_times[sensor]
durations[sensor] = durations.get(sensor, zero_delta) + this_duration
del turn_on_times[sensor]
yield (day, durations)
if turn_on_times:
# This check could be removed, but for now it's a good sanity check...
raise ValueError(
"Some sensors were left on at the end of the day; this could be a problem"
)
listaDati = [
"2009-10-16 00:01:04.000059 KITCHENSENSOR ON",
"2009-10-16 02:33:12.000093 KITCHENSENSOR OFF",
"2009-10-24 21:25:52.000023 BATHROOMSENSOR ON",
"2009-10-24 23:13:52.000014 BATHROOMSENSOR OFF",
"2009-11-13 09:03:23.000053 BATHROOMSENSOR ON",
"2009-11-13 12:13:42.000014 BATHROOMSENSOR OFF",
]
# Parse and sort input lines. It's imperative that the events are sorted
# so the rest of the code works as it should.
sorted_events = sorted(parse_line(i) for i in listaDati)
# Collate events by day; the function yields day/durations tuples,
# and `dict` accepts that format to create a dict with.
output = dict(collate_sorted_events(sorted_events))
print(output)
for date, deltas in sorted(output.items()):
for sensor, delta in sorted(deltas.items()):
print(f"{date} {sensor} {delta.total_seconds() / 60:.2f} minutes")
The output is
{
datetime.date(2009, 10, 16): {'KITCHENSENSOR': datetime.timedelta(seconds=9128, microseconds=34)},
datetime.date(2009, 10, 24): {'BATHROOMSENSOR': datetime.timedelta(seconds=6479, microseconds=999991)},
datetime.date(2009, 11, 13): {'BATHROOMSENSOR': datetime.timedelta(seconds=11418, microseconds=999961)},
}
followed by the formatted
2009-10-16 KITCHENSENSOR 152.13 minutes
2009-10-24 BATHROOMSENSOR 108.00 minutes
2009-11-13 BATHROOMSENSOR 190.32 minutes
I am building a vaccination appointment program that automatically assigns a slot to the user.
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date in the CSV table we just created:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I need help with making the program check if the random hour it chose on that date has vacant slots. I can manage the while loops that are needed if the number of vacant slots is 0. And no, I have not tried much because I have no clue of how to do this.
P.S. If you're going to try running the code, please remember to change the save and read location.
Here is how I would do it. I've also cleaned it up a bit.
import random
import pandas as pd
start_date, end_date = '1/1/2022', '31/12/2022'
hours = [f'{hour}:00' for hour in range(8, 18)]
df = pd.DataFrame(
data=pd.date_range(start_date, end_date),
columns=['Date/Time']
)
for hour in hours:
df[hour] = 100
# 1000 simulations
for _ in range(1000):
random_date, random_hour = random.randrange(365), random.choice(hours)
# Check if slot has vacant slot
if df.at[random_date, random_hour] > 0:
df.at[random_date, random_hour] -= 1
else:
# Pass here, but you can add whatever logic you want
# for instance you could give it the next free slot in the same day
pass
print(df.describe())
import pandas
import random
from random import randrange
# randrange randomly picks an index for date and time for the user
random_date = randrange(365)
# random_hour = randrange(10) #consider removing this line since it's not used
lista = [# consider avoid using Python preserved names
"8:00",
"9:00",
"10:00",
"11:00",
"12:00",
"13:00",
"14:00",
"15:00",
"16:00",
"17:00",
]
hour = random.choice(lista)
df = pandas.read_csv("new.csv")
date = df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
if df.loc[random_date, hour] > 0:#here is what you asked for
df.loc[random_date, hour] -= 1
else:
print(f"No Vacant Slots in {random_date}, {hour}")
df.to_csv(r"new.csv", index=False)
print(date)
print(hour)
Here's another alternative. I'm not sure you really need the very large and slow-to-load pandas module for this. This does it with plan Python structures. I tried to run the simulation until it got a failure, but with 365,000 open slots, and flushing the database to disk each time, it takes too long. I changed the 100 to 8, just to see it find a dup in reasonable time.
import csv
import datetime
import random
def create():
start = datetime.date( 2022, 1, 1 )
oneday = datetime.timedelta(days=1)
headers = ["date"] + [f"{i}:00" for i in range(8,18)]
data = []
for _ in range(365):
data.append( [start.strftime("%Y-%m-%d")] + [8]*10 ) # not 100
start += oneday
write( headers, data )
def write(headers, rows):
fcsv = csv.writer(open('data.csv','w',newline=''))
fcsv.writerow( headers )
fcsv.writerows( rows )
def read():
days = []
headers = []
for row in csv.reader(open('data.csv')):
if not headers:
headers = row
else:
days.append( [row[0]] + list(map(int,row[1:])))
return headers, days
def choose( headers, days ):
random_date = random.randrange(365)
random_hour = random.randrange(len(headers)-1)+1
choice = days[random_date][0] + " " + headers[random_hour]
print( "Chose", choice )
if days[random_date][random_hour]:
days[random_date][random_hour] -= 1
write(headers,days)
return choice
else:
print("Randomly chosen slot is full.")
return None
create()
data = read()
while choose( *data ):
pass
I did this for my python Discord bot (basically it's a voice activity tracker), everything works fine but I want to remove the milliseconds from total_time. I would like to get something in this format '%H:%M:%S'
Is this possible ?
Here's a part of the code:
if(before.channel == None):
join_time = round(time.time())
userdata["join_time"] = join_time
elif(after.channel == None):
if(userdata["join_time"] == None): return
userdata = voice_data[guild_id][new_user]
leave_time = time.time()
passed_time = leave_time - userdata["join_time"]
userdata["total_time"] += passed_time
userdata["join_time"] = None
And here's the output:
{
"total_time": 7.4658853358879,
}
You can use a datetime.timedelta object, with some caveats.
>>> import datetime as dt
>>> data = {"total_time": 7.4658853358879}
>>> data["total_time"] = str(dt.timedelta(seconds=int(data["total_time"])))
>>> data
{'total_time': '0:00:07'}
If your time is greater than 1 day, or less than zero, the format starts including days
>>> str(dt.timedelta(days=1))
'1 day, 0:00:00'
>>> str(dt.timedelta(seconds=-1))
'-1 day, 23:59:59'
>>>
I have this code that is rather done in a hurry but it works in general. The only thing it runs forever. The idea is to update 2 columns on a table that is holding 1495748 rows, so the number of the list of timestamp being queried in first place. For each update value there has to be done a comparison in which the timestamp has to be in an hourly interval that is formed by two timestamps coming from the api in two different dicts. Is there a way to speed up things a little or maybe multiprocess it?
Hint: db_mac = db_connection to a Postgres database.
the response looks like this:
{'meta': {'source': 'National Oceanic and Atmospheric Administration, Deutscher Wetterdienst'}, 'data': [{'time': '2019-11-26 23:00:00', 'time_local': '2019-11-27 00:00', 'temperature': 8.3, 'dewpoint': 5.9, 'humidity': 85, 'precipitation': 0, 'precipitation_3': None, 'precipitation_6': None, 'snowdepth': None, 'windspeed': 11, 'peakgust': 21, 'winddirection': 160, 'pressure': 1004.2, 'condition': 4}, {'time': '2019-11-27 00:00:00', ....
import requests
import db_mac
from collections import defaultdict
import datetime
import time
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=<APIKEY>".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for row in result:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE the code for multiprocessing. I'll let it run the night and give an update of the running time.
import requests
import db_mac
from collections import defaultdict
import datetime
import time
import multiprocessing as mp
t = time.time()
station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]
dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)
hist_weather_list = []
for d in dates:
end = d[1]
start = d[0]
print(start, end)
url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=wzwi2YR5".format(station=station[0], start=start, end=end, timezone=station[-1])
response = requests.get(url)
weather = response.json()
print(weather)
for i in weather["data"]:
hist_weather_list.append(i)
sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
def find_parameters(x):
for row in result[x[0]:x[1]]:
try:
ts_dump = datetime.datetime.timestamp(row[0])
for i, hour in enumerate(hist_weather_list):
ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))
ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))
if ts1 <= ts_dump and ts_dump < ts2:
insert_dict[row[0]] = (hour["temperature"], hour["pressure"])
except Exception as e:
pass
step1 = int(len(result) /4)
step2 = 2 * step1
step3 = 3 * step1
step4 = len(result)
steps = [[0,step1],[step1,step2],[step2,step3], [step3,step4]]
pool = mp.Pool(mp.cpu_count())
pool.map(find_parameters, steps)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
for key, value in insert_dict.items():
sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
db_mac.execute(sql2)
hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
UPDATE 2
It finished and ran for 2:45 hours in 4 cores on a raspberry pi. Though is there a more efficient way to do such things?
So theres a few minor things I can think of to speed this up a little. I figure anything little bit helps especially if you have a lot of rows to process. For starters, print statements can slow down your code a lot. I'd get rid of those if they are unneeded.
Most importantly, you are calling the api in every iteration of the loop. Waiting for a response from the API is probably taking up the bulk of your time. I looked a bit at the api you are using, but don't know the exact case you're using it for or what your dates "start" and "end" look like, but if you could do it in less calls that would surely speed up this loop by a lot. Another way you can do this is, it looks like the api has a .csv version of the data you can download and use. Running this on local data would be way faster. If you choose to go this route i'd suggest using pandas. (Sorry if you already know pandas and i'm over explaining) You can use: df = pd.read_csv("filename.csv") and edit the table from there easily. You can also do df.to_sql(params) to write to your data base. Let me know if you want help forming a pandas version of this code.
Also, not sure from your code if this would cause an error, but I would try, instead of your for loop (for i in weather["data"]).
hist_weather_list += weather["data"]
or possibly
hist_weather_list += [weather["data"]
Let me know how it goes!
I want to pass parameters to an API and get results
Sample Data
df_results.Unique_Coords
0 51.213:4.386
1 41.294:36.342
2 -7.203:112.733
The parmaeter should be in format
Param Format( expected Format)
(('coords', '44.164:28.641'),
('fromDate', '2019-12-03'),
('toDate', '2019-11-26'))
Response
response = requests.post('https://example.com/geocoder/geocode', headers=headers, params=d_, ,timeout=(3.05, 27))
I am trying to pass values into the API iteratievely.
fromDate = today_date
toDate =shifted_date
My code till now
today_date =date.today().strftime("%Y-%m-%d")
shifted_date = date.today() + timedelta(days=7)
shifted_date =shifted_date.strftime("%Y-%m-%d")
for i, row in df_results.iterrows():
d_ = '( \'coords\' , \'{0}\' )'.format(str(row["Unique_Coords"]))
How to get the correct format?
It looks like you are looking for tuple of tuples, not string with ()
today_date = date.today().strftime("%Y-%m-%d")
shifted_date = date.today() + timedelta(days=7)
shifted_date = shifted_date.strftime("%Y-%m-%d")
data = [(('coords', str(row["Unique_Coords"])), ('fromDate', today_date), ('toDate', shifted_date)) for i, row in df_results.iterrows()]
This will produce a list of tuple of tuples
(('coords', '51.213:4.386'), ('fromDate', '2019-11-26'), ('toDate', '2019-12-03'))
(('coords', '41.294:36.342'), ('fromDate', '2019-11-26'), ('toDate', '2019-12-03'))
(('coords', '-7.203:112.733'), ('fromDate', '2019-11-26'), ('toDate', '2019-12-03'))