Related
I would like to calculate the average slope of multiple numbers. For example:
I've been given 5 numbers (eg. 1.1523, 1.4626, 1.5734, 1.8583, 1.6899). I get a new number every 15 minutes and delete the oldest and want to calculate again the average slope.
I've already seen formulas but I don't really get how to calculate it when like the imaginary x-axis is the time. Like I have:
X: 14:00, 14:15, 14:30, 14:45, 15:00
Y: 1.1523, 1.4626, 1.5734, 1.8583, 1.6899
Assuming all times are in HH:MM format and we don't need to worry about passing midnight, this should work:
X = ['14:00', '14:15', '14:30', '14:45', '15:00']
Y = [1.1523, 1.4626, 1.5734, 1.8583, 1.6899]
minutes = [int(s[:2]) * 60 + int(s[3:]) for s in X]
slope = (Y[-1] - Y[0]) / ((minutes[-1] - minutes[0]) / 60)
print(slope)
slopes = [(Y[i] - Y[i - 1]) / ((minutes[i] - minutes[i - 1]) / 60) for i in range(1, len(X))]
print(slopes)
averageSlope = sum(slopes) / (len(X) - 1)
print(averageSlope)
Results:
0.5375999999999999
[1.2411999999999992, 0.44320000000000004, 1.1396000000000006, -0.6736000000000004]
0.5375999999999999
I could be wrong, but isn't average slope determined the same way as average velocity - which is Delta d / Delta t? If that is the case, shouldn't it be Delta Y / Delta X?
from datetime import datetime
X = ['14:00', '14:15', '14:30', '14:45', '15:00']
Y = [1.1523, 1.4626, 1.5734, 1.8583, 1.6899]
today = datetime.now()
avg = []
for i in range(len(X) - 1):
e = X[i + 1].split(":")
e = datetime(today.year, today.month, today.day, int(e[0]), int(e[1]))
s = X[i].split(":")
s = datetime(today.year, today.month, today.day, int(s[0]), int(s[1]))
deltaX = e - s
deltaY = Y[i + 1] - Y[i]
ans = (deltaY / deltaX.seconds) / 60
avg.append(f'{ans:.9f}')
print(avg)
['0.000005746', '0.000002052', '0.000005276', '-0.000003119']
Consider this data:
1 minute(s) 1 meter(s)
2 minute(s) 2 meter(s)
3 minute(s) 3 meter(s)
You should have a slope of 1 meter/minute, no matter how you cut the cake.
Likewise for this:
1 minute(s) 2 meter(s)
2 minute(s) 3 meter(s)
3 minute(s) 5 meter(s)
From 2 to 1 minutes, the average is 1 meters/minute, from 3 to 2 minutes its 2 meters/minute, and from 3 to 1 minutes its 1.5 meters/minute.
I have two list which have different value. I tried to put the a list in an organized format with g.split. Although it work fine on the a list, but it cant filter b list properly
a = ['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
b = ['Sehingga 9 Ogos 2021. Jumlah kes COVID-19 yang dilaporkan adalah 17,236 kes (1,279,776 kes).\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 5,740 (470,755)\nWPKL - 1,567 (141,971)\nJohor - 1,232 (101,684)\nSabah -Lagi 1,247 (95,082)\nSarawak - 589 (81,917)\nNegeri Sembilan - 1,215 (79,992)\nKedah - 1,328 (57,926)\nPulau Pinang - 908 (53,276)\nKelantan - 914 (50,347)\nPerak - 935 (44,859)\nMelaka - 360 (35,944)\nPahang - 604 (29,729)\nTerengganu - 501 (21,197)\nWP Labuan - 8 (9,719)\nWP Putrajaya - 66 (4,544)\nPerlis - 22 (834)\n\n- KPK KKM']
My code
out = []
for v in b:
for g in re.findall(r"^(.*?\(.*?\))\n", v, flags=re.M):
out.append(g.split(":")[0])
print(*out[0])
Whenever i print print(out[0]) in b list it only show me Selangor - 5 , 7 4 0 (470,755) which is wrong, it should be Sehingga 9 Ogos 2021.
I tried the same code but this time in a list and it work properly without any issues. However I noticed there is minor differences between the two list, one is the ':' and '.' after the Sehingga 8 Ogos 2021. How can I make the function to work on both list? I'm still new to re and gsplit, does anyone have any idea on this ? Thanks.
`
There are issue with your data format and regex, I am not that good at regex but this works on me.
import re
a = ['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
b = ['Sehingga 9 Ogos 2021. Jumlah kes COVID-19 yang dilaporkan adalah 17,236 kes (1,279,776 kes).\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 5,740 (470,755)\nWPKL - 1,567 (141,971)\nJohor - 1,232 (101,684)\nSabah -Lagi 1,247 (95,082)\nSarawak - 589 (81,917)\nNegeri Sembilan - 1,215 (79,992)\nKedah - 1,328 (57,926)\nPulau Pinang - 908 (53,276)\nKelantan - 914 (50,347)\nPerak - 935 (44,859)\nMelaka - 360 (35,944)\nPahang - 604 (29,729)\nTerengganu - 501 (21,197)\nWP Labuan - 8 (9,719)\nWP Putrajaya - 66 (4,544)\nPerlis - 22 (834)\n\n- KPK KKM']
out = []
for v in b:
regex_list = re.findall(r"^(.*?\(.*?\))\n", v.replace('.\n', '\n').replace('.',':'), flags=re.M)
for g in regex_list:
print(g)
out.append(g.split(":")[0])
print(*out[0])
I have the sample df below:
Project Name Industry Due Date
P - ABC FI 1/31/2020
P - POA FI 1/8/2020
P - BCD MANU 1/25/2020
P - QWE RES 3/6/2020
P - POI FI 6/7/2020
P - RLK TECH 6/9/2020
P - MJK RET 3/18/2020
P - KIU TECH 4/19/2020
P - KNJ RES 3/9/2020
P - ISA TECH 4/3/2020
P - YUI FI 4/2/2020
I want to create a grouped view as below and I tried pandas pivot table but didn't meet my expectation and returns error...
pd.pivot_table(df,index=['Industry'],columns=['Due Date'],values=['Project Name'])
Expected outputs:
Jan Mar Apr Jun
Industry
FI P - POA P - YUI P - POI
P - ABC
MANU P - BCD
RES P - QWE
P - KNJ
RET P - MJK
TECH P - ISA P - RLK
P - KIU
Does anyone have any thoughts? Thank you for the help in advance!
You can try the below:
m= (df[['Industry','Project Name']]
.assign(Month=pd.to_datetime(df['Due Date']).dt.month_name()))
idx=m['Industry'].unique()
final = (m.pivot_table('Project Name',
['Industry',m.groupby(['Industry','Month']).cumcount()],
'Month',aggfunc='first',fill_value='')
.rename_axis(None,axis=1).reindex(idx,level=0))
print(final)
April January June March
Industry
FI 0 P - YUI P - ABC P - POI
1 P - POA
MANU 0 P - BCD
RES 0 P - QWE
1 P - KNJ
TECH 0 P - KIU P - RLK
1 P - ISA
RET 0 P - MJK
Here's an alternative using .groupby, which is more of a pipeline approach rather than using pivot_table:
import pandas as pd
df = pd.DataFrame(
[
("P - ABC","FI","1/31/2020"),
("P - POA","FI","1/8/2020"),
("P - BCD","MANU","1/25/2020"),
("P - QWE","RES","3/6/2020"),
("P - POI","FI","6/7/2020"),
("P - RLK","TECH","6/9/2020"),
("P - MJK","RET","3/18/2020"),
("P - KIU","TECH","4/19/2020"),
("P - KNJ","RES","3/9/2020"),
("P - ISA","TECH","4/3/2020"),
("P - YUI","FI","4/2/2020"),
],
columns=("Project Name","Industry","Due Date")
)
# I've wrapped the Pandas pipeline in parentheses to allow for line breaks
(
df
.set_index(pd.to_datetime(df["Due Date"]).dt.month_name())
.pipe(lambda x: x.groupby([x["Industry"], x.index]))
.max() # This technically works but there might be better opts
.unstack()
["Project Name"]
)
Out[]:
Due Date April January June March
Industry
FI P - YUI P - POA P - POI NaN
MANU NaN P - BCD NaN NaN
RES NaN NaN NaN P - QWE
RET NaN NaN NaN P - MJK
TECH P - KIU NaN P - RLK NaN
I am trying to plot a number of bar charts with matplotlib having exactly 26 timestamps / slots at the x-axis and two integers for the y-axis. For most data sets this scales fine, but in some cases matplotlib lets the bars overlap:
Left overlapping and not aligned to xticks, right one OK
Overlapping
So instead of giving enough space for the bars they are overlapping although my width is set to 0.1 and my datasets have 26 values, which I checked.
My code to plot these charts is as follows:
# Plot something
rows = len(data_dict) // 2 + 1
fig = plt.figure(figsize=(15, 5*rows))
gs1 = gridspec.GridSpec(rows, 2)
grid_x = 0
grid_y = 0
for dataset_name in data_dict:
message1_list = []
message2_list = []
ts_list = []
slot_list = []
for slot, counts in data_dict[dataset_name].items():
slot_list.append(slot)
message1_list.append(counts["Message1"])
message2_list.append(counts["Message2"])
ts_list.append(counts["TS"])
ax = fig.add_subplot(gs1[grid_y, grid_x])
ax.set_title("Activity: " + dataset_name, fontsize=24)
ax.set_xlabel("Timestamps", fontsize=14)
ax.set_ylabel("Number of messages", fontsize=14)
ax.xaxis_date()
hfmt = matplotdates.DateFormatter('%d.%m,%H:%M')
ax.xaxis.set_major_formatter(hfmt)
ax.set_xticks(ts_list)
plt.setp(ax.get_xticklabels(), rotation=60, ha='right')
ax.tick_params(axis='x', pad=0.75, length=5.0)
rects = ax.bar(ts_list, message2_list, align='center', width=0.1)
rects2 = ax.bar(ts_list, message1_list, align='center', width=0.1, bottom=message2_list)
# update grid position
if (grid_x == 1):
grid_x = 0
grid_y += 1
else:
grid_x = 1
plt.tight_layout(0.01)
plt.savefig(r"output_files\activity_barcharts.svg",bbox_inches='tight')
plt.gcf().clear()
The input data looks as follows (example of a plot with overlapping bars, second picture)
slot - message1 - message2 - timestamp
0 - 0 - 42 - 2017-09-11 07:59:53.517000+00:00
1 - 0 - 4 - 2017-09-11 09:02:28.827875+00:00
2 - 0 - 0 - 2017-09-11 10:05:04.138750+00:00
3 - 0 - 0 - 2017-09-11 11:07:39.449625+00:00
4 - 0 - 0 - 2017-09-11 12:10:14.760500+00:00
5 - 0 - 0 - 2017-09-11 13:12:50.071375+00:00
6 - 0 - 13 - 2017-09-11 14:15:25.382250+00:00
7 - 0 - 0 - 2017-09-11 15:18:00.693125+00:00
8 - 0 - 0 - 2017-09-11 16:20:36.004000+00:00
9 - 0 - 0 - 2017-09-11 17:23:11.314875+00:00
10 - 0 - 0 - 2017-09-11 18:25:46.625750+00:00
11 - 0 - 0 - 2017-09-11 19:28:21.936625+00:00
12 - 0 - 0 - 2017-09-11 20:30:57.247500+00:00
13 - 0 - 0 - 2017-09-11 21:33:32.558375+00:00
14 - 0 - 0 - 2017-09-11 22:36:07.869250+00:00
15 - 0 - 0 - 2017-09-11 23:38:43.180125+00:00
16 - 0 - 0 - 2017-09-12 00:41:18.491000+00:00
17 - 0 - 0 - 2017-09-12 01:43:53.801875+00:00
18 - 0 - 0 - 2017-09-12 02:46:29.112750+00:00
19 - 0 - 0 - 2017-09-12 03:49:04.423625+00:00
20 - 0 - 0 - 2017-09-12 04:51:39.734500+00:00
21 - 0 - 0 - 2017-09-12 05:54:15.045375+00:00
22 - 0 - 0 - 2017-09-12 06:56:50.356250+00:00
23 - 0 - 0 - 2017-09-12 07:59:25.667125+00:00
24 - 0 - 20 - 2017-09-12 09:02:00.978000+00:00
25 - 0 - 0 - 2017-09-12 10:04:36.288875+00:00
Does anyone know how to prevent this from happening?
I calculated exactly 26 bars for every chart and actually expected them to have equally width. I also tried to replace the 0 with 1e-5, but that did not prevent any overlapping (which another post proposed).
The width of the bar is the width in data units. I.e. if you want to have a bar of width 1 minute, you would set the width to
plt.bar(..., width=1./(24*60.))
because the numeric axis unit for datetime axes in matplotlib is days and there are 24*60 minutes in a day.
For an automatic determination of the bar width, you may say that you want to have the bar width the smallest difference between any two successive values from the input time list. In that case, something like the following will do the trick
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates
t = [datetime.datetime(2017,9,12,8,i) for i in range(60)]
x = np.random.rand(60)
td = np.diff(t).min()
s1 = matplotlib.dates.date2num(datetime.datetime.now())
s2 = matplotlib.dates.date2num(datetime.datetime.now()+td)
plt.bar(t, x, width=s2-s1, ec="k")
plt.show()
periodsList = []
su = '0:'
Su = []
sun = []
SUN = ''
I'm formating timetables by converting
extendedPeriods = ['0: 1200 - 1500',
'0: 1800 - 2330',
'2: 1200 - 1500',
'2: 1800 - 2330',
'3: 1200 - 1500',
'3: 1800 - 2330',
'4: 1200 - 1500',
'4: 1800 - 2330',
'5: 1200 - 1500',
'5: 1800 - 2330',
'6: 1200 - 1500',
'6: 1800 - 2330']
into '1200 - 1500/1800 - 2330'
su is the day identifier
Su, sun store some values
SUN stores the converted timetable
for line in extendedPeriods:
if su in line:
Su.append(line)
for item in Su:
sun.append(item.replace(su, '', 1).strip())
SUN = '/'.join([str(x) for x in sun])
Then I tried to write a function to apply my "converter" also to the other days..
def formatPeriods(id, store1, store2, periodsDay):
for line in extendedPeriods:
if id in line:
store1.append(line)
for item in store1:
store2.append(item.replace(id, '', 1).strip())
periodsDay = '/'.join([str(x) for x in store2])
return periodsDay
But the function returns 12 misformatted strings...
'1200 - 1500', '1200 - 1500/1200 - 1500/1800 - 2330',
You can use collections.OrderedDict here, if order doesn't matter then use collections.defaultdict
>>> from collections import OrderedDict
>>> dic = OrderedDict()
for item in extendedPeriods:
k,v = item.split(': ')
dic.setdefault(k,[]).append(v)
...
>>> for k,v in dic.iteritems():
... print "/".join(v)
...
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
1200 - 1500/1800 - 2330
To access a particular day you can use:
>>> print "/".join(dic['0']) #sunday
1200 - 1500/1800 - 2330
>>> print "/".join(dic['2']) #tuesday
1200 - 1500/1800 - 2330
This is your general logic:
from collections import defaultdict
d = defaultdict(list)
for i in extended_periods:
bits = i.split(':')
d[i[0].strip()].append(i[1].strip())
for i,v in d.iteritems():
print i,'/'.join(v)
The output is:
0 1200 - 1500/1800 - 2330
3 1200 - 1500/1800 - 2330
2 1200 - 1500/1800 - 2330
5 1200 - 1500/1800 - 2330
4 1200 - 1500/1800 - 2330
6 1200 - 1500/1800 - 2330
To make it function for a day, simply select d[0] (for Sunday, for example):
def schedule_per_day(day):
d = defaultdict(list)
for i in extended_periods:
bits = i.split(':')
d[i[0].strip()].append(i[1].strip())
return '/'.join(d[day]) if d.get(day) else None