index date miles
0 7/8/2015 14:00:00 10
1 7/8/2015 15:00:01 2
2 7/8/2015 16:00:01 5
3 7/9/2015 09:00:02 12
4 7/10/2015 12:00:00 4
5 7/11/2015 11:00:00 25
6 7/12/2015 04:34:33 10
7 7/12/2015 05:35:35 22
8 7/12/2015 23:11:11 14
9 7/13/2015 01:00:23 10
10 7/13/2015 03:00:03 2
I want to make this table to following;
7/8/2015 17
7/9/2015 12
7/10/2015 4
7/11/2015 25
7/12/2015 46
7/13/2015 12
How can i make something like this in python? Group by date to get sum of miles of each day
If you asked about a solution to add the miles of same day in one line .A way to do it is to go through all of the dates using (for loop) and add all that are equal or basically the same date to a variable then print each line
Using resample:
df.set_index('date', inplace=True)
ddf = df.resample('1D').sum()
resample needs a datetime index, so you need to set the index to 'date' before.
If df is your sample input, ddf will look:
miles
date
2015-07-08 17
2015-07-09 12
2015-07-10 4
2015-07-11 25
2015-07-12 46
2015-07-13 12
As #Valentino mentionned:
data = {
'date': ['7/8/2015 14:00:00', '7/8/2015 14:00:00', '7/8/2015 14:00:00', '7/9/2015 14:00:00'],
'miles': [10, 2, 5, 12]
}
df = pandas.DataFrame(data)
df['date'] = pandas.to_datetime(df.date)
df['date'] = df['date'].dt.strftime('%m/%d/%Y')
print(df)
Out:
date miles
0 7/8/2015 10
1 7/8/2015 2
2 7/8/2015 5
3 7/9/2015 12
print(df.groupby('date').sum())
Out:
date miles
7/8/2015 17
7/9/2015 12
Related
I want to create a new column called DateTime using numerical columns "Year","Month","Day","Hour","Minute".
Year Month Day Hour Minute
2019 5 9 11 0
2019 5 9 11 10
2019 5 9 11 20
This is my code:
df["DateTime"] = pd.to_datetime(df[["Year","Month","Day","Hour","Minute"]])
The expected result is:
DateTime
2019-05-09 11:00:00
2019-05-09 11:10:00
2019-05-09 11:20:00
However, I get this wrong result:
DateTime
2019-05-09
2019-05-09
2019-05-09
try this:
d = {
'Year': [2019,2019],
'Month': [5,6],
'Day': [12,13],
'Hour': [12,20],
'Minute': [30,45],
}
df = pd.DataFrame(d)
df["DateTime"] = pd.to_datetime(df[["Year","Month","Day","Hour","Minute"]]).dt.strftime('%d/%m/%y %H:%M')
df
Year Month Day Hour Minute DateTime
0 2019 5 12 12 30 12/05/19 12:30
1 2019 6 13 20 45 13/06/19 20:45
I have the following dataframe:
date
wind (°)
wind (kt)
temp (C°)
humidity(%)
currents (°)
currents (kt)
stemp (C°)
sea_temp_diff
wind_distance_diff
wind_speed_diff
temp_diff
humidity_diff
current_distance_diff
current_speed_diff
8 12018
175.000000
16.333333
25.500000
82.500000
60.000000
0.100000
25.400000
-1.066667
23.333333
-0.500000
-0.333333
-12.000000
160.000000
6.666667e-02
9 12019
180.000000
17.000000
23.344828
79.724138
230.000000
0.100000
23.827586
-0.379310
22.068966
1.068966
0.827586
-7.275862
315.172414
3.449034e+02
10 12020
365.000000
208.653846
24.192308
79.346154
355.769231
192.500000
24.730769
574.653846
1121.923077
1151.153846
1149.346154
-19.538462
1500.000000
1.538454e+03
14 22019
530.357143
372.964286
23.964286
81.964286
1270.714286
1071.560714
735.642857
-533.642857
-327.500000
-356.892857
1.857143
-10.321429
-873.571429
-8.928107e+02
15 22020
216.551724
12.689655
24.517241
81.137931
288.275862
172.565517
196.827586
-171.379310
-8.965517
3.724138
1.413793
-7.137931
-105.517241
-1.722724e+02
16 32019
323.225806
174.709677
25.225806
80.741935
260.000000
161.451613
25.709677
480.709677
486.451613
483.967742
0.387097
153.193548
1044.516129
9.677065e+02
17 32020
351.333333
178.566667
25.533333
78.800000
427.666667
166.666667
26.600000
165.533333
-141.000000
-165.766667
166.633333
158.933333
8.333333
1.500000e-01
18 42017
180.000000
14.000000
27.000000
5000.000000
200.000000
0.400000
25.400000
2.600000
20.000000
-4.000000
0.000000
0.000000
-90.000000
-1.000000e-01
19 42019
694.230769
589.769231
24.038462
69.461538
681.153846
577.046154
26.884615
-1.346154
37.307692
-1.692308
1.500000
4.769231
98.846154
1.538462e-01
20 42020
306.666667
180.066667
24.733333
75.166667
427.666667
166.666667
26.800000
165.066667
205.333333
165.200000
1.100000
-4.066667
360.333333
3.334233e+02
21 52017
146.333333
11.966667
22.900000
5000.000000
116.333333
0.410000
26.066667
-1.553333
8.666667
0.833333
-0.766667
0.000000
95.000000
-1.300000e-01
22 52019
107.741935
12.322581
23.419355
63.032258
129.354839
0.332258
25.935484
-1.774194
14.838710
0.096774
-0.612903
-14.451613
130.967742
I need to sort the 'date' column chronologically, and I'm wondering if there's a way for me to split it two ways, with the '10' in one column and 2017 in another, sort both of them in ascending order, and then bring them back together.
I had tried this:
australia_overview[['month','year']] = australia_overview['date'].str.split("2",expand=True)
But I am getting error like this:
ValueError: Columns must be same length as key
How can I solve this issue?
From your DataFrame :
>>> df = pd.DataFrame({'id': [1, 2, 3, 4],
... 'date': ['1 42018', '12 32019', '8 112020', '23 42021']},
... index = [0, 1, 2, 3])
>>> df
id date
0 1 1 42018
1 2 12 32019
2 3 8 112020
3 4 23 42021
We can split the column to get the first value of day like so :
>>> df['day'] = df['date'].str.split(' ', expand=True)[0]
>>> df
id date day
0 1 1 42018 1
1 2 12 32019 12
2 3 8 112020 8
3 4 23 42021 23
And get the 4 last digit from the column date for the year to get the expected result :
>>> df['year'] = df['date'].str[-4:].astype(int)
>>> df
id date day year
0 1 1 42018 1 2018
1 2 12 32019 12 2019
2 3 8 112020 8 2020
3 4 23 42021 23 2021
Bonus : as asked in the comment, you can even get the month using the same principle :
>>> df['month'] = df['date'].str.split(' ', expand=True)[1].str[:-4].astype(int)
>>> df
id date day year month
0 1 1 42018 1 2018 4
1 2 12 32019 12 2019 3
2 3 8 112020 8 2020 11
3 4 23 42021 23 2021 4
I have a dataframe with different columns (like price, id, product and date) and I need to divide this dataframe into several dataframes based on the current date of the system (current_date = np.datetime64(date.today())).
For example, if today is 2020-02-07 I want to divide my main dataframe into three different ones where df1 would be the data of the last month (data of 2020-01-07 to 2020-02-07), df2 would be the data of the last three months (excluding the month already in df1 so it would be more accurate to say from 2019-10-07 to 2020-01-07) and df3 would be the data left on the original dataframe.
Is there some easy way to do this? Also, I've been trying to use Grouper but I keep getting this error over an over again: NameError: name 'Grouper' is not defined (my Pandas version is 0.24.2)
You can use offsets.DateOffset for last 1mont and 3month datetimes, filter by boolean indexing:
rng = pd.date_range('2019-10-10', periods=20, freq='5d')
df = pd.DataFrame({'date': rng, 'id': range(20)})
print (df)
date id
0 2019-10-10 0
1 2019-10-15 1
2 2019-10-20 2
3 2019-10-25 3
4 2019-10-30 4
5 2019-11-04 5
6 2019-11-09 6
7 2019-11-14 7
8 2019-11-19 8
9 2019-11-24 9
10 2019-11-29 10
11 2019-12-04 11
12 2019-12-09 12
13 2019-12-14 13
14 2019-12-19 14
15 2019-12-24 15
16 2019-12-29 16
17 2020-01-03 17
18 2020-01-08 18
19 2020-01-13 19
current_date = pd.to_datetime('now').floor('d')
print (current_date)
2020-02-07 00:00:00
last1m = current_date - pd.DateOffset(months=1)
last3m = current_date - pd.DateOffset(months=3)
m1 = (df['date'] > last1m) & (df['date'] <= current_date)
m2 = (df['date'] > last3m) & (df['date'] <= last1m)
#filter non match m1 or m2 masks
m3 = ~(m1 | m2)
df1 = df[m1]
df2 = df[m2]
df3 = df[m3]
print (df1)
date id
18 2020-01-08 18
19 2020-01-13 19
print (df2)
date id
6 2019-11-09 6
7 2019-11-14 7
8 2019-11-19 8
9 2019-11-24 9
10 2019-11-29 10
11 2019-12-04 11
12 2019-12-09 12
13 2019-12-14 13
14 2019-12-19 14
15 2019-12-24 15
16 2019-12-29 16
17 2020-01-03 17
print (df3)
date id
0 2019-10-10 0
1 2019-10-15 1
2 2019-10-20 2
3 2019-10-25 3
4 2019-10-30 4
5 2019-11-04 5
I have read a couple of similar post regarding the issue before, but none of the solutions worked for me. so I got the followed csv :
Score date term
0 72 3 Feb · 1
1 47 1 Feb · 1
2 119 6 Feb · 1
8 101 7 hrs · 1
9 536 11 min · 1
10 53 2 hrs · 1
11 20 11 Feb · 3
3 15 1 hrs · 2
4 33 7 Feb · 1
5 153 4 Feb · 3
6 34 3 min · 2
7 26 3 Feb · 3
I want to sort the csv by date. What's the easiest way to do that ?
You can create 2 helper columns - one for datetimes created by to_datetime and second for timedeltas created by to_timedelta, only necessary format HH:MM:SS, so added Series.replace by regexes, so last is possible sorting by 2 columns by DataFrame.sort_values:
df['date1'] = pd.to_datetime(df['date'], format='%d %b', errors='coerce')
times = df['date'].replace({'(\d+)\s+min': '00:\\1:00',
'\s+hrs': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
df = df.sort_values(['times','date1'])
print (df)
Score date term date1 times
6 34 3 min 2 NaT 00:03:00
9 536 11 min 1 NaT 00:11:00
3 15 1 hrs 2 NaT 01:00:00
10 53 2 hrs 1 NaT 02:00:00
8 101 7 hrs 1 NaT 07:00:00
1 47 1 Feb 1 1900-02-01 NaT
0 72 3 Feb 1 1900-02-03 NaT
7 26 3 Feb 3 1900-02-03 NaT
5 153 4 Feb 3 1900-02-04 NaT
2 119 6 Feb 1 1900-02-06 NaT
4 33 7 Feb 1 1900-02-07 NaT
11 20 11 Feb 3 1900-02-11 NaT
Currently I have a time series data frame as follows:
dfMain =
Date Portfolio Value
0 2016-07-01 1.000000e+06
1 2016-07-08 1.025168e+06
2 2016-07-15 1.028053e+06
3 2016-07-22 1.024184e+06
4 2016-07-29 1.022491e+06
5 2016-08-05 1.023241e+06
6 2016-08-12 1.030325e+06
7 2016-08-19 1.032742e+06
8 2016-08-26 1.032567e+06
9 2016-09-02 1.028614e+06
10 2016-09-09 9.930876e+05
11 2016-09-16 9.956875e+05
12 2016-09-23 1.010174e+06
13 2016-09-30 1.010388e+06
14 2016-10-07 1.004989e+06
15 2016-10-14 9.924929e+05
16 2016-10-21 9.969708e+05
17 2016-10-28 9.816373e+05
18 2016-11-04 9.563689e+05
19 2016-11-11 9.869579e+05
20 2016-11-18 9.936929e+05
21 2016-11-25 1.009625e+06
Given that the dataframe can be different (can't just pull specific rows from example) what would be the best way to pull the closest to the end of month dates from the dataframe? for example index 4 would be pulled because that is the closest to the end of month date.
Any tips would be greatly appreciated!
Group on the month number and find the last record:
df.Date = pd.to_datetime(df.Date, errors='coerce')
df.groupby(df.Date.dt.month).last()
Date Portfolio Value
Date
7 2016-07-29 1022491.0
8 2016-08-26 1032567.0
9 2016-09-30 1010388.0
10 2016-10-28 981637.3
11 2016-11-25 1009625.0
If rows aren't sorted by Date, call sort_values first:
df.sort_values('Date').groupby(df.Date.dt.month).last()
Date Portfolio Value
Date
7 2016-07-29 1022491.0
8 2016-08-26 1032567.0
9 2016-09-30 1010388.0
10 2016-10-28 981637.3
11 2016-11-25 1009625.0
Should work in any case.
If you have dates spanning multiple years, better to groupby on the year-month:
df.sort_values('Date').groupby([df.Date.dt.year, df.Date.dt.month]).last()
You need to sort the dates and then find the last value for each group.
df['Date'] = pd.to_datetime(df['Date'])
grp = df.sort_values('Date').groupby(df['Date'].dt.month)
pd.DataFrame([grp.get_group(x).iloc[-1] for x in grp.groups])
Output:
Date Portfolio Value
4 2016-07-29 1022491.0
8 2016-08-26 1032567.0
13 2016-09-30 1010388.0
17 2016-10-28 981637.3
21 2016-11-25 1009625.0