How can I convert Series to DataFrame?
The problem is mapping columns' name of Series and DataFrame
I have a Series like this:
(made with groupby and concat function)
CUS_ID DAY
2 MON 0.176644
TUE 0.246489
WED 0.160569
THU 0.234109
FRI 0.170916
...
dtype: float64
And what I want to get is like this:
CUS_ID MON TUE WED THU FRI
2 0.176644 0.246489 0.160569 0.234109 0.170916
The type must be DataFrame..!
Is there any way to get it without using 'for' statement??
You can simply unstack the index
s=pd.Series(data=[1,2,3,4,5],index=[[2,2,2,2,2],['mon','tue','wed','thu','fri']])
2 mon 1
tue 2
wed 3
thu 4
fri 5
s.unstack()
fri mon thu tue wed
2 5 1 4 2 3
Related
]I have a column called as datetime of type datetime64[ns] and for eg: it is represented as 2019-10-27 06:00:00 I would like to create a new column called waves which groups the date interval from datetime column to different categorical values. For eg:
Before covid: 16th of Nov 2019 until 28th of Feb 2020 First wave: 1st of Mar 2020 until 15th of Jun 2020 Between waves: 16th of Jun 2020 until 30th of Sep 2020 Second wave: 1st of Okt 2020 until 15th of Jan 2021
How do I achieve this in python maybe using a loop function?
My dataset called df looks like this:
provider fid pid datetime
0 CHE-223 2bfc9a62 2f43d557 2021-09-26T23:18:00
1 CHE-223 fff669e9 295b82e2 2021-08-13T09:10:00
2 CHE-223 8693e564 9df9c555 2021-11-05T20:03:00
This question already has answers here:
Extracting just Month and Year separately from Pandas Datetime column
(13 answers)
Closed 2 years ago.
HI all I have a column in a dataframe that looks like:
print(df['Date']):
29-Nov-16
4-Dec-16
1-Oct-16
30-Nov-19
30-Jun-20
28-Apr-16
24-May-16
And i am trying to get an output that looks like
print(df):
Date Month Year
29-Nov-16 Nov 2016
4-Dec-16 Dec 2016
1-Oct-16 Oct 2016
30-Nov-19 Nov 2019
30-Jun-20 Jun 2020
28-Apr-16 Apr 2016
24-May-16 May 2016
I have tried the following:
df['Month'] = pd.datetime(df['Date']).month
df['Year'] = pd.datetime(df['Date']).year
but am getting a TypeError: cannot convert the series to <class 'int'>
Any ideas or references to help out?
Thanks!
Use strftime and str.split and assign them to new columns
df_final = df.assign(**pd.to_datetime(df['Date']).dt.strftime('%b-%Y')
.str.split('-', expand=True)
.set_axis(['Month','Year'], axis=1))
Out[32]:
Date Month Year
0 29-Nov-16 Nov 2016
1 4-Dec-16 Dec 2016
2 1-Oct-16 Oct 2016
3 30-Nov-19 Nov 2019
4 30-Jun-20 Jun 2020
5 28-Apr-16 Apr 2016
6 24-May-16 May 2016
you are missing dt after pd.datetime(df['Date'])
try this:
df['Month'] = pd.datetime(df['Date']).dt.month
df['Year'] = pd.datetime(df['Date']).dt.year
I have a dataframe that looks like this:
Year vl
2017 20
2017 21
2017 22
2017 23
2017 24
2017 25
2017 26
...
I need to convert the year into the format dd.mm.yyyy. Every time start from the first day of the year. For example, 2017 will become 01.01.2017. And then, I need to multiply each value in the column "vl" by 7 and add them line by line to the column as the number of days, where the dates will be in the new format (as in the example 01.01.2017).
The result should be something like this:
Year vl new_date
2017 20 21.05.2017
2017 21 28.05.2017
2017 22 04.06.2017
2017 23 11.06.2017
2017 24 18.06.2017
2017 25 25.06.2017
2017 26 02.07.2017
...
Here is one option by pasting the Year (%Y) and Day of the year (%j) together and then parse and reformat it:
from datetime import datetime
df.apply(lambda r: datetime.strptime("{}{}".format(r.Year, r.vl*7+1), "%Y%j").strftime("%d.%m.%Y"), axis=1)
#0 21.05.2017
#1 28.05.2017
#2 04.06.2017
#3 11.06.2017
#4 18.06.2017
#5 25.06.2017
#6 02.07.2017
#dtype: object
Assign the column back to the original data frame:
df['new_date'] = df.apply(lambda r: datetime.strptime("{}{}".format(r.Year, r.vl*7+1), "%Y%j").strftime("%d.%m.%Y"), axis=1)
Unfortunately %U and %W aren't implemented in Pandas
But we can use the following vectorized approach:
In [160]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
pd.to_timedelta(df.vl.mul(7).astype(str) + ' days')
Out[160]:
0 2017-05-21
1 2017-05-28
2 2017-06-04
3 2017-06-11
4 2017-06-18
5 2017-06-25
6 2017-07-02
dtype: datetime64[ns]
I have a very large dataframe in which one of the columns, ['date'], datetime (dtype is string still) is formatted as below.. sometimes it is displayed as hh:mm:ss and sometimes as h:mm:ss (with hours 9 and earlier)
Tue Mar 1 9:23:58 2016
Tue Mar 1 9:29:04 2016
Tue Mar 1 9:42:22 2016
Tue Mar 1 09:43:50 2016
pd.to_datetime() won't work when I'm trying to convert the string into datetime format so I was hoping to find some help in getting 0's in front of the time where missing.
Any help is greatly appreciated!
import pandas as pd
date_stngs = ('Tue Mar 1 9:23:58 2016','Tue Mar 1 9:29:04 2016','Tue Mar 1 9:42:22 2016','Tue Mar 1 09:43:50 2016')
a = pd.Series([pd.to_datetime(date) for date in date_stngs])
print a
output
0 2016-03-01 09:23:58
1 2016-03-01 09:29:04
2 2016-03-01 09:42:22
3 2016-03-01 09:43:50
time = df[0].str.split(' ').str.get(3).str.split('').str.get(0).str.strip().str[:8]
year = df[0].str.split('--').str.get(0).str[-5:].str.strip()
daynmonth = df[0].str[:10].str.strip()
df_1['date'] = daynmonth + ' ' +year + ' ' + time
df_1['date'] = pd.to_datetime(df_1['date'])
Found this to work myself when rearranging the order
Assuming you have a one column DataFrame with strings as above and column name is 0 then the following will split the strings by space and then take the third string and zero-fill it with zfill
Assuming starting df
0
0 Tue Mar 1 9:23:58 2016
1 Tue Mar 1 9:29:04 2016
2 Tue Mar 1 9:42:22 2016
3 Tue Mar 1 09:43:50 2016
df1 = df[0].str.split(expand=True)
df1[3] = df1[3].str.zfill(8)
pd.to_datetime(df1.apply(lambda x: ' '.join(x.tolist()), axis=1))
Output
0 2016-03-01 09:23:58
1 2016-03-01 09:29:04
2 2016-03-01 09:42:22
3 2016-03-01 09:43:50
dtype: datetime64[ns]
I have a DataFrame with a many-levelled MultiIndex.
I know that there are duplicates in the MultiIndex (because I don't care about a distinction that the underlying databse does care about)
I want to sum over these duplicates:
>>> x = pd.DataFrame({'month':['Sep', 'Sep', 'Oct', 'Oct'], 'day':['Mon', 'Mon', 'Mon', 'Tue'], 'sales':[1,2,3,4]})
>>> x
day month sales
0 Mon Sep 1
1 Mon Sep 2
2 Mon Oct 3
3 Tue Oct 4
>>> x = x.set_index(['day', 'month'])
sales
day month
Mon Sep 1
Sep 2
Oct 3
Tue Oct 4
To give me
day month
Mon Sep 3
Oct 3
Tue Oct 4
Buried deep in this SO answer to a similar question is the suggestion:
df.groupby(level=df.index.names).sum()
But this seems to me to fail the 'readability counts' criterion of good Python code.
Does anyone know of a more human-readable way?