Convert integer to dates in python - python

Is there aby way of converting integers of range(0,365) to dates (dtype='datetime64[D]')?
Eg:
0 -> 1 jan
1 -> 2 jan
.
.
31 -> 1 feb
.
.
364-> 31 dec
P.S: I don't need the year. Only date and month for a non-leap year.

As you have mentioned that you need the resultset for a non-leap year so create the datetime object with a non-leap year such as '2015'
from datetime import datetime, timedelta
for day_count in range(0, 365) :
curr_date_object = datetime.strptime('2015-01-01', '%Y-%m-%d') + timedelta(days=day_count)
print(curr_date_object.strftime("%d %b"))
This will return your desired result.
01 Jan
02 Jan
03 Jan
04 Jan
05 Jan
...

If you want the mapping to be in a dictionary, this is how it would look:
import datetime
In [35]: days = {}
In [36]: for i in range(0, 365):
...: days[i] = (datetime.datetime(2017, 1, 1) + datetime.timedelta(days=i)).strftime("%d %b")
...:
You'll get this:
In [37]: days
Out[37]:
{0: '01 Jan',
1: '02 Jan',
2: '03 Jan',
3: '04 Jan',
...

Related

Python list forward fill elements according to thresholds

I have a list
a = ["Today, 30 Dec",
"01:10",
"02:30",
"Tomorrow, 31 Dec",
"00:00",
"04:30",
"05:30",
"01 Jan 2023",
"01:00",
"10:00"]
and would like to kind of forward fill this list so that the result looks like this
b = ["Today, 30 Dec 01:10",
"Today, 30 Dec 02:30",
"Tomorrow, 31 Dec 00:00",
"Tomorrow, 31 Dec 04:30",
"Tomorrow, 31 Dec 05:30",
"01 Jan 2023 01:00",
"01 Jan 2023 10:00"]
I iterate over the list and check if it is a time with regex. If it isn't I save it, to prepend it to the following items, and the append it to the output.
Code:
import re
from pprint import pprint
def forward(input_list):
output = []
for item in input_list:
if not re.fullmatch(r"\d\d:\d\d", item):
forwarded = item
else:
output.append(f"{forwarded} {item}")
return output
a = ["Today, 30 Dec",
"01:10",
"02:30",
"Tomorrow, 31 Dec",
"00:00",
"04:30",
"05:30",
"01 Jan 2023",
"01:00",
"10:00"]
b = forward(a)
pprint(b)
Output:
['Today, 30 Dec 01:10',
'Today, 30 Dec 02:30',
'Tomorrow, 31 Dec 00:00',
'Tomorrow, 31 Dec 04:30',
'Tomorrow, 31 Dec 05:30',
'01 Jan 2023 01:00',
'01 Jan 2023 10:00']
Looks like that list contains dates and times.
Any item that contains a space is a date value; otherwise it is a time value.
Iterate over the list. If you see a date value, save it as the current date. If you see a time value, append it to the current date and save that value the new list.
How about:
a = ["Today, 30 Dec",
"01:10",
"02:30",
"Tomorrow, 31 Dec",
"00:00",
"04:30",
"05:30",
"01 Jan 2023",
"01:00",
"10:00"]
b = []
base = ""
for x in a:
if ":" in x:
b.append(base + " " + x)
else:
base = x
print(b)
simply iterate over your data and store the front string and if the current element contains a colon append it
Output:
['Today, 30 Dec 01:10',
'Today, 30 Dec 02:30',
'Tomorrow, 31 Dec 00:00',
'Tomorrow, 31 Dec 04:30',
'Tomorrow, 31 Dec 05:30',
'01 Jan 2023 01:00',
'01 Jan 2023 10:00']

converting str to YYYYmmdd format in python

I have year, month and date in three columns, I am concatenating them to one column then trying to make this column to YYYY/mm/dd format as follows:
dfyz_m_d['dt'] = '01'# to bring one date of each of the month
dfyz_m_d['CalendarWeek1'] = dfyz_m_d['year'].map(str) + dfyz_m_d['mon'].map(str) + dfyz_m_d['dt'].map(str)
dfyz_m_d['CalendarWeek'] = pd.to_datetime(dfyz_m_d['CalendarWeek1'], format='%Y%m%d')
but for both 1 ( jan) and 10 ( Oct) months I am getting only oct in final outcome (CalendarWeek comun doesn't have any Jan. Basically it is retaining all records but Jan month also it is formatting to Oct
The issue is Jan is single digit numerically, so you end up with something like 2021101 which will be interpreted as Oct instead of Jan. Make sure your mon column is always converted to two digit months with leading zeros if needed using .zfill(2):
dfyz_m_d['year'].astype(str) + dfyz_m_d['mon'].astype(str).str.zfill(2) + dfyz_m_d['dt'].astype(str)
zfill example:
df = pd.DataFrame({'mon': [1,2,10]})
df.mon.astype(str).str.zfill(2)
0 01
1 02
2 10
Name: mon, dtype: object
I usually do
pd.to_datetime(df.mon,format='%m').dt.strftime('%m')
0 01
1 02
2 10
Name: mon, dtype: object
Also , if you name the column correctly , notice the name as year month and day
df['day'] = '01'
df['new'] = pd.to_datetime(df.rename(columns={'mon':'month'})).dt.strftime('%m/%d/%Y')
df
year mon day new
0 2020 1 1 01/01/2020
1 2020 1 1 01/01/2020
I like str.pad :)
dfyz_m_d['year'].astype(str) + dfyz_m_d['mon'].astype(str).str.pad(2, 'left', '0') + dfyz_m_d['dt'].astype(str)
It will pad zeros to the left to ensure that the length of the strings will be two. SO 1 becomes 01, but 10 stays to be 10.
You should be able to use pandas.to_datetime with your input dataframe. You may need to rename your columns.
import pandas as pd
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'dt': [4, 5]})
print(pd.to_datetime(df.rename(columns={"dt": "day"})))
Output
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
You can add / between year, mon and dt and amend the format string to include it, as follows:
dfyz_m_d['dt'] = '01'
dfyz_m_d['CalendarWeek1'] = dfyz_m_d['year'].astype(str) + '/' + dfyz_m_d['mon'].astype(str) + '/' + dfyz_m_d['dt'].astype(str)
dfyz_m_d['CalendarWeek'] = pd.to_datetime(dfyz_m_d['CalendarWeek1'], format='%Y/%m/%d')
Data Input
year mon dt
0 2021 1 01
1 2021 2 01
2 2021 10 01
3 2021 11 01
Output
year mon dt CalendarWeek1 CalendarWeek
0 2021 1 01 2021/1/01 2021-01-01
1 2021 2 01 2021/2/01 2021-02-01
2 2021 10 01 2021/10/01 2021-10-01
3 2021 11 01 2021/11/01 2021-11-01
If you want the final output date format be YYYY/mm/dd, you can further use .dt.strftime after pd.to_datetime, as follows:
dfyz_m_d['dt'] = '01'
dfyz_m_d['CalendarWeek1'] = dfyz_m_d['year'].astype(str) + '/' + dfyz_m_d['mon'].astype(str) + '/' + dfyz_m_d['dt'].astype(str)
dfyz_m_d['CalendarWeek'] = pd.to_datetime(dfyz_m_d['CalendarWeek1'], format='%Y/%m/%d').dt.strftime('%Y/%m/%d')
Output
year mon dt CalendarWeek1 CalendarWeek
0 2021 1 01 2021/1/01 2021/01/01
1 2021 2 01 2021/2/01 2021/02/01
2 2021 10 01 2021/10/01 2021/10/01
3 2021 11 01 2021/11/01 2021/11/01

How can I convert day of week, Month, Date to Year - Month - Date

I have dates from 2018 until 2021 in a pandas column and they look like this:
Date
Sun, Dec 30
Mon, Dec 31
Any idea how I can convert this to:
Date
Dec 30 2018
Dec 31 2018
In the sense that is it possible that knowing the day of the week i.e. (monday, tuesday etc) is it possible to get the year of that specific date?
I would take a look at this conversation. As mentioned, you will probably need to define a range of years, since it is possible that December 30th (for example) falls on a Sunday in more than one year. Otherwise, it is possible to collect a list of years where the input (Sun, Dec 30) is valid. You will probably need to use datetime to convert your strings to a Python readable format.
you can iterate the years from 2018 to 2022 to get every target date's weekday name, then find the match year.
df = pd.DataFrame({'Date': {0: 'Sun, Dec 30',
1: 'Mon, Dec 31'}})
for col in range(2018, 2022):
df[col] = '%s' % col + df['Date'].str.split(',').str[-1]
df[col] = pd.to_datetime(df[col], format='%Y %b %d').dt.strftime('%a, %b %d')
dfn = df.set_index('Date').stack().reset_index()
cond = dfn['Date'] == dfn[0]
obj = dfn[cond].set_index('Date')['level_1'].rename('year')
result:
print(obj)
Date
Sun, Dec 30 2018
Mon, Dec 31 2018
Name: year, dtype: int64
print(df.join(obj, on='Date'))
Date 2018 2019 2020 2021 year
0 Sun, Dec 30 Sun, Dec 30 Mon, Dec 30 Wed, Dec 30 Thu, Dec 30 2018
1 Mon, Dec 31 Mon, Dec 31 Tue, Dec 31 Thu, Dec 31 Fri, Dec 31 2018
df_result = obj.reset_index()
df_result['Date_new'] = df_result['Date'].str.split(',').str[-1] + ' ' + df_result['year'].astype(str)
print(df_result)
Date year Date_new
0 Sun, Dec 30 2018 Dec 30 2018
1 Mon, Dec 31 2018 Dec 31 2018

Return dataframe with range of dates

I need a Python function to return a Pandas DataFrame with range of dates, only year and month, for example, from November 2016 to March 2017 and have this as result:
year month
2016 11
2016 12
2017 01
2017 02
2017 03
My dates are in string format Y-m (from = '2016-11', to = '2017-03'). I'm not sure on turning them to datetime type or to separate them into two different integer values.
Any ideas on how to achieve it properly?
Are you looking at something like this?
pd.date_range('November 2016', 'April 2017', freq = 'M')
You get
DatetimeIndex(['2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28',
'2017-03-31'],
dtype='datetime64[ns]', freq='M')
To get dataframe
index = pd.date_range('November 2016', 'April 2017', freq = 'M')
df = pd.DataFrame(index = index)
pd.Series(pd.date_range('2016-11', '2017-4', freq='M').strftime('%Y-%m')) \
.str.split('-', expand=True) \
.rename(columns={0: 'year', 1: 'month'})
year month
0 2016 11
1 2016 12
2 2017 01
3 2017 02
4 2017 03
You can use a combination of pd.to_datetime and pd.date_range.
import pandas as pd
start = 'November 2016'
end = 'March 2017'
s = pd.Series(pd.date_range(*(pd.to_datetime([start, end]) \
+ pd.offsets.MonthEnd()), freq='1M'))
Construct a dataframe using the .dt accessor attributes.
df = pd.DataFrame({'year' : s.dt.year, 'month' : s.dt.month})
df
month year
0 11 2016
1 12 2016
2 1 2017
3 2 2017
4 3 2017

Calculating Elapsed Days From Pandas Dataframe Strings

I have a Pandas dataframe that stores travel dates of people. I'd like to add a column that shows the length of the stay. To do this the string needs to be parsed, converted to a datetime and subtracted. Pandas seems to be treating the datetime conversion as a whole series and not individual strings as a I get TypeError: must be string, not Series. I like to do this with a non-looping option as the actual dataset is quite large, but need a bit of help.
import pandas as pd
from datetime import datetime
df = pd.DataFrame(data=[['Bob', '12 Mar 2015 - 31 Mar 2015'], ['Jessica', '27 Mar 2015 - 31 Mar 2015']], columns=['Names', 'Day of Visit'])
df['Length of Stay'] = (datetime.strptime(df['Day of Visit'][:11], '%d %b %Y') - datetime.strptime(df['Day of Visit'][-11:], '%d %b %Y')).days + 1
print df
Desired Output:
Names Day of Visit Length of Stay
0 Bob 12 Mar 2015 - 31 Mar 2015 20
1 Jessica 27 Mar 2015 - 31 Mar 2015 5
Use Series.str.extract to split the Day of Visit column into two separate columns.
Then use pd.to_datetime to parse the columns as dates.
Computing the Length of Stay can then be done by subtracting the date columns and adding 1:
import numpy as np
import pandas as pd
df = pd.DataFrame(data=[['Bob', '12 Mar 2015 - 31 Mar 2015'], ['Jessica', '27 Mar 2015 - 31 Mar 2015']], columns=['Names', 'Day of Visit'])
tmp = df['Day of Visit'].str.extract(r'([^-]+)-(.*)', expand=True).apply(pd.to_datetime)
df['Length of Stay'] = (tmp[1] - tmp[0]).dt.days + 1
print(df)
yields
Names Day of Visit Length of Stay
0 Bob 12 Mar 2015 - 31 Mar 2015 20
1 Jessica 27 Mar 2015 - 31 Mar 2015 5
The regex pattern ([^-]+)-(.*) means
( # start group #1
[ # begin character class
^- # any character except a literal minus sign `-`
] # end character class
+ # match 1-or-more characters from the character class
) # end group #1
- # match a literal minus sign
( # start group #2
.* # match 0-or-more of any character
) # end group #2
.str.extract returns a DataFrame with the matching text from groups #1 and #2 in columns.
Solution
def length_of_stay(x):
start, end = [datetime.strptime(d, '%d %b %Y') for d in x.split(' - ')]
return end - start
df['Length of Stay'] = df['Day of Visit'].apply(length_of_stay)
print df

Categories

Resources