How do you convert a date to a number? [duplicate] - python

This question already has answers here:
How to calculate number of days between two given dates
(15 answers)
Closed 1 year ago.
How do you convert a pandas dataframe column from a date formatted as below to a number as shown below:
date
0 4/5/2010
1 9/26/2014
2 8/3/2010
To this
date newFormat
0 4/5/2010 40273
1 9/26/2014 41908
2 8/3/2010 40393
Where the second columns is the number of days since 1/1/1900.

Use:
data['newFormat'] = data['Date'].dt.strftime("%Y%m%d").astype(int)
This has been answered before:
Pandas: convert date 'object' to int
enter link description here

Related

Pyspark dataframe sum variable up to the current row's month [duplicate]

This question already has answers here:
Calculating Cumulative sum in PySpark using Window Functions
(2 answers)
Closed 3 months ago.
I have a pyspark dataframe that looks as follows:
date, loan
1.1.2020, 0
1.2.2020, 0
1.3.2020, 0
1.4.2020, 10000
1.5.2020, 200
1.6.2020, 0
I would like to have the fact that they took out a loan in month 4 to reflect on the other later months as well. So the resulting dataframe would be:
date, loan
1.1.2020, 0
1.2.2020, 0
1.3.2020, 0
1.4.2020, 10000
1.5.2020, 10200
1.6.2020, 10200
Is there any simple way to do this in pyspark? Thanks.
#Ehrendil - do you want to calculate running total ..
select date,loan,
sum(loan) over(order by date row between unbounded preceding and current row) as running_total from table

Select rows with conditions based on two columns(Start date and end date) [duplicate]

This question already has answers here:
pandas: multiple conditions while indexing data frame - unexpected behavior
(5 answers)
Pandas slicing/selecting with multiple conditions with or statement
(1 answer)
Closed 2 years ago.
I have a dataframe which looks like this:
id start_date end_date
0 1 2017/06/01 2021/05/31
1 2 2018/10/01 2022/09/30
2 3 2015/01/01 2019/02/28
3 4 2017/11/01 2021/10/31
Can anyone tell me how i will slice the rows only for the start date which is 2017/06/01 and end date which is 2021/10/31 only.

How to concat hour with date in python [duplicate]

This question already has an answer here:
Python: Adding hours to pandas timestamp
(1 answer)
Closed 3 years ago.
I have a pandas dataframe where date and hour is in two different columns as shown below -
I want to concat these two columns to have a new datatime column where I can apply pandas window/shift functions. Please share your views.
date hour
0 20190409 0
1 20190409 0
2 20190409 0
3 20190409 0
4 20190409 0
Use pandas.to_datetime and pd.to_timedelta and add them together:
df['datetime'] = pd.to_datetime(df['date'], format='%Y%m%d') + pd.to_timedelta(df['hour'], unit='H')

How do I clean phone numbers in pandas [duplicate]

This question already has answers here:
How to only do string manupilation on column of pandas that have 4 digits or less?
(3 answers)
Closed 3 years ago.
I have a pandas dataframe with a column for Phone however, the data is a bit inconsistent. Here are some examples that I would like to focus on.
df["Phone"]
0 732009852
1 738073222
2 755920306
3 0755353288
Row 3 has the necessary leading 0 for an Australian number. How do I update rows like 0,1 and 2?
Use pandas.Series.str.zfill:
s = pd.Series(['732009852', '0755353288'])
s.str.zfill(10)
Output:
0 0732009852
1 0755353288
Or pd.Series.str.rjust:
print(df["Phone"].str.rjust(10, '0'))
Output:
0 0732009852
1 0738073222
2 0755920306
3 0755353288

Python pandas. how to delete date rows by condition? [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Pandas filter dataframe rows with a specific year
(2 answers)
Closed 4 years ago.
i have a dataframe with dates as index, dates from 2013 year to 2018 year
how can i delete all rows where year < 2018?
enter image description here

Categories

Resources