using index in calculations, pandas [duplicate] - python

This question already has answers here:
How to directly use Pandas Date-time index in calculations?
(1 answer)
selecting from multi-index pandas
(7 answers)
Closed 4 years ago.
I have a df that contains a date index and another column which is a different date. I would like to add a column to my df that is the difference between these two dates in days. How can one use the index in the computation directly without having to bring it into the df as a column?
MWE:
df = pd.DataFrame(data = {"val": [1,2,3,4,5], "some_date": np.arange("2000-02-01", "2000-02-06", dtype="datetime64[D]")}, index = pd.date_range(start = "2000-01-01", end = "2000-01-05", periods = 5, name="date"))
#would like to do something like this
df["delta"] = df["some_date"] - df["date"] #produces an error
What's the best way to access the index in calculations of this type?

Related

How to convert pandas dataframe column to string and delete some text of column in pandas dataframe [duplicate]

This question already has answers here:
Extracting the hour from a time column in pandas
(3 answers)
Convert string to timedelta in pandas
(4 answers)
Closed 2 years ago.
I want to convert each value in a pandas dataframe column to a string and then delete some text. The values are times. For example, if the value is 11:21, I would like to delete every to the right of the : in every element in the column. 11:21 should be converted to 11.
Let's say you have following dataset:
df = pd.DataFrame({
'time': ['09:30:00','09:40:01','09:50:02','10:00:03']
})
df.head()
Output:
If you want to work with time column as a string, following code may be used:
df['hour'] = df['time'].apply(lambda time : time.split(':')[0])
df.head()
Output:
Alternatively time can be converted to datetime and hour can be extracted:
df['hour'] = pd.to_datetime(df['time'], format='%H:%M:%S').dt.hour
df.head()
Output:

Find months between dates pandas [duplicate]

This question already has an answer here:
Create date range list with pandas
(1 answer)
Closed 2 years ago.
I have a large DataFrame with two columns - start_date and finish_date with dates in string format. f.e. "2018-06-01"
I want to create third column with list of months between two dates.
So, if I have a start_date - "2018-06-01", finish_date - "2018-08-01", in the third column I expect ["2018-06-01", "2018-07-01", "2018-08-01"]. Day doesn't matter for me, so we can delete it.
I find many ways to do it for simple strings, but no one to do it for pandas DataFrame.
Pandas has a function called apply which allows you to apply logic to every row of a dataframe.
We can use dateutil to get all months between the start and end date, then apply the logic to every row of your dataframe as a new column.
import pandas as pd
import time
import datetime
from dateutil.rrule import rrule, MONTHLY
#Dataframe creation, this is just for the example, use the one you already have created.
data = {'start': datetime.datetime.strptime("10-10-2020", "%d-%m-%Y"), 'end': datetime.datetime.strptime("10-12-2020", "%d-%m-%Y")}
df = pd.DataFrame(data, index=[0])
#df
# start end
#0 2020-10-10 2020-12-10
# Find all months between the start and end date, apply to every row in the dataframe. Result is a list.
df['months'] = df.apply(lambda x: [date.strftime("%m/%Y") for date in rrule(MONTHLY, dtstart=x.start, until=x.end)], axis = 1)
#df
# start end months
#0 2020-10-10 2020-12-10 [10/2020, 11/2020, 12/2020]

How can I filter this DataFrame by month? [duplicate]

This question already has answers here:
How to filter a dataframe of dates by a particular month/day?
(3 answers)
Closed 2 years ago.
I have a DataFrame with a column that holds time values with a '%dd/%mm/YY %hh:%ss' format (e.g.: '31/12/2018 23:35'). I want to get a dataframe with only a determined month.
image of the DF
you can filter an entire dataframe by columns value with something like
filtered_df = df.iloc[df["month"] == desired_cell_value]
Edit for OP's comment: Wants to know how to iterate through
date_col = "col_name"
df["mmyyyy"] = pd.DatetimeIndex(df[date_col]).month + pd.DatetimeIndex(df[date_col]).year * 100
values = pd.unique(df[date_col].values)
for date in values:
filtered_df = df.iloc[df["mmyyyy"] == date]
#do your stuff
create a column that's Month/Year. Then you can create a list of month/years in the df and iterate through one by one.

Panda: Summing multiple columns in dataframe to a new column [duplicate]

This question already has answers here:
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 4 years ago.
I want to sum multiple columns of dataframe to a new column. For 2 columns I was using this.
import pandas as pd, numpy as np
df=pd.read_csv("Calculation_test.csv")
#creating new colums
df["Test1"] = 0
#sum of 2 columns
df["Test1"]= df['col1']+df['col2']
df.to_csv('test_cal.csv', index=False)
But, for my project, I need to do sums of around 15-20 columns. Every time I do not want to write df['col1']+df['col2']+......................
I have the list of columns, which I have to add. Like:
'col1'+'col2'+ 'col5'+'col8'+----+'col18'
or like this:
'col1', 'col2', 'col5', 'col8',----,'col18'
How can I use this list directly to do the sum of columns?
Try slicing the columns:
import pandas as pd
df = pd.read_csv("whatever.csv")
df.loc[:,'col1':'col18'].sum(axis = 1)

Loosing the column name with getting a subset of a dataframe [duplicate]

This question already has answers here:
Keep selected column as DataFrame instead of Series
(5 answers)
Closed 5 years ago.
I'm going to get a subset of my dataframe, but it's column name disappears. This is the code I use:
my_dataframe = df["column1"]
How can have the subset of df in my_dataframe without loosing the column name?
This gives a Series:
my_series = df["column1"]
but this a sub-dataframe:
my_dataframe = df[["column1"]]
which shows the column name.

Categories

Resources