I got a dataframe that looks like this. What I want to do is to:
Sort my DF by DateTime
After sorting my DF by date, adding a new column that counts and acummulates values for EACH rowname in "Cod_atc".
The problem is that everytime I add this new column, no matter what I do, I can not get my DF sorted by DateTime
This is the code I am using, I am just adding a column called "DateTime" and sorting everything by that column. The problem is when i add the new column called "count".
df1['DateTime'] = pd.to_datetime(df1['local_date'])
df1.sort_values(by='DateTime')
df1['count']=df1.groupby(['cod_atc']).cumcount() #sort=False
df1
This is the result I get and the problem is that, if I try to sort my DF by DateTime again, it works but the "count" column would not make any sense! "Count" column should be counting and acumulating values for EACH rowname in "COD_Atc" but following the DATETIME!
Did you not forgot to add inplace = True when you sorted df1?
Without that you lose the sort step.
df1['DateTime'] = pd.to_datetime(df1['local_date'])
df1.sort_values(by='DateTime', inplace =True)
df1['count']=df1.groupby(['cod_atc']).cumcount() #sort=False
df1
I have a dataframe that has a column 'mon/yr' that has month and year stored in this format Jun/19 , Jan/22,etc.
I want to Extract only these from that column - ['Jul/19','Oct/19','Jan/20','Apr/20','Jul/20','Oct/20','Jan/21','Apr/21','Jul/21','Oct/21','Jan/22']
and put them into a variable called 'dates' so that I can use it for plotting
My code which does not work -
dates = df["mon/yr"] == ['Jul/19','Oct/19','Jan/20','Apr/20','Jul/20','Oct/20','Jan/21','Apr/21','Jul/21','Oct/21','Jan/22']
This is a python code
this is how to filter rows
df.loc[df['column_name'].isin(some_values)]
Using your dates list, if we wanted to extract just 'Jul/20' and 'Oct/20' we can do:
import pandas as pd
df = pd.DataFrame(['Jul/19','Oct/19','Jan/20','Apr/20','Jul/20','Oct/20','Jan/21','Apr/21','Jul/21','Oct/21','Jan/22'], columns = ['dates'])
mydates = ['Jul/20','Oct/20']
df.loc[df['dates'].isin(mydates)]
which produces:
dates
4 Jul/20
5 Oct/20
So, for your actual use case, assuming that df is a pandas dataframe, and mon/yr is the name of the column, you can do:
dates = df.loc[df['mon/yr'].isin(['Jul/19','Oct/19','Jan/20','Apr/20','Jul/20','Oct/20','Jan/21','Apr/21','Jul/21','Oct/21','Jan/22'])]
I have a dataframe called df which looks like this:
and I have another dataframe called vix which looks like this:
previously I added the columns 'open ndaq' , 'open jpm' and 'open kya' like this:
df['open jpm'] = jpm_dataframe['open']
df['open ndaq'] = ndaq_dataframe['open']
df['open nya'] = nya_dataframe['open']
this worked since those dataframes had the exact same index as df (string date and time) however the vix dataframe has a date and time in a different format, how do I add the open column from vix to df such that it still corresponds to the same indices? i want the result to look like what I'd get if I could do
df['vix open'] = vix['open']
#(assuming vix and df somehow have the exact same index)
You need DatetimeIndex in both DataFrames:
vix.index = pd.to_datetime(vix.index, format='%m/%d/%Y')
df.index = pd.to_datetime(df.index)
A DataFrame has Date as Index. I need to add a column, value of the column should be days_since_epoch. This value can be calculated with
(date_value - datetime.datetime(1970,1,1)).days
How can this value be calculated for all rows in dataframe ?
Following code demonstrate the operation with a sample DataFrame, is there a better way of doing this ?
import pandas as pd
date_range = pd.date_range(start='1/1/1970', end='12/31/2018', freq='D')
df = pd.DataFrame(date_range, columns=['date'])
df['days_since_epoch']=range(0,len(df))
df = df.set_index('date')
Note : this is an example, dates in DataFrame need not start from 1st Jan 1970.
Subtract from Datetimeindex scalar and then call TimedeltaIndex.days:
df['days_since_epoch1']= (df.index - pd.Timestamp('1970-01-01')).days
I have data in excel like this
I want to combine columns of Date and Time using the following code
import pandas
df = pd.read_excel('selfmade.xlsx')
df['new'] = df['Date'].map(str) + df['Time'].map(str)
print(df)
but it prints the results like this.
I want the last column in format like 2016-06-14 10:00:00
What should I change in my code to get the desired results
I think you need to_datetime and to_timedelta, also is necessary convert Time column to string by astype:
df['new'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'].astype(str))
If dtype of Date column is already datetime:
df['new'] = df['Date'] + pd.to_timedelta(df['Time'].astype(str))