I have the data frame below with dates ranging from 2016-01-01 to 2021-03-27
timestamp close circulating_supply issuance_native
0 2016-01-01 0.944695 7.389026e+07 26070.31250
1 2016-01-02 0.931646 7.391764e+07 27383.90625
2 2016-01-03 0.962863 7.394532e+07 27675.78125
3 2016-01-04 0.944515 7.397274e+07 27420.62500
4 2016-01-05 0.950312 7.400058e+07 27839.21875
I'm looking to filter this dataframe by Month & Day to look at the circulating supply on December 31st for each year.
here is an output of the datatypes of the data frame
timestamp datetime64[ns]
close float64
circulating_supply float64
issuance_native float64
dtype: object
I'm able to pull single rows using this:
ts = pd.to_datetime('2016-12-31')
df.loc[df['timestamp'] == td]
but no luck passing in a list of datetimes inside df.loc[]
The result should look like this, showing the rows for December 31st of each year:
timestamp close circulating_supply issuance_native
0 2016-31-12 0.944695 7.389026e+07 26070.31250
1 2017-31-12 0.931646 7.391764e+07 27383.90625
2 2018-31-12 0.962863 7.394532e+07 27675.78125
3 2019-31-12 0.944515 7.397274e+07 27420.62500
4 2020-31-12 0.950312 7.400058e+07 27839.21875
This is the closest Ive gotten but I get this error
#query dataframe for the circulating supply at the end of the year
circulating_supply = df.query("timestamp == '2016-12-31' or timestamp =='2017-12-31' or timestamp =='2018-12-31' or timestamp =='2019-12-31' or timestamp =='2020-12-31' or timestamp =='2021-03-01'")
circulating_supply.drop(columns=['close', 'issuance_native'], inplace=True)
circulating_supply.copy()
circulating_supply.head()
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py:4308: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
return super().drop(
Try something like this:
end_of_year = [
pd.to_datetime(ts)
for ts in [
"2016-12-31",
"2017-12-31",
"2018-12-31",
"2019-12-31",
"2020-12-31",
"2021-03-01",
]
]
end_of_year_df = df.loc[df["timestamp"].isin(end_of_year), :]
circulating_supply = end_of_year_df.drop(columns=["close", "issuance_native"])
circulating_supply.head()
I was able to solve this by ignoring the error I got when using the .drop() function on my df.query result
#query dataframe for the circulating supply at the end of the year
circulating_supply = df.query("timestamp == '2016-12-31' or timestamp =='2017-12-31' or timestamp =='2018-12-31' or timestamp =='2019-12-31' or timestamp =='2020-12-31' or timestamp =='2021-03-01'")
circulating_supply.drop(columns=['close', 'issuance_native'], inplace=True)
circulating_supply.copy() #not sure if this did anything
circulating_supply.head()
#add the column
yearly_issuance['EOY Supply'] = circulating_supply['circulating_supply'].values
yearly_issuance.head()
Related
I am trying to filter a DataFrame to only show values 1-hour before and 1-hour after a specified time/date, but am having trouble finding the right function for this. I am working in Python with Pandas.
The posts I see regarding masking by date mostly cover the case of masking rows between a specified start and end date, but I am having trouble finding help on how to mask rows based around a single date.
I have time series data as a DataFrame that spans about a year, so thousands of rows. This data is at 1-minute intervals, and so each row corresponds to a row ID, a timestamp, and a value.
Example of DataFrame:
ID timestamp value
0 2011-01-15 03:25:00 34
1 2011-01-15 03:26:00 36
2 2011-01-15 03:27:00 37
3 2011-01-15 03:28:00 37
4 2011-01-15 03:29:00 39
5 2011-01-15 03:30:00 29
6 2011-01-15 03:31:00 28
...
I am trying to create a function that outputs a DataFrame that is the initial DataFrame, but only rows for 1-hour before and 1-hour after a specified timestamp, and so only rows within this specified 2-hour window.
To be more clear:
I have a DataFrame that has 1-minute interval data throughout a year (as exemplified above).
I now identify a specific timestamp: 2011-07-14 06:15:00
I now want to output a DataFrame that is the initial input DataFrame, but now only contains rows that are within 1-hour before 2011-07-14 06:15:00, and 1-hour after 2011-07-14 06:15:00.
Do you know how I can do this? I understand that I could just create a filter where I get rid of all values before 2011-07-14 05:15:00 and 2011-07-14 07:15:00, but my goal is to have the user simply enter a single date/time (e.g. 2011-07-14 06:15:00) to produce the output DataFrame.
This is what I have tried so far:
hour = pd.DateOffset(hours=1)
date = pd.Timestamp("2011-07-14 06:15:00")
df = df.set_index("timestamp")
df([date - hour: date + hour])
which returns:
File "<ipython-input-49-d42254baba8f>", line 4
df([date - hour: date + hour])
^
SyntaxError: invalid syntax
I am not sure if this is really only a syntax error, or something deeper and more complex. How can I fix this?
Thanks!
You can do with:
import pandas as pd
import datetime as dt
data = {"date": ["2011-01-15 03:10:00","2011-01-15 03:40:00","2011-01-15 04:10:00","2011-01-15 04:40:00","2011-01-15 05:10:00","2011-01-15 07:10:00"],
"value":[1,2,3,4,5,6]}
df=pd.DataFrame(data)
df['date']=pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S', errors='ignore')
date_search= dt.datetime.strptime("2011-01-15 05:20:00",'%Y-%m-%d %H:%M:%S')
mask = (df['date'] > date_search-dt.timedelta(hours = 1)) & (df['date'] <= date_search+dt.timedelta(hours = 1))
print(df.loc[mask])
result:
date value
3 2011-01-15 04:40:00 4
4 2011-01-15 05:10:00 5
I am running into an error with a grouped by date dataframe:
byDate = df.groupby('Date').count()
Date Value
2019-08-15 2
2019-08-19 1
2019-08-23 7
2019-08-28 4
2019-09-04 7
2019-09-09 2
I know that type(df["Date"].iloc[0])
returns datetime.date
I want to plot the data in such a way, that days, for which no value is available are shown as 0.
I have played around with
ax = sns.lineplot(x=byDate.index.fillna(0), y="Value", data=byDate)
I am however only able to get this output, where the y-axis indicates that a line is not drawn to 0 for days for which no value is available.
Have you ever tried creating a new dataFrame object indexed by all the dates ranging from startDate to endDate and then filling in the missing values with 0.0?
The output would looks something like:
dates = pd.to_datetime(['2019-08-15','2019-08-19','2019-08-23','2019-08-28','2019-09-04','2019-09-09']).date
byDate = pd.DataFrame({'Value':[2,1,7,4,7,2]},index=dates)
startDate = byDate.index.min()
endDate = byDate.index.max()
newDates = pd.date_range(startDate,endDate, periods=(endDate - startDate).days).date.tolist()
newDatesDf = pd.DataFrame( index=newDates)
newByDate = pd.concat([newDatesDf,byDate],1).fillna(0)
sns.lineplot(x=newByDate.index, y="Value", data=newByDate)
output
I am trying to add a set of common date related columns to my data frame and my approach to building these date columns is off the .date_range() pandas method that will have the date range for my dataframe.
While I can use methods like .index.day or .index.weekday_name for general date columns, I would like to set a business day column based on date_range I constructed, but not sure if I can use the freq attribute nickname 'B' or if I need to create a new date range.
Further, I am hoping to not count those business days based on a list of holiday dates that I have.
Here is my setup:
Holiday table
holiday_table = holiday_table.set_index('date')
holiday_table_dates = holiday_table.index.to_list() # ['2019-12-31', etc..]
Base Date Table
data_date_range = pd.date_range(start=date_range_start, end=date_range_end)
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# Business day
df['business_day'] = data_date_range.freq("B")
Error at df['business_day'] = data_date_range.freq("B"):
---> 13 df['business_day'] = data_date_range.freq("B")
ApplyTypeError: Unhandled type: str
OK, I think I understand your question now. You are looking to create a a new column of working business days (excluding your custom holidays). In my example i just used the regular US holidays from pandas but you already have your holidays as a list in holiday_table_dates but you should still be able to follow the general layout of my example for your specific use. I also used the assumption that you are OK with boolean values for your business_day column:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as h_cal
# sample data
data_date_range = pd.date_range(start='1/1/2019', end='12/31/2019')
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# this is just a sample using US holidays
hday = h_cal().holidays(df.index.min(), df.index.max())
# b is the same date range as bove just with the freq set to business days
b = pd.date_range(start='1/1/2019', end='12/31/2019', freq='B')
# find all the working business day where b is not a holiday
bday = b[~b.isin(hday)]
# create a boolean col where the date index is in your custom business day we just created
df['bday'] = df.index.isin(bday)
day_index weekday_name bday
date
2019-01-01 1 Tuesday False
2019-01-02 2 Wednesday True
2019-01-03 3 Thursday True
2019-01-04 4 Friday True
2019-01-05 5 Saturday False
This is related to a previous question which I asked here (pandas average by timestamp and day of the week).
Here, I perform a groupby operation as follows:
df = pd.DataFrame(np.random.random(2838),index=pd.date_range('2019-09-13 12:40:00', periods=2838, freq='5T'))
# Reset the index
df.reset_index(inplace=True)
df.groupby(df.index.dt.strftime('%A %H:%M')).mean()
df.reset_index(inplace=True)
Now if I check the data types of the column, we have:
index object
0 float64
The column does not retain its datetime data type. How can I still preserve the column data type?
I wouldn't do grouping like that, instead, I would do double grouping/indexing:
days = df.index.day_name()
times = df.index.time
df.groupby([days,times]).mean()
which gives (head):
0
Friday 00:00:00 0.524322
00:05:00 0.857684
00:10:00 0.593461
00:15:00 0.755158
00:20:00 0.049511
where the first level index is the (string) day names, and second level index are datetime type.
I have the following column in a dataframe, I would like to add a column to the end of this dataframe, where the column has the business days from today (6/24) to the previous day.
Bday() function does not seem to have this capability.
Date
2019-6-21
2019-6-20
2019-6-14
I am looking for a result that looks like following:
Date Business days
2019-6-21 1
2019-6-20 2
2019-6-14 6
Is there an easy way to do this, other than doing individual manipulations or using datetime library
Use np.busday_count:
# df['Date'] = pd.to_datetime(df['Date']) # if needed
np.busday_count(df['Date'].dt.date, np.datetime64('today'))
# array([1, 2, 6])
df['bdays'] = np.busday_count(df['Date'].dt.date, np.datetime64('today'))
df
Date bdays
0 2019-06-21 1
1 2019-06-20 2
2 2019-06-14 6