I have a data frame like that (it's just the head) :
Timestamp Function_code Node_id Delta
0 2000-01-01 10:39:51.790683 Tx_PDO_2 54 551.0
1 2000-01-01 10:39:51.791650 Tx_PDO_2 54 601.0
2 2000-01-01 10:39:51.792564 Tx_PDO_3 54 545.0
3 2000-01-01 10:39:51.793511 Tx_PDO_3 54 564.0
There are only two types of Function_code : Tx_PDO_2 and Tx_PDO_3
I plot in two windows, a graph with Timestamp on the x-axis and Delta on the y-axis. One for Tx_PDO_2 and the other for Tx_PDO_3 :
delta_rx_tx_df.groupby("Function_code").plot(x="Timestamp", y="Delta", )
Now, I want to know which window corresponds to which Function_code
I tried to use title=delta_rx_tx_df.groupby("Function_code").groups but it did not work.
There may be a better way, but for starters, you can assign the titles to the plots after they are created:
plots = delta_rx_tx_df.groupby("Function_code").plot(x="Timestamp", y="Delta")
plots.reset_index()\
.apply(lambda x: x[0].set_title(x['Function_code']), axis=1)
Related
I've a dataframe like this:
DATE
VALUE
TYPE
2021-01-11
57
A
2021-02-11
34
B
2021-03-11
43
A
2021-04-11
15
B
...
My question is how I could plot a bar graph, the mean monthly ordered by date of course and grouped by 'TYPE'
I'm using Pandas with this extract of code:
df = df.set_index('DATE')
df.index = pd.to_datetime(df.index)
df = df.resample('M').mean()
df.plot(kind='bar',stacked=True)
I want to draw a stacked bar plot but I don't know how...
Not sure if I understand correctly but if you want to stack values by types with date as x-axis, I would use pivot (do not set index first):
df = df.pivot('DATE', 'TYPE', 'VALUE')
df.plot(kind='bar', stacked=True, rot=0)
plt.show()
With the slightly edited table to show the stacking:
DATE
VALUE
TYPE
2021-01-11
57
A
2021-02-11
34
A
2021-02-11
12
B
2021-03-11
43
A
2021-04-11
15
B
You get the following:
I am trying to filter a DataFrame to only show values 1-hour before and 1-hour after a specified time/date, but am having trouble finding the right function for this. I am working in Python with Pandas.
The posts I see regarding masking by date mostly cover the case of masking rows between a specified start and end date, but I am having trouble finding help on how to mask rows based around a single date.
I have time series data as a DataFrame that spans about a year, so thousands of rows. This data is at 1-minute intervals, and so each row corresponds to a row ID, a timestamp, and a value.
Example of DataFrame:
ID timestamp value
0 2011-01-15 03:25:00 34
1 2011-01-15 03:26:00 36
2 2011-01-15 03:27:00 37
3 2011-01-15 03:28:00 37
4 2011-01-15 03:29:00 39
5 2011-01-15 03:30:00 29
6 2011-01-15 03:31:00 28
...
I am trying to create a function that outputs a DataFrame that is the initial DataFrame, but only rows for 1-hour before and 1-hour after a specified timestamp, and so only rows within this specified 2-hour window.
To be more clear:
I have a DataFrame that has 1-minute interval data throughout a year (as exemplified above).
I now identify a specific timestamp: 2011-07-14 06:15:00
I now want to output a DataFrame that is the initial input DataFrame, but now only contains rows that are within 1-hour before 2011-07-14 06:15:00, and 1-hour after 2011-07-14 06:15:00.
Do you know how I can do this? I understand that I could just create a filter where I get rid of all values before 2011-07-14 05:15:00 and 2011-07-14 07:15:00, but my goal is to have the user simply enter a single date/time (e.g. 2011-07-14 06:15:00) to produce the output DataFrame.
This is what I have tried so far:
hour = pd.DateOffset(hours=1)
date = pd.Timestamp("2011-07-14 06:15:00")
df = df.set_index("timestamp")
df([date - hour: date + hour])
which returns:
File "<ipython-input-49-d42254baba8f>", line 4
df([date - hour: date + hour])
^
SyntaxError: invalid syntax
I am not sure if this is really only a syntax error, or something deeper and more complex. How can I fix this?
Thanks!
You can do with:
import pandas as pd
import datetime as dt
data = {"date": ["2011-01-15 03:10:00","2011-01-15 03:40:00","2011-01-15 04:10:00","2011-01-15 04:40:00","2011-01-15 05:10:00","2011-01-15 07:10:00"],
"value":[1,2,3,4,5,6]}
df=pd.DataFrame(data)
df['date']=pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S', errors='ignore')
date_search= dt.datetime.strptime("2011-01-15 05:20:00",'%Y-%m-%d %H:%M:%S')
mask = (df['date'] > date_search-dt.timedelta(hours = 1)) & (df['date'] <= date_search+dt.timedelta(hours = 1))
print(df.loc[mask])
result:
date value
3 2011-01-15 04:40:00 4
4 2011-01-15 05:10:00 5
I have the following dataset:
dataset.head(7)
Transaction_date Product Product Code Description
2019-01-01 A 123 A123
2019-01-02 B 267 B267
2019-01-09 B 267 B267
2019-02-11 C 139 C139
2019-02-11 A 125 C125
2019-02-12 C 139 C139
2019-02-12 A 123 A123
The dataset stores transaction information, for which a transaction date is available. In other words, not for all days, data is available.
Ultimately, I want to create a time series plot, showing me the number of transactions per day.
So far, I have done a simple countplot:
ax = sns.countplot(x=dataset["Transaction_date"],data=dataset)
This plot shows me the dates, where a transaction happened. But I would prefer to see also the dates, where no transaction has happened in a plot, preferably shown as 0.
I have tried the following, but retrieve an error message:
groupbydate = dataset.groupby("Transaction_date")
ax = sns.tsplot(x="Transaction_date",y="Product",data=groubydate.fillna(0))
But I get the error
cannot label index with a null key
Due to restrictions, I can only use seaborn 0.8.1
I believe reindex should work for you:
# First convert the index to datetime
dataset.index = pd.DatetimeIndex(dataset.index)
# Then reindex! You can also select the min and max of the index for the limits
dataset= dataset.reindex(pd.date_range("2019-01-01", "2019-02-12"), fill_value="NaN")
You can drop the rows containing NaN values using pandas.DataFrame.dropna, and then plot the chart. For example:
dataset.dropna(thresh=2)
will drop all rows where there are at least two NaN values.
You may also want to fill the NaN values using pandas.DataFrame.fillna
I am struggling to do aggregation on timedelta including plotting. The raw data is available here
Essentially the data has a submit (datetime) , resolved (datetime) , PauseTime (timedelta) and Resolved-Submit-Pause ( which is the actual time to resolve )
click here for data
test_df = pd.read_csv('test_df.csv')
#convert to date time stamps
test_df[['Submit','Resolved']] = test_df[['Submit','Resolved']].apply(pd.to_datetime)
#CONVERT PauseTime and Resolved-Submit-Pausetime to Timedelta
test_df['PauseTime']=pd.to_timedelta(test_df['PauseTime'])
test_df['Resolved-Submit-Pausetime'] = pd.to_timedelta(test_df['Resolved-Submit-Pausetime'])
I am trying to aggregate mean for each day of 'Resolved'
test_df.groupby([pd.Grouper(key='Resolved', freq='D')])['Resolved-Submit-Pausetime'].mean()
which gives me an error - 'DataError: No numeric types to aggregate'
1) How can I aggregate on mean .
2) Also some guidance for plotting trend of the mean time to resolve (x axis will have all the dates and y axis agg mean timedelta of 'Resolved-Submit-Pausetime')
Use this step to convert your time delta column into seconds:
test_df['Resolved-Submit-Pausetime'] = test_df['Resolved-Submit-Pausetime'].astype('timedelta64[s]')
0 1234.0
1 27380.0
2 33017.0
3 5454.0
4 433.0
5 2302.0
6 21753.0
7 3405.0
8 4779.0
9 3974.0
10 3389.0
11 114.0
Name: Resolved-Submit-Pausetime, dtype: float64
Then run your groupby statement to compute the mean:
test_df.groupby([pd.Grouper(key='Resolved', freq='D')])['Resolved-Submit-Pausetime'].mean()
Resolved
2017-04-01 20543.666667
2017-04-02 7485.500000
2017-04-03 3132.200000
Name: Resolved-Submit-Pausetime, dtype: float64
You can use Pandas built in plotting tools to do a quick and dirty plot of mean time with respect to the groupby day:
test_df.groupby([pd.Grouper(key='Resolved', freq='D')])['Resolved-Submit-Pausetime'].mean().plot()
I have two data frames in pandas one with four timesseries with second by second data like the following
timestamp ID value1 value2 value3 value4
2016/01/01T01:01:01 1234 100 50 50 60
2016/01/01T01:01:02 1234 101 48 48 52
2016/01/01T01:01:02 1234 101 48 48 52
....
and a second with averages from selected intervals
ID start_time end_time avg_value1 avg_value2 avg_value3 avg_value4
1234 01:01:01 01:01:15 100.1 50.2 49 55
...
I would like to plot these two as timeseries superimposed over each other with the averages appearing as flat lines starting at start_time and ending at end_time. How would I go about doing this in the latest version of pandas?
The easiest way is to put all the data into a single DataFrame and use the built-in .plot() method.
Assuming your original DataFrame is called df the the code below should solve your issue (you might need to strip out the "ID" column):
means = df.groupby(pd.TimeGrouper('15s')).mean()
means.columns = ['avg_'+col for col in df.columns]
merged_df = pd.concat([df, means], axis=1).fillna(method='ffill')
merged_df.plot()
Using some intraday 1s candle stock data, you get something like this:
If you want to further customize your plots I am afraid you will have to spend a few hours/days studying the basics of matplotlib.