My Code:
import matplotlib.pyplot as plt
plt.style.use('seaborn-ticks')
import pandas as pd
import numpy as np
path = 'C:\\File\\Data.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '08/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/1000) < 60, 0, (df.volume/1000))
df.plot('Time','Price')
dff = df[df.Volume > 60].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
plt.subplots_adjust(left=0.05, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=None)
plt.show()
My Plot Output is as below:
The Output of dff Datframe as below:
Date Time Price Volume
0 08/02/2019 13:39:43 685.35 97.0
1 08/02/2019 13:39:57 688.80 68.0
2 08/02/2019 13:43:50 683.00 68.0
3 08/02/2019 13:43:51 681.65 92.0
4 08/02/2019 13:49:42 689.95 70.0
5 08/02/2019 13:52:00 695.20 64.0
6 08/02/2019 14:56:42 686.25 68.0
7 08/02/2019 15:03:15 685.35 63.0
8 08/02/2019 15:03:31 683.15 69.0
9 08/02/2019 15:08:08 684.00 61.0
I want to plot the Prices of this table as Vertical Lines as per the below image. Any Help..
Based on your image, I think you mean horizontal lines. Either way it's pretty simple, Pyplot has hlines/vlines builtins. In your case, try something like
plt.hlines(dff['Price'], '08/02/2019', '09/02/2019')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
path = 'File.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '05/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/7500) < 39, 0, (df.volume/7500))
df["Time"] = pd.to_datetime(df['Time'])
df.plot(x="Time",y='Price', rot=0)
plt.title("Date: " + str(df['Date'].iloc[0]))
dff = df[df.Volume > 39].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
dict = dff.to_dict('index')
for x in range(0, len(dict)):
plt.axhline(y=dict[x]['Price'],linewidth=2, color='blue')
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.95, top=0.96, wspace=None, hspace=None)
plt.show()
Related
I have a data frame which looks as given below.First, I wanted the count of each status in each date. For example number of 'COMPLETED' in 2017-11-02 is 2.I want a stack plot of such.
status start_time end_time \
0 COMPLETED 2017-11-01 19:58:54.726 2017-11-01 20:01:05.414
1 COMPLETED 2017-11-02 19:43:04.000 2017-11-02 19:47:54.877
2 ABANDONED_BY_USER 2017-11-03 23:36:19.059 2017-11-03 23:36:41.045
3 ABANDONED_BY_TIMEOUT 2017-10-31 17:02:38.689 2017-10-31 17:12:38.844
4 COMPLETED 2017-11-02 19:35:33.192 2017-11-02 19:42:51.074
Here is the csv for the dataframe:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
To achieve this:
df_['status'].astype('category')
df_ = df_.set_index('start_time')
grouped = df_.groupby('status')
color = {'COMPLETED':'green','ABANDONED_BY_TIMEOUT':'blue',"MISSED":'red',"ABANDONED_BY_USER":'yellow'}
for key_, group in grouped:
print(key_)
df_ = group.groupby(lambda x: x.date).count()
print(df_)
df_['status'].plot(label=key_,kind='bar',stacked=True,\
color=color[key_],rot=90)
plt.show()
The output of the following is :
ABANDONED_BY_TIMEOUT
status end_time
2017-10-31 1 1
ABANDONED_BY_USER
status end_time
2017-11-03 1 1
COMPLETED
status end_time
2017-11-01 1 1
2017-11-02 2 2
The problem here as we can see it is taking into account only last two dates '2017-11-01' and '2017-11-02' instead of all the dates in all the categories.
How can I solve this problem?I am welcome to a whole new approach for stacked plot.Thanks in advance.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')
grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count")).pivot(columns='status', index='date', values='count')
print(grouped)
sns.set()
grouped.plot(kind='bar', stacked=True)
# g = grouped.plot(x='date', kind='bar', stacked=True)
plt.show()
output:
Try restructuring df_ with pandas.crosstab instead:
color = ['blue', 'yellow', 'green', 'red']
df_xtab = pd.crosstab(df_.start_time.dt.date, df_.status)
This DataFrame will look like:
status ABANDONED_BY_TIMEOUT ABANDONED_BY_USER COMPLETED
start_time
2017-10-31 1 0 0
2017-11-01 0 0 1
2017-11-02 1 0 2
2017-11-03 0 1 0
and will be easier to plot.
df_xtab.plot(kind='bar',stacked=True, color=color, rot=90)
use seaborn library barplot with its hue
code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')
print(df_)
grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count"))
print(grouped)
g = sns.barplot(x='date', y='count', hue='status', data=grouped)
plt.show()
output:
data:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
My Goal is just to plot this simple data, as a graph, with x data being dates ( date showing in x-axis) and price as the y-axis. Understanding that the dtype of the NumPy record array for the field date is datetime64[D] which means it is a 64-bit np.datetime64 in 'day' units. While this format is more portable, Matplotlib cannot plot this format natively yet. We can plot this data by changing the dates to DateTime.date instances instead, which can be achieved by converting to an object array: which I did below view the astype('0'). But I am still getting
this error :
view limit minimum -36838.00750000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-DateTime value to an axis that has DateTime units
code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(r'avocado.csv')
df2 = df[['Date','AveragePrice','region']]
df2 = (df2.loc[df2['region'] == 'Albany'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2['Date'] = df2.Date.astype('O')
plt.style.use('ggplot')
ax = df2[['Date','AveragePrice']].plot(kind='line', title ="Price Change",figsize=(15,10),legend=True, fontsize=12)
ax.set_xlabel("Period",fontsize=12)
ax.set_ylabel("Price",fontsize=12)
plt.show()
df.head(3)
Unnamed: 0 Date AveragePrice Total Volume 4046 4225 4770 Total Bags Small Bags Large Bags XLarge Bags type year region
0 0 2015-12-27 1.33 64236.62 1036.74 54454.85 48.16 8696.87 8603.62 93.25 0.0 conventional 2015 Albany
1 1 2015-12-20 1.35 54876.98 674.28 44638.81 58.33 9505.56 9408.07 97.49 0.0 conventional 2015 Albany
2 2 2015-12-13 0.93 118220.22 794.70 109149.67 130.50 8145.35 8042.21 103.14 0.0 conventional 2015 Albany
df2 = df[['Date', 'AveragePrice', 'region']]
df2 = (df2.loc[df2['region'] == 'Albany'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2 = df2[['Date', 'AveragePrice']]
df2 = df2.sort_values(['Date'])
df2 = df2.set_index('Date')
print(df2)
ax = df2.plot(kind='line', title="Price Change")
ax.set_xlabel("Period", fontsize=12)
ax.set_ylabel("Price", fontsize=12)
plt.show()
output:
I have a dataframe which looks like this (left column is the index):
YYYY-MO-DD HH-MI-SS_SSS ATMOSPHERIC PRESSURE (hPa) mean
2016-11-07 14:00:00 1014.028782
2016-11-07 15:00:00 1014.034111
.... ....
2016-11-30 09:00:00 1006.516436
2016-11-30 10:00:00 1006.216156
Now I want to plot a colormap with this data - so I want to create an X (horizontal axis) to be just the dates:
2016-11-07, 2016-11-08,...,2016-11-30
and the Y (Vertical axis) to be the time:
00:00:00, 01:00:00, 02:00:00, ..., 23:00:00
And finally the Z (color map) to be the pressure data for each date and time [f(x,y)].
How can I arrange the data for this kind of plotting ?
Thank you !
With test data prepared like so:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
samples = 24 * 365
index = pd.date_range('2017-01-01', freq='1H', periods=samples)
data = pd.DataFrame(np.random.rand(samples), index=index, columns=['data'])
I would do something like this:
data = data.reset_index()
data['date'] = data['index'].apply(lambda x: x.date())
data['time'] = data['index'].apply(lambda x: x.time())
pivoted = data.pivot(index='time', columns='date', values='data')
fig, ax = plt.subplots(1, 1)
ax.imshow(pivoted, origin='lower', cmap='viridis')
plt.show()
Which produces:
To improve the axis labeling, this is a start:
ax.set_yticklabels(['{:%H:%M:%S}'.format(t) for t in data['time'].unique()])
ax.set_xticklabels(['{:%Y-%m-%d}'.format(t) for t in data['date'].unique()])
but you'll need to figure out how to choose how often a label appears with set_xticks() and set_yticks()
I have a DataFrame that looks like this when unstacked.
Start Date 2016-07-11 2016-07-12 2016-07-13
Period
0 1.000000 1.000000 1.0
1 0.684211 0.738095 NaN
2 0.592105 NaN NaN
I'm trying to plot it in Seaborn as a heatmap but it's giving me unintended results.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.array(data), columns=['Start Date', 'Period', 'Users'])
df = df.fillna(0)
df = df.set_index(['Start Date', 'Period'])
sizes = df['Users'].groupby(level=0).first()
df = df['Users'].unstack(0).divide(sizes, axis=1)
plt.title("Test")
sns.heatmap(df.T, mask=df.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.savefig(table._v_name + "fig.png")
I want it so that text doesn't overlap and there aren't 6 heat legends on the side. Also if possible, how do I fix the date so that it only displays %Y-%m-%d?
While exact reproducible data is not available, consider below using posted snippet data. This example runs a pivot_table() to achieve the structure as posted with StartDates across columns. Overall, your heatmap possibly outputs the multiple color bars and overlapping figures due to the unstack() processing where you seem to be dividing by users (look into seaborn.FacetGrid to split). So below runs the df as is through heatmap. Also, an apply() re-formats datetime to specified need.
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data = '''Period,StartDate,Value
0,2016-07-11,1.000000
0,2016-07-12,1.000000
0,2016-07-13,1.0
1,2016-07-11,0.684211
1,2016-07-12,0.738095
1,2016-07-13
2,2016-07-11,0.592105
2,2016-07-12
2,2016-07-13'''
df = pd.read_csv(StringIO(data))
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['StartDate'] = df['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
pvtdf = df.pivot_table(values='Value', index=['Period'],
columns='StartDate', aggfunc=sum)
print(pvtdf)
# StartDate 2016-07-11 2016-07-12 2016-07-13
# Period
# 0 1.000000 1.000000 1.0
# 1 0.684211 0.738095 NaN
# 2 0.592105 NaN NaN
sns.set()
plt.title("Test")
ax = sns.heatmap(pvtdf.T, mask=pvtdf.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.show()
I am plotting a simple chart and adding a number taken from a DataFrame, via plot.text(). The number is plotting as intended, but detail of its properties are also being displayed. I would like to suppress the display of properties and plot just the number.
The following code reproduces the issue.
import numpy as np
import pandas as pd
from pandas import *
import matplotlib.pyplot as plot
%matplotlib inline
rand = np.random.RandomState(1)
index = np.arange(8)
df = DataFrame(rand.randn(8, 1), index=index, columns=list('A'))
df['date'] = date_range('1/1/2014', periods=8)
print df
A date
0 1.624345 2014-01-01
1 -0.611756 2014-01-02
2 -0.528172 2014-01-03
3 -1.072969 2014-01-04
4 0.865408 2014-01-05
5 -2.301539 2014-01-06
6 1.744812 2014-01-07
7 -0.761207 2014-01-08
df2 = pd.DataFrame(index = ['1'], columns=['example'])
df2['example'] = 1.436792
print df2
example
1 1.436792
fig, ax = plot.subplots(figsize=(15,10))
df.plot(x='date', y='A')
plot.text(0.05, 0.95, df2['example'],
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes)
The plot is showing index, name and dtype data along with the example number. Can anybody show how to suppress this detail and just plot the number? Any help much appreciated.
Just plot with the DataFrame values:
plot.text(0.05, 0.95, df2['example'].values,
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes)
Or just set visible=True to hide everything:
plot.text(0.05, 0.95, df2['example'],
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes, visible=False)