I am trying to plot the graph bellow using python, but I am getting an error.
The Python commands I am using are:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data/filtro_bovespa_final.csv')
data.loc[(data['codigo'] == 'BBAS3') & (data['codigo'] == 'BBDC4')]
data.date = pd.to_datetime(data['date'],format='%Y%m%d')
data.set_index(['date','codigo'])
plt.plot(data.date,data.preco)
plt.show()
The error I am getting is:
I got this graph, but it is not what I need:
The csv file I am using: Bovespa
I need a graph that allows me to compare the price linked with both the codes (BBAS3 and BBDC4) as the first graph I showed.
What else should I do to get the graph I need?
To draw them by attribute, we use a pivot to turn the data frames into columns by attribute. I've also changed the extraction condition to OR.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('./Data/filtro_bovespa_final.csv')
data.date = pd.to_datetime(data['date'],format='%Y%m%d')
data = data.loc[(data['codigo'] == 'BBAS3') | (data['codigo'] == 'BBDC4')]
data.set_index('date', inplace=True)
data = data.pivot(columns='codigo')
data.columns = ['BBAS3','BBDC4']
data.plot()
plt.show()
Related
I have output nested dictionary variable called all_count_details_dictionary. Using that variable I saved data to the CSV file using the following command.
import pandas as pd
csv_path = '../results_v6/output_01.csv'
# creating pandas dataframe using concat mehtod to extract data from dictionary
df = pd.concat([pd.DataFrame(l) for l in all_count_details_dictionary],axis=1).T
# saving the dataframe to the csv file
df.to_csv(csv_path, index=True)
The output CSV file is just like as below
The CSV file can be download using this link
So I used the following code to plot a graph
import matplotlib.pyplot as plt
def extract_csv_gen_plot(csv_path):
length = 1503 #len(dataframe_colums_list)
data = np.genfromtxt(csv_path, delimiter=",", skip_header=True, usecols=range(3, (length+1)))
print(data)
# renaming data axes
#fig, ax = plt.subplots()
#fig.canvas.draw()
#labels =[item.get_text() for item in ax.get_xticklabels()]
#labels[1] = 'testing'
#ax.set_xticklabels(labels)
#ax.set_xticklabels(list)
#ax.set_yticklabels(list)
#plt.setp(ax.get_xticklabels(), rotation = 90)
plt.imshow(data, cmap='hot',interpolation='nearest')
plt.show()
I tried to get the column labels and case details labels into the graph axes, but it doesn't work out. Can anyone please tell me there is any other best method to plot this table into a heat map than this?
Thank you!
I would suggest using Pandas, the labels are picked up automatically:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def extract_csv_gen_plot(csv_path):
data = pd.read_csv(csv_path, index_col=1)
data = data.drop(data.columns[[0, 1]], axis=1)
data.index.names = ['Name']
g = sns.heatmap(data)
g.set_yticklabels(g.get_yticklabels(), rotation=0)
g.set_title('Heatmap')
plt.tight_layout()
plt.show()
extract_csv_gen_plot("output_01.csv")
I recommend using Seaborn, they have a heatmap plotting function that works very well with Pandas DataFrames
import seaborn as sns
sns.heatmap(data)
https://seaborn.pydata.org/generated/seaborn.heatmap.html
I have currently started a project where I need to evaluate and plot data using python. The csv-file that I have to plot are structured like this:
date,ch1,ch2,ch3,date2
11:56:20.149766,0.909257531,0.909420371,1.140183687, 13:56:20.149980
11:56:20.154008,0.895447016,0.895601869,1.122751355, 13:56:20.154197
11:56:20.157245,0.881764293,0.881911397,1.105638862, 13:56:20.157404
11:56:20.160590,-0.009178977,-0.000108901,-1.486875653, 13:56:20.160750
11:56:20.190473,-1.473576546,-1.477073431,-1.846657276, 13:56:20.190605
11:56:20.193810,-1.460405469,-1.463766813,-1.8300246, 13:56:20.193933
11:56:20.197139,-1.447362065,-1.450844049,-1.813711882, 13:56:20.197262
11:56:20.200480,-1.434574604,-1.437921286,-1.797878742, 13:56:20.200604
11:56:20.203803,-1.422042727,-1.425382376,-1.782045603, 13:56:20.203926
11:56:20.207136,-1.40951097,-1.412971258,-1.7663728, 13:56:20.207258
11:56:20.210472,-0.436505407,-0.438260257,-0.54675138, 13:56:20.210595
11:56:20.213804,0.953246772,0.953690529,1.19551909, 13:56:20.213921
11:56:20.217136,0.93815738,0.938464701,1.176487565, 13:56:20.217252
11:56:20.220472,0.923707485,0.924006522,1.158255577, 13:56:20.220590
11:56:20.223807,0.909385324,0.909676254,1.140343547, 13:56:20.223922
11:56:20.227132,0.895447016,0.895729899,1.122911215, 13:56:20.227248
11:56:20.230466,0.881892085,0.882039428,1.105798721, 13:56:20.230582
I can already read the file and print it using pandas:
df = pd.read_csv (r'F:\Schule\HTL\Diplomarbeit\aw_python\datei_meas.csv')
print (df)
But now I want to plot the file using matplotlib. The first column date should be in the x axis and column 2,3 and 4 should be the y-values of different graphs.
I hope that anyone can help me with my problem.
Kind regards
Matthias
Edit:
This is what I have tried to convert the date-column into a readable file-format:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
import pandas as pd
import matplotlib.dates as mdates
df = pd.read_csv (r'F:\Schule\HTL\Diplomarbeit\aw_python\datei_meas.csv')
print (df)
x_list = df.date
y = df.ch1
x = mdates.date2num(x_list)
plt.scatter(x,y)
plt.show
And this is the occurring error message:
d = d.astype('datetime64[us]')
ValueError: Error parsing datetime string " 11:56:20.149766" at position 3
I got errors everytime I try to combine graphs together using plotly. I have no problem when it's just x1,y1. But when I try to have x1,x2,.. and so on, it starts giving me the error as mentioned in the title. Here is my code:
import pandas as pd
import plotly
#plotly.offline.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import chart_studio.plotly as py
excel_file = 'C:\\Users\\Taffy R. Mantang\\Desktop\\matrixtester.csv'
df = pd.read_csv(excel_file)
df.head()
data0 = [go.Scatter(x=df['Date'],y=df['0/0'],mode='lines',name='0/0')]
data1 = [go.Scatter(x=df['Date'],y=df['0/1'],mode='lines',name='0/1')]
data2 = [go.Scatter(x=df['Date'],y=df['0/2'],mode='lines',name='0/2')]
data3 = [go.Scatter(x=df['Date'],y=df['0/3'],mode='lines',name='0/3')]
layout = go.Layout(title='processor ISW-1',plot_bgcolor='rgb(230,230,230)',showlegend=True)
fig = go.Figure(data=[data0,data1,data2,data3],layout=layout)
py.offline.plot(fig)
When I only plot data0, or data1 and so on, it works. But when I try data = [data0,data1,data2,data3] it gives me the error.
What exactly is the problem? Help :'(((
I used the code from this website:
https://chart-studio.plotly.com/~notebook_demo/84.embed
You can create separate traces using the fig.add_trace() command, so this will allow you to create multiple traces of graphs on a single plot. This website should help: https://plotly.com/python/creating-and-updating-figures/#adding-traces,
In your instance you could do
import pandas as pd
import plotly
#plotly.offline.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import chart_studio.plotly as py
excel_file = 'C:\\Users\\Taffy R. Mantang\\Desktop\\matrixtester.csv'
df = pd.read_csv(excel_file)
df.head()
data0 = go.Scatter(x=df['Date'],y=df['0/0'],mode='lines',name='0/0')
data1 = go.Scatter(x=df['Date'],y=df['0/1'],mode='lines',name='0/1')
data2 = go.Scatter(x=df['Date'],y=df['0/2'],mode='lines',name='0/2')
data3 = go.Scatter(x=df['Date'],y=df['0/3'],mode='lines',name='0/3')
layout = go.Layout(title='processor ISW-1',plot_bgcolor='rgb(230,230,230)',showlegend=True)
fig = go.Figure(layout=layout)
fig.add_trace(data0)
fig.add_trace(data1)
fig.add_trace(data2)
fig.add_trace(data3)
plotly.offline.plot(fig)
So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:
I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())
Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output: