How to give different titles to matplotlib plots when parsing from csv? - python

I wrote the python code below to read a CSV and generate multiple plots at once. However, I can't figure out how to give matplotlib the Location value from the csv I'm reading as I would like to have each plot labeled with the location value that I am aggregating on. Please see the code and screenshot below.
import pandas as pd
import datetime
import csv
import matplotlib.pyplot as plt
from google.colab import files
import matplotlib.pyplot as plt
import numpy as np
myfile = files.upload()
df = pd.read_csv('Electricity.csv')
df2 = df.groupby(['Location'], as_index=False)
print(df2)
df2.plot(x='Period', y = 'score', ylim=(0,11))#IDK about ylim being necessary
plt.title('Location', color='black')
plt._show()
Plots

Get the axes retured by df2.plot, and since you're grouping by Location you'll have something similar to:
Location NaN
0 Location1 AxesSubplot(0.125,0.11;0.775x0.77)
1 Location2 AxesSubplot(0.125,0.11;0.775x0.77)
2 Location3 AxesSubplot(0.125,0.11;0.775x0.77)
Use the apply function with axis=1 to get current axis and the location. With the axis object use set_title using the Location for each plot.
...
df2.apply(print)
ax = df2.plot(x='Period', y='score', ylim=(0, 11))
ax.apply(lambda x: x[1].set_title(x['Location']), axis=1)
plt.show()

Related

Plotting a heatmap using CSV file data in python

I have output nested dictionary variable called all_count_details_dictionary. Using that variable I saved data to the CSV file using the following command.
import pandas as pd
csv_path = '../results_v6/output_01.csv'
# creating pandas dataframe using concat mehtod to extract data from dictionary
df = pd.concat([pd.DataFrame(l) for l in all_count_details_dictionary],axis=1).T
# saving the dataframe to the csv file
df.to_csv(csv_path, index=True)
The output CSV file is just like as below
The CSV file can be download using this link
So I used the following code to plot a graph
import matplotlib.pyplot as plt
def extract_csv_gen_plot(csv_path):
length = 1503 #len(dataframe_colums_list)
data = np.genfromtxt(csv_path, delimiter=",", skip_header=True, usecols=range(3, (length+1)))
print(data)
# renaming data axes
#fig, ax = plt.subplots()
#fig.canvas.draw()
#labels =[item.get_text() for item in ax.get_xticklabels()]
#labels[1] = 'testing'
#ax.set_xticklabels(labels)
#ax.set_xticklabels(list)
#ax.set_yticklabels(list)
#plt.setp(ax.get_xticklabels(), rotation = 90)
plt.imshow(data, cmap='hot',interpolation='nearest')
plt.show()
I tried to get the column labels and case details labels into the graph axes, but it doesn't work out. Can anyone please tell me there is any other best method to plot this table into a heat map than this?
Thank you!
I would suggest using Pandas, the labels are picked up automatically:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def extract_csv_gen_plot(csv_path):
data = pd.read_csv(csv_path, index_col=1)
data = data.drop(data.columns[[0, 1]], axis=1)
data.index.names = ['Name']
g = sns.heatmap(data)
g.set_yticklabels(g.get_yticklabels(), rotation=0)
g.set_title('Heatmap')
plt.tight_layout()
plt.show()
extract_csv_gen_plot("output_01.csv")
I recommend using Seaborn, they have a heatmap plotting function that works very well with Pandas DataFrames
import seaborn as sns
sns.heatmap(data)
https://seaborn.pydata.org/generated/seaborn.heatmap.html

Plot Correlation Table imported from excel with Python

So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:

Pandas line plot suppresses half of the xticks, how to stop it?

I am trying to make a line plot in which every one of the elements from the index appears as an xtick.
import pandas as pd
ind = ['16-12', '17-01', '17-02', '17-03', '17-04',
'17-05','17-06', '17-07', '17-08', '17-09', '17-10', '17-11']
data = [1,3,5,2,3,6,4,7,8,5,3,8]
df = pd.DataFrame(data,index=ind)
df.plot(kind='line',x_compat=True)
however the resultant plot skips every second element of the index like so:
My code to call the plot includes the (x_compat=True) parameter which the documentation for pandas suggests should stop the auto tick configuratioin but it seems to have no effect.
You need to use ticker object on axis and then use that axis when plotting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ind = ['16-12', '17-01', '17-02', '17-03', '17-04',
'17-05','17-06', '17-07', '17-08', '17-09', '17-10', '17-11']
data = [1,3,5,2,3,6,4,7,8,5,3,8]
df = pd.DataFrame(data,index=ind)
ax2 = plt.axes()
ax2.xaxis.set_major_locator(ticker.MultipleLocator(1))
df.plot(kind='line', ax=ax2)

How to plot a Python Dataframe with category values like this picture?

How can I achieve that using matplotlib?
Here is my code with the data you provided. As there's no class [they are all different, despite your first example in your question does have classes], I gave colors based on the numbers. You can definitely start alone from here, whatever result you want to achieve. You just need pandas, seaborn and matplotlib:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import xls
df=pd.read_excel('data.xlsx')
# exclude Ranking values
df1 = df.ix[:,1:-1]
# for each element it takes the value of the xls cell
df2=df1.applymap(lambda x: float(x.split('\n')[1]))
# now plot it
df_heatmap = df2
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(df_heatmap, square=True, ax=ax, annot=True, fmt="1.3f")
plt.yticks(rotation=0,fontsize=16);
plt.xticks(fontsize=12);
plt.tight_layout()
plt.savefig('dfcolorgraph.png')
Which produces the following picture.

plot histogram in python using csv file as input

I have a csv file which contains two columns where first column is fruit name and second column is count and I need to plot histogram using this csv as input to the code below. How do I make it possible. I just have to show first 20 entries where fruit names will be x axis and count will be y axis from entire csv file of 100 lines.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', header = None ,quoting=2)
data.hist(bins=10)
plt.xlim([0,100])
plt.ylim([50,500])
plt.title("Data")
plt.xlabel("fruits")
plt.ylabel("Frequency")
plt.show()
I edited the above program to plot a bar chart -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None)
data.values
print data
plt.bar(data[:,0], data[:,1], color='g')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
but this gives me an error 'Unhashable Type '. Can anyone help on this.
You can use the inbuilt plot of pandas, although you need to specify the first column is index,
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None, index_col =0)
data.plot(kind='bar')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
If you need to use matplotlib, it may be easier to convert the array to a dictionary using data.to_dict() and extract the data to numpy array or something.

Categories

Resources