plot histogram in python using csv file as input - python

I have a csv file which contains two columns where first column is fruit name and second column is count and I need to plot histogram using this csv as input to the code below. How do I make it possible. I just have to show first 20 entries where fruit names will be x axis and count will be y axis from entire csv file of 100 lines.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', header = None ,quoting=2)
data.hist(bins=10)
plt.xlim([0,100])
plt.ylim([50,500])
plt.title("Data")
plt.xlabel("fruits")
plt.ylabel("Frequency")
plt.show()
I edited the above program to plot a bar chart -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None)
data.values
print data
plt.bar(data[:,0], data[:,1], color='g')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
but this gives me an error 'Unhashable Type '. Can anyone help on this.

You can use the inbuilt plot of pandas, although you need to specify the first column is index,
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None, index_col =0)
data.plot(kind='bar')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
If you need to use matplotlib, it may be easier to convert the array to a dictionary using data.to_dict() and extract the data to numpy array or something.

Related

Plotting a heatmap using CSV file data in python

I have output nested dictionary variable called all_count_details_dictionary. Using that variable I saved data to the CSV file using the following command.
import pandas as pd
csv_path = '../results_v6/output_01.csv'
# creating pandas dataframe using concat mehtod to extract data from dictionary
df = pd.concat([pd.DataFrame(l) for l in all_count_details_dictionary],axis=1).T
# saving the dataframe to the csv file
df.to_csv(csv_path, index=True)
The output CSV file is just like as below
The CSV file can be download using this link
So I used the following code to plot a graph
import matplotlib.pyplot as plt
def extract_csv_gen_plot(csv_path):
length = 1503 #len(dataframe_colums_list)
data = np.genfromtxt(csv_path, delimiter=",", skip_header=True, usecols=range(3, (length+1)))
print(data)
# renaming data axes
#fig, ax = plt.subplots()
#fig.canvas.draw()
#labels =[item.get_text() for item in ax.get_xticklabels()]
#labels[1] = 'testing'
#ax.set_xticklabels(labels)
#ax.set_xticklabels(list)
#ax.set_yticklabels(list)
#plt.setp(ax.get_xticklabels(), rotation = 90)
plt.imshow(data, cmap='hot',interpolation='nearest')
plt.show()
I tried to get the column labels and case details labels into the graph axes, but it doesn't work out. Can anyone please tell me there is any other best method to plot this table into a heat map than this?
Thank you!
I would suggest using Pandas, the labels are picked up automatically:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def extract_csv_gen_plot(csv_path):
data = pd.read_csv(csv_path, index_col=1)
data = data.drop(data.columns[[0, 1]], axis=1)
data.index.names = ['Name']
g = sns.heatmap(data)
g.set_yticklabels(g.get_yticklabels(), rotation=0)
g.set_title('Heatmap')
plt.tight_layout()
plt.show()
extract_csv_gen_plot("output_01.csv")
I recommend using Seaborn, they have a heatmap plotting function that works very well with Pandas DataFrames
import seaborn as sns
sns.heatmap(data)
https://seaborn.pydata.org/generated/seaborn.heatmap.html

Plot Correlation Table imported from excel with Python

So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:

Make histogram from CSV file with python

I have written this code to perform a histogram from a .csv file however I do not get the histogram but as you see in the image
how can I fix it?
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('test.csv', header=None)
plt.hist(data)
plt.show()
The head lines in the .csv file are:
-95.725
-78.477
-77.976
-77.01
-73.161
-72.505
-71.794
-71.036
-70.653
-70.476
-69.32
-68.787
-68.234
-67.968
-67.742
-67.611
-67.577
-66.69
-66.381
-66.172
-66.072
-65.773
-64.969
-64.897
-64.603
I'm not sure if this will work, but try adding the keyword parameters bins='auto', density=True and histtype='step' to the plt.hist function.
For example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('test.csv', header=None)
plt.hist(data, bins='auto', density=True, histtype='step')
plt.show()
What they each do is:
bins='auto': Lets numpy automatically decide on the best bin edges;
density=True: Sets the area within the histogram to equal 1.0;
histtype='bar': Gives the bar style look for the histogram.
This and more can all be found in the matplotlib API.

draw line/scatter plot from specific cells in an excel file?

I have an excel file with my data in sheet named 'main'.
I want to plot a line plot (or scatter) for particular cells in the 'main' sheet
The data I want to use in 'main' is:
X-axis data is in column A i.e. from A36 to A136
and
Y-axis data is in column A i.e. from G36 to G136
Here is the code I used to make the simpler version of the plot
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
import pandas as pd
x = pd.read_excel('ob_half_cd100_titration.xlsx', 'test', parse_cols='A')
y = pd.read_excel('ob_half_cd100_titration.xlsx', 'test', parse_cols='B')
plt.plot(x, y)
plt.show()
The final figure should look like the following image (made from the 'test' sheet):
Link to the excel file :
https://www.dropbox.com/s/2pq4pzq7y7ng29e/ob_half_cd100_titration.xlsx?dl=0
Use a slice of the data:
plt.plot(x[35:136], y[35:136])

How to plot a Python Dataframe with category values like this picture?

How can I achieve that using matplotlib?
Here is my code with the data you provided. As there's no class [they are all different, despite your first example in your question does have classes], I gave colors based on the numbers. You can definitely start alone from here, whatever result you want to achieve. You just need pandas, seaborn and matplotlib:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import xls
df=pd.read_excel('data.xlsx')
# exclude Ranking values
df1 = df.ix[:,1:-1]
# for each element it takes the value of the xls cell
df2=df1.applymap(lambda x: float(x.split('\n')[1]))
# now plot it
df_heatmap = df2
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(df_heatmap, square=True, ax=ax, annot=True, fmt="1.3f")
plt.yticks(rotation=0,fontsize=16);
plt.xticks(fontsize=12);
plt.tight_layout()
plt.savefig('dfcolorgraph.png')
Which produces the following picture.

Categories

Resources