Python Plot Data From CSV - python

enter image description here
Hi everyone, I'm trying to plot a graph data from CSV. There are 7 columns in my CSV. I've already plot the Genre column with my code:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import numpy as np
df = pd.read_csv('booksellers.csv')
genre = df['Genre']
countFiction = 0
countNonFiction = 0
for i in genre:
if i == "Fiction":
countFiction+=1
else:
countNonFiction+=1
labels = 'Fiction','Non Fiction'
sizes = [countFiction,countNonFiction]
fig1, ax1 = plt.subplots()
ax1.pie(sizes,labels=labels,startangle=90,autopct='%1.1f%%')
plt.show()
Now, I want to plot another 2 columns which are 'Author' and the average of 'User Rating'. If the Author is duplicated, how can I get only one Author with their average user rating? And also what kind of graph is compatible with it?

# you can iterate line by line
from statistics import mean
data = {}
for index, row in df.iterrows():
author = row['Author']
if not author in data:
data[author] = {'rating':[]}
data[author].append(row['User Rating'])
rates_by_authors = {}
for k in data.keys()
rates_by_authors[k] = means(data[k])
# after create the data with that code
# you can use list(rates_by_authors.keys()) that is author's list as a X axis
# you can use list(rates_by_authors.values() ) that is average of ratings by authors list as a Y axis

Related

Setting order to categorical data while creating count plot ( ) results in erroneous plot

I have a column in my dataframe for different age groups which has 174 rows of data. A sample is as follows:
enter image description here
My code to print a countplot() of this column is below:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
sns.set(rc = {'figure.figsize':(15,8)})
sns.countplot(data = variable, x = 'Q2')
plt.title("Respondents as per age group")
The resulting output is:
Count of respondents as per age-group
As can be seen the order of the categories in the x axis is not right.
I used the following code the set the order:
age_order = ['15-19','20-29','30-39','40-49','50-59','60 and above']
sns.countplot(x = 'Q2', data = variable, order = age_order)
plt.title("Respondents as per age group")
But the result is:
Count of respondents as per age-group
Which is obviously incorrect.
I have no idea why this is happening and how to rectify this.

Graph the average line using sns lineplot

I have a dataframe that looks like this:
id|date |amount
1 |02-04-18|3000
1 |05-04-19|5000
1 |10-04-19|2600
2 |10-04-19|2600
2 |11-04-19|3000
I want to the amount spent over time for each unique id and have an average trend line. This is the code that I have:
import matplotlib.pyplot as plt
import pandas as pd
temp_m = df.pivot_table(index='id',columns='id',values='amount', fill_value=0)
temp_m = pd.melt(temp, id_vars=['id'])
temp_m['date'] = temp_m['date'].astype('str')
fig, ax = plt.subplots(figsize=(20,10))
for i, group in temp_m.groupby('id'):
group.plot('id', y='amount', ax=ax,legend=None)
plt.xticks(rotation = 90)
Each line is a unique customer.
Goal: I want to add another line that is the average of all the individual customer trends.
Also if there is a better way to graph the individual lines as well please let me know
At first we reshape the data
agg = df.set_index(['date', 'id']).unstack()
agg.columns = agg.columns.get_level_values(-1)
This makes plotting very easy:
sns.lineplot(data=agg)
The average trends can be calculated by
from sklearn.linear_model import LinearRegression
regress = {}
idx = agg.index.to_julian_date()[:, None]
for c in agg.columns:
regress[c] = LinearRegression().fit(idx, agg[c].fillna(0)).predict(idx)
trend = pd.Series(pd.DataFrame(regress).mean(axis=1).values, agg.index)

problem in plotting multiple lists using matplotlib

I am writing a script which can be used to plot the country wise covid time-series data. It is working fine when I plot a single country but The scale at Y-axis is in appropriately printed.
Plot which I am getting The Problem is after printing the maximum value for one country the y axis is extrapolated with smaller values to plot the data points of subsequent countries.
The code for my script is as follows
import requests
from contextlib import closing
import csv
import matplotlib.pyplot as plt
url = "https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv"
def prepareCountryWiseData(country):
countryWise = {}
with closing(requests.get(url, stream=True)) as r:
f = (line.decode('utf-8') for line in r.iter_lines())
reader = csv.reader(f, delimiter=',', quotechar='"')
active = []
recovered = []
dates = []
for row in reader:
if row[1] == country:
dates.append(row[0])
active.append(row[2])
recovered.append(row[3])
return (dates, active, recovered)
def plotCountryWiseData(countryList):
plotable = []
for country in countryList:
dates,active,recovered = (prepareCountryWiseData(country))
plt.plot(active)
plt.ylabel('active_cases')
plt.legend(countryList)
plt.show()
plotCountryWiseData(['India','US','Italy'])
If you can use the pandas module your job would be much easier:
import pandas as pd, matplotlib.pyplot as plt
url = "https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv"
df = pd.read_csv(url)
fig,ax = plt.subplots()
for k,g in df[df['Country'].isin(['India','US','Italy'])].groupby('Country'):
ax = g.plot(ax=ax,kind='line',x='Date',y='Confirmed',label=k)
plt.gcf().suptitle('Active Cases')
plt.show()
Result:

Add text to each point in a seaborn.relplot

I am using below code below to generate a relplot:
df = pd.read_csv(r"train.csv")
df.head()
p1=sns.relplot(x="OS_Packages",y="Vulnerabilities",hue="OS_Distro",
size="High_Vulnerabilities",sizes=(400,1000), data = df)
plt.show()
I need to add text to each points in the plot. How I can do that? I have searched but only got results only for regplot. I am looking for adding text to the points of a relplot.
As noted in this answer, you have to access the axes of the FacetGrid that is returned by the relplot.
A simple reproduction of your question with a point annotated:
import seaborn as sns
import pandas as pd
d = {'OS_Packages':[0,1,2,4], 'Vulnerabilities': [6,7,3,7],
'text':['point1','point2','point3','point4']}
df = pd.DataFrame(d)
p1 = sns.relplot(x='OS_Packages', y='Vulnerabilities',data=df )
ax = p1.axes[0,0]
for idx,row in df.iterrows():
x = row[0]
y = row[1]
text = row[2]
ax.text(x+.05,y,text, horizontalalignment='left')
This will return the following:

Create a stacked graph or bar graph using plotly in python

I have data like this :
[ ('2018-04-09', '10:18:11',['s1',10],['s2',15],['s3',5])
('2018-04-09', '10:20:11',['s4',8],['s2',20],['s1',10])
('2018-04-10', '10:30:11',['s4',10],['s5',6],['s6',3]) ]
I want to plot a stacked graph preferably of this data.
X-axis will be time,
it should be like this
I created this image in paint just to show.
X axis will show time like normal graph does( 10:00 ,April 3,2018).
I am stuck because the string value (like 's1',or 's2' ) will change in differnt bar graph.
Just to hard code and verify,I try this:
import plotly
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
plotly.offline.init_notebook_mode()
def createPage():
graph_data = []
l1=[('com.p1',1),('com.p2',2)('com.p3',3)]
l2=[('com.p1',1),('com.p4',2)('com.p5',3)]
l3=[('com.p2',8),('com.p3',2)('com.p6',30)]
trace_temp = go.Bar(
x='2018-04-09 10:18:11',
y=l1[0],
name = 'top',
)
graph_data.append(trace_temp)
plotly.offline.plot(graph_data, filename='basic-scatter3.html')
createPage()
Error I am getting is Tuple Object is not callable.
So can someone please suggest some code for how I can plot such data.
If needed,I may store data in some other form which may be helpful in plotting.
Edit :
I used the approach suggested in accepted answer and succeed in plotting using plotly like this
fig=df.iplot(kin='bar',barmode='stack',asFigure=True)
plotly.offline.plt(fig,filename="stack1.html)
However I faced one error:
1.When Time intervals are very close,Data overlaps on graph.
Is there a way to overcome it.
You could use pandas stacked bar plot. The advantage is that you can create with pandas easily the table of column/value pairs you have to generate anyhow.
from matplotlib import pyplot as plt
import pandas as pd
all_data = [('2018-04-09', '10:18:11', ['s1',10],['s2',15],['s3',5]),
('2018-04-09', '10:20:11', ['s4',8], ['s2',20],['s1',10]),
('2018-04-10', '10:30:11', ['s4',10],['s5',6], ['s6',3]) ]
#load data into dataframe
df = pd.DataFrame(all_data, columns = list("ABCDE"))
#combine the two descriptors
df["day/time"] = df["A"] + "\n" + df["B"]
#assign each list to a new row with the appropriate day/time label
df = df.melt(id_vars = ["day/time"], value_vars = ["C", "D", "E"])
#split each list into category and value
df[["category", "val"]] = pd.DataFrame(df.value.values.tolist(), index = df.index)
#create a table with category-value pairs from all lists, missing values are set to NaN
df = df.pivot(index = "day/time", columns = "category", values = "val")
#plot a stacked bar chart
df.plot(kind = "bar", stacked = True)
#give tick labels the right orientation
plt.xticks(rotation = 0)
plt.show()
Output:

Categories

Resources