So I have the below question that needs to make 3 subplots using the data provided. The problem I have run into is that I need to select a specific genre from within the column 'genre_1', for each of the three subplots. I am unable to figure out how to select the specific data. I have provided an example of what the output should look like.
from plotly.subplots import make_subplots
import plotly.graph_objects as go
movies = {'title_year': {0: 2016, 1: 2016, 2: 2016, 3: 2016, 4:2016},'Title': {0: 'La La Land', 1: 'Zootopia',2: 'Lion',3: 'Arrival', 4: 'Manchester by the Sea'},'Runtime': {0: 128, 1: 108, 2: 118, 3: 116, 4: 137},'IMDb_rating': {0: 8.2, 1: 8.1, 2: 8.1, 3: 8.0, 4: 7.9},'genre_1': {0: 'Action',1: 'Animation',2: 'Biography',3: 'Drama',4: 'Drama'}}
# Create a subplot, using column, 'genre_1' for three genres - 'Action','Drama','Biography'
sub_fig = make_subplots(rows=1, cols=3)
fig.add_trace(go.Scatter(x='Runtime', y='IMDb_rating',row=1, col=1)
fig.add_trace(go.Scatter(x='Runtime', y='IMDb_rating',row=1, col=2)
fig.add_trace(go.Scatter(x='Runtime', y='IMDb_rating',row=1, col=3)
Output
This should work:
list_genre = list(df.genre_1.unique())
sub_fig = make_subplots(rows=1, cols=len(list_genre), subplot_titles= list_genre)
for i,genre in enumerate(list_genre):
sub_fig.add_trace(go.Scatter(x = df[df.genre_1==genre]["Runtime"],
y=df[df.genre_1==genre]["IMDb_rating"]),row=1, col=i+1)
sub_fig.show()
Output:
EDIT:
This is the code that you need:
genres_to_plot = ['Action','Drama','Biography']
subset_movies = movies[movies.genre_1.isin(genres_to_plot)].reset_index(drop=True)
fig = px.scatter(subset_movies, x = "Runtime", y = "IMDb_rating", color = "genre_1", facet_col="genre_1", height = 480, width = 850)
fig.show()
Output figure:
You simply needed to add the parameter facet_col to px.scatter. If you want a bubble plot add size="actor_1_facebook_likes".
Related
I have the following table as pandas:
>>>date hour plant1 plant2 plant3 plant4 ...
0 2019-06-23 07:00:00 251.2 232.7 145.1 176.7
1 2019-06-23 07:02:00 123.4 173.1 121.5 180.4
2 2019-06-23 07:04:00 240.1 162.7 140.1 199.5
3 2019-06-23 07:06:00 224.8 196.5 134.1 200.5
4 2019-06-23 07:08:00 124.3 185.4 132.3 190.1
...
I want to interactivly plot each plant (each column) to create line plot with all the columns that are plants.
plotting only one line with plotly works for me:
import plotly.express as px
fig = px.line(df, x=df.iloc[:,2], y=df.iloc[:,3])
fig.show()
but when I try to plot all the columns using iloc and put all the columns like this it failes:
fig = px.line(df, x=df.iloc[:,2], y=df.iloc[:,3:])
ValueError: All arguments should have the same length. The length of
column argument df[wide_variable_0] is 2814, whereas the length of
previously-processed arguments ['x'] is 201
I understand that for plotly doesn't understand my input of iloc to plot each column seperatly.
How do I tell it to plot each column as seperate line (e.g something like this but with my data and with line for each column, so instead of countries we will have the column name):
*this example is from plotly manual (https://plotly.com/python/line-charts/)
My endgoal: to plot each column as line for each plant column
edit: I have also tried to that that with pandas as describes here but for some reason when I try like this I get error:
dfs['2019-06-23'].iloc[:,2:].plot(kind='line')
>>>ImportError: matplotlib is required for plotting when the default backend "matplotlib" is selected.
but when I "change the order":
plt.plot(df.iloc[:,2:])
it works but is not interactive.
You can simply define the column names you'd like to plot in list(df.columns) by using to_plot = [v for v in list(df.columns) if v.startswith('plant')] and then use fig = px.line(df, x=df.index, y=to_plot) to get:
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'date': {0: '2019-06-23',
1: '2019-06-23',
2: '2019-06-23',
3: '2019-06-23',
4: '2019-06-23'},
'hour': {0: '07:00:00',
1: '07:02:00',
2: '07:04:00',
3: '07:06:00',
4: '07:08:00'},
'plant1': {0: 251.2, 1: 123.4, 2: 240.1, 3: 224.8, 4: 124.3},
'plant2': {0: 232.7, 1: 173.1, 2: 162.7, 3: 196.5, 4: 185.4},
'plant3': {0: 145.1, 1: 121.5, 2: 140.1, 3: 134.1, 4: 132.3},
'plant4': {0: 176.7, 1: 180.4, 2: 199.5, 3: 200.5, 4: 190.1}})
df['ix'] = df['date']+' ' +df['hour']
df['ix'] = pd.to_datetime(df['ix'])
to_plot = [v for v in list(df.columns) if v.startswith('plant')]
fig = px.line(df, x=df.index, y=to_plot)
fig.show()
could you provide a slice of your data?
I don't know what exactly you used as x. df.iloc[:,2] looks like plant1
generaly, newer version of plotly might take multiple y, older version might not; if updating the package still doesn't work, merge the dataframe like this:
list = # all the lines you want to draw, eg ['plant1','plant2']
df = pd.melt(df,
id_vars=["date", "hour"],
value_vars= list ,
var_name="plant_number",
value_name="y")
fig = px.line(df, x= "date", y="y" ,color = "plant_number")
fig.show()
I have a pandas dataframe. It looks like this:
Im trying to create scatter plot with different colors for each point. I tried:
df.plot.scatter(x='x',y='y',c=df.colors.tolist(), legend=True)
I get the scatter plot allright. But im not able to show label that "Financials" is associated with color '#4ce068'.
How do i do that here? Thanks
I tried:
df.plot.scatter(x='x',y='y',c=df.colors.tolist(),label=df.key.unique.tolist())
This almost works but the fact there there are too many labels and the colors associate with the label is hard to see.
I would like to have the key shows with associated color preferably on top of the chart i.e next to title. Is that possible?
My dataframe in dictionary format:
{'key': {0: 'Financials',
1: 'Consumer Discretionary',
2: 'Industrials',
3: 'Communication Services',
4: 'Communication Services',
5: 'Consumer Discretionary',
6: 'Health Care',
7: 'Information Technology',
8: 'Consumer Discretionary',
9: 'Information Technology'},
'x': {0: 1630000.0,
1: 495800000.0,
2: 562790000.0,
3: 690910000.0,
4: 690910000.0,
5: 753090000.0,
6: 947680000.0,
7: 1010000000.0,
8: 1090830000.0,
9: 1193600000.0},
'y': {0: 0.02549175,
1: 0.0383163,
2: -0.09842154,
3: 0.03876266,
4: 0.03596279,
5: 0.01897367,
6: 0.06159238,
7: 0.0291209,
8: 0.003931255,
9: -0.007134976},
'colors': {0: '#4ce068',
1: '#ecef4a',
2: '#ff3c83',
3: '#ff4d52',
4: '#ff4d52',
5: '#ecef4a',
6: '#00f1d1',
7: '#d9d9d9',
8: '#ecef4a',
9: '#d9d9d9'}}
If you don't mind using plotly you can try
import pandas as pd
import plotly.express as px
px.scatter(df, 'x', 'y', color="key", color_discrete_sequence=df["colors"].tolist())
But given your choice of colors just px.scatter(df, 'x', 'y', color="key") will look better
Here's one way to do it:
fig, ax = plt.subplots()
for i in df.colors.unique():
df_color = df[df.colors==i]
df_color.plot.scatter(x='x',y='y', ax=ax, c=i, label=df_color.key.iloc[0])
ax.legend(loc='best')
You'll get
I need to plot a pie chart using matplotlib but my DataFrame has 3 columns namely gender, segment and total_amount.
I have tried playing with plt.pie() arguments but it only takes x and labels for data. I tried setting gender as a legend but then it doesn't look right.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'gender': {0: 'Female',
1: 'Female',
2: 'Female',
3: 'Male',
4: 'Male',
5: 'Male'},
'Segment': {0: 'Gold',
1: 'Platinum',
2: 'Silver',
3: 'Gold',
4: 'Platinum',
5: 'Silver'},
'total_amount': {0: 2110045.0,
1: 2369722.0,
2: 1897545.0,
3: 2655970.0,
4: 2096445.0,
5: 2347134.0}})
plt.pie(data = df,x="claim_amount",labels="Segment")
plt.legend(d3.gender)
plt.show()
The result I want is a pie chart of total_amount and its labels as gender and segment. If I can get the percentage, it will be a bonus.
I suggest the following:
# Data to plot
# Take the information from the segment and label columns and join them into one string
labels = df["Segment"]+ " " + df["gender"].map(str)
# Extract the sizes of the segments
sizes = df["total_amount"]
# Plot with labels and percentage
plt.pie(sizes, labels=labels,autopct='%1.1f%%')
plt.show()
You should get this:
slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
I'd like to see this picture
But I don't know how to do it. I mean I want to have a red color for Russian people, Green color for USA people and yellow color for Chines.
My attemp to find solution:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white")
slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
obj = pd.DataFrame(slov)
palette=["g", "b", "r"]
obj['Color']='r'
row_index = obj.Country == 'Russia'
obj.loc[row_index, 'Color'] = 'r'
row_index = obj.Country == 'USA'
obj.loc[row_index, 'Color'] = 'g'
row_index = obj.Country == 'China'
obj.loc[row_index, 'Color'] = 'y'
g = sns.factorplot(x="People", y="Height", data=obj, kind='bar', palette=obj['Color'])
plt.show()
And maybe my solution is not very good. I added color to DataFrame. Maybe we can write this better. Maybe I don't need to add color to my DataFrame (It seems not very correct.). But How can I solve my task without adding these colors to my DataFrame?
You can use map by dict:
d = {'Russia':'r', 'USA':'g','China':'y'}
g = sns.factorplot(x="People",
y="Height",
data=obj,
kind='bar',
palette=obj['Country'].map(d))
plt.show()
Here is my data structure:
data = {'2013': {1:25,2:81,3:15}, '2014': {1:28, 2:65, 3:75}, '2015': {1:78,2:91,3:86 }}
My x-axis is the number [1,2,3]
My y-axis is the quantity of each number. For example: In 2013, 1 is x axis while its quantity is 25.
Print each individual graph for each year
I would like to graph a bar chart, which uses matplotlib with legend on it.
import matplotlib.pyplot as plt
import pandas as pd
data = {'2013': {1:25,2:81,3:15}, '2014': {1:28, 2:65, 3:75}, '2015': {1:78,2:91,3:86 }}
df = pd.DataFrame(data)
df.plot(kind='bar')
plt.show()
I like pandas because it takes your data without having to do any manipulation to it and plot it.
You can access the keys of a dictionary via dict.keys() and the values via dict.values()
If you wanted to plot, say, the data for 2013 you can do:
import matplotlib.pyplot as pl
x_13 = data['2013'].keys()
y_13 = data['2013'].values()
pl.bar(x_13, y_13, label = '2013')
pl.legend()
That should do the trick. More elegantly, do can simply do:
year = '2013'
pl.bar(data[year].keys(), data[year].values(), label=year)
which woud allow you to loop it:
for year in ['2013','2014','2015']:
pl.bar(data[year].keys(), data[year].values(), label=year)
You can do this a few ways.
The Functional way using bar():
data = {'2013': {1: 25, 2: 81, 3: 15}, '2014': {1: 28, 2: 65, 3: 75}, '2015': {1: 78, 2: 91, 3: 86}}
df = pd.DataFrame(data)
X_axis = np.arange(len(df))
plt.bar(X_axis - 0.1,height=df["2013"], label='2013',width=.1)
plt.bar(X_axis, height=df["2014"], label='2014',width=.1)
plt.bar(X_axis + 0.1, height=df["2015"], label='2015',width=.1)
plt.legend()
plt.show()
More info here.
The Object-Oriented way using figure():
data = {'2013': {1: 25, 2: 81, 3: 15}, '2014': {1: 28, 2: 65, 3: 75}, '2015': {1: 78, 2: 91, 3: 86}}
df = pd.DataFrame(data)
fig= plt.figure()
axes = fig.add_axes([.1,.1,.8,.8])
X_axis = np.arange(len(df))
axes.bar(X_axis -.25,df["2013"], color ='b', width=.25)
axes.bar(X_axis,df["2014"], color ='r', width=.25)
axes.bar(X_axis +.25,df["2015"], color ='g', width=.25)