scatter plot with different colors and labels - python

I have a pandas dataframe. It looks like this:
Im trying to create scatter plot with different colors for each point. I tried:
df.plot.scatter(x='x',y='y',c=df.colors.tolist(), legend=True)
I get the scatter plot allright. But im not able to show label that "Financials" is associated with color '#4ce068'.
How do i do that here? Thanks
I tried:
df.plot.scatter(x='x',y='y',c=df.colors.tolist(),label=df.key.unique.tolist())
This almost works but the fact there there are too many labels and the colors associate with the label is hard to see.
I would like to have the key shows with associated color preferably on top of the chart i.e next to title. Is that possible?
My dataframe in dictionary format:
{'key': {0: 'Financials',
1: 'Consumer Discretionary',
2: 'Industrials',
3: 'Communication Services',
4: 'Communication Services',
5: 'Consumer Discretionary',
6: 'Health Care',
7: 'Information Technology',
8: 'Consumer Discretionary',
9: 'Information Technology'},
'x': {0: 1630000.0,
1: 495800000.0,
2: 562790000.0,
3: 690910000.0,
4: 690910000.0,
5: 753090000.0,
6: 947680000.0,
7: 1010000000.0,
8: 1090830000.0,
9: 1193600000.0},
'y': {0: 0.02549175,
1: 0.0383163,
2: -0.09842154,
3: 0.03876266,
4: 0.03596279,
5: 0.01897367,
6: 0.06159238,
7: 0.0291209,
8: 0.003931255,
9: -0.007134976},
'colors': {0: '#4ce068',
1: '#ecef4a',
2: '#ff3c83',
3: '#ff4d52',
4: '#ff4d52',
5: '#ecef4a',
6: '#00f1d1',
7: '#d9d9d9',
8: '#ecef4a',
9: '#d9d9d9'}}

If you don't mind using plotly you can try
import pandas as pd
import plotly.express as px
px.scatter(df, 'x', 'y', color="key", color_discrete_sequence=df["colors"].tolist())
But given your choice of colors just px.scatter(df, 'x', 'y', color="key") will look better

Here's one way to do it:
fig, ax = plt.subplots()
for i in df.colors.unique():
df_color = df[df.colors==i]
df_color.plot.scatter(x='x',y='y', ax=ax, c=i, label=df_color.key.iloc[0])
ax.legend(loc='best')
You'll get

Related

matplotlib DateFormatter not showing correct dates with yyyy-mm-dd column

I have a python dataframe with two columns, a numeric column (total_outbounds) on the y-axis and a date column (month, pardon the bad name) for x-axis:
and when when I run this code to create a graph using this dataframe:
fig,ax = plt.subplots()
my_df.plot(x='month', y='total_outbounds', ax=ax, label = 'Total Email Outbounds on LE Change')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%y'))
plt.xlabel('')
plt.title('Total LE Changes and Outbounds by Month', pad = 10)
I receive a graph where the X-axis is not what I was hoping for... Am I using mdates.DateFormatter wrong? Looking to receive mm/yy on the X-Axis, instead of the Apr, Jul, etc. that are currently appearing.
For reproducibility, here is the dataframe output with my_df.to_dict()
{'month': {0: Timestamp('2020-01-01 00:00:00'),
1: Timestamp('2020-02-01 00:00:00'),
2: Timestamp('2020-03-01 00:00:00'),
3: Timestamp('2020-04-01 00:00:00'),
4: Timestamp('2020-05-01 00:00:00'),
5: Timestamp('2020-06-01 00:00:00'),
6: Timestamp('2020-07-01 00:00:00'),
7: Timestamp('2020-08-01 00:00:00'),
8: Timestamp('2020-09-01 00:00:00'),
9: Timestamp('2020-10-01 00:00:00'),
10: Timestamp('2020-11-01 00:00:00'),
11: Timestamp('2020-12-01 00:00:00'),
12: Timestamp('2021-01-01 00:00:00'),
13: Timestamp('2021-02-01 00:00:00'),
14: Timestamp('2021-03-01 00:00:00')},
'total_outbounds': {0: 26364,
1: 33081,
2: 35517,
3: 34975,
4: 40794,
5: 51659,
6: 50948,
7: 65332,
8: 82839,
9: 96408,
10: 86923,
11: 99176,
12: 122199,
13: 116057,
14: 108439}}
and I think you should be able to use pd.DataFrame.from_dict() to turn that back into a dataframe my_df from the dictionary. Please let me know if there's a more reproducible way to share the dataframe.
Edit: the solution in the comments works, however now I cannot rotate the minor ticks using plt.xaxis(rotation=50), this only rotates the two major ticks... also the X-axis values appearing are odd (showing 71 as the year?)
As discussed in the comments, the Apr/Jul/Oct are minor ticks.
However, rather than customizing both major/minor ticks, I suggest increasing the major tick frequency, disabling minor ticks, and using autofmt_xdate() to style the date ticks:
fig, ax = plt.subplots()
ax.plot(df.month, df.total_outbounds, label='Total Email Outbounds on LE Change')
ax.legend()
# increase the major tick frequency (8 ticks in this example)
start, end = ax.get_xlim()
xticks = np.linspace(start, end, 8)
ax.set_xticks(xticks)
ax.set_xticklabels(xticks)
# set date format
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%y'))
# use matplotlib's auto date styling
fig.autofmt_xdate()
# disable minor ticks
plt.minorticks_off()
I also had this issue. Finally fixed it and hope my little experience can help you here, and hope this can be more clear.
The reason causing this is that your 'dates' inside your [month] column are strings, not Datetime. Then the Dateformatter don't recognise the 'dates' in the way you want.
So what you need to do first is to transform the format of the 'dates' in your [month] column into Datetime objects. To do this, simply use:
df['month'] = pd.to_datetime(df['month'])
(please be aware that this line of code may be stoped if you set xlim using Matplotlib 3.4.2, but works fine using Matplotlib 3.3.0 )
then use this transformed df['month'] as x-axis to plot (for exp):
fig, ax = plt.subplots(figsize=(12, 12))
ax.plot(
df['month'],
df['total_outbounds']
)
Then you can add the formatting (for exp):
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b, %Y"))
Then everything just worked out, at lest on my Mac.

Creating column based subplots

So I have the below question that needs to make 3 subplots using the data provided. The problem I have run into is that I need to select a specific genre from within the column 'genre_1', for each of the three subplots. I am unable to figure out how to select the specific data. I have provided an example of what the output should look like.
from plotly.subplots import make_subplots
import plotly.graph_objects as go
movies = {'title_year': {0: 2016, 1: 2016, 2: 2016, 3: 2016, 4:2016},'Title': {0: 'La La Land', 1: 'Zootopia',2: 'Lion',3: 'Arrival', 4: 'Manchester by the Sea'},'Runtime': {0: 128, 1: 108, 2: 118, 3: 116, 4: 137},'IMDb_rating': {0: 8.2, 1: 8.1, 2: 8.1, 3: 8.0, 4: 7.9},'genre_1': {0: 'Action',1: 'Animation',2: 'Biography',3: 'Drama',4: 'Drama'}}
# Create a subplot, using column, 'genre_1' for three genres - 'Action','Drama','Biography'
sub_fig = make_subplots(rows=1, cols=3)
fig.add_trace(go.Scatter(x='Runtime', y='IMDb_rating',row=1, col=1)
fig.add_trace(go.Scatter(x='Runtime', y='IMDb_rating',row=1, col=2)
fig.add_trace(go.Scatter(x='Runtime', y='IMDb_rating',row=1, col=3)
Output
This should work:
list_genre = list(df.genre_1.unique())
sub_fig = make_subplots(rows=1, cols=len(list_genre), subplot_titles= list_genre)
for i,genre in enumerate(list_genre):
sub_fig.add_trace(go.Scatter(x = df[df.genre_1==genre]["Runtime"],
y=df[df.genre_1==genre]["IMDb_rating"]),row=1, col=i+1)
sub_fig.show()
Output:
EDIT:
This is the code that you need:
genres_to_plot = ['Action','Drama','Biography']
subset_movies = movies[movies.genre_1.isin(genres_to_plot)].reset_index(drop=True)
fig = px.scatter(subset_movies, x = "Runtime", y = "IMDb_rating", color = "genre_1", facet_col="genre_1", height = 480, width = 850)
fig.show()
Output figure:
You simply needed to add the parameter facet_col to px.scatter. If you want a bubble plot add size="actor_1_facebook_likes".

How to plot a pie chart in matplotlib with 3 columns?

I need to plot a pie chart using matplotlib but my DataFrame has 3 columns namely gender, segment and total_amount.
I have tried playing with plt.pie() arguments but it only takes x and labels for data. I tried setting gender as a legend but then it doesn't look right.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'gender': {0: 'Female',
1: 'Female',
2: 'Female',
3: 'Male',
4: 'Male',
5: 'Male'},
'Segment': {0: 'Gold',
1: 'Platinum',
2: 'Silver',
3: 'Gold',
4: 'Platinum',
5: 'Silver'},
'total_amount': {0: 2110045.0,
1: 2369722.0,
2: 1897545.0,
3: 2655970.0,
4: 2096445.0,
5: 2347134.0}})
plt.pie(data = df,x="claim_amount",labels="Segment")
plt.legend(d3.gender)
plt.show()
The result I want is a pie chart of total_amount and its labels as gender and segment. If I can get the percentage, it will be a bonus.
I suggest the following:
# Data to plot
# Take the information from the segment and label columns and join them into one string
labels = df["Segment"]+ " " + df["gender"].map(str)
# Extract the sizes of the segments
sizes = df["total_amount"]
# Plot with labels and percentage
plt.pie(sizes, labels=labels,autopct='%1.1f%%')
plt.show()
You should get this:

seaborn scatterplot datetime xaxis too wide

I have this dataframe:
pd.DataFrame({'Depth': {0: 0.2,
1: 0.4,
2: 0.4,
3: 0.4,
4: 0.4,
5: 0.4,
6: 0.6000000000000001,
7: 0.4,
8: 3.2,
9: 2.0},
'DateTimeUTC': {0: Timestamp('2018-03-28 06:25:08'),
1: Timestamp('2018-03-28 06:25:49'),
2: Timestamp('2018-03-28 06:27:06'),
3: Timestamp('2018-03-28 06:32:11'),
4: Timestamp('2018-03-28 06:32:59'),
5: Timestamp('2018-03-28 06:34:02'),
6: Timestamp('2018-03-28 06:35:38'),
7: Timestamp('2018-03-28 06:37:04'),
8: Timestamp('2018-03-28 06:39:08'),
9: Timestamp('2018-03-28 06:40:52')}})
which looks like this:
<table>
<tr><th></th><th>Depth</th><th>DateTimeUTC</th></tr>
<tr><th>0</th><td>0.2</td><td>2018-03-28 06:25:08</td></tr>
<tr><th>1</th><td>0.4</td><td>2018-03-28 06:25:49</td></tr>
<tr><th>2</th><td>0.4</td><td>2018-03-28 06:27:06</td></tr>
<tr><th>3</th><td>0.4</td><td>2018-03-28 06:32:11</td></tr>
<tr><th>4</th><td>0.4</td><td>2018-03-28 06:32:59</td></tr>
<tr><th>5</th><td>0.4</td><td>2018-03-28 06:34:02</td></tr>
<tr><th>6</th><td>0.6</td><td>2018-03-28 06:35:38</td></tr>
<tr><th>7</th><td>0.4</td><td>2018-03-28 06:37:04</td></tr>
<tr><th>8</th><td>3.2</td><td>2018-03-28 06:39:08</td></tr>
<tr><th>9</th><td>2.0</td><td>2018-03-28 06:40:52</td></tr>
</table>
Note the all DateTimeUTC are within 2018. When I try to plot depth vs time using sns.scatterplot I get:
sns.scatterplot('DateTimeUTC', 'Depth', data=df)
Why does the X-axis start at year 2000? Am I doing something wrong?
I posted the question as an issue on Github, and got this great response. Basically, the problem is that plt.scatter does not deal with dates well, and seaborn uses it. If seaborn will add a type check for the x-axis, which uses plt.plot_date for date values instead, this will be fixed. In the meanwhile, one can create a custom version of sns.scatterplot that does excatly that.
As an alternative you could use seaborns lineplot which does have a correct x-axis:
sns.lineplot(x='DateTimeUTC', y='Depth', data=df, marker='o')
Or you could use:
plt.plot(df['DateTimeUTC'], df['Depth'], linestyle='None', marker='o')

Colors for Python (seaborn): colors without adding to DataFrame

slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
I'd like to see this picture
But I don't know how to do it. I mean I want to have a red color for Russian people, Green color for USA people and yellow color for Chines.
My attemp to find solution:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white")
slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
obj = pd.DataFrame(slov)
palette=["g", "b", "r"]
obj['Color']='r'
row_index = obj.Country == 'Russia'
obj.loc[row_index, 'Color'] = 'r'
row_index = obj.Country == 'USA'
obj.loc[row_index, 'Color'] = 'g'
row_index = obj.Country == 'China'
obj.loc[row_index, 'Color'] = 'y'
g = sns.factorplot(x="People", y="Height", data=obj, kind='bar', palette=obj['Color'])
plt.show()
And maybe my solution is not very good. I added color to DataFrame. Maybe we can write this better. Maybe I don't need to add color to my DataFrame (It seems not very correct.). But How can I solve my task without adding these colors to my DataFrame?
You can use map by dict:
d = {'Russia':'r', 'USA':'g','China':'y'}
g = sns.factorplot(x="People",
y="Height",
data=obj,
kind='bar',
palette=obj['Country'].map(d))
plt.show()

Categories

Resources