[Python3]How to plot 55 columns using seaborn/matplotlib - python

thanks in advance.
I wonder how to use the Seaborn + Matplotlib combination to make a beautiful bar chart.
Here is the sample dataset that I have:
2015-01 2015-02 2015-03 2015-04 2015-05
negative 28 13 12 33 7
positive 78 20 19 3 55
neutral 17 5 45 24 9
And I want the bar chart to look like this click me, this is the chart I used excel to create, but I wonder how to use python to do the same thing? Or similar things?

If your dataframe is Pandas DataFrame, you can do:
df.T.plot.bar(width=0.8, edgecolor='w', linewidth=3)
which gives you:

It is possible with the following:
import pandas as pd
import io
import seaborn as sns
import matplotlib.pyplot as plt
Data
df = pd.read_csv(io.StringIO("""
2015-01 2015-02 2015-03 2015-04 2015-05
negative 28 13 12 33 7
positive 78 20 19 3 55
neutral 17 5 45 24 9
"""), sep=" ", engine="python")
ndf = df.stack().reset_index()
ndf.columns = ['Emotion','Date','Count']
Seaborn
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.barplot(x="Date", hue="Emotion", y="Count", data=ndf)

Related

Seaborn figure with multiple axis (year) and month on x-axis

I try to become warm with seaborn. I want to create one or both of that figures (bar plot & line plot). You see 12 months on the X-axis and 3 years each one with its own line or bar color.
That is the data creating script including the data in comments.
#!/usr/bin/env python3
import random as rd
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
rd.seed(0)
a = pd.DataFrame({
'Y': [2016]*12 + [2017]*12 + [2018]*12,
'M': list(range(1, 13)) * 3,
'n': rd.choices(range(100), k=36)
})
print(a)
# Y M n
# 0 2016 1 84
# 1 2016 2 75
# 2 2016 3 42
# ...
# 21 2017 10 72
# 22 2017 11 89
# 23 2017 12 68
# 24 2018 1 47
# 25 2018 2 10
# ...
# 34 2018 11 54
# 35 2018 12 1
b = a.pivot_table(columns='M', index='Y')
print(b)
# n
# M 1 2 3 4 5 6 7 8 9 10 11 12
# Y
# 2016 84 75 42 25 51 40 78 30 47 58 90 50
# 2017 28 75 61 25 90 98 81 90 31 72 89 68
# 2018 47 10 43 61 91 96 47 86 26 80 54 1
I'm even not sure which form (a or b or something elese) of a dataframe I should use here.
What I tried
I assume in seaboarn speech it is a countplot() I want. Maybe I am wrong?
>>> sns.countplot(data=a)
<AxesSubplot:ylabel='count'>
>>> plt.show()
The result is senseless
I don't know how I could add the pivoted dataframe b to seaborn.
You could do the first plot with a relplot, using hue as a categorical grouping variable:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line')
I'd use these colour and size settings to make it more similar to the plot you wanted:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line', palette='pastel', height=3, aspect=3)
The equivalent axes-level code would be sns.lineplot(data=a, x='M', y='n', hue='Y', palette='pastel')
Your second can be done with catplot:
sns.catplot(kind='bar', data=a, x='M', y='n', hue='Y')
Or the axes-level function sns.barplot. In that case let's move the default legend location:
sns.barplot(data=a, x='M', y='n', hue='Y')
plt.legend(bbox_to_anchor=(1.05, 1))

dates.YearLocator() does not show years

I unfortunately cannot upload my dataset but here is how my dataset looks like:
UMTMVS month
DATE
1992-01-01 209438.0 1
1992-02-01 232679.0 2
1992-03-01 249673.0 3
1992-04-01 239666.0 4
1992-05-01 243231.0 5
1992-06-01 262854.0 6
1992-07-01 222832.0 7
1992-08-01 240299.0 8
1992-09-01 260216.0 9
1992-10-01 252272.0 10
1992-11-01 245261.0 11
1992-12-01 245603.0 12
1993-01-01 223258.0 1
1993-02-01 246941.0 2
1993-03-01 264886.0 3
1993-04-01 249181.0 4
1993-05-01 250870.0 5
1993-06-01 271047.0 6
1993-07-01 224077.0 7
1993-08-01 248963.0 8
1993-09-01 269227.0 9
1993-10-01 263075.0 10
1993-11-01 256142.0 11
1993-12-01 252830.0 12
1994-01-01 234097.0 1
1994-02-01 259041.0 2
1994-03-01 277243.0 3
1994-04-01 261755.0 4
1994-05-01 267573.0 5
1994-06-01 287336.0 6
1994-07-01 239931.0 7
1994-08-01 276947.0 8
1994-09-01 291357.0 9
1994-10-01 282489.0 10
1994-11-01 280455.0 11
1994-12-01 279888.0 12
1995-01-01 260175.0 1
1995-02-01 286290.0 2
1995-03-01 303201.0 3
1995-04-01 283129.0 4
1995-05-01 289257.0 5
1995-06-01 310201.0 6
1995-07-01 255163.0 7
1995-08-01 293605.0 8
1995-09-01 313228.0 9
1995-10-01 301301.0 10
1995-11-01 293164.0 11
1995-12-01 290963.0 12
1996-01-01 263041.0 1
1996-02-01 290317.0 2
I want to set a locator for each year and ran the following code
ax = df.UMTMVS.plot(figsize=(12, 5))
ax.xaxis.set_major_locator(dates.YearLocator())
but it simply gives the following figure without any locator at all
Why does the locator fail to point out the years?
Try applying set_major_locator() to the axis before df.plot(). Like this:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
# reading your sample data into dataframe
df = pd.read_clipboard()
# dates should be dates (datetime), not strings
df.index = df.index.to_series().apply(pd.to_datetime)
fig, ax = plt.subplots(1,1,figsize=(12, 5))
# set locator before df.plot()
ax.xaxis.set_major_locator(dates.YearLocator())
df.UMTMVS.plot()
Result:
Slightly different result could be achieved with last bit of code above modified to the following:
fig, ax = plt.subplots(1,1,figsize=(12, 5))
ax.plot(df.UMTMVS)
ax.xaxis.set_major_locator(dates.YearLocator())
plt.xlabel('DATE')
plt.show()
Result_alt (note the "padding" and loss of minor ticks):

Plotting different predictions with same column names and categories Python/Seaborn

I have df with different groups. I have two predictions (iqr, median).
cntx_iqr pred_iqr cntx_median pred_median
18-54 83 K18-54 72
R18-54 34 R18-54 48
25-54 33 18-34 47
K18-54 29 18-54 47
18-34 27 R25-54 29
K18-34 25 25-54 23
K25-54 24 K25-54 14
R18-34 22 R18-34 8
R25-54 17 K18-34 6
Now I want to plot them using seaborn and I have melted data for pilots. However, it does not look right to me.
pd.melt(df, id_vars=['cntx_iqr', 'cntx_median'], value_name='category', var_name="kind")
I am aiming to compare predictions (pred_iqr,pred_median) from those 2 groups (cntx_iqr, cntx_median) maybe stack barplot or some other useful plot to see how each group differs for those 2 predictions.
any help/suggestion would be appreciated
Thanks in advance
Not sure how you obtained the data frame, but you need to match the values first:
df = df[['cntx_iqr','pred_iqr']].merge(df[['cntx_median','pred_median']],
left_on="cntx_iqr",right_on="cntx_median")
df.head()
cntx_iqr pred_iqr cntx_median pred_median
0 18-54 83 18-54 47
1 R18-54 34 R18-54 48
2 25-54 33 25-54 23
3 K18-54 29 K18-54 72
4 18-34 27 18-34 47
Once you have this, you can just make a scatterplot:
sns.scatterplot(x = 'pred_iqr',y = 'pred_median',data=df)
The barplot requires a bit of pivoting, but should be:
sns.barplot(x = 'cntx_iqr', y = 'value', hue='variable',
data = df.melt(id_vars='cntx_iqr',value_vars=['pred_iqr','pred_median']))

Plotting Graph IP vs Occurances

I am trying to plot a graph for "Occurances of an IP" vs "IP" address itself. So far I have tried plotting this using excel but I want to automate this entire process, using python. The data I have is as follows.
5122 172.20.10.2
2419 74.125.103.105
1677 74.125.158.169
252 216.58.196.78
116 216.58.196.68
72 172.20.10.1
38 216.58.220.162
34 216.58.196.65
22 216.58.196.67
21 42.106.128.49
18 216.58.203.163
15 172.217.163.194
14 66.117.28.68
14 216.58.203.170
14 216.58.199.130
13 151.101.1.69
12 216.58.196.66
12 117.18.237.29
11 172.217.27.214
10 216.58.196.70
10 157.240.16.20
10 157.240.16.16
9 151.101.129.69
8 192.0.73.2
8 172.217.166.78
8 104.69.158.16
8 104.16.109.18
4 139.59.43.68
2 172.20.10.3
2 14.139.56.74
So far I have tried various of ways to plot this via storing it in an array and using python but I just can't make it work.
Little nudge would be really helpful.
With your data in a pandas dataframe with column names "ip" and "count", try this:
import seaborn as sns
import matplotlib.pyplot as plt
sns.barplot(x = "ip", y = "count", data = data)
plt.show()

Using pandas in python I am trying to group data from price ranges

Here is the code I am running, It creates a bar plot but i would like to group together values within $5 of each other for each bar in the graph. The bar graph currently shows all 50 values as individual bars and makes the data nearly unreadable. Is a histogram a better option? Also, bdf is the bids and adf is the asks.
import gdax
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from gdax import *
from pandas import *
from numpy import *
s= 'sequence'
b= 'bids'
a= 'asks'
public_client = gdax.PublicClient()
o = public_client.get_product_order_book('BTC-USD', level=2)
df = pd.DataFrame(o)
bdf = pd.DataFrame(o[b],columns = ['price','size','null'], dtype='float')
adf = pd.DataFrame(o[b],columns = ['price','size','null'], dtype='float')
del bdf['null'] bdf.plot.bar(x='price', y='size')
plt.show()
pause = input('pause')
Here is an example of the data I receive as a DataFrame object.
price size
0 11390.99 13.686618
1 11389.40 0.002000
2 11389.00 0.090700
3 11386.53 0.060000
4 11385.26 0.010000
5 11385.20 0.453700
6 11381.33 0.006257
7 11380.06 0.011100
8 11380.00 0.001000
9 11378.61 0.729421
10 11378.60 0.159554
11 11375.00 0.012971
12 11374.00 0.297197
13 11373.82 0.005000
14 11373.72 0.661006
15 11373.39 0.001758
16 11373.00 1.000000
17 11370.00 0.082399
18 11367.22 1.002000
19 11366.90 0.010000
20 11364.67 1.000000
21 11364.65 6.900000
22 11364.37 0.002000
23 11361.23 0.250000
24 11361.22 0.058760
25 11360.89 0.001760
26 11360.00 0.026000
27 11358.82 0.900000
28 11358.30 0.020000
29 11355.83 0.002000
30 11355.15 1.000000
31 11354.72 8.900000
32 11354.41 0.250000
33 11353.00 0.002000
34 11352.88 1.313130
35 11352.19 0.510000
36 11350.00 1.650228
37 11349.90 0.477500
38 11348.41 0.001762
39 11347.43 0.900000
40 11347.18 0.874096
41 11345.42 7.800000
42 11343.21 1.700000
43 11343.02 0.001754
44 11341.73 0.900000
45 11341.62 0.002000
46 11341.00 0.024900
47 11340.00 0.400830
48 11339.77 0.002946
49 11337.00 0.050000
Is pandas the best way to manipulate this data?
Not sure if I understand correctly, but if you want to count number of bids with a $5 step, here is how you can do it:
> df["size"].groupby((df["price"]//5)*5).sum()
price
11335.0 0.052946
11340.0 3.029484
11345.0 10.053358
11350.0 12.625358
11355.0 1.922000
11360.0 8.238520
11365.0 1.012000
11370.0 2.047360
11375.0 0.901946
11380.0 0.018357
11385.0 0.616400
11390.0 13.686618
Name: size, dtype: float64
You can using cut here
df['bin']=pd.cut(df.price,bins=3)
df.groupby('bin')['size'].sum().plot(kind='bar')

Categories

Resources