I am trying to plot FRED's economic data using matplotlib/seaborn, But the values themselves are in floating points and matplotlib instead of using range, quite literally just uses all the values as distinct y-axis points, something like . I need to plot this in a way where the changes are apparent. I tried to specify y axis range by using yticks, but it still does not work. Here's my code
mort30=pd.read_csv('Dataset/MORTGAGE30US.csv')
mort30['DATE']= pd.DateTimeIndex(mort30['DATE']).years # to get only year values on the x-axis
sns.lineplot(data=mort30, x='DATE', y='MORTGAGE30US')
plt.yticks(np.arange(1,11,step=1))
Any other ideas that could work? Here is the dataset link for the graph (P.S. go to edit graph and change frequency to Annual for simplicity)
Your y-data are objects, not numerical values. Take a look to the CSV, the last line contains no number.
mort30['MORTGAGE30US']
47 4.5446153846153846
48 3.9357692307692308
49 3.1116981132075472
50 .
Name: MORTGAGE30US, dtype: object
Next time add a running example, please. Your shown code is not working, it should be:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
mort30=pd.read_csv('MORTGAGE30US.csv')
mort30['DATE']= pd.DatetimeIndex(mort30['DATE']).year # to get only year values on the x-axis
sns.lineplot(data=mort30, x='DATE', y='MORTGAGE30US')
plt.yticks(np.arange(1,11,step=1))
Related
I'm very new to Python and am trying to plot all the columns in my data frame in separate plots.
The data frame has 45 columns which are all called, V1_category V2_category V3_category V4_category V5_category V6_category V7_category etc. till V45_category.
Each entry has one of the four values: neutral, pleasant, unpleasant, painful. I need to somehow count how often these 4 values occur in each of the 45 columns and then plot these as 45 individual histograms (possibly in one figure?). I want the plots to be nicely formatted so I guess matplotlib would be the most useful?
Any help or suggestions would be much appreciated! :)
I guess what you need is a barplot. There are many options for visualization these categories, see more at the vignette for seaborn.
Below I try to make a data.frame that looks like yours:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
categories = ['neutral','pleasant','unpleasant','painful']
df = pd.DataFrame(np.random.choice(categories,(100,45)),
columns=["V"+str(i)+"_category" for i in np.arange(1,46)])
To use it in seaborn, your data.frame needs to be long, so we can pivot it like:
df.melt()
variable value
0 V1_category neutral
1 V1_category unpleasant
2 V1_category pleasant
3 V1_category unpleasant
4 V1_category unpleasant
And we pass this directly into seaborn:
sns.catplot('value',data=df.melt(),col="variable",
kind="count",col_wrap=5,height=5, aspect=2)
I'm trying to create a single barplot from multiple dataframe columns each of which is a categorical variable (all based on the same levels). I want it to show a count of the levels occurring in each column.
The below code achieves what I want, but on 4 different bar plots. I'd like it all to be on one plot, so the bars are side by side (labels/legend would be rad). I'm trying to a get clean, simple solution using matplotlib but so far I can't figure it out. Help?
Thanks!
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({"A":['cow','pig','horse','goat','cow'], "B":['cow','pig','horse','cow','goat'], "C":['pig','horse','goat','pig','cow'], "D":['cow','pig','horse','horse','goat'], "E":['pig','horse','goat','cow','goat']})
levels = np.sort(df['A'].unique())
df.A.value_counts()[levels].plot(kind='bar')
df.B.value_counts()[levels].plot(kind='bar')
df.C.value_counts()[levels].plot(kind='bar')
df.D.value_counts()[levels].plot(kind='bar')
You should apply pd.series.value_counts and plot a bar graph, stacked or unstacked.
If you need each column on its own;
df.apply(pd.Series.value_counts).plot(kind='bar')
if you need them stacked;
df.apply(pd.Series.value_counts).plot(kind='bar', stacked=True)
I'm currently trying to plot 7 days with varying small to large numbers.
The first set of data may look like this
dates = ['2018-09-20', '2018-09-21', '2018-09-22', '2018-09-23', '2018-09-24', '2018-09-25', '2018-09-26', '2018-09-27']
values = [107.660514, 107.550403, 107.435041, 107.435003, 107.574965, 107.449961, 107.650052, 107.649974]
vs another set of data may have the same dates, but the values may be much small incremental changes
dates = ['2018-09-20', '2018-09-21', '2018-09-22', '2018-09-23', '2018-09-24', '2018-09-25', '2018-09-26', '2018-09-27']
values = [0.849215, 0.849655, 0.849655, 0.851095, 0.850885, 0.850135, 0.851203, 0.851865]
When I use this
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
plt.plot_date(x=dates, y=values, fmt="r--")
plt.ylabel(c)
plt.grid(True)
plt.savefig('static/%s.png' % c)
The resulting image for the 1st set of values comes out as a dashed lined connecting the days to the dots. But the 2nd set of data makes a image of 7 parallel lines stacked on top of each other.
Should I be plotting this differently?
I assume you would like a comparison between two set of data you provided.
However, with such gap between both sets of data, it could be fairly unclear if you want to show both sets in a same plot.
You could use plt.subplots() to do that, and you'll probably get a plot like this
Or a better way is just showing two plots separately.. And you'll get a much clearer plot.
If you want to just show two plots, you can do something like this.
I'm plotting a scatter plot using a pandas dataframe. This works correctly, but I wanted to use seaborn themes and specials functions. When I plot the same data points calling seaborn, the y-axis remains almost invisible. X-axis values ranges from 5000-15000, while y-axis values are in [-6:6]*10^-7.
If I multiply the y-axis values by 10^6, they display correctly, but the actual values when plotted using seaborn remains invisible/indistinguishable in a seaborn generated plot.
How can I seaborn so that the y-axis values scale automatically in the resultant plot?
Also some rows even contain NaN, not in this case, how to disregard that while plotting, short of manually weeding out rows containing NaN.
Below is the code I've used to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("datascale.csv")
subdf = df.loc[(df.types == "easy") & (df.weight > 1300), ]
subdf = subdf.iloc[1:61, ]
subdf.drop(subdf.index[[25]], inplace=True) #row containing NaN
subdf.plot(x='length', y='speed', style='s') #scales y-axis correctly
sns.lmplot("length", "speed", data=subdf, fit_reg=True, lowess=True) #doesn't scale y-axis properly
# multiplying by 10^6 displays the plot correctly, in matplotlib
plt.scatter(subdf['length'], 10**6*subdf['speed'])
Strange that seaborn does not scale the axis correctly. Nonetheless, you can correct this behaviour. First, get a reference to the axis object of the plot:
lm = sns.lmplot("length", "speed", data=subdf, fit_reg=True)
After that you can manually set the y-axis limits:
lm.axes[0,0].set_ylim(min(subdf.speed), max(subdf.speed))
The result should look something like this:
Example Jupyter notebook here.
Seaborn and matplotlib should just ignore NaN values when plotting. You should be able to leave them as is.
As for the y scaling: there might be a bug in seaborn.
The most basic workaround is still to scale the data before plotting.
Scale to microspeed in the dataframe before plotting and plot microspeed instead.
subdf['microspeed']=subdf['speed']*10**6
Or transform to log y before plotting, i.e.
import math
df = pd.DataFrame({'speed':[1, 100, 10**-6]})
df['logspeed'] = df['speed'].map(lambda x: math.log(x,10))
then plot logspeed instead of speed.
Another approach would be to use seaborn regplot instead.
Matplot lib correctly scales and plots for me as follows:
plt.plot(subdf['length'], subdf['speed'], 'o')
I have a dataset similar to this format X = [[1,4,5], [34,70,1,5], [43,89,4,11], [22,76,4]] where the length of element lists are not equal.
I want to create a checkerboard plot of 4 rows and 4 columns and the colorbar of each unit box corresponds to the value of the number. In this dataset some small boxes will be missing (eg. 4th column firs row).
How would I plot this in python using matplotlib?
Thanks
You can use seaborn library or matplotlib to generate heatmap. Firstly, convert it to pandas dataframe to handle missing values.
import pandas as pd
df = pd.DataFrame([[1,4,5],[34,70,1,5], [43,89,4,11],[22,76,4]])
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
sns.heatmap(df)
plt.show()
Result looks something like this.