I have this dataframe:
and I need to have a chart similar to this:
I recommend reading this documentation on plotting with pandas:
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
Then potting simply works like df.plot() plus some additional lines depending on color, axis labels, etc.
Additional remark: you should not post data as image. See https://stackoverflow.com/questions/how-to-ask
Related
I currently have a dataframe called df with 18 columns, and I have plotted a histogram of each to check the distribution shape of each variable using the hist() function in pandas:
df.hist(figsize=(30,30))
What I now want to do is add a boxplot above each box plot so I can understand at a glance which variables contain outliers. I want the plot to look as follows:
I can plot the boxplot using the following code, but it displays all of the boxplots on a single plot:
df.boxplot(figsize=(30,30))
And I can add a group by, however, this isn't what I require. I just want each histogram in my df.hist plot to be overlayed with the boxplot derived from the same column of data. I suspect I could write a funciton to do this, but as the hist function seems quite intuitive, I suspect there is a straighforward way that I'm probably missing.
I Posted this question about 3D plots of data frames:
3D plot of 2d Pandas data frame
and the user referred me very very helfully to this:
Plotting Pandas Crosstab Dataframe into 3D bar chart
It use useful and the code worked in principle, but it lookes like a mess (see image below) for several reasons:
I have huge number of values to plot (470 or so, along the y-axis) so perhaps a bar chart is not the best way (I am going for a histogram kind of look, so I assumed very narrow bars would be suitable)
my counts (z axis) do not give almost any information, because the differences I need to see are from 100 to the max value
how can I make the 3D plot that shows up interactive? (being able to rotate etc) - I have seen it done in blogs/videos but sure if it's something on Tools -> Preferences that I can't find
So re: the second issue, simple enough, I tried to just change the limits of the zbar as I would for a 2D Plot, by incorporating:
ax.set_zlim([110,150])
just before the axis labels, but obviously this is the wrong way:
SO do I have to limit the values from the original data set (i.e. filter out <110), or is there a way to do this from the plot?
Might seem like a repeat question, but the solution in this post doesn't seem to work for me.
I have a bunch of data I want to plot as lines/curves, and another dataset linked to the curves consisting of XYZ data, where Z represents a labeling variable for the curves.
I've got some example code here with some XY data, and labels for anyone wanting to replicate what I'm doing:
plt.plot(xdata, ydata)
plt.scatter(xlab, ylab, c=lab) # needs a marker function adding
plt.show()
Ideally I want to add some kind of unique marker based on the label values; 0.1,0.5,1,2,3,4,6,8,10,20. The labels are the same for each curve.
I have over 100 curves to plot, so something quick and effective is needed. Any help would be great!
My current solution would be to just split the data by labelling values, and then plot separately for each one (long and messy in my opinion). Figured someone might have a more elegant solution here.
I'm guessing you could do this with a dictionary... but I might need some help doing that!
Cheers, KB
Matplotlib does not accepts different markers per plot.
However, a less verbose and more robust solution for large dataset is using the pandas and seaborn library:
Additionally you can use the pandas.cut function to plot bins (Its something I regularly need to produce graphs where I can use a third continuous value as a parameter). The way to use it is :
import pandas as pd
import seaborn as sns
url = 'https://pastebin.com/raw/dwGBLqSb' # url of paste
df = pd.read_csv(url)
sns.scatterplot(data = df, x='labx', y='laby', style='lab')
and it produces the following example:
If you have something more advanced labelling you could also look at LabelEncoder of Sklearn.
Hopefully, I've edited enough this answer not to offend don't post identical answers to multiple questions. For what is worth, I am not affiliated with seaborn library in any way nor am I trying to promote anything. The only thing I am trying to do is help someone with a similar problem that I've come across and I couldn't find easily a clear answer in SE.
I have a data frame table "pandastable3" that looks like this:
I would like to plot histograms of values for all the columns separately, but so far I am able to get only a single figure containing all the plots together with this to plot the first 3 columns:
pandastable3.hist(layout=(1,2,3))
But I am not sure I am doing that correctly as I cannot visualize anything.
I suppose diff() gives different plots for each column:
pandastable3.diff().hist()
I'm facing a problem to plot 2 box plots into a same graph to make easier to compare them.
The problems is that each box plot comes from a different dataframe with different lenght, however, both have same columns.
My two data frame are:
'headlamp_water' and 'headlamp_crack'; the column I want to use is called 'Use Period'.
How do I do it?
Any help will be highly appreciated
You can concat() the columns and call the boxplot() method.
pd.concat([headlamp_water['Use Period'], headlamp_crack['Use Period']], axis=1).boxplot()
Using axis=1, you select the columns.