new Python learner here. This seems like a very simple task but I can't do it to save my life.
All I want to do is to grab 1 column from my DataFrame, sort it, and then plot it. THAT'S IT. But when I plot it, the graph is inverted. Upon examination, I find that the values are sorted, but the index is not...
Here is my simple 3 liner code:
testData = pd.DataFrame([5,2,4,2,5,7,9,7,8,5,4,6],[9,4,3,1,5,6,7,5,4,3,7,8])
x = testData[0].sort_values()
plt.plot(x)
edit:
Using matplotlib
If you're talking about ordering values sequentially on the x-axis like 0, 1, 2, 3, 4 ... You need to re-index your values.
x = testData[0].sort_values()
x.index = range(len(x))
plt.plot(x)
Other than that if you want your values sorted in the data frame but displayed by order of index then you want a scatter plot not a line plot
plt.scatter(x.index, x.values)
Related
much like the title says I am trying to create a graph that shows 1-6 on the x-axis (the values position in the row of the array) and its value on the y-axis. A snippet from the array is shown below, with each column representing a coefficient number from 1-6.
[0.99105 0.96213 0.96864 0.96833 0.96698 0.97381]
[0.99957 0.99709 0.9957 0.9927 0.98492 0.98864]
[0.9967 0.98796 0.9887 0.98613 0.98592 0.99125]
[0.9982 0.99347 0.98943 0.96873 0.91424 0.83831]
[0.9985 0.99585 0.99209 0.98399 0.97253 0.97942]
It's already set up as a numpy array. I think it's relatively straightforward, just drawing a complete mental blank.
Any ideas?
Do you want something like this?
a = np.array([[0.99105, 0.96213, 0.96864, 0.96833, 0.96698, 0.97381],
[0.99957, 0.99709, 0.9957, 0.9927, 0.98492, 0.98864],
[0.9967, 0.98796, 0.9887, 0.98613, 0.98592, 0.99125],
[0.9982, 0.99347, 0.98943, 0.96873, 0.91424, 0.83831],
[0.9985, 0.99585, 0.99209, 0.98399, 0.97253, 0.97942]])
import matplotlib.pyplot as plt
plt.scatter(x=np.tile(np.arange(a.shape[1]), a.shape[0])+1, y=a)
output:
Note that you can emulate the same with groups using:
plt.plot(a.T, marker='o', ls='')
x = np.arange(a.shape[0]+1)
plt.xticks(x, x+1)
output:
how to detach height of the stacked bars from colors of the fill?
I have multiple categories which I want to present in stacked bar chart so that the height represent the value and color is conditionally defined by another variable (something like fill= in the ggplot ).
I am new to bokeh and struggling with the stack bar chart mechanics. I tried construct this type of chart, but I haven't got anything except all sorts of errors. The examples of stacked bar chart are very limited in the bokeh documentation.
My Data is stored in pandas dataframe:
data =
['A',1, 15, 1]
'A',2, 14, 2
'A',3, 60, 1
'B',1, 15, 2
'B',2, 25, 2
'B',3, 20, 1
'C',1, 15, 1
'C',2, 25, 1
'C',3, 55, 2
...
]
Columns represent Category, Regime, Value, State.
I want to plot Category on x axis, Regimes stacked on y axis where bar length represents Value and color represents State.
is this achievable in bokeh?
can anybody demonstrate please
I think this problem becomes much easier if you transform your data to the following form:
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.transform import stack, factor_cmap
import pandas as pd
df = pd.DataFrame({
"Category": ["a", "b"],
"Regime1_Value": [1, 4],
"Regime1_State": ["A", "B"],
"Regime2_Value": [2, 5],
"Regime2_State": ["B", "B"],
"Regime3_Value": [3, 6],
"Regime3_State": ["B", "A"]})
p = figure(x_range=["a", "b"])
p.vbar_stack(["Regime1_Value", "Regime2_Value", "Regime3_Value"],
x="Category",
fill_color=[
factor_cmap(state, palette=["red", "green"], factors=["A", "B"])
for state in ["Regime1_State","Regime2_State", "Regime3_State"]],
line_color="black",
width=0.9,
source=df)
show(p)
This is a bit strange, because vbar_stack behaves unlike a "normal glyph". Normally you have three options for attributes of a renderer (assume we want to plot n dots/rectangles/shapes/things:
Give a single value that is used for all n glyphs
Give a column name that is looked up in the source (source[column_name] must produce an "array" of length n)
Give an array of length n of data
But vbar_stack does not create one renderer, it creates as many as there are elements in the first array you give. Lets call this number k. Then to make sense of the attributes you have again three options:
Give a single value that is used for all glyphs
Give an array of k things that are used as columns names in the source (each lookup must produce an array of length n).
Give an array of length n of data (so for all 1-k glyphs have the same data).
So p.vbar(x=[a,b,c]) and p.vbar_stacked(x=[a,b,c]) actually do different things (the first gives literal data, the second gives column names) which confused, and it's not clear from the documentation.
But why do we have to transform your data so strangely? Lets unroll vbar_stack and write it on our own (details left out for brevity):
plotted_regimes = []
for regime in regimes:
if not plotted_regimes:
bottom = 0
else:
bottom = stack(*plotted_regimes)
p.vbar(bottom=bottom, top=stack(*plotted_regimes, regime))
plotted_regimes.append(regime)
So for each regime we have a separate vbar that has its bottom where the sum of the other regimes ended. Now with the original data structure this is not really possible because there doesn't need to be a a value for each regime for each category. Here we are forced to set these values to 0 if we actually want.
Because the stacked values corrospond to column names we have to put these values in one dataframe. The vbar_stack call in the beginning could also be written with stack (basically because vbar_stack is a convenience wrapper around stack).
The factor_cmap is used so that we don't have to manually assign colors. We could also simply add a Regime1_Color column, but this way the mapping is done automatically (and client side).
So i'm trying to create a barplot using seaborn. My data is in the form
Packet number,Flavour,Contents
1,orange,4
2,orange,3
3,orange,2
4,orange,4
...
36, orange,3
1, coffee,5
2, coffee,3
...
1, raisin,4
etc.
My code is currently:
revels_data = pd.read_csv("testtt.txt") rd = revels_data
ax = sns.barplot(x="Packet number", y="Contents", data=rd) plt.show()
I'm trying to create bars for each packet number (on x axis) which are divided by colour inside each bar for the flavour with the total contents per packet on the y axis.
Started trying to make the totals of each packet i.e.
total_1 = (rd.loc[rd["Packet number"] == 1, "Contents"].sum())
but not sure how i'd go from there, or if there is an easier way to do it.
Any advice is much appreciated!
You want to use hue for that. As well, currently you are displaying the mean of each category. To calculate different function you can use estimator.
Thus, your code should be:
ax = sns.barplot(x="Packet number", y="Contents", hue="Flavour", data=rd)
Or if you want to show the sum instead of the mean:
ax = sns.barplot(x="Packet number", y="Contents", hue="Flavour", estimator=np.sum, data=rd)
Edit:
If you are interested in stacked barplot, you can make it directly using pandas, but you need to group your data first:
# Sum (or mean if you'd rather) the Contents per packet number and flavor
# unstack() will turn the flavor into columns, and fillna will put 0 in
# all missing columns
grouped = rd.groupby(["Packet number", "Flavour"])["Contents"].sum().unstack().fillna(0)
# The x axis is taken from the index. The y axis from the columns
grouped.plot(kind="bar", stacked=True)
This picture
Please ignore the background image. The foreground chart is what I am interested in showing using pandas or numpy or scipy (or anything in iPython).
I have a dataframe where each row represents temperatures for a single day.
This is an example of some rows:
100 200 300 400 500 600 ...... 2300
10/3/2013 53*C 57*C 48*C 49*C 54*C 54*C 55*C
10/4/2013 45*C 47*C 48*C 49*C 50*C 52*C 57*C
Is there a way to get a chart that represents the changes from hour to hour using the first column as a 'zero'
Something quick and dirty that might get you most of the way there, assuming your data frame is named df:
import matplotlib.pyplot as plt
plt.imshow(df.T.diff().fillna(0.0).T.drop(0, axis=1).values)
Since I can't easily construct a sample version with your exact column labels, there might be slight additional tinkering with getting rid of any index columns that are included in the diff and moved with the transposition. But this worked to make a simple heat-map-ish plot for me on a random data example.
Then you can create a matplotlib figure or axis object and specify whatever you want for the x- and y-axis labels.
You could just plot lines one at a time for each row with an offset:
nrows, ncols = 12, 30
# make up some fake data:
d = np.random.rand(nrows, ncols)
d *= np.sin(2*np.pi*np.arange(ncols)*4/ncols)
d *= np.exp(-0.5*(np.arange(nrows)-nrows/2)**2/(nrows/4)**2)[:,None]
#this is all you need, if you already have the data:
for i, r in enumerate(d):
plt.fill_between(np.arange(ncols), r+(nrows-i)/2., lw=2, facecolor='white')
You could do it all at once if you don't need the fill color to block the previous line:
d += np.arange(nrows)[:, None]
plt.plot(d.T)
Hi I have a 3D list (I realise this may not be the best representation of my data so any advice here is appreciated) as such:
y_data = [
[[a,0],[b,1],[c,None],[d,6],[e,7]],
[[a,5],[b,2],[c,1],[d,None],[e,1]],
[[a,3],[b,None],[c,4],[d,9],[e,None]],
]
The y-axis data is such that each sublist is a list of values for one hour. The hours are the x-axis data. Each sublist of this has the following format:
[label,value]
So essentially:
line a is [0,5,3] on the y-axis
line b is [1,2,None] on the y-axis etc.
My x-data is:
x_data = [0,1,2,3,4]
Now when I plot this list directly i.e.
for i in range(0,5):
ax.plot(x_data, [row[i][1] for row in y_data], label=y_data[0][i][0])
I get a line graph however where the value is None the point is not drawn and the line not connected.
What I would like to do is to have a graph which will plot my data in it's current format, but ignore missing points and draw a line between the point before the missing data and the point after (i.e. interpolating the missing point).
I tried doing it like this https://stackoverflow.com/a/14399830/1800665 but I couldn't work out how to do this for a 3D list.
Thanks for any help!
The general approach that you linked to will work fine here ; it looks like the question you're asking is how to apply that approach to your data. I'd like to suggest that by factoring out the data you're plotting, you'll see more clearly how to do it.
import numpy as np
y_data = [
[[a,0],[b,1],[c,None],[d,6],[e,7]],
[[a,5],[b,2],[c,1],[d,None],[e,1]],
[[a,3],[b,None],[c,4],[d,9],[e,None]],
]
x_data = [0, 1, 2, 3, 4]
for i in range(5):
xv = []
yv = []
for j, v in enumerate(row[i][1] for row in y_data):
if v is not None:
xv.append(j)
yv.append(v)
ax.plot(xv, yv, label=y_data[0][i][0])
Here instead of using a mask like in the linked question/answer, I've explicitly built up the lists of valid data points that are to be plotted.