Highlighting last data point in pandas plot - python

I have number of graphs similar to this:
import pandas as pd
dates = pd.date_range('2012-01-01','2013-02-22')
y = np.random.randn(len(dates))/365
Y = pd.Series(y, index=dates)
Y.plot()
The graph is great for showing the shape of the data, but I would like the latest value to stand out as well. I would like to highlight the last data point with a marker 'x' and with a different color. Any idea how I can do this?
Have added Dan Allan's suggestion. Works but I need something a bit more visible. As seen below the x is hardly visible. Any ideas?
Have added return of final answer to complete this. Changed the x to a D for a diamond for better visibility and increased the size of the marker.
Y.tail(1).plot(style='rD',markersize=10)

Add this line to your example to plot the last data point as a red X.
Y.tail(1).plot(style='rx')

Related

How to modify time interval in altair line graph

I have a simple line graph that looks like this: line graph of stock returns
I have been trying to format the x axis such that the time interval is in years instead of months, as it currently is now. But when I use the timeUnit attribute, it produces a stunted graph like this: line graph of stock returns in years
Code:
alt.Chart(data).mark_line().encode(
x = alt.X('Date', timeUnit = 'year'),
y = alt.Y('Cumul_R', axis = alt.Axis(format='%', orient='right')),
color = 'Stock')
What I'm trying to produce is a graph that looks like the first graph, but with intervals expressed in years like 06-2010, 06-2011, ... etc without compressing the graph like in the second pic. In other words, how do I only show some tick labels and not all of them.
I've seen answers to my question but they deal with absolute values using tickCount or tickMinStep, not for datetime values. There is apparently an altair attribute called timeinterval in https://altair-viz.github.io/user_guide/generated/core/altair.TimeInterval.html#altair.TimeInterval.init
that may solve the problem, but I'm not sure how to use it.
Appreciate all help on the matter. Thank you!
It appears that you are plotting your dates as nominal typed values, when you should probably be plotting them as temporal.
You should change x = alt.X('Date') to x = alt.X('Date:T') to specify that the x channel is temporal. When you do that, the renderer will use a temporal axis label that is probably closer to what you had in mind.
See Encoding Data Types in the documentation for more information.

Seaborn pairplot: how to change legend label text

I'm making a simple pairplot with Seaborn in Python that shows different levels of a categorical variable by the color of plot elements across variables in a Pandas DataFrame. Although the plot comes out exactly as I want it, the categorical variable is binary, which makes the legend quite meaningless to an audience not familiar with the data (categories are naturally labeled as 0 & 1).
An example of my code:
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
Is there a way to change legend label text with pairplot? Or should I use PairGrid, and if so how would I approach this?
Found it! It was answered here: Edit seaborn legend
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
g._legend.set_title(new_title)
Since you don't provide a full example of code, nor mock data, I will use my own codes to answer.
First solution
The easiest must be to keep your binary labels for analysis and to create a column with proper names for plotting. Here is a sample code of mine, you should grab the idea:
def transconum(morph):
if (morph == 'S'):
return 1.0
else:
return 0.0
CompactGroups['MorphNum'] = CompactGroups['MorphGal'].apply(transconum)
Second solution
Another way would be to overwrite labels on the flight. Here is a sample code of mine which works perfectly:
grid = sns.jointplot(x="MorphNum", y="PropS", data=CompactGroups, kind="reg")
grid.set_axis_labels("Central type", "Spiral proportion among satellites")
grid.ax_joint.set_xticks([0, 1, 1])
plt.xticks(range(2), ('$Red$', '$S$'))

Plotting 2 rasters over each other on a map with 2D colour scheme

This is rather a GIS question. What I am trying to do is to make a map that would show the areas which are hot-dry, hot-wet, cold-dry, cold-wet. I have 2 rasters with precipitation and temperature values. And I want to plot them over each other so that each extreme combination of the 2 variables (hot-dry, hot-wet, cold-dry, cold-wet) would have its own colour with respective gradients for the intermediate values on the colour scheme, that will have to produce a 2D colour legend. Below please see the concept image, that I have produced for explanation. I saw such a thing once and thought that was a briliant idea to show how 2 variables interact, but then I totally forgot where it was. I have been googling for 2 days - no result. Any help is very much welcome - the name of the thing, name of the software to do it (how to do it would be marvelous), keywords to google, workarounds - anything.
Concept image
Just a reminder to myself, a possible solution could be:
temp <- matrix(1:10000, 100)
temp <- raster(temp)
temp[] <- scales::rescale(temp[],to = c(0,255))
pp <- t(matrix(1:10000, 100))
pp <- raster(pp)
pp[] <- scales::rescale(pp[],to = c(0,255))
constant <- pp
constant[] <- rep(255,ncell(constant))
# Here you can vary the order of the bands (1,3,2) to get different colours
plotRGB(stack(list(constant,temp,pp)),1,3,2)
The resulting plot looks like this (it should look better with real temperature and precipitation data):

matplotlib: not plotting a curve correctly

I am trying to plot this curve, and am a little confused on why it looks the way that it does. I would like to plot the curve seen below, but I don't want the lines in the middle and can't figure out why they're there. Could it be because there are 0's in the middle of the vector representing the y values?
This is just from my phone, so apologies if the formatting is off...
This is happening because you have data with zeros in it. If you want to prune them out in some way, then either you can do it on the reads, or you can sort the data. Something like this should suffice:
x, y = sorted(zip(x, y))
It is already late but I hope it may help to someone. Taken from that answer why my curve fitting plot using matplotlib looks obscured?
You need to sort your X's in ascending order and then use it in plot function. Please bear in mind x and y pairs should be preserved to have correctly drawn curve.
import numpy as np
sorted_indexes = np.argsort(X)
X = X[sorted_indexes]
y = y[sorted_indexes]

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Categories

Resources