I would like to create an Altair map having n distinct categories (each with a specific color) while
having a second variable that controls the alpha/shading/color of these categories?
Now, I am able to produce a map colored by category and using whatever custom color of my choice and I am able to produce a map with a continuous variable and using whatever colormap of my choice.
I am not sure, however, how to proceed to obtain what I am looking for.
I thought I could possibly add some extra color with something like this:
.encode(alt.Color('properties.Cat2:O', scale=alt.Scale(domain=domain, range=range_),alt.Color('properties.colmap2:Q', scale=alt.Scale(domain=domain, range=cm_range_))
but I feel like I am trying random stuff and not getting any closer.
EDIT
Following #jakevdp's comments I am trying to include an opacity argument. However, I am unsure about the proper syntax.
chart_json = json.loads(gdf.to_json())
chart_data= alt.Data(values=chart_json ['features'])
data_1km_geojson = alt.InlineData(values=val_1km, format=alt.DataFormat(property='features',type='json'))
domain=['Label1','Label2']
range_=['#b0d247','#007bd1']
chart_layer1 = alt.Chart(chart_data).mark_geoshape().encode(
alt.Color('properties.Cat2:O', scale=alt.Scale(domain=domain, range=range_),title = "sometitle"),
opacity=alt.Opacity('properties:OpacityVar:Q', bin=True),
).properties(
width=1100,
height=800
)
#Visualize the result
(background+chart_layer1).configure_view(stroke='white')
Additionally, the variable I am trying to use for the opacity argument has actually a very broad support (from 10.000 to 100Billions). Should I do a minmax normalization first?
Solved in the comments by splitting up the variables on two different encodings (opacity and color) instead of having them both on color.
Related
I have an altair chart where I am using mark_rectangle. I want to choose how many x axis labels are displayed to have the marks form squares in the visualization. Or perhaps I want to choose the range of the x axis labels. Right now there are far too many labels being displayed. Below I have an example of what I am trying to achieve and what my current output chart is. I apologize if the issue is due to something else, I am currently figuring out altair.
My code currently:
alt.Chart(temdf).mark_rect().encode(
x=alt.X('norm:O', title='', axis=alt.Axis(grid=False, labelAngle=360)),
y=alt.Y('term:N', title='', axis=alt.Axis(grid=False)),
color=alt.Color('norm:O', title='', scale=alt.Scale(scheme='blues'), legend=None),
facet=alt.Facet('title:N', title='',columns=3, header=alt.Header(labelOrient='bottom', labelPadding=15, labelAngle=360),
sort=alt.EncodingSortField(field = 'title', order='ascending'))
What I am trying to achieve:
My current output:
You have declared that your x data is type O, meaning ordinal, i.e. ordered categories. This says that you want one distinct x bin for each unique value in your dataset. If you want fewer ordinal x bins, you should use a dataset with fewer unique values.
Alternatively, if you don't want each unique x value to have its own label, you can use the quantitative data type (i.e. x=alt.X('norm:Q')), or perhaps bin your data x=alt.X('norm:O', bin=True). Be sure to bin your color encoding as well if you use the latter.
I'm making a simple pairplot with Seaborn in Python that shows different levels of a categorical variable by the color of plot elements across variables in a Pandas DataFrame. Although the plot comes out exactly as I want it, the categorical variable is binary, which makes the legend quite meaningless to an audience not familiar with the data (categories are naturally labeled as 0 & 1).
An example of my code:
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
Is there a way to change legend label text with pairplot? Or should I use PairGrid, and if so how would I approach this?
Found it! It was answered here: Edit seaborn legend
g = sns.pairplot(df, hue='categorical_var', palette='Set3')
g._legend.set_title(new_title)
Since you don't provide a full example of code, nor mock data, I will use my own codes to answer.
First solution
The easiest must be to keep your binary labels for analysis and to create a column with proper names for plotting. Here is a sample code of mine, you should grab the idea:
def transconum(morph):
if (morph == 'S'):
return 1.0
else:
return 0.0
CompactGroups['MorphNum'] = CompactGroups['MorphGal'].apply(transconum)
Second solution
Another way would be to overwrite labels on the flight. Here is a sample code of mine which works perfectly:
grid = sns.jointplot(x="MorphNum", y="PropS", data=CompactGroups, kind="reg")
grid.set_axis_labels("Central type", "Spiral proportion among satellites")
grid.ax_joint.set_xticks([0, 1, 1])
plt.xticks(range(2), ('$Red$', '$S$'))
I would like to create a figure that shows how much money people earned in a game (continuous variable) as a function of the categorical values of three other variables. The first variable is whether people were included or excluded prior to the Money game, the second variable is whether people knew their decision-making partner and the last is the round of the game (participants played 5 rounds with a known co-player and 5 rounds with an unknown co-player). I know how to do draw violin plots as a function of the values of two categorical variables using FacetGrid (see below) but I did not manage to add another layer to it.
g= sns.FacetGrid(df_long, col = 'XP_Social_Condition', size=5, aspect=1)
g.map(sns.boxplot, 'DM partner', 'Money', palette = col_talk)
I have created two dataframe versions: my initial one and a melted one (see image below). I have also tried to create two plots together using f, (ax_l, ax_r) = but this does not seem to take FacetGrid plots as plots within the plot... You can see below links to see the data and the kind of plot I would like to use as a subplot - one showing a known player and one showing the unknown player. I am happy to share the data if it would help.
I have now tried the solution proposed
grid = sns.FacetGrid(melted_df, hue='DM partner', col='XP_Social_Condition')
grid.map(sns.violinplot, 'Round', 'Money')
But it still does not work. This shows the plot shown below, with the third hue variable not showing well the different conditions.
here is the new figure I get - almost there
data - original and melted
Thank you very much for your help.
OK, so you want to create one plot of continuous data depending on three different categorical variables?
I think what you're looking for is:
grid = sns.FacetGrid(melted_df, col='XP_Social_Condition')
grid.map(sns.violinplot, 'Round', 'Money', 'DM partner').add_legend()
The col results in two plots, one for each value of XP_Social_Condition. The three values passed to grid.map split the data so 'Round' becomes the x-axis, 'money' the y-axis and 'DM partner' the color. You can play around and swap the values 'DM_partner', 'XP_Social_Condition' and 'Round'.
The result should now look something like this or this ('Round' and 'DM Partner' swapped).
I'm not sure if binning is the correct term, but I want to implement the following for a project I am working on:
I have an array or maybe a dict describing boundaries and/or regions, for example:
boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
The areas are indexed from 0 to 100 (for example). I want to classify each area into a color (that is less than the key in the dict) and then plot it. For example, if it is less than 10, it is red.
So far, I have:
boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
binned = []
for area in areas:
for border in boundaries.keys():
if area < border:
binned.append(boundaries[border])
break
Also, I need to figure out a way to define the colors and find a package to plot it. So if you have any ideas how can I plot a 2-D color plot (the actual project will be in 2-D). Maybe matplotlib or PIL? I have used matplotlib before but never for this type of data.
Also, is there a scipy/numpy function that already does what I'm trying to do? It would be nice if the code is short and fast. This is not for an assignment of any sort (it's for a little experiment / data project of mine), so I don't want to reinvent the wheel here.
import matplotlib.pyplot as plt
boundaries = collections.OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
n, bins, patches = plt.hist(areas, [0]+list(boundaries), histtype='bar', rwidth=1.0)
for (patch,color) in zip(patches,boundaries.values()):
patch.set_color(color)
plt.show()
I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives: