How to make a heatmap in python with aggregated/summarized data? - python

I'm trying to plot some X and Z coordinates on an image to show which parts of the image have higher counts. Y values are height in this case so I am excluding since I want 2D. Since I have many millions of data points, I have grouped by the combinations of X and Z coordinates and counted how many times that value occurred. The data should contain almost all conbinations of X and Z coordinates. It looks something like this (fake data):
I have experimented with matplotlib.pyplot by using the plt.hist2d(x,y) function but it seems like this takes raw data and not already-summarized data like I've got.
Does anyone know if this is possible?
Note: I can figure out the plotting on an image part later, first I'm trying to get the scatter-plot/heatmap to show aggregated data.

I managed to figure this out. After loading in the data in the format of the original post, step one is pivoting the data so you have x values as columns and z values as rows. Then you plot it using seaborn heatmap. See below:
#pivot columns
values = pd.pivot_table(raw, values='COUNT_TICKS', index=['Z_LOC'], columns = ['X_LOC'], aggfunc=np.sum)
plt.figure(figsize=(20, 20))
sns.set(rc={'axes.facecolor':'cornflowerblue', 'figure.facecolor':'cornflowerblue'})
#ax = sns.heatmap(values, vmin=100, vmax=5000, cmap="Oranges", robust = True, xticklabels = x_labels, yticklabels = y_labels, alpha = 1)
ax = sns.heatmap(values,
#vmin=1,
vmax=1000,
cmap="Greens", #BrBG is also good
robust = True,
alpha = 1)
plt.show()

Related

Dataframe Ploting -Plotly Line chart Single X values vs Multiple Y Values

I have a data frame as shown below
I need to plot it's line chart using plotly with X axis a "Supply[V]" and Y axis a all the columns
shown in the blue box.
Below is my code ,but no output is coming.
Vcm_Settle_vs_supply_funct = px.line(df_vcm_set_funct_mode1, x = 'Supply[V]', y = df_vcm_set_funct_mode1.columns[5:-9])
Vcm_Settle_vs_supply_funct.show()
But no output is coming may I know where I went wrong
Is the column designation correct? I created a code with the data you presented. It just looks like two lines because the data is almost two different types. I have changed the graph size and added a scale range detail for the y-axis.
import plotly.express as px
fig = px.line(df,x='Supply[V]', y=['VCM_10ms','VCM_20ms','VCM_5s','VCM_DEL1A_10ms','VCNI_DELIA_20ms'])
fig.update_yaxes(tickvals=np.arange(-0.1, 1.5, 0.05))
fig.update_layout(height=600)
fig.show()

I have a large data set where the rows are a series of coordinates and need to plot specific rows

I have a very large dataset of coordinates that I need plot and specify specific rows instead of just editing the raw excel file.
The data is organized as so
frames xsnout ysnout xMLA yMLA
0 532.732971 503.774200 617.231018 492.803711
1 532.472351 504.891632 617.638550 493.078583
2 532.453552 505.676300 615.956116 493.2839
3 532.356079 505.914642 616.226318 494.179047
4 532.360718 506.818054 615.836548 495.555298
The column "frames" is the specific video frame for each of these coordinates (xsnout,ysnout) (xMLA,yMLA). Below is my code which is able to plot all frames and all data points without specifying the row
import numpy as np
import matplotlib.pyplot as plt
#import data
df = pd.read_excel("E:\\Clark\\Flow Tank\\Respirometry\\Cropped_videos\\F1\\MG\\F1_MG_4Hz_simplified.xlsx")
#different body points
ax1 = df.plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
How would I specify just a single row instead of plotting the whole dataset? And is there anyway to connect the coordinates of a single row with a line?
Thank you and any help would be greatly appreciated
How would I specify just a single row instead of plotting the whole dataset?
To do this you can slice your dataframe. There's a large variety of ways of doing this and they'll depend on exactly what you're trying to do. For instance, you can use df.iloc[] to specify which rows you want. This is short for index locator. Note the brackets! If you want to specify your rows by their row index (and same for columns), you have to use .loc[]. For example, the plot with the original data you provided is:
Slicing the dataframe with iloc:
ax1 = df.iloc[2:5, :].plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.iloc[2:5, :].plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
Gives you this:
If you specify something like this, you get only a single line:
df.iloc[1:2, :]
And is there anyway to connect the coordinates of a single row with a line?
What exactly do you mean by this? You want to connect the points (xsnout, ysnout) with (xMLA, yMLA)? If that's so, then you can do it with this:
plt.plot([df['xsnout'], df['xMLA']], [df['ysnout'], df['yMLA']])

How to draw a continuous contour plot with discrete coordinate data (DataFrame form)?

The row data has 3 columns and cannot shape a uniform grid based on 'x'&'z', so I am not able to plot the contour as the existed question: Create Contour Plot from Pandas Groupby Dataframe.
The row data is attached here (updated): https://drive.google.com/drive/folders/1nm84hJynYK0d6J-ToRT9oiZZFt942RnJ?usp=sharing
I attempted to divide the data into 2 groups by z values but get a plot with blank areas:
df1 = pd.read_pickle('sample.pkl')
zone1 = df1[(df1['z'].between(0,4.8))]
zone2 = df1[(df1['z'].between(4.8,30))]
piv1 = zone1.pivot('x','z')
piv2 = zone2.pivot('x','z')
fig = plt.figure(figsize=(20,10),dpi=300)
vmin= 5.2e-6
vmax= 9e-6
levels=np.linspace(vmin,vmax,50)
ax1= fig.add_subplot(1,1,1)
X1=piv1.columns.levels[1].values
Y1=piv1.index.values
Z1=piv1.values
Xi1,Yi1 = np.meshgrid(X1, Y1)
X2=piv2.columns.levels[1].values
Y2=piv2.index.values
Z2=piv2.values
Xi2,Yi2 = np.meshgrid(X2, Y2)
cs1 = ax1.contourf(Yi1, Xi1, Z1, levels=levels,vmax=vmax,vmin=vmin,alpha=0.9, cmap=plt.cm.jet)
cs2 = ax1.contourf(Yi2, Xi2, Z2, levels=levels,vmax=vmax,vmin=vmin,alpha=0.9, cmap=plt.cm.jet)
I have also attempted 2D interpolation but cannot use the scipy.interpolate.interp2d right.
How can I get the continuous contourf without blank areas when part of the data is lost?
Update:
When I don't divide them and use pivot for plotting, it shows as below:
The row data has the characteristics:
Ok, I think I understand the problem now and I agree you can't just plot the whole thing at once.
I think the quickest fix that would look like you want would be to do some quick interpolation and then make the plot.
piv0 = df1.pivot('x','z')
X0=piv0.columns.levels[1].values
Y0=piv0.index.values
Z0=piv0.values
Z0int = piv0.interpolate(method='linear',limit_direction='both')
cs0 = ax1.contourf(Yi0, Xi0, Z0int, levels=levels,vmax=vmax,vmin=vmin,alpha=0.9, cmap=plt.cm.jet)
This interpolation is pretty crude (1D), but I think it looks as you wanted. If you want something a little better I would go to scipy (as you suggested) and do the interpolation you want and then do the plotting. I think griddata would be better than interp2d though. Some examples here: https://scipython.com/book/chapter-8-scipy/examples/two-dimensional-interpolation-with-scipyinterpolategriddata/

How to plot the legend of a set of data with different color label in Mathplotlib

I have a 1:1 plot in which the dot colour are different based on the condition (A-F), which comes from the same data frame column.
df is a data frame with data for every 1 min. df60 is a data frame with data for every 1 hour.
plt.figure()
colors = {'A':'green', 'B':'aqua', 'C':'blue','D':'black','E':'yellow','F':'red'}
x = df['Method1'].loc['2020-01-01 00:00':'2020-01-15 23:59'].resample('h').mean()
y = df['Method2'].loc['2020-01-01 00:00':'2020-01-15 23:59'].resample('h').mean()
plt.scatter(x, y, c=df60['Method1'].loc['2020-01-01 00:00':'2020-01-15 23:59'].map(colors))
plt.show()
I have tried to plot the legend showing that which is A-F. However, since the data comes from the same column, it does not show what I am expecting. Are there any methods which help me to show the legend properly without breaking the column into several columns?
You can define the legend manually by, for instance:
handles=[Line2D([0],[0],label=k,marker="o",markerfacecolor=v,markeredgecolor=v,linestyle="None") for k,v in colors.items()]
plt.legend(handles=handles)
This should produce:
I hope this helps. Not really sure if there is a more elegant solution, though...

Plotting an Obscure Python Graph

I've been trying to work through an interesting problem but have had difficulty finding a proper solution. I am trying to plot columns of heatmaps, with each column potentially varying in row size. The data structure I currently have consists of a nested list, where each list contains various heat values for their points. I'll attach an image below to make this clear.
At this time we have mostly tried to make matplotlib to work, however we haven't been able to produce any of the results we want. Please let me know if you have any idea on what steps we should take next.
Thanks
I think the basic strategy would be to transform your initial array so that you have a rectangular array with the missing values coded as NaN. Then you can simply use imshow to display the heatmap.
I used np.pad to fill the missing values
data = [[0,0,1],[1,0,5,6,0]]
N_cols = len(data)
N_lines = np.max([len(a) for a in data])
data2 = np.array([np.pad(np.array(a, dtype=float), (0,N_lines-len(a)), mode='constant', constant_values=np.nan) for a in data])
fig, ax = plt.subplots()
im = ax.imshow(data2.T, cmap='viridis')
plt.colorbar(im)

Categories

Resources