Plotting an Obscure Python Graph - python

I've been trying to work through an interesting problem but have had difficulty finding a proper solution. I am trying to plot columns of heatmaps, with each column potentially varying in row size. The data structure I currently have consists of a nested list, where each list contains various heat values for their points. I'll attach an image below to make this clear.
At this time we have mostly tried to make matplotlib to work, however we haven't been able to produce any of the results we want. Please let me know if you have any idea on what steps we should take next.
Thanks

I think the basic strategy would be to transform your initial array so that you have a rectangular array with the missing values coded as NaN. Then you can simply use imshow to display the heatmap.
I used np.pad to fill the missing values
data = [[0,0,1],[1,0,5,6,0]]
N_cols = len(data)
N_lines = np.max([len(a) for a in data])
data2 = np.array([np.pad(np.array(a, dtype=float), (0,N_lines-len(a)), mode='constant', constant_values=np.nan) for a in data])
fig, ax = plt.subplots()
im = ax.imshow(data2.T, cmap='viridis')
plt.colorbar(im)

Related

I have a large data set where the rows are a series of coordinates and need to plot specific rows

I have a very large dataset of coordinates that I need plot and specify specific rows instead of just editing the raw excel file.
The data is organized as so
frames xsnout ysnout xMLA yMLA
0 532.732971 503.774200 617.231018 492.803711
1 532.472351 504.891632 617.638550 493.078583
2 532.453552 505.676300 615.956116 493.2839
3 532.356079 505.914642 616.226318 494.179047
4 532.360718 506.818054 615.836548 495.555298
The column "frames" is the specific video frame for each of these coordinates (xsnout,ysnout) (xMLA,yMLA). Below is my code which is able to plot all frames and all data points without specifying the row
import numpy as np
import matplotlib.pyplot as plt
#import data
df = pd.read_excel("E:\\Clark\\Flow Tank\\Respirometry\\Cropped_videos\\F1\\MG\\F1_MG_4Hz_simplified.xlsx")
#different body points
ax1 = df.plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
How would I specify just a single row instead of plotting the whole dataset? And is there anyway to connect the coordinates of a single row with a line?
Thank you and any help would be greatly appreciated
How would I specify just a single row instead of plotting the whole dataset?
To do this you can slice your dataframe. There's a large variety of ways of doing this and they'll depend on exactly what you're trying to do. For instance, you can use df.iloc[] to specify which rows you want. This is short for index locator. Note the brackets! If you want to specify your rows by their row index (and same for columns), you have to use .loc[]. For example, the plot with the original data you provided is:
Slicing the dataframe with iloc:
ax1 = df.iloc[2:5, :].plot(kind='scatter', x='xsnout', y='ysnout', color='r', label='snout')
ax2 = df.iloc[2:5, :].plot(kind='scatter', x='xMLA', y='yMLA', color='g', ax=ax1)
Gives you this:
If you specify something like this, you get only a single line:
df.iloc[1:2, :]
And is there anyway to connect the coordinates of a single row with a line?
What exactly do you mean by this? You want to connect the points (xsnout, ysnout) with (xMLA, yMLA)? If that's so, then you can do it with this:
plt.plot([df['xsnout'], df['xMLA']], [df['ysnout'], df['yMLA']])

How do I create a count plot with multiple columns without the axes being stored in a numpy.ndarray?

I'm new to coding and this is my first post. Sorry if it could be worded better!
I'm taking a free online course, and for one of the projects I have to make a count plot with 2 subplot columns.
I've managed to make a count plot with multiple subplots using the code below, and all of the values are correct.
fig = sns.catplot(x = 'variable', hue = 'value', order = ['active', 'alco', 'cholesterol', 'gluc', 'overweight', 'smoke'], col='cardio', data = df_cat, kind = 'count')
But because of the way I've done it, the fig.axes is stored in a 2 dimensional array. The only difference between both rows of the array is the title (cardio = 0 or cardio = 1). I'm assuming this is because of the col='cardio'. Does the col argument always cause the fig.axes to be stored in a 2D array? Is there a way around this or do I have to completely change how I'm making my graph?
I'm sure it's not usually a problem, but because of this, when I run my program through the test module, it fails since some of the functions in the test module don't work on numpy.ndarrays.
I pass the test if I change the reference from fig.axes[0] to fig.axes[0,0], but obviously I cant just change the test module to pass.
I found something. This is just an implementation detail, so it would be nuts to rely on it. If you set col_wrap, then you get an axes ndarray of a different shape.
Reproduced like this:
import seaborn as sns
# I don't have your data but I have this example
tips = sns.load_dataset("tips")
fig = sns.catplot(x='day', hue='sex', col='time', data=tips, kind='count', col_wrap=2)
fig.axes.shape
And it has shape (2,) i.e it's 1D. seaborn==0.11.2.

3D structured array plotting having different sizes

I am basically trying to create a 3-D plot. I have a 3D array and it has different sizes. Let's exemplify it.
I import my data however the issue is that I need to implement a code giving me 3D plot of this data. I tried the following but that's saying **shape mismatch: objects cannot be broadcast to a single shape**
It's kind of structured data. Then, this is how the data looks like in MATLAB
col = ['UN_DTE','UN_DTF','UN_SM05','UN_SM10','UN_SM15','D1_DTE', 'D1_DTF', 'D1_SM05', 'D1_SM10', 'D1_SM15', 'D2_DTE','D2_DTF','D2_SM05','D2_SM10','D2_SM15','D3_DTE','D3_DTF','D3_SM05','D3_SM10','D3_SM15']
df = []
for i in range(len(col)):
df.append(pd.DataFrame(mat[col[i]]))
fig = plt.figure()
ax = plt.axes(projection='3d')
xup = pd.DataFrame(df[0][0][0]).shape[0]
yup = pd.DataFrame(df[0][0][0]).shape[1]
xline = np.arange(0,xup,1)
yline = np.arange(0,yup,1)
X_line, Y_line = np.meshgrid(xline, yline)
ax.plot_surface(X_line, Y_line, df[0][0][0])
plt.show()
If I modify the last row as ax.plot_surface(X_line, Y_line, df[0][0][0][0]) it says:
ValueError: Argument Z must be 2-dimensional.
I'd appreciate if any of you give me some tips about how to plot them exactly. In addition to this, I was also wondering if I am able plot whole data of that input in one figure, let's say UN_DTE.

How to make a heatmap in python with aggregated/summarized data?

I'm trying to plot some X and Z coordinates on an image to show which parts of the image have higher counts. Y values are height in this case so I am excluding since I want 2D. Since I have many millions of data points, I have grouped by the combinations of X and Z coordinates and counted how many times that value occurred. The data should contain almost all conbinations of X and Z coordinates. It looks something like this (fake data):
I have experimented with matplotlib.pyplot by using the plt.hist2d(x,y) function but it seems like this takes raw data and not already-summarized data like I've got.
Does anyone know if this is possible?
Note: I can figure out the plotting on an image part later, first I'm trying to get the scatter-plot/heatmap to show aggregated data.
I managed to figure this out. After loading in the data in the format of the original post, step one is pivoting the data so you have x values as columns and z values as rows. Then you plot it using seaborn heatmap. See below:
#pivot columns
values = pd.pivot_table(raw, values='COUNT_TICKS', index=['Z_LOC'], columns = ['X_LOC'], aggfunc=np.sum)
plt.figure(figsize=(20, 20))
sns.set(rc={'axes.facecolor':'cornflowerblue', 'figure.facecolor':'cornflowerblue'})
#ax = sns.heatmap(values, vmin=100, vmax=5000, cmap="Oranges", robust = True, xticklabels = x_labels, yticklabels = y_labels, alpha = 1)
ax = sns.heatmap(values,
#vmin=1,
vmax=1000,
cmap="Greens", #BrBG is also good
robust = True,
alpha = 1)
plt.show()

How to control the cell size of a pyplot pcolor heatmap?

I have a pair of lists of numbers representing points in a 2-D space, and I want to represent the y/x ratios for these points as a 1-dimensional heatmap, with a diverging color map centered around 1, or the logs of my ratios, with a diverging color map centered around 0.
How do I do that?
My current attempt (borrowing somewhat from Heatmap in matplotlib with pcolor?):
from matplotlib import numpy as np
import matplotlib.pyplot as plt
# There must be a better way to generate arrays of random values
x_values = [np.random.random() for _ in range(10)]
y_values = [np.random.random() for _ in range(10)]
labels = list("abcdefghij")
ratios = np.asarray(y_values) / np.asarray(x_values)
axis = plt.gca()
# I transpose the array to get the points arranged vertically
heatmap = axis.pcolor(np.log2([ratios]).T, cmap=plt.cm.PuOr)
# Put labels left of the colour cells
axis.set_yticks(np.arange(len(labels)) + 0.5, minor=False)
# (Not sure I get the label order correct...)
axis.set_yticklabels(labels)
# I don't want ticks on the x-axis: this has no meaning here
axis.set_xticks([])
plt.show()
Some points I'm not satisfied with:
The coloured cells I obtain are horizontally-elongated rectangles. I would like to control the width of these cells and obtain a column of cells.
I would like to add a legend for the color map. heatmap.colorbar = plt.colorbar() fails with RuntimeError: No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf).
One important point:
matplotlib/pyplot always leaves me confused: there seems to be a lot of ways to do things and I get lost in the documentation. I never know what would be the "clean" way to do what I want: I welcome suggestions of reading material that would help me clarify my very approximative understanding of these things.
Just 2 more lines:
axis.set_aspect('equal') # X scale matches Y scale
plt.colorbar(mappable=heatmap) # Tells plt where it should find the color info.
Can't answer your final question very well. Part of it is due to we have two branches of doing things in matplotlib: the axis way (axis.do_something...) and the MATLAB clone way plt.some_plot_method. Unfortunately we can't change that, and it is a good feature for people to migrate into matplotlib. As far as the "Clean way" is concerned, I prefer to use whatever produces the shorter code. I guess that is inline with Python motto: Simple is better than complex and Readability counts.

Categories

Resources