Concatenate np arrays in pandas dataframe and plot - python

I have the following dataframe:
df =
sample measurements
1 [0.2, 0.22, 0.3, 0.7, 0.4, 0.35, 0.2]
2 [0.2, 0.17, 0.6, 0.6, 0.54, 0.32, 0.2]
5 [0.2, 0.39, 0.40, 0.53, 0.41, 0.3, 0.2]
7 [0.2, 0.29, 0.46, 0.68, 0.44, 0.35, 0.2]
The data type in df['measurements'] is a 1-D np.array. I'm trying to concatenate each np.array in the column "measurements" and plot it as a time series, but issue is that the samples are discontinuous, and the interval between points is not consistent due to missing data. What is the best way I can concatenate the arrays and plot them such that there is just a gap in the plot between samples 2 and 5 and 5 and 7?

Depending on how you want to use the data, you can either convert the individual elements to new rows ("long form"), or create new columns ("wide form").
Convert to new rows
This is the preferred format for seaborn. explode() creates new rows from the array elements. Optionally, groupby() together with cumcount() can add a position.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'sample': [1, 2, 5, 7],
'measurements': [np.array([0.2, 0.22, 0.3, 0.7, 0.4, 0.35, 0.2]),
np.array([0.2, 0.17, 0.6, 0.6, 0.54, 0.32, 0.2]),
np.array([0.2, 0.39, 0.40, 0.53, 0.41, 0.3, 0.2]),
np.array([0.2, 0.29, 0.46, 0.68, 0.44, 0.35, 0.2])]})
df1 = df.explode('measurements', ignore_index=True)
df1['position'] = df1.groupby('sample').cumcount() + 1
sns.lineplot(df1, x='sample', y='measurements', hue='position', palette='bright')
plt.show()
Convert to new columns
If all arrays have the same length, each element can be converted to a new column. This is how pandas usually prefers to organize it data. New columns are created by applying to_list on the original column.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'sample': [1, 2, 5, 7],
'measurements': [np.array([0.2, 0.22, 0.3, 0.7, 0.4, 0.35, 0.2]),
np.array([0.2, 0.17, 0.6, 0.6, 0.54, 0.32, 0.2]),
np.array([0.2, 0.39, 0.40, 0.53, 0.41, 0.3, 0.2]),
np.array([0.2, 0.29, 0.46, 0.68, 0.44, 0.35, 0.2])]})
df2 = pd.DataFrame(df['measurements'].to_list(),
columns=[f'measurement{i + 1}' for i in range(7)],
index=df['sample'])
df2.plot()
plt.show()

Related

Selecting items on a matrix based on indexes given by an array

Consider this matrix:
[0.9, 0.45, 0.4, 0.35],
[0.4, 0.8, 0.3, 0.25],
[0.5, 0.45, 0.9, 0.35],
[0.2, 0.18, 0.8, 0.1],
[0.6, 0.45, 0.4, 0.9]
and this list:
[0,1,2,3,3]
I want to create a list that looks like the following:
[0.9, 0.8, 0.9, 0.1, 0.9]
To clarify, for each row, I want the element of the matrix whose column index is contained in the first array. How can I accomplish this?
Zip the two lists together as below
a=[[0.9, 0.45, 0.4, 0.35],[0.4, 0.8, 0.3, 0.25],[0.5, 0.45, 0.9, 0.35],[0.2, 0.18, 0.8, 0.1],[0.6, 0.45, 0.4, 0.9]]
b=[0,1,2,3,3]
[i[j] for i,j in zip(a,b)]
Result
[0.9, 0.8, 0.9, 0.1, 0.9]
This basically pairs up each sublist in the matrix with the element of your second list in order with zip(a,b)
Then for each pair you choose the bth element of a
If this is a numpy array, you can pass in two numpy arrays to access the desired indices:
import numpy as np
data = np.array([[0.9, 0.45, 0.4, 0.35],
[0.4, 0.8, 0.3, 0.25],
[0.5, 0.45, 0.9, 0.35],
[0.2, 0.18, 0.8, 0.1],
[0.6, 0.45, 0.4, 0.9]])
indices = np.array([0,1,2,3,3])
data[np.arange(data.shape[0]), indices]
This outputs:
[0.9 0.8 0.9 0.1 0.9]
In the first array [0, 1, 2, 3, 3], the row is determined by the index of the each element, and the value at that index is the column. This is a good case for enumerate:
matrix = [[ ... ], [ ... ], ...] # your matrix
selections = [0, 1, 2, 3, 3]
result = [matrix[i][j] for i, j in enumerate(selections)]
This will be much more efficient than looping through the entire matrix.
Loop through both arrays together using the zip function.
def create_array_from_matrix(matrix, indices):
if len(matrix) != len(indices):
return None
res = []
for row, index in zip(matrix, indices):
res.append(row[index])
return res

Hungarian algorithm in Python for non-square cost matrices

I want to use the Hungarian assignment algorithm in python on a non-square numpy array.
My input matrix X looks like this:
X = np.array([[0.26, 0.64, 0.16, 0.46, 0.5 , 0.63, 0.29],
[0.49, 0.12, 0.61, 0.28, 0.74, 0.54, 0.25],
[0.22, 0.44, 0.25, 0.76, 0.28, 0.49, 0.89],
[0.56, 0.13, 0.45, 0.6 , 0.53, 0.56, 0.05],
[0.66, 0.24, 0.61, 0.21, 0.47, 0.31, 0.35],
[0.4 , 0.85, 0.45, 0.14, 0.26, 0.29, 0.24]])
The desired result is the matrix ordered such as X becomes X_desired_output:
X_desired_output = np.array([[0.63, 0.5 , 0.29, 0.46, 0.26, 0.64, 0.16],
[0.54, 0.74, 0.25, 0.28, 0.49, 0.12, 0.61],
[[0.49, 0.28, 0.89, 0.76, 0.22, 0.44, 0.25],
[[0.56, 0.53, 0.05, 0.6 , 0.56, 0.13, 0.45],
[[0.31, 0.47, 0.35, 0.21, 0.66, 0.24, 0.61],
[[0.29, 0.26, 0.24, 0.14, 0.4 , 0.85, 0.45]])
Here I would like to maximize the cost and not minimize so the input to the algorithm would be in theory either 1-X or simply X.
I have found https://software.clapper.org/munkres/ that leads to:
from munkres import Munkres
m = Munkres()
indices = m.compute(-X)
indices
[(0, 5), (1, 4), (2, 6), (3, 3), (4, 0), (5, 1)]
# getting the indices in list format
ii = [i for (i,j) in indices]
jj = [j for (i,j) in indices]
How can I use these to sort X ? jjonly contain 6 elements as opposed to the original 7 columns of X.
I am looking to actually get the matrix sorted.
After spending some hours working on it, I found a solution. The problem was due to the fact that X.shape[1] > X.shape[0], some columns are not assigned at all and this leads to the problem.
The documentation states that
"The Munkres algorithm assumes that the cost matrix is square.
However, it’s possible to use a rectangular matrix if you first pad it
with 0 values to make it square. This module automatically pads
rectangular cost matrices to make them square."
from munkres import Munkres
m = Munkres()
indices = m.compute(-X)
indices
[(0, 5), (1, 4), (2, 6), (3, 3), (4, 0), (5, 1)]
# getting the indices in list format
ii = [i for (i,j) in indices]
jj = [j for (i,j) in indices]
# re-order matrix
X_=X[:,jj] # re-order columns
X_=X_[ii,:] # re-order rows
# HERE IS THE TRICK: since the X is not diagonal, some columns are not assigned to the rows !
not_assigned_columns = X[:, [not_assigned for not_assigned in np.arange(X.shape[1]).tolist() if not_assigned not in jj]].reshape(-1,1)
X_desired = np.concatenate((X_, not_assigned_columns), axis=1)
print(X_desired)
array([[0.63, 0.5 , 0.29, 0.46, 0.26, 0.64, 0.16],
[0.54, 0.74, 0.25, 0.28, 0.49, 0.12, 0.61],
[0.49, 0.28, 0.89, 0.76, 0.22, 0.44, 0.25],
[0.56, 0.53, 0.05, 0.6 , 0.56, 0.13, 0.45],
[0.31, 0.47, 0.35, 0.21, 0.66, 0.24, 0.61],
[0.29, 0.26, 0.24, 0.14, 0.4 , 0.85, 0.45]])

Creating a color map in python from a list of RGB colors

I want to create a color map in python similar to this image:
but my map only consists of 3 rows and 4 columns. And I want to assign a certain color value to each square using RGB values. I have tried this code
colors=np.array([[0.01, 0.08, 0.01], [0.01, 0.16, 0.01], [0.01, 0.165, 0.01], [0.01, 0.3, 0.01],
[0.01, 0.2, 0.01], [0.666, 0.333, 0.01], [0.01, 0.165, 0.01], [0.01, 0.3, 0.01],
[0.01, 0.2, 0.01], [0.666, 0.333, 0.01], [0.01, 0.165, 0.01], [0.01, 0.3, 0.01]])
fig, ax=plt.subplots()
ax.imshow(colors)
ax.set_aspect('equal')
plt.show()
but the output does not match my expectations. It seems that with this method I cannot use the RGB values to represent the color for a square. Can anyone help me, please? Thank you!
You have a (12,3) colors array, while you need a (4, 3, 3) image, one RGB color per pixel.
import numpy as np # type: ignore
import matplotlib.pyplot as plt # type: ignore
colors = np.array(
[
# this is the first row
[
# these are the 3 pixels in the first row
[0.01, 0.08, 0.01],
[0.01, 0.16, 0.01],
[0.01, 0.165, 0.01],
],
[
[0.01, 0.3, 0.01],
[0.01, 0.2, 0.01],
[0.666, 0.333, 0.01],
],
[
[0.01, 0.165, 0.01],
[0.01, 0.3, 0.01],
[0.01, 0.2, 0.01],
],
# this is the fourth row
[
[0.666, 0.333, 0.01],
[0.01, 0.165, 0.01],
[0.01, 0.3, 0.01],
],
]
)
print(colors.shape)
fig, ax = plt.subplots()
ax.imshow(colors)
ax.set_aspect("equal")
plt.show()
Rearrange the data as you need in rows/columns.

How do I grab random elements on python from paired lists?

I tried to compare drop height versus rebound height and have some data here:
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
I want to select 5 random data points off of these variables, so I tried
smol_drop_heights = []
smol_rebound_heights = []
for each in range(0,5):
smol_drop_heights.append(drop_heights[randint(0, 9)])
smol_rebound_heights.append(rebound_heights[randint(0, 9)])
print(smol_drop_heights)
print(smol_rebound_heights)
When they print, they print different sets of data, and sometimes even repeat data, how do I fix this?
[0.8, 1.6, 0.6, 0.2, 0.12]
[1.02, 1.15, 0.88, 0.88, 0.6]
Here is a sample output, where you can see .88 is repeated.
A simple way to avoid repetitions and keep the data points paired and randomly sort the pairs:
from random import random
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
pairs = list(sorted(zip(drop_heights, rebound_heights), key=lambda _: random()))[:5]
smol_drop_heights = [d for d, _ in pairs]
smol_rebound_heights = [r for _, r in pairs]
One way to do it would be:
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
indices = [*range(len(drop_heights))]
from random import shuffle
shuffle(indices)
smol_drop_heights = []
smol_rebound_heights = []
for each in indices:
smol_drop_heights.append(drop_heights[each])
smol_rebound_heights.append(rebound_heights[each])
print(smol_drop_heights)
print(smol_rebound_heights)
Output:
[1.7, 0.8, 1.6, 1.2, 0.2, 0.4, 1.4, 2.0, 1.0, 0.6]
[1.34, 0.6, 1.15, 0.88, 0.16, 0.3, 1.02, 1.51, 0.74, 0.46]
Or, much shorter:
from random import sample
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
paired = [*zip(drop_heights, rebound_heights)]
smol_drop_heights, smol_rebound_heights = zip(*sample(paired,5))
print(smol_drop_heights[:5])
print(smol_rebound_heights[:5])
Here"s what I would do.
import random
import numpy as np
k=5
drop_heights = np.array([0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0])
rebound_heights = np.array([0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51])
idx = random.sample(range(len(drop_heights )), k)
print(drop_heights[idx])
print(rebound_heights [idx])
You could try shuffling and then use the index of the original items like,
>>> drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
>>> rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
>>>
>>> import random
>>> d = drop_heights[:] # keep a copy to get index for making pairs later
>>> random.shuffle(drop_heights)
>>> # iterate through the new list and get the index of the item
>>> # from the original lists
>>> nd, nr = zip(*[(x,rebound_heights[d.index(x)]) for x in drop_heights])
>>> nd[:5]
(1.4, 0.6, 1.7, 0.2, 1.0)
>>> nr[:5]
(1.02, 0.46, 1.34, 0.16, 0.74)
or just use operator.itemgetter and random.sample like,
>>> drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
>>> rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
>>>
>>> import random, operator
>>> indexes = random.sample(range(len(drop_heights)), 5)
>>> indexes
[5, 0, 4, 7, 3]
>>> f = operator.itemgetter(*indexes)
>>> f(drop_heights)
(1.2, 0.2, 1.0, 1.6, 0.8)
>>> f(rebound_heights)
(0.88, 0.16, 0.74, 1.15, 0.6)
Your problem is that when you call randint, it gives a different random number each time. To solve this you would need to save an index variable, to a random number, each time the code loops, so that you add the same random variable each time.
for each in range(0, 4):
index = randint(0, 9)
smol_drop_heights.append(drop_heights[index])
smol_rebound_heights.append(rebound_heights[index])
print(smol_drop_heights)
print(smol_rebound_heights)
To solve the problem about repeats, just check if the lists already have the variable you want to add, you could do it with either variable, as neither of them have repeats in them, and since there may be repeats, a for loop will not be sufficient, so you will have to repeat until the lists are full.
So my final solution is:
while True:
index = randint(0, 9)
if drop_heights[index] not in smol_drop_heights:
smol_drop_heights.append(drop_heights[index])
smol_rebound_heights.append(rebound_heights[index])
if len(smol_drop_heights) == 4:
break
print(smol_drop_heights)
print(smol_rebound_heights)
And since you may want to arrange those value in order, you may do this:
smol_drop_heights = []
smol_rebound_heights = []
while True:
index = randint(0, 9)
if drop_heights[index] not in smol_drop_heights:
smol_drop_heights.append(drop_heights[index])
smol_rebound_heights.append(rebound_heights[index])
if len(smol_drop_heights) == 4:
smol_drop_heights.sort()
smol_rebound_heights.sort()
break
print(smol_drop_heights)
print(smol_rebound_heights)
Ok, you want to do two things, pair your lists. The idiomatic way to do this is to use zip:
drop_heights = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.7, 2.0]
rebound_heights = [0.16, 0.30, 0.46, 0.6, 0.74, 0.88, 1.02, 1.15, 1.34, 1.51]
paired = list(zip(drop_heights, rebound_heights))
Then, you want to sample five pairs from this. So use random.sample:
sampled = random.sample(paired, 5)
Finally, if you need them to be in seperate lists (you probably don't, but if you must), you can unpack it like this:
smol_drop_heights, smol_rebound_heights = zip(*sampled)
You can actually just do this in all at once, although it might become a bit unreadable:
smol_drop_heights, smol_rebound_heights = zip(*random.sample(list(zip(drop_heights, rebound_heights)), 5))

How to share axes after adding subplots via add_subplot?

I have a dataframe like this:
df = pd.DataFrame({'A': [0.3, 0.2, 0.5, 0.2], 'B': [0.1, 0.0, 0.3, 0.1], 'C': [0.2, 0.5, 0.0, 0.7], 'D': [0.6, 0.3, 0.4, 0.6]}, index=list('abcd'))
A B C D
a 0.3 0.1 0.2 0.6
b 0.2 0.0 0.5 0.3
c 0.5 0.3 0.0 0.4
d 0.2 0.1 0.7 0.6
Now I want to plot each row as a barplot whereby the y-axis and the x-tick-labels are shared using add_subplot.
Until now, I can only produce a plot that looks like this:
There is one problem:
The axes are not shared, how one do this after using add_subplot? Here, this problem is solved by creating one huge subplot; is there any way to do this in a different manner?
My desired outcome looks like the plot above with the only difference, that there are no x-tick-labels in the upper row and now y-tick-labels in the right column.
My current attempt is the following:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({'A': [0.3, 0.2, 0.5, 0.2], 'B': [0.1, 0.0, 0.3, 0.1], 'C': [0.2, 0.5, 0.0, 0.7], 'D': [0.6, 0.3, 0.4, 0.6]}, index=list('abcd'))
fig = plt.figure()
bar_width = 0.35
counter = 1
index = np.arange(df.shape[0])
for indi, rowi in df.iterrows():
ax = fig.add_subplot(2, 2, counter)
ax.bar(index, rowi.values, width=bar_width, tick_label=df.columns)
ax.set_ylim([0., 1.])
ax.set_title(indi, fontsize=20)
ax.set_xticks(index + bar_width / 2)
counter += 1
plt.xticks(index + bar_width / 2, df.columns)
The question how to produce shared subplots in matplotlib:
The SO seach engine results
The matplotlib recipes or the examples page
What may be more interesting here, is that you could also directly use pandas to create the plot in a single line:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A': [0.3, 0.2, 0.5, 0.2], 'B': [0.1, 0.0, 0.3, 0.1], 'C': [0.2, 0.5, 0.0, 0.7], 'D': [0.6, 0.3, 0.4, 0.6]}, index=list('abcd'))
df.plot(kind="bar", subplots=True, layout=(2,2), sharey=True, sharex=True)
plt.show()

Categories

Resources