I am trying to create a histogram for each rolling window across a DataFrame. The rolling function in Python (df.WaveData.rolling(14).mean()) can be used for calculating sum or average, but how can we use it to plot histogram for data in each window?
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-10, 10, 1000)
y = np.sin(x)
plt.plot(x, y)
plt.show()
df = pd.DataFrame(y, columns=['WaveData'])
print(df)
print(df.WaveData.rolling(14).mean())
**Ideal**:
for data in window:
histogram(data_in_window)
n, edges = np.histogram(data, bins=25)
Here you go:
import matplotlib.pyplot as plt
import pandas as pd
# generate random dataframe
df = pd.DataFrame(np.random.randint(0,1000,size=(1000, 4)), columns=list('ABCD'))
window_size = 100
for i in range(len(df.A.values)):
window = df.A.values[i:i+window_size]
n, bins, patches = plt.hist(window, 25)
plt.show()
Related
I have a dataframe which I drawed as you can see the figure and codes below;
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
df = pd.read_excel('nötronn.xlsx')
fig, ax = plt.subplots(figsize=(20,40))
ax1 = plt.subplot2grid((1,5), (0,0), rowspan=1, colspan = 1)
ax1.plot(df["N/F*10"], df['Depth'], color = "green", linewidth = 0.5)
ax1.set_xlabel("Porosity")
ax1.xaxis.label.set_color("green")
ax1.set_xlim(10, 50)
ax1.set_ylabel("Depth (m)")
ax1.tick_params(axis='x', colors="green")
ax1.spines["top"].set_edgecolor("green")
ax1.title.set_color('green')
ax1.set_xticks([10, 20, 30, 40, 50])
I want to filter data so that I can realize the differences better. I tried these:
z = np.polyfit(df["N/F*10"], df['Depth'], 2)
p = np.poly1d(z)
plt.plot(df["N/F*10"], p(df["N/F*10"]))
But it gives :LinAlgError: SVD did not converge in Linear Least Squares
How can I solve it? Thanks.
Output expectation:
This works!
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
from statsmodels.nonparametric.smoothers_lowess import lowess
data = pd.read_excel('nötronn.xlsx')
sub_data = data[data['Depth'] > 21.5]
result = lowess(sub_data['Eksi'], sub_data['Depth'].values)
x_smooth = result[:,0]
y_smooth = result[:,1]
tot_result = lowess(data['Eksi'], data['Depth'].values, frac=0.01)
x_tot_smooth = tot_result[:,0]
y_tot_smooth = tot_result[:,1]
fig, ax = plt.subplots(figsize=(20, 8))
##ax.plot(data.depth.values, data['N/F*10'], label="raw")
ax.plot(x_tot_smooth, y_tot_smooth, label="lowess 1%", linewidth=3, color="g")
ax.plot(data['GR-V121B-ETi'])
ax.plot(data['Caliper'], linestyle = 'dashed')
I want to create a heatmap out of 3 1dimensional arrays. Something that looks like this:
Up to this point, I was only able to create a scatter plot where the markers have a different color and marker size depending on the third value:
My code:
xf = np.random.rand(1000)
yf = np.random.rand(1000)
zf = 1e5*np.random.rand(1000)
ms1 = (zf).astype('int')
from matplotlib.colors import LinearSegmentedColormap
# Remove the middle 40% of the RdBu_r colormap
interval = np.hstack([np.linspace(0, 0.4), np.linspace(0.6, 1)])
colors = plt.cm.RdBu_r(interval)
cmap = LinearSegmentedColormap.from_list('name', colors)
col = cmap(np.linspace(0,1,len(ms1)))
#for i in range(len(ms1)):
plt.scatter(xf, yf, c=zf, s=5*ms1/1e4, cmap=cmap,alpha=0.8)#, norm =matplotlib.colors.LogNorm())
ax1 =plt.colorbar(pad=0.01)
is giving me this result:
Any idea how I could make it look like the first figure?
Essentially what I want to do is find the average of the z value for groups of the x and y arrays
I think the functionality you are looking for is provided by scipy.stats.binned_statistic_2d. You can use it to organize values of xf and yf arrays into 2-dimensional bins, and compute the mean of zf values in each bin:
import numpy as np
from scipy import stats
np.random.seed(0)
xf = np.random.rand(1000)
yf = np.random.rand(1000)
zf = 1e5 * np.random.rand(1000)
means = stats.binned_statistic_2d(xf,
yf,
values=zf,
statistic='mean',
bins=(5, 5))[0]
Then you can use e.g. seaborn to plot a heatmap of the array of mean values:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(means,
cmap="Reds_r",
annot=True,
annot_kws={"fontsize": 16},
cbar=True,
linewidth=2,
square=True)
plt.show()
This gives:
This question already has answers here:
How to plot in multiple subplots
(12 answers)
Closed 1 year ago.
I want to arrange 5 histograms in a grid. Here is my code and the result:
I was able to create the graphs but the difficulty comes by arranging them in a grid. I used the grid function to achieve that but i need to link the graphs to it in the respective places.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
Openness = df['O']
Conscientiousness = df['C']
Extraversion = df['E']
Areeableness = df['A']
Neurocitism = df['N']
grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)
# Plot 1
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['O'], bins = 100)
plt.title("Openness to experience")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 2
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['C'], bins = 100)
plt.title("Conscientiousness")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 3
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['E'], bins = 100)
plt.title("Extraversion")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 4
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['A'], bins = 100)
plt.title("Areeableness")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 5
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['N'], bins = 100)
plt.title("Neurocitism")
plt.xlabel("Value")
plt.ylabel("Frequency")
Results merge everything into one chart
But it should look like this
Could you guys please help me out?
You can use plt.subplots:
fig, axes = plt.subplots(nrows=2, ncols=2)
this creates a 2x2 grid. You can access individual positions by indexing hte axes object:
top left:
ax = axes[0,0]
ax.hist(df['C'], bins = 100)
ax.set_title("Conscientiousness")
ax.set_xlabel("Value")
ax.set_ylabel("Frequency")
and so on.
You also continue use GridSpec. Visit https://matplotlib.org/stable/tutorials/intermediate/gridspec.html
for example -
fig2 = plt.figure(constrained_layout=True)
spec2 = gridspec.GridSpec(ncols=2, nrows=3, figure=fig2)
f2_ax1 = fig2.add_subplot(spec2[0, 0])
f2_ax2 = fig2.add_subplot(spec2[0, 1])
f2_ax3 = fig2.add_subplot(spec2[1, 0])
f2_ax4 = fig2.add_subplot(spec2[1, 1])
f2_ax5 = fig2.add_subplot(spec2[2, 1])
# Plot 1
f2_ax1.hist(df['O'])
f2_ax1.set_title("Openness to experience")
f2_ax1.set_xlabel("Value")
f2_ax1.set_ylabel("Frequency")
` plt.show()
I want to make a bar chart that has binned values every 10 x values. Here is my bins array:bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100] I do not want to use a histogram because I have specific y values that I want to use, not just the frequency of the binned values. I have a pandas dataframe with two columns: yardline_100 (these are the values that are becoming "binned", they always fall between 0 and 100) and epa. I want to have my yardline_100 on the x and epa on the y. How do I do this? plt.hist() only takes one argument for data. And I can't figure out how to make plt.bar() work with binned values. Advice?
IIUC, do you want something like this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'yardline_100':np.random.randint(0,100,200), 'epa':np.random.random(200)})
df['bin'] = pd.cut(df['yardline_100'], bins=range(0,101,10), labels=[f'{l}-{l+10}' for l in range(0,91,10)])
fig,ax = plt.subplots(2,2, figsize=(15,8))
ax=ax.flatten()
sns.stripplot(x='bin', y='epa', data=df, ax=ax[0])
sns.violinplot(x='bin', y='epa', data=df, ax=ax[1])
sns.boxplot(x='bin', y='epa', data=df, ax=ax[2])
sns.barplot(x='bin', y='epa', data=df, ax=ax[3])
Output:
Change bar width
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'yardline_100':np.random.randint(0,100,200), 'epa':np.random.random(200)})
df['bin'] = pd.cut(df['yardline_100'], bins=range(0,101,10), labels=[f'{l}-{l+10}' for l in range(0,91,10)])
fig,ax = plt.subplots(figsize=(15,8))
sns.barplot(x='bin', y='epa', data=df, ax=ax)
def change_width(ax, new_value) :
for patch in ax.patches :
current_width = patch.get_width()
diff = current_width - new_value
# we change the bar width
patch.set_width(new_value)
# we recenter the bar
patch.set_x(patch.get_x() + diff * .5)
change_width(ax, 1.)
Output:
i created a dataframe with random columns and values. now i am trying to interate with an loop over "time" window" (maybe there is a more elegant solution than mine). i try to plot the calculated correlations in a heatmap and then interate furhter and show the next result in the same figure. Like this
https://datasoaring.blogspot.com/2018/07/gdp-correlation-matrix-top-10-economies.html
The current code plot a new figure for each correlation...
Thanks for ideas and help!
Creates Dataframe
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import time
import seaborn as sns
sns.set_style('white')
plt.style.use('dark_background')
index = pd.date_range('01/01/2010',periods=num_days, freq='D')
data_KW = pd.DataFrame(np.random.randint(0,250,size=(250, 10)), columns=list('ABCDEFGHIJ'), index=index)
data_KW.head()
interate and plot (wrong :))
# Calculate the lenght of the Dataframe
end = 10 #len(data_KW.index)
# is the variable for the rolling window
var_start = 0
var_end = 5
#Set up the matplotlib figure
f, ax = plt.subplots(figsize=(5, 5))
while var_end <= end:
window = data_KW.iloc[var_start : var_end]
# Compute the correlation matrix
corr = window.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=1, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
#plt.pause(3)
plt.show()
time.sleep(2)
#time.sleep(5)
var_start = var_start + 1
var_end = var_end + 1
print(var_start)