I guess I just didn't use the right keywords, because this probably has been asked before, but I didn't find a solution. Anyway, I have a problem where the the bars of a histogram do not line up with the xticks. I want the bars to be centred over the xticks they correspond to, but they get placed between ticks to fill the space in-between evenly.
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, rwidth = .3)
plt.xticks(bins)
plt.show()
Note that what you are plotting here is not a histogram. A histogram would be
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, edgecolor="k", alpha=1)
plt.xticks(bins)
plt.show()
Here, the bars range between the bins as expected. E.g. you have 3 values in the interval 1 <= x < 1.5.
Conceptually what you want to do here is get a bar plot of the counts of data values. This would not require any bins at all and could be done as follows:
import numpy as np
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
u, inv = np.unique(data, return_inverse=True)
counts = np.bincount(inv)
plt.bar(u, counts, width=0.3)
plt.xticks(np.arange(1,10,0.5))
plt.show()
Of course you can "misuse" a histogram plot to get a similar result. This would require to move the center of the bar to the left bin edge, plt.hist(.., align="left").
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, align="left", rwidth = .6)
plt.xticks(bins)
plt.show()
This results in the same plot as above.
Related
I am trying to figure out how I can do a correlation matrix heatmap with SNS with heatmap values from a target column. I am trying to identify if combination of 2 features have an effect on the target_value.
I know I can do following but this is correlation between features but not correlation of 2 features on target_value
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(dataframe.corr());
I have following CSV
feature_1, feature_2, feature_3, feature_4, target_value
4, 8, 9, 8, 0.1
9, 7, 2, 0, 0.2
4, 4, 1, 4, 0.6
9, 7, 8, 4, 0.7
0, 9, 0, 7, 0.9
I could encode them as followed based on a threshold to define presence (1) or not present (0) of that feature.
feature_1, feature_2, feature_3, feature_4, target_value
0, 1, 1, 1, 0.1
1, 1, 0, 0, 0.2
0, 0, 0, 0, 0.6
1, 1, 1, 0, 0.7
0, 1, 0, 1, 0.9
I would like to know correlation of feature_1:4 on the target value. I would also like to know if/how I could filter the correlation features shown on axis? For this I guess I can filter the dataframe based on target_value However, I am not sure how I can show/hide features on the axis
e.g.
feature_1 and feature_2 on X axis /
feature_3 and feature_4 on Y axis
for target value >= 0.5
e.g.
feature_1 and feature_2 on X axis /
feature_3 and feature_4 on Y axis
for target value < 0.5
Pandas' corrwith() helps to find the correlation between one column and the others. As the result is a series and seaborn expects a dataframe, the series needs to be converted to one.
To find the correlation between feature_1/feature_2 and feature_3/feature_4 for a subset of the target values:
take the desired subset of the dataframe
calculate the correlation
take some rows/columns from the correlation matrix
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'feature_1': [4, 9, 4, 9, 0],
'feature_2': [8, 7, 4, 7, 9],
'feature_3': [9, 2, 1, 8, 0],
'feature_4': [8, 0, 4, 4, 7],
'target_value': [0.1, 0.2, 0.6, 0.7, 0.9]})
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(16, 5))
corr_with_target = df.drop(columns='target_value').corrwith(df['target_value'])
sns.heatmap(pd.DataFrame(corr_with_target), ax=ax1)
ax1.tick_params(rotation=0)
ax1.set_xticks([])
ax1.set_title('Correlation with target value')
corr_for_large_target = df[df['target_value'] >= 0.5].corr().loc[['feature_1', 'feature_2'], ['feature_3', 'feature_4']]
sns.heatmap(pd.DataFrame(corr_for_large_target), ax=ax2)
ax2.set_title('Correlation for large target')
corr_for_small_target = df[df['target_value'] <= 0.5].corr().loc[['feature_1', 'feature_2'], ['feature_3', 'feature_4']]
sns.heatmap(pd.DataFrame(corr_for_small_target), ax=ax3)
ax3.set_title('Correlation for small target')
plt.tight_layout()
plt.show()
I have been trying to create a matplotlib subplot (1 x 3) with horizontal bar plots on either side of a lineplot.
It looks like this:
The code for generating the above plot -
u_list = [2, 0, 0, 0, 1, 5, 0, 4, 0, 0]
n_list = [0, 0, 1, 0, 4, 3, 1, 1, 0, 6]
arr_ = list(np.arange(10, 11, 0.1))
data_ = pd.DataFrame({
'points': list(np.arange(0, 10, 1)),
'value': [10.4, 10.5, 10.3, 10.7, 10.9, 10.5, 10.6, 10.3, 10.2, 10.4][::-1]
})
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 8))
ax1 = plt.subplot(1, 3, 1)
sns.barplot(u_list, arr_, orient="h", ax=ax1)
ax2 = plt.subplot(1, 3, 2)
x = data_['points'].tolist()
y = data_['value'].tolist()
ax2.plot(x, y)
ax2.set_yticks(arr_)
plt.gca().invert_yaxis()
ax3 = plt.subplot(1, 3, 3, sharey=ax1, sharex=ax1)
sns.barplot(n_list, arr_, orient="h", ax=ax3)
fig.tight_layout()
plt.show()
Edit
How do I share the y-axis of the central line plot with the other horizontal bar plots?
I would set the limits of all y-axes to the same range, set the ticks in all axes and than set the ticks/tick-labels of all but the most left axis to be empty. Here is what I mean:
from matplotlib import pyplot as plt
import numpy as np
u_list = [2, 0, 0, 0, 1, 5, 0, 4, 0, 0]
n_list = [0, 0, 1, 0, 4, 3, 1, 1, 0, 6]
arr_ = list(np.arange(10, 11, 0.1))
x = list(np.arange(0, 10, 1))
y = [10.4, 10.5, 10.3, 10.7, 10.9, 10.5, 10.6, 10.3, 10.2, 10.4]
fig, axs = plt.subplots(1, 3, figsize=(20, 8))
axs[0].barh(arr_,u_list,height=0.1)
axs[0].invert_yaxis()
axs[1].plot(x, y)
axs[1].invert_yaxis()
axs[2].barh(arr_,n_list,height=0.1)
axs[2].invert_yaxis()
for i in range(1,len(axs)):
axs[i].set_ylim( axs[0].get_ylim() ) # align axes
axs[i].set_yticks([]) # set ticks to be empty (no ticks, no tick-labels)
fig.tight_layout()
plt.show()
This is a minimal example and for the sake of conciseness, I refrained from mixing matplotlib and searborn. Since seaborn uses matplotlib under the hood, you can reproduce the same output there (but with nicer bars).
I have 60 numbers divided into 8 intervals:
[[534, 540.0, 3], [540.0, 546.0, 3], [546.0, 552.0, 14], [552.0, 558.0, 8], [558.0, 564.0, 14], [564.0, 570.0, 9], [570.0, 576.0, 6], [576.0, 582.0, 3]]
The number of numbers in each interval is divided by 6:
[0.5, 0.5, 2.33, 1.33, 2.33, 1.5, 1.0, 0.5]
How do I create a histogram so that the height of the bars corresponds to the obtained values, while signing the intervals in accordance with my intervals? The result should be something like this
i do not have reputation to post images, so
Running F Blanchet's code generates the following graph in my IPython console:
That doesn't really look like your image. I think you're looking for something more like this, where the x-ticks are between the bars:
This is the code I used to generate the above plot:
import matplotlib.pyplot as plt
# Include one more value for final x-tick.
intervals = list(range(534, 583, 6))
# Include one more bar height that == 0.
bar_height = [0.5, 0.5, 2.33, 1.33, 2.33, 1.5, 1.0, 0.5, 0]
plt.bar(intervals,
bar_height,
width = [6] * 8 + [0], # Set width of 0 bar to 0.
align = "edge", # Align ticks at edge of bars.
tick_label = intervals) # Make tick labels explicit.
You can use matplotlib :
import matplotlib.pyplot as plt
data = [[534, 540.0, 3], [540.0, 546.0, 3], [546.0, 552.0, 14], [552.0, 558.0, 8], [558.0, 564.0, 14], [564.0, 570.0, 9], [570.0, 576.0, 6], [576.0, 582.0, 3]]
x = [element[0]+3 for element in data]
y = [element[2]/6 for element in data]
width = 6
plt.bar(x, y, width, color="blue")
plt.show()
More documentation here
Few data points have been obtained from an expt,but they are not in order ,so the lines between plots are not correct,
I need to plot them in say, increasing order in Xaxis
C=[0.5,4,2,1,3,8,6,10]
D=[20,2,2,10,0.3,2.5,0.8,1]
%matplotlib inline
import matplotlib.pyplot as plt
#plot obtained from given data points
plt.plot(C,D)
## required plot
A=[0.5, 1, 2, 3, 4, 6, 8, 10]
B=[20, 10, 2, 0.5, 2, 0.8, 2.5, 1]
plt.plot(A,B)
Solution using pandas. I recommend using DataFrames in future for plotting tasks.
from matplotlib import pyplot as plt
import pandas as pd
C= [0.5, 4, 2, 1, 3, 8, 6, 10]
D= [20, 2, 2, 10, 0.3, 2.5, 0.8, 1]
xy = pd.DataFrame({'x': C, 'y': D})
xy.sort_values('x', inplace=True)
plt.plot(xy['x'], xy['y'])
plt.show()
Your C is not sorted and hence by default the points which are joined by a continuous line seems like a mess in your output of plot(C,D). I personally would make use of the np.argsort function to get the sorted indices of C and use them to plot C and D as follows (showing only relevant lines added):
import numpy as np
C = np.array([0.5,4,2,1,3,8,6,10])
D = np.array([20,2,2,10,0.3,2.5,0.8,1])
plt.plot(sorted(C), D[np.argsort(C)], 'b')
Output
I have to plot multiple lines and their curve fit lines on a single plot. All these lines are plotted using a for loop. Since it is plot using loops the curve fit lines of the succeeding step is plotted over its predecessor as shown in figure.
The reproducible code:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
y = np.array([[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[6, 5.2, 8.5, 9.1, 13.4, 15.1, 16.1, 18.3, 20.4, 22.1, 23.7]])
m, n = x.shape
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
for i in range(m):
poly = np.polyfit(x[i, :], y[i, :], deg =1)
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-')
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20)
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
plt.show()
I can fix this using another loop as:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
y = np.array([[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[6, 5.2, 8.5, 9.1, 13.4, 15.1, 16.1, 18.3, 20.4, 22.1, 23.7]])
m, n = x.shape
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
for i in range(m):
poly = np.polyfit(x[i, :], y[i, :], deg =1)
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-')
for i in range(m):
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20)
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
plt.show()
which gives me all the fit lines below the markers.
But is there any built-in function in Python/matplotlib to do this without using two loops?
Update
Only as an example I have used n = 2, n can be greater than 2, i.e. the loop would be run multiple times.
Update 2 after answer
Can I do this for the same line also? As an example:
plt.plot(x[i, :], y[i, :], linestyle = ':', marker = 'o', markersize = 20)
Can I give the linestyle a zorder = 1 and the markers a zorder = 3?
Editing just your plotting lines:
plt.plot(poly[0] * x[i, :] + poly[1], linestyle = '-',
zorder=-1)
plt.plot(x[i, :], y[i, :], linestyle = '', marker = 'o', markersize = 20,
zorder=3)
now the markers are all in front of the lines, though within marker/line groups they're still order-of-plotting.
Update answer
No. One call to plot, one zorder argument.
If you want to match the color and style of markers and line in each pass through the loop, set up an iterator or generator for colors and get current_color on each pass, then use that as an argument for plot calls.