How to draw bar in python - python

I want to draw bar chart for below data:
4 1406575305 4
4 -220936570 2
4 2127249516 2
5 -1047108451 4
5 767099153 2
5 1980251728 2
5 -2015783241 2
6 -402215764 2
7 927697904 2
7 -631487113 2
7 329714360 2
7 1905727440 2
8 1417432814 2
8 1906874956 2
8 -1959144411 2
9 859830686 2
9 -1575740934 2
9 -1492701645 2
9 -539934491 2
9 -756482330 2
10 1273377106 2
10 -540812264 2
10 318171673 2
The 1st column is the x-axis and the 3rd column is for y-axis. Multiple data exist for same x-axis value. For example,
4 1406575305 4
4 -220936570 2
4 2127249516 2
This means three bars for 4 value of x-axis and each of bar is labelled with tag(the value in middle column). The sample bar chart is like:
http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I am using matplotlib.pyplot and np. Thanks..

I followed the tutorial you linked to, but it's a bit tricky to shift them by a nonuniform amount:
import numpy as np
import matplotlib.pyplot as plt
x, label, y = np.genfromtxt('tmp.txt', dtype=int, unpack=True)
ux, uidx, uinv = np.unique(x, return_index=True, return_inverse=True)
max_width = np.bincount(x).max()
bar_width = 1/(max_width + 0.5)
locs = x.astype(float)
shifted = []
for i in range(max_width):
where = np.setdiff1d(uidx + i, shifted)
locs[where[where<len(locs)]] += i*bar_width
shifted = np.concatenate([shifted, where])
plt.bar(locs, y, bar_width)
If you want you can label them with the second column instead of x:
plt.xticks(locs + bar_width/2, label, rotation=-90)
I'll leave doing both of them as an exercise to the reader (mainly because I have no idea how you want them to show up).

Related

Different aggregate function based on value of column pandas

I have the following dataframe
import pandas as pd
test = pd.DataFrame({'y':[1,2,3,4,5,6], 'label': ['bottom', 'top','bottom', 'top','bottom', 'top']})
y label
0 1 bottom
1 2 top
2 3 bottom
3 4 top
4 5 bottom
5 6 top
I would like to add a new column, agg_y, which would be the the max(y) if label=="bottom" and min(y) if label=="top". I have tried this
test['min_y'] = test.groupby('label').y.transform('min')
test['max_y'] = test.groupby('label').y.transform('max')
test['agg_y'] = np.where(test.label == "bottom", test.max_y, test.min_y)
test.drop(columns=['min_y', 'max_y'], inplace=True)
which gives the correct result
y label agg_y
0 1 bottom 5
1 2 top 2
2 3 bottom 5
3 4 top 2
4 5 bottom 5
5 6 top 2
I am just looking fora one-liner solution, if possible
Your solution in one line solution is:
test['agg_y'] = np.where(test.label == "bottom",
test.groupby('label').y.transform('max'),
test.groupby('label').y.transform('min'))
Solution without groupby, thank you #ouroboros1:
test['agg_y'] = np.where(test.label == 'bottom',
test.loc[test.label.eq('bottom'), 'y'].max(),
test.loc[test.label.ne('bottom'), 'y'].min())
Another idea is mapping values, idea is similar like ouroboros1 solution:
d = {'bottom':'max', 'top':'min'}
test['agg_y'] = test['label'].map({val:test.loc[test.label.eq(val),'y'].agg(func)
for val, func in d.items()})
print (test)
y label agg_y
0 1 bottom 5
1 2 top 2
2 3 bottom 5
3 4 top 2
4 5 bottom 5
5 6 top 2

ValueError: Points must be Nx2 array, got 2x5

I'm trying to make an animation and am looking at the code of another stack overflow question. The code is the following
import matplotlib.pyplot as plt
from matplotlib import animation as animation
import numpy as np
import pandas as pd
import io
u = u"""Time M1 M2 M3 M4 M5
1 1 2 3 1 2
2 1 3 3 1 2
3 1 3 2 1 3
4 2 2 3 1 2
5 3 3 3 1 3
6 2 3 4 1 4
7 2 3 4 3 3
8 3 4 4 3 4
9 4 4 5 3 3
10 4 4 5 5 4"""
df_Bubble = pd.read_csv(io.StringIO(u), delim_whitespace=True)
time_count = len(df_Bubble)
colors = np.arange(1, 6)
x = np.arange(1, 6)
max_radius = 25
fig, ax = plt.subplots()
pic = ax.scatter(x, df_Bubble.iloc[0, 1:], s=100, c=colors)
pic.set_offsets([[np.nan]*len(colors)]*2)
ax.axis([0,7,0,7])
def init():
pic.set_offsets([[np.nan]*len(colors)]*2)
return pic,
def updateData(i):
y = df_Bubble.iloc[i, 1:]
area = np.pi * (max_radius * y / 10.0) ** 2
pic.set_offsets([x, y.values])
pic._sizes = area
i+=1
return pic,
ani = animation.FuncAnimation(fig, updateData,
frames=10, interval = 50, blit=True, init_func=init)
plt.show()
When I run this code unchanged I get the error
ValueError: Points must be Nx2 array, got 2x5
I have looked at similar threads on this question and have come to the conclusion that the problem has to do with the line with [[np.nan]*len(colors)]*2. Based on the examples I found, I thought that changing a part of this line to an array might help, but none of my attempts have worked, and now I'm stuck. I would be grateful for any help.
set_offsets expects a Nx2 ndarray and you provide 2 arrays with 5 elements each in updateData(i) and 2 lists with 5 elements each in init()
def init():
pic.set_offsets(np.empty((len(colors),2)))
return pic,
def updateData(i):
y = df_Bubble.iloc[i, 1:]
area = np.pi * (max_radius * y / 10.0) ** 2
#pic.set_offsets(np.hstack([x[:i,np.newaxis], y.values[:i, np.newaxis]]))
pic.set_offsets(np.transpose((x, y.values)))
pic._sizes = area
i+=1
return pic,

How to create n subplots (box plots) automatically?

I need to show n (e.g. 5) box plots. How can I do it?
df =
col1 col2 col3 col4 col5 result
1 3 1 1 4 0
1 2 2 4 9 1
1 2 1 3 7 1
This is my current code. But it does not display the data inside plots. Also, plots are very thin if n is for example 10 (is it possible to go to a new line automatically?).
n=5
columns = df.columns
i = 0
fig, axes = plt.subplots(1, n, figsize=(20,5))
for ax in axes:
df.boxplot(by="result", column = [columns[i]], vert=False, grid=True)
i = i + 1
display(fig)
This example is for Azure Databricks, but I appreciate just a matplotlib solution as well if it's applicable.
I am not sure I got what you are trying to do, but the following code will show you the plots. You can control the figure sizes by changing the values of (10,10)
Code:
df.boxplot(by="result",figsize=(10,10));
Result:
To change the Vert and show the grid :
df.boxplot(by="result",figsize=(10,10),vert=False, grid=True);
I solved it myself as follows:
df.boxplot(by="result", column = columns[0:4], vert=False, grid=True, figsize=(30,10), layout = (3, 5))
If you want additional row to be generated, while fixing the number of columns to be constant: adjust the layout as follows:
In [41]: ncol = 2
In [42]: df
Out[42]:
v0 v1 v2 v3 v4 v5 v6
0 0 3 6 9 12 15 18
1 1 4 7 10 13 16 19
2 2 5 8 11 14 17 20
In [43]: df.boxplot(by='v6', layout=(df.shape[1] // ncol + 1, ncol)) # use floor division to determine how many row are required

plot line between points pandas

I would like to plot lines between two points and my points are defined in different columns.
#coordinates of the points
#point1(A[0],B[0])
#point2(C[0],D[0])
#line between point1 and point 2
#next line would be
#point3(A[1],B[1])
#point4(C[1],D[1])
#line between point3 and point 4
plot_result:
A B C D E F
0 0 4 7 1 5 1
1 2 5 8 3 3 1
2 3 4 9 5 6 1
3 4 5 4 7 9 4
4 6 5 2 1 2 7
5 1 4 3 0 4 7
i tried with this code:
import numpy as np
import matplotlib.pyplot as plt
for i in range(0, len(plot_result.A), 1):
plt.plot(plot_result.A[i]:plot_result.B[i], plot_result.C[i]:plot_result.D[i], 'ro-')
plt.show()
but it is a invalid syntax. I have no idea how to implement this
The first two parameters of the method plot are x and y which can be single points or array-like objects. If you want to plot a line from the point (x1,y1) to the point (x2,y2) you have to do something like this:
for plot_result in plot_result.values: # if plot_results is a DataFrame
x1 = row[0] # A[i]
y1 = row[1] # B[i]
x2 = row[2] # C[i]
y2 = row[3] # D[i]
plt.plot([x1,x2],[y1,y2]) # plot one line for every row in the DataFrame.

tilted axis 2D plot where x y axis make 60 degree rather than 90

I want to plot a distribution in hexagonal lattice like following.
I want to present this data as 2D colormap or bar chart. Does any one know how to do this? I am familiar with octave, python, gnuplot, excel, matlab.
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
2 2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 1
Here is a solution using patch in MATLAB.
data = cellfun(#(x) textscan(x, '%f')', importdata('data.txt', sprintf('\n')));
rowLen = cellfun(#numel, data);
nPoints = sum(rowLen);
centerCells = arrayfun(#(l,r) [(-l+1:2:l-1)'*sin(pi/3) -r*1.5*ones(l,1)], ...
rowLen', 1:numel(rowLen), 'UniformOutput', false);
centers = vertcat(centerCells{:});
hx = linspace(0,2*pi,7)';
vertices = reshape(...
bsxfun(#plus, permute(sin([hx pi/2+hx]), [1 3 2]), ...
permute(centers, [3 1 2])), 7 * nPoints, 2);
faces = reshape(1:7*nPoints, 7, nPoints)';
colorData = vertcat(data{:});
patch('Vertices', vertices, 'Faces', faces, ...
'FaceColor', 'flat', 'FaceVertexCData', colorData);
axis equal
and this produces
Read the documentation if you need to change the color scheme.

Categories

Resources