How to create a Gantt plot - python

How is it possible with matplotlib to plot a graph with that data. The problem is to visualize the distance from column 2 to column 3. At the end it should look like a Gantt chart.
0 0 0.016 19.833
1 0 19.834 52.805
2 0 52.806 84.005
5 0 84.012 107.305
8 0 107.315 128.998
10 0 129.005 138.956
11 0 138.961 145.587
13 0 145.594 163.863
15 0 163.872 192.118
16 0 192.127 193.787
17 0 193.796 197.106
20 0 236.099 246.223
25 1 31.096 56.180
27 1 58.097 64.857
28 1 64.858 66.494
29 1 66.496 89.908
31 1 89.918 111.606
34 1 129.007 137.371
35 1 137.372 145.727
39 1 176.097 209.461
42 1 209.476 226.207
44 1 226.217 259.317
46 1 259.329 282.488
47 1 282.493 298.905
I need 2 colors for column 1. And for the y-axis the column 0 is selected, for the x-axis the column 2 and 3 are important. For each row a line should be plotted. Column 2 is the start time, and column 3 is the stop time.

If I have understood you correctly, you want to plot a horizontal line between the x-values of the 3rd and 4th column, with y-value equal that in column 0. To plot a horizontal line at a given y-value between two x-values, you could use hlines. I believe the code below is a possible solution.
import numpy as np
import matplotlib.pyplot as plt
# Read data from file into variables
y, c, x1, x2 = np.loadtxt('data.txt', unpack=True)
# Map value to color
color_mapper = np.vectorize(lambda x: {0: 'red', 1: 'blue'}.get(x))
# Plot a line for every line of data in your file
plt.hlines(y, x1, x2, colors=color_mapper(c))

You can read the text file using numpy.loadtxt, for example, and then plot it using matplotlib. For example:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.loadtxt('file.txt', usecols=(2,3), unpack=True)
plt.plot(x,y)
You should see the matplotlib documentation for more options.

Related

How to create n subplots (box plots) automatically?

I need to show n (e.g. 5) box plots. How can I do it?
df =
col1 col2 col3 col4 col5 result
1 3 1 1 4 0
1 2 2 4 9 1
1 2 1 3 7 1
This is my current code. But it does not display the data inside plots. Also, plots are very thin if n is for example 10 (is it possible to go to a new line automatically?).
n=5
columns = df.columns
i = 0
fig, axes = plt.subplots(1, n, figsize=(20,5))
for ax in axes:
df.boxplot(by="result", column = [columns[i]], vert=False, grid=True)
i = i + 1
display(fig)
This example is for Azure Databricks, but I appreciate just a matplotlib solution as well if it's applicable.
I am not sure I got what you are trying to do, but the following code will show you the plots. You can control the figure sizes by changing the values of (10,10)
Code:
df.boxplot(by="result",figsize=(10,10));
Result:
To change the Vert and show the grid :
df.boxplot(by="result",figsize=(10,10),vert=False, grid=True);
I solved it myself as follows:
df.boxplot(by="result", column = columns[0:4], vert=False, grid=True, figsize=(30,10), layout = (3, 5))
If you want additional row to be generated, while fixing the number of columns to be constant: adjust the layout as follows:
In [41]: ncol = 2
In [42]: df
Out[42]:
v0 v1 v2 v3 v4 v5 v6
0 0 3 6 9 12 15 18
1 1 4 7 10 13 16 19
2 2 5 8 11 14 17 20
In [43]: df.boxplot(by='v6', layout=(df.shape[1] // ncol + 1, ncol)) # use floor division to determine how many row are required

Making a bar chart to represent the number of occurrences in a Pandas Series

I was wondering if anyone could help me with how to make a bar chart to show the frequencies of values in a Pandas Series.
I start with a Pandas DataFrame of shape (2000, 7), and from there I extract the last column. The column is shape (2000,).
The entries in the Series that I mentioned vary from 0 to 17, each with different frequencies, and I tried to plot them using a bar chart but faced some difficulties. Here is my code:
# First, I counted the number of occurrences.
count = np.zeros(max(data_val))
for i in range(count.shape[0]):
for j in range(data_val.shape[0]):
if (i == data_val[j]):
count[i] = count[i] + 1
'''
This gives us
count = array([192., 105., ... 19.])
'''
temp = np.arange(0, 18, 1) # Array for the x-axis.
plt.bar(temp, count)
I am getting an error on the last line of code, saying that the objects cannot be broadcast to a single shape.
What I ultimately want is a bar chart where each bar corresponds to an integer value from 0 to 17, and the height of each bar (i.e. the y-axis) represents the frequencies.
Thank you.
UPDATE
I decided to post the fixed code using the suggestions that people were kind enough to give below, just in case anybody facing similar issues will be able to see my revised code in the future.
data = pd.read_csv("./data/train.csv") # Original data is a (2000, 7) DataFrame
# data contains 6 feature columns and 1 target column.
# Separate the design matrix from the target labels.
X = data.iloc[:, :-1]
y = data['target']
'''
The next line of code uses pandas.Series.value_counts() on y in order to count
the number of occurrences for each label, and then proceeds to sort these according to
index (i.e. label).
You can also use pandas.DataFrame.sort_values() instead if you're interested in sorting
according to the number of frequencies rather than labels.
'''
y.value_counts().sort_index().plot.bar(x='Target Value', y='Number of Occurrences')
There was no need to use for loops if we use the methods that are built into the Pandas library.
The specific methods that were mentioned in the answers are pandas.Series.values_count(), pandas.DataFrame.sort_index(), and pandas.DataFrame.plot.bar().
I believe you need value_counts with Series.plot.bar:
df = pd.DataFrame({
'a':[4,5,4,5,5,4],
'b':[7,8,9,4,2,3],
'c':[1,3,5,7,1,0],
'd':[1,1,6,1,6,5],
})
print (df)
a b c d
0 4 7 1 1
1 5 8 3 1
2 4 9 5 6
3 5 4 7 1
4 5 2 1 6
5 4 3 0 5
df['d'].value_counts(sort=False).plot.bar()
If possible some value missing and need set it to 0 add reindex:
df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0).plot.bar()
Detail:
print (df['d'].value_counts(sort=False))
1 3
5 1
6 2
Name: d, dtype: int64
print (df['d'].value_counts(sort=False).reindex(np.arange(18), fill_value=0))
0 0
1 3
2 0
3 0
4 0
5 1
6 2
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
Name: d, dtype: int64
Here's an approach using Seaborn
import numpy as np
import pandas as pd
import seaborn as sns
s = pd.Series(np.random.choice(17, 10))
s
# 0 10
# 1 13
# 2 12
# 3 0
# 4 0
# 5 5
# 6 13
# 7 9
# 8 11
# 9 0
# dtype: int64
val, cnt = np.unique(s, return_counts=True)
val, cnt
# (array([ 0, 5, 9, 10, 11, 12, 13]), array([3, 1, 1, 1, 1, 1, 2]))
sns.barplot(val, cnt)

plot line between points pandas

I would like to plot lines between two points and my points are defined in different columns.
#coordinates of the points
#point1(A[0],B[0])
#point2(C[0],D[0])
#line between point1 and point 2
#next line would be
#point3(A[1],B[1])
#point4(C[1],D[1])
#line between point3 and point 4
plot_result:
A B C D E F
0 0 4 7 1 5 1
1 2 5 8 3 3 1
2 3 4 9 5 6 1
3 4 5 4 7 9 4
4 6 5 2 1 2 7
5 1 4 3 0 4 7
i tried with this code:
import numpy as np
import matplotlib.pyplot as plt
for i in range(0, len(plot_result.A), 1):
plt.plot(plot_result.A[i]:plot_result.B[i], plot_result.C[i]:plot_result.D[i], 'ro-')
plt.show()
but it is a invalid syntax. I have no idea how to implement this
The first two parameters of the method plot are x and y which can be single points or array-like objects. If you want to plot a line from the point (x1,y1) to the point (x2,y2) you have to do something like this:
for plot_result in plot_result.values: # if plot_results is a DataFrame
x1 = row[0] # A[i]
y1 = row[1] # B[i]
x2 = row[2] # C[i]
y2 = row[3] # D[i]
plt.plot([x1,x2],[y1,y2]) # plot one line for every row in the DataFrame.

plotting multiple graph from a csv file and output to a single pdf/svg

I have some csv data in the following format.
Ln Dr Tag Lab 0:01 0:02 0:03 0:04 0:05 0:06 0:07 0:08 0:09
L0 St vT 4R 0 0 0 0 0 0 0 0 0
L2 Tx st 4R 8 8 8 8 8 8 8 8 8
L2 Tx ss 4R 1 1 9 6 1 0 0 6 7
I want to plot a timeseries graph using the columns (Ln , Dr, Tg,Lab) as the keys and the 0:0n field as values on a timeseries graph.
I have the following code.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
plt.ylabel('time')
plt.xlabel('events')
plt.grid(True)
plt.xlim((0,150))
plt.ylim((0,200))
a=pd.read_csv('yourfile.txt',delim_whitespace=True)
for x in a.iterrows():
x[1][4:].plot(label=str(x[1][0])+str(x[1][1])+str(x[1][2])+str(x[1][3]))
plt.legend()
fig.savefig('test.pdf')
I have only shown a subset of my data here. I have around 200 entries (200 rows) in my full data set. the above code plots all graphs in a single figure. I would prefer each row to be plotted in a separate graph.
Use subplot()
import matplotlib.pyplot as plt
fig = plt.figure()
plt.subplot(221) # 2 rows, 2 columns, plot 1
plt.plot([1,2,3])
plt.subplot(222) # 2 rows, 2 columns, plot 2
plt.plot([3,1,3])
plt.subplot(223) # 2 rows, 2 columns, plot 3
plt.plot([3,2,1])
plt.subplot(224) # 2 rows, 2 columns, plot 4
plt.plot([1,3,1])
plt.show()
fig.savefig('test.pdf')
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplot.html#matplotlib.pyplot.subplot

How to draw bar in python

I want to draw bar chart for below data:
4 1406575305 4
4 -220936570 2
4 2127249516 2
5 -1047108451 4
5 767099153 2
5 1980251728 2
5 -2015783241 2
6 -402215764 2
7 927697904 2
7 -631487113 2
7 329714360 2
7 1905727440 2
8 1417432814 2
8 1906874956 2
8 -1959144411 2
9 859830686 2
9 -1575740934 2
9 -1492701645 2
9 -539934491 2
9 -756482330 2
10 1273377106 2
10 -540812264 2
10 318171673 2
The 1st column is the x-axis and the 3rd column is for y-axis. Multiple data exist for same x-axis value. For example,
4 1406575305 4
4 -220936570 2
4 2127249516 2
This means three bars for 4 value of x-axis and each of bar is labelled with tag(the value in middle column). The sample bar chart is like:
http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I am using matplotlib.pyplot and np. Thanks..
I followed the tutorial you linked to, but it's a bit tricky to shift them by a nonuniform amount:
import numpy as np
import matplotlib.pyplot as plt
x, label, y = np.genfromtxt('tmp.txt', dtype=int, unpack=True)
ux, uidx, uinv = np.unique(x, return_index=True, return_inverse=True)
max_width = np.bincount(x).max()
bar_width = 1/(max_width + 0.5)
locs = x.astype(float)
shifted = []
for i in range(max_width):
where = np.setdiff1d(uidx + i, shifted)
locs[where[where<len(locs)]] += i*bar_width
shifted = np.concatenate([shifted, where])
plt.bar(locs, y, bar_width)
If you want you can label them with the second column instead of x:
plt.xticks(locs + bar_width/2, label, rotation=-90)
I'll leave doing both of them as an exercise to the reader (mainly because I have no idea how you want them to show up).

Categories

Resources