Plot specific values on y axis instead of increasing scale from dataframe - python

When plotting 2 columns from a dataframe into a line plot, is it possible to, instead of a consistently increasing scale, have fixed values on your y axis (and keep the distances between the numbers on the axis constant)? For example, instead of 0, 100, 200, 300, ... to have 0, 21, 53, 124, 287, depending on the values from your dataset? So basically to have on the axis all your possible values fixed instead of an increasing scale?

Yes, you can use: ax.set_yticks()
Example:
df = pd.DataFrame([[13, 1], [14, 1.5], [15, 1.8], [16, 2], [17, 2], [18, 3 ], [19, 3.6]], columns = ['A','B'])
fig, ax = plt.subplots()
x = df['A']
y = df['B']
ax.plot(x, y, 'g-')
ax.set_yticks(y)
plt.show()
Or if the values are very distant each other, you can use ax.set_yscale('log').
Example:
df = pd.DataFrame([[13, 1], [14, 1.5], [15, 1.8], [16, 2], [17, 2], [18, 3 ], [19, 3.6], [20, 300]], columns = ['A','B'])
fig, ax = plt.subplots()
x = df['A']
y = df['B']
ax.plot(x, y, 'g-')
ax.set_yscale('log', basex=2)
ax.yaxis.set_ticks(y)
ax.yaxis.set_ticklabels(y)
plt.show()

What you need to do is:
get all distinct y values and sort them
set their y position on the plot according to their place on the ordered list
set the y labels according to distinct ordered values
The code below would do
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame([[13, 1], [14, 1.8], [16, 2], [15, 1.5], [17, 2], [18, 3 ],
[19, 200],[20, 3.6], ], columns = ['A','B'])
x = df['A']
y = df['B']
y_keys = np.sort(y.unique())
y_values = range(len(y_keys))
y_dict = dict(zip(y_keys,y_values))
fig, ax = plt.subplots()
ax.plot(x,[y_dict[k] for k in y],'o-')
ax.set_yticks(y_values)
ax.set_yticklabels(y_keys)

Related

Bar graph df.plot() vs ax.bar() structure matplotlib

I am trying to graph a table as a bar graph.
I get my desired outcome using df.plot(kind='bar') structure. But for certain reasons, I now need to graph it using the ax.bar() structure.
Please refer to the example screenshot. I would like to graph the x axis as categorical labels like the df.plot(kind='bar') structure rather than continuous scale, but need to learn to use ax.bar() structure to do the same.
Make the index categorical by setting the type to 'str'
import pandas as pd
import matplotlib.pyplot as plt
data = {'SA': [11, 12, 13, 16, 17, 159, 209, 216],
'ET': [36, 45, 11, 15, 16, 4, 11, 10],
'UT': [11, 26, 10, 11, 16, 7, 2, 2],
'CT': [5, 0.3, 9, 5, 0.2, 0.2, 3, 4]}
df = pd.DataFrame(data)
df['SA'] = df['SA'].astype('str')
df.set_index('SA', inplace=True)
width = 3
fig, ax = plt.subplots(figsize=(12, 8))
p1 = ax.bar(df.index, df.ET, color='b', label='ET')
p2 = ax.bar(df.index, df.UT, bottom=df.ET, color='g', label='UT')
p3 = ax.bar(df.index, df.CT, bottom=df.ET+df.UT, color='r', label='CT')
plt.legend()
plt.show()

How to create conditional coloring for matplotlib table values?

How do I add conditional coloring to this table?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':[16, 15, 14, 16],
'B': [3, -2, 5, 0],
'C': [200000, 3, 6, 800000],
'D': [51, -6, 3, 2]})
fig, ax = plt.subplots(figsize=(10,5))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText = df.values, colLabels = df.columns, loc='center')
plt.show()
How do I add conditional coloring to the table where column A and column D values are greater than or equal to 15, the cells are red; else they're green. If column B and column C values are greater than or equal to 5, the cells are red; else they're green. This is what it should look like:
Generate a list of lists and feed it to cellColours. Make sure that the list of lists contains as many lists as you have rows in the data frame and each of the lists within the list of lists contains as many strings as you have columns in the data frame.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':[16, 15, 14, 16],
'B': [3, -2, 5, 0],
'C': [200000, 3, 6, 800000],
'D': [51, -6, 3, 2]})
colors = []
for _, row in df.iterrows():
colors_in_column = ["g", "g", "g", "g"]
if row["A"]>=15:
colors_in_column[0] = "r"
if row["B"]>=5:
colors_in_column[1] = "r"
if row["C"]>5:
colors_in_column[2] = "r"
if row["D"]>=15:
colors_in_column[3] = "r"
colors.append(colors_in_column)
fig, ax = plt.subplots(figsize=(10,5))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText = df.values, colLabels = df.columns, loc='center', cellColours=colors)
plt.show()

how to plot a histogram by given points in python 3

I have 60 numbers divided into 8 intervals:
[[534, 540.0, 3], [540.0, 546.0, 3], [546.0, 552.0, 14], [552.0, 558.0, 8], [558.0, 564.0, 14], [564.0, 570.0, 9], [570.0, 576.0, 6], [576.0, 582.0, 3]]
The number of numbers in each interval is divided by 6:
[0.5, 0.5, 2.33, 1.33, 2.33, 1.5, 1.0, 0.5]
How do I create a histogram so that the height of the bars corresponds to the obtained values, while signing the intervals in accordance with my intervals? The result should be something like this
i do not have reputation to post images, so
Running F Blanchet's code generates the following graph in my IPython console:
That doesn't really look like your image. I think you're looking for something more like this, where the x-ticks are between the bars:
This is the code I used to generate the above plot:
import matplotlib.pyplot as plt
# Include one more value for final x-tick.
intervals = list(range(534, 583, 6))
# Include one more bar height that == 0.
bar_height = [0.5, 0.5, 2.33, 1.33, 2.33, 1.5, 1.0, 0.5, 0]
plt.bar(intervals,
bar_height,
width = [6] * 8 + [0], # Set width of 0 bar to 0.
align = "edge", # Align ticks at edge of bars.
tick_label = intervals) # Make tick labels explicit.
You can use matplotlib :
import matplotlib.pyplot as plt
data = [[534, 540.0, 3], [540.0, 546.0, 3], [546.0, 552.0, 14], [552.0, 558.0, 8], [558.0, 564.0, 14], [564.0, 570.0, 9], [570.0, 576.0, 6], [576.0, 582.0, 3]]
x = [element[0]+3 for element in data]
y = [element[2]/6 for element in data]
width = 6
plt.bar(x, y, width, color="blue")
plt.show()
More documentation here

How can I draw 3D plane using PCA In python?

X = np.array([[24,13,38],[8,3,17],[21,6,40],[1,14,-9],[9,3,21],[7,1,14],[8,7,11],[10,16,3],[1,3,2],
[15,2,30],[4,6,1],[12,10,18],[1,9,-4],[7,3,19],[5,1,13],[1,12,-6],[21,9,34],[8,8,7],
[1,18,-18],[15,8,25],[16,10,29],[7,0,17],[14,2,31],[3,7,0],[5,6,7]])
pca = PCA(n_components=1)
pca.fit(X)
a = pca.components_[0][0] # a
b = pca.components_[0][1] # b
c = pca.components_[0][2] # c
def average(values):
if(values) ==0:
return None
return sum(values, 0.0) / len(values)
x_mean = average(x) # For an approximation
y_mean = average(y)
z_mean = average(z)
d = -(a * x_mean + b * y_mean + c * z_mean)
so -0.375978766054x + 0.10612154283y -0.920531469111z + 15.1366572005 = 0
Actually, I'm not sure it is right.
I want to draw a plane in this situation using matplotlib library.
How can I code this?
Each principal component defines a vector in the feature space. PCA orders those vectors based on the variance of the data in each direction. So the first vector will represent the maximum variance of the data and the last vector minimum variance. Assuming the data are distributed around a plane the third vector should be perpendicular to the plane. Here's the code:
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X = np.array([[24,13,38],[8,3,17],[21,6,40],[1,14,-9],[9,3,21],[7,1,14],[8,7,11],[10,16,3],[1,3,2],
[15,2,30],[4,6,1],[12,10,18],[1,9,-4],[7,3,19],[5,1,13],[1,12,-6],[21,9,34],[8,8,7],
[1,18,-18],[15,8,25],[16,10,29],[7,0,17],[14,2,31],[3,7,0],[5,6,7]])
pca = PCA(n_components=3)
pca.fit(X)
eig_vec = pca.components_
print(pca.explained_variance_ratio_)
# [0.90946569 0.08816839 0.00236591]
# Percentage of variance explain by last vector is less 0.2%
# This is the normal vector of minimum variance
normal = eig_vec[2, :] # (a, b, c)
centroid = np.mean(X, axis=0)
# Every point (x, y, z) on the plane should satisfy a*x+b*y+c*z = d
# Taking centroid as a point on the plane
d = -centroid.dot(normal)
# Draw plane
xx, yy = np.meshgrid(np.arange(np.min(X[:, 0]), np.max(X[:, 0])), np.arange(np.min(X[:, 1]), np.max(X[:, 1])))
z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]
# plot the surface
plt3d = plt.figure().gca(projection='3d')
plt3d.plot_surface(xx, yy, z)
plt3d.scatter(*(X.T))
plt.show()
The first principal component doesn't define a plane, it defines a vector in three dimensions. Here's how to visualize it in 3D: the code starts out with yours, and then has the plotting steps:
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[24, 13, 38], [8, 3, 17], [21, 6, 40], [1, 14, -9], [9, 3, 21], [7, 1, 14],
[8, 7, 11], [10, 16, 3], [1, 3, 2], [15, 2, 30], [4, 6, 1], [12, 10, 18], [1, 9, -4],
[7, 3, 19], [5, 1, 13], [1, 12, -6], [21, 9, 34], [8, 8, 7], [1, 18, -18],
[15, 8, 25], [16, 10, 29], [7, 0, 17], [14, 2, 31], [3, 7, 0], [5, 6, 7]])
pca = PCA(n_components=1)
pca.fit(X)
## New code below
p = pca.components_
centroid = np.mean(X, 0)
segments = np.arange(-40, 40)[:, np.newaxis] * p
import matplotlib
matplotlib.use('TkAgg') # might not be necessary for you
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatterplot = ax.scatter(*(X.T))
lineplot = ax.plot(*(centroid + segments).T, color="red")
plt.xlabel('x')
plt.ylabel('y')
plt.savefig('result.png', dpi=150)
(Note the above code was auto-formatted with yapf, which I highly recommend.) Resulting figure:

Python Numpy arange bins should be shown in ascending order

I have created a series of bins using the Numpy 'arange' function:
bins = np.arange(0, df['eCPM'].max(), 0.1)
The output looks like this:
[1.8, 1.9) 145940.67 52.569295 1.842306
[1.9, 2) 150356.59 54.159954 1.932365
[10.6, 10.7) 150980.84 54.384815 10.626436
[13.3, 13.4) 152038.63 54.765842 13.373157
[2, 2.1) 171494.11 61.773901 2.033192
[2.1, 2.2) 178196.65 64.188223 2.141412
[2.2, 2.3) 186259.13 67.092410 2.264005
How can I get the bins[10. 6, 10.7] and [13.3, 13.4] to go where they belong such that all bins appear in ascending order?
I'm assuming the bins are read as strings hence this issue. I tried to add a dtype: bins = ..., 0.1, dtype=float) but no luck.
[EDIT]
import numpy as np
import pandas
df = pandas.read_csv('path/to/file', skip_footer=1)
bins = np.arange(0, df1['eCPM'].max(), 0.1, dtype=float)
df['ecpm group'] = pandas.cut(df['eCPM'], bins, right=False, labels=None)
df =df[['ecpm group', 'Imps', 'Revenue']].groupby('ecpm group').sum()
You could sort the index in "human order" and then reindex:
import numpy as np
import pandas as pd
import re
def natural_keys(text):
'''
alist.sort(key=natural_keys) sorts in human order
http://nedbatchelder.com/blog/200712/human_sorting.html
(See Toothy's implementation in the comments)
'''
def atoi(text):
return int(text) if text.isdigit() else text
return [atoi(c) for c in re.split('(\d+)', text)]
# df = pandas.read_csv('path/to/file', skip_footer=1)
df = pd.DataFrame({'eCPM': np.random.randint(20, size=40)})
bins = np.arange(0, df['eCPM'].max()+1, 0.1, dtype=float)
df['ecpm group'] = pd.cut(df['eCPM'], bins, right=False, labels=None)
df = df.groupby('ecpm group').sum()
df = df.reindex(index=sorted(df.index, key=natural_keys))
print(df)
yields
eCPM
[0, 0.1) 0
[1, 1.1) 5
[2, 2.1) 4
[4, 4.1) 12
[6, 6.1) 24
[7, 7.1) 7
[8, 8.1) 16
[9, 9.1) 45
[10, 10.1) 40
[11, 11.1) 11
[12, 12.1) 12
[13, 13.1) 13
[15, 15.1) 15
[16, 16.1) 64
[17, 17.1) 34
[18, 18.1) 18

Categories

Resources