How to plot only one half of a scatter matrix using pandas - python

I am using pandas scatter_matrix (couldn't get PairgGrid in seaborn to work) to plot all combinations of a set of columns in a pandas frame. Each column as 1000 data points and there are nine columns.
I am using the following code:
pandas.plotting.scatter_matrix(df, alpha=0.2, figsize=(8,8))
I get the figure shown below:
This is nice., However, you'll notice that across the main diagonal I have a mirror image. Is it possible to plot only the lower portion as in the following fake plot I made using paint:

This is probably not the cleanest way to do it, but it works:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
axes = pd.plotting.scatter_matrix(iris, alpha=0.2, figsize=(8,8))
for i in range(np.shape(axes)[0]):
for j in range(np.shape(axes)[1]):
if i < j:
axes[i,j].set_visible(False)

Related

Plot pandas all columns from and use their dataframe

I would like to have every column on my x-Axis and every value on my y-Axis.
With plotly and seaborn I could only find a way to plot the values against each other (column 1 on x vs coulmn 2 on y).
So for my shown example following would be columns:
"Import Files", "Defining Variables", "Simulate Cutting Down",...
I would like to have all theri values on the y-Axis.
So what I basically want is
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('timings.csv')
df.T.plot()
plt.show()
but with scatter. Matplotlib, Seaborn or Plotly is fine by me.
This would be an example for a csv File, since I can't upload a file:
Import Files,Defining Variables,Copy All Cutters,Simulate Cutting Down,Calculalte Circle, Simulate Cutting Circle, Calculate Unbalance,Write to CSV,Total Time
0.015956878662109375,0.0009989738464355469,0.022938966751098633,0.1466083526611328,0.0009968280792236328,48.128061294555664,0.0,0.014995098114013672,48.33055639266968
0.015958786010742188,0.0,0.024958133697509766,0.14598894119262695,0.0,49.22848296165466,0.0,0.004987239837646484,49.42037606239319
0.015943288803100586,0.0,0.036900997161865234,0.14561033248901367,0.0,46.80884146690369,0.0,0.004009723663330078,47.011305809020996
I only used the data you provided; as mentioned by others in the comments, barplot is more suited for this data but here it is with scatter plot:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16,5))
sns.scatterplot(data=df.melt(), x='variable', y ='value', ax=ax)
ax.set_xlabel('')
ax.set_ylabel('Time in seconds')

Plot scatter graphs with matplotlib subplot

I am trying to plot a scatter diagram. It will take multiple arrays as input but plot into a single graph.
Here is my code:
import numpy as np
import os
import matplotlib.pyplot as plt
ax = plt.gca()
n_p=np.array([17.2,25.7,6.1,0.9,0.5,0.2])
n_d=np.array([1,2,3])
a_p=np.array([4.3,1.4,8.1,1.8,7.9,7.0])
a_d=np.array([12,13,14])
ax.scatter = ([n_d[0]/n_d[1]],[n_p[0]/n_p[1]])
ax.scatter = ([a_d[0]/a_d[1]],[a_p[0]/a_p[1]])
I will read the arrays from csv file, here I just put a simple example (for that I imported os). I want to plot the ratio of array element 2/ element 1 of n_p (as x-axis) and same with n_d (as y-axis). This will give a point in the graph. Similar operation will be followed by a_p and a_d array, and the point will be appended to the graph. There will be more data to append, but to understand the process, two is enough.
I tried to follow example from here.
If I use the color, I get syntax error.
If I do not use color, I get a blank plot.
Sorry, my coding experience is beginner so code is rather nasty.
Thanks in advance.
remove the = from the function call!
import numpy as np
import os
import matplotlib.pyplot as plt
ax = plt.gca()
n_p=np.array([17.2,25.7,6.1,0.9,0.5,0.2])
n_d=np.array([1,2,3])
a_p=np.array([4.3,1.4,8.1,1.8,7.9,7.0])
a_d=np.array([12,13,14])
ax.scatter([n_d[0]/n_d[1]],[n_p[0]/n_p[1]])
ax.scatter([a_d[0]/a_d[1]],[a_p[0]/a_p[1]])

Python - Pandas histogram width

I am doing a histogram plot of a bunch of data that goes from 0 to 1. When I plot I get this
As you can see, the histogram 'blocks' do not align with the y-axis.
Is there a way to set my histogram in order to get the histograms in a constant width of 0.1? Or should I try a diferent package?
My code is quite simple:
import pandas as pd
import numpy as np
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
np.set_printoptions(precision=10,
threshold=10000,
linewidth=150,suppress=True)
E=pd.read_csv("FQCoherentSeparableBons5.csv")
E = E.ix[0:,1:]
E=np.array(E,float)
P0=E[:,0]
P0=pd.DataFrame(P0,columns=['P0'])
scatter_matrix(P0, alpha=0.2, figsize=(6, 6), diagonal='hist',color="red")
plt.suptitle('Distribucio p0')
plt.ylabel('Frequencia p0')
plt.show()
PD: If you are wondering about the data, I is just a random distribution from 0 to 1.
You can pass additional arguments to the pandas histogram using the hist_kwds argument of the scatter_matrix function. If you want ten bins of width 0.1, then your scatter_matrix call should look like
scatter_matrix(P0, alpha=0.2, figsize=(6, 6), diagonal='hist', color="red",
hist_kwds={'bins':[i*0.1 for i in range(11)]})
Additional arguments for the pandas histogram can be found in documentation.
Here is a simple example. I've added a grid to the plot so that you can see the bins align correctly.
import numpy as np
import pandas as pd
from pandas import scatter_matrix
import matplotlib.pyplot as plt
x = np.random.uniform(0,1,100)
scatter_matrix(pd.DataFrame(x), diagonal='hist',
hist_kwds={'bins':[i*0.1 for i in range(11)]})
plt.xlabel('x')
plt.ylabel('frequency')
plt.grid()
plt.show()
By default, the number of bins in the histogram is 10, but just because your data is distributed between 0 and 1 doesn't mean the bins will be evenly spaced over the range. For example, if you do not actually have a data point equal to 1, you will get a result similar to the one in your question.

Plot pandas dataframe with varying number of columns along imshow

I want to plot an image and a pandas bar plot side by side in an iPython notebook. This is part of a function so that the dataframe containing the values for the bar chart can vary with respect to number of columns.
The libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
Dataframe
faces = pd.Dataframe(...) # return values for 8 characteristics
This returns the the bar chart I'm looking for and works for a varying number of columns.
faces.plot(kind='bar').set_xticklabels(result[0]['scores'].keys())
But I didn't find a way to plot it in a pyplot figure also containing the image. This is what I tried:
fig, (ax_l, ax_r) = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
ax_l.imshow( img )
ax_r=faces.plot(kind='bar').set_xticklabels(result[0]['scores'].keys())
The output i get is the image on the left and an empty plot area with the correct plot below. There is
ax_r.bar(...)
but I couldn't find a way around having to define the columns to be plotted.
You just need to specify your axes object in your DataFrame.plot calls.
In other words: faces.plot(kind='bar', ax=ax_r)

Create a checkerboard plot with unbalanced rows and colums

I have a dataset similar to this format X = [[1,4,5], [34,70,1,5], [43,89,4,11], [22,76,4]] where the length of element lists are not equal.
I want to create a checkerboard plot of 4 rows and 4 columns and the colorbar of each unit box corresponds to the value of the number. In this dataset some small boxes will be missing (eg. 4th column firs row).
How would I plot this in python using matplotlib?
Thanks
You can use seaborn library or matplotlib to generate heatmap. Firstly, convert it to pandas dataframe to handle missing values.
import pandas as pd
df = pd.DataFrame([[1,4,5],[34,70,1,5], [43,89,4,11],[22,76,4]])
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
sns.heatmap(df)
plt.show()
Result looks something like this.

Categories

Resources