ploting a bar plot for large amount of data

ploting a bar plot for large amount of data - python

I have a 752 data points which i need to plot,
I have plotted the data on bar plot using seaborn library in python , but graph i get is very unclear and I am not able to analyze anything through graph , is there any way i can view this graph more clearly and all data points fit with labels seen clearly in python

code written is following
import seaborn as sns
sns.set_style("whitegrid")
ax = sns.barplot(x="Events", y = "Count" , data = Unique_Complaints)

It is always difficult to visualise so many points. Nihal, has rightly pointed that it is best to use Pandas and statistical analysis to extract information from your data. Having said this, IDEs like Spyder and Pycharm and packages like Bokeh allow interactive plots where you can zoom to different parts of the plot. Here is an example with Pycharm:
Code:
# Import libraries
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Exponential decay function
x = np.arange(1,10, 0.1)
A = 7000
y = A*np.exp(-x)
# Plot the exponential function
sns.barplot(x = x, y = y)
plt.show()
Figure without magnification
Magnified figure

To see a large amount of data you can use the figure from matplotlib.pyplot like this
from matplotlib.pyplot import figure
figure(num=None, figsize=(20,18), dpi=80, facecolor='w', edgecolor='r')
sns.barplot(x="Events", y = "Count" , data = Unique_Complaints)
plt.show()
I am using this to see a graph with 49 variables and the result is:
My code is
from matplotlib.pyplot import figure
figure(num=None, figsize=(20,18), dpi=256, facecolor='w', edgecolor='r')
plt.title("Missing Value Prercentage")
sns.barplot(miss_val_per, df.columns)
plt.show()
Data I am using is:
https://www.kaggle.com/sobhanmoosavi/us-accidents

just swap x and y and try to increase the fig size

Related

How to draw a 3D grid using matplotlib based on three columns of data?

I'm facing a problem with making a 3D plot. I want to build a 3D surface plot like below from three columns of data.
Expected graphic case
I have implemented a few currently, as shown below.
Current picture case
But I still don't know how to make it "grid" like the first picture? Does anyone know how to achieve this? Part of the code and full data are as follows.
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import os
import warnings
from mpl_toolkits.mplot3d import Axes3D
warnings.filterwarnings('ignore')
os.chdir(r"E:\SoftwareFile\stataFile")
matplotlib.use('TkAgg')
plt.figure(figsize=(10,6))
data = pd.read_stata(r"E:\SoftwareFile\stataFile\demo.dta")
ax = plt.axes(projection="3d")
ax.plot_trisurf(data["age"], data["weight"], data["pr_highbp"],
cmap=plt.cm.Spectral_r)
ax.set_xticks(np.arange(20, 90, step=10))
ax.set_yticks(np.arange(40, 200, step=40))
ax.set_zticks(np.arange( 0, 1.2, step=0.2))
ax.set_title("Probability of Hypertension by Age and Weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Weight (kg")
ax.zaxis.set_rotate_label(False)
ax.set_zlabel("Probability of Hypertension", rotation=90)
ax.view_init(elev=30, azim=240)
plt.savefig("demo.png", dpi=1200)
Download all data
Sincerely appreciate your help

Remove the colormap and opacity in the trisurf command like so:
ax.plot_trisurf(
data["age"],
data["weight"],
data["pr_highbp"],
color=None,
linewidth=1,
antialiased=True,
edgecolor="Black",
alpha=0,
)
That should result in:
You could also take a look at plot_wireframe(). For that I think you have to start with
x = data["age"].to_list()
y = data["weight"].to_list()
X, Y = np.meshgrid(x, y)
But I'm not sure how to create the z coordinate. It seems you may need interpolation from what I read.

Make width of seaborn facets proportional to the range of data along the x axis

I have used FacetGrid() from the seaborn module to break a line graph into segments with labels for each region as the title of each subplot. I saw the option in the documentation to have the x-axes be independent. However, I could not find anything related to having the plot sizes correspond to the size of each axis.
The code I used to generate this plot, along with the plot, are found below.
import matplotlib.pyplot as plt
import seaborn as sns
# Added during Edit 1.
sns.set()
graph = sns.FacetGrid(rmsf_crys, col = "Subunit", sharex = False)
graph.map(plt.plot, "Seq", "RMSF")
graph.set_titles(col_template = '{col_name}')
plt.show()
Plot resulting from the above code
Edit 1
Updated plot code using relplot() instead of calling FacetGrid() directly. The final result is the same graph.
import matplotlib.pyplot as plt
import seaborn as sns
# Forgot to include this in the original code snippet.
sns.set()
graph = sns.relplot(data = rmsf_crys, x = "Seq", y = "RMSF",
col = "Subunit", kind = "line",
facet_kws = dict(sharex=False))
graph.set_titles(col_template = '{col_name}')
plt.show()

Full support for this would need to live at the matplotlib layer, and I don't believe it's currently possible to have independent axes but shared transforms. (Someone with deeper knowledge of the matplotlib scale internals may prove me wrong).
But you can get pretty close by calculating the x range you'll need ahead of time and using that to parameterize the gridspec for the facets:
import numpy as np, seaborn as sns
tips = sns.load_dataset("tips")
xranges = tips.groupby("size")["total_bill"].agg(np.ptp)
xranges *= 1.1 # Account for default margins
sns.relplot(
data=tips, kind="line",
x="total_bill", y="tip",
col="size", col_order=xranges.index,
height=3, aspect=.65,
facet_kws=dict(sharex=False, gridspec_kws=dict(width_ratios=xranges))
)

How do I cluster values of y axis against x axis in scatterplot?

Lets say I've 2 arrays
x = [1,2,3,4,5,6,7]
y = [1,2,2,2,3,4,5]
its scatter plot looks like this
what I want to do is that I want my x axis to look like this in the plot
0,4,8
as a result of which values of y in each piece of x should come closer .
The similar behavior I've seen is bar plots where this is called clustering , how do I do the same in case of scatter plot , or is there any other plot I should be using ?
I hope my question is clear/understandable .
All the help is appreciated

With you plot, try this, before you display the plot.
plt.xticks([0,4,8]))
or
import numpy as np
plt.xticks(np.arange(0, 8+1, step=4))
Then to change the scale you can try something like this,
plt.xticks([0,4,8]))
plt.rcParams["figure.figsize"] = (10,5)
I got this with my example,
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.xticks([0,4,8])
plt.rcParams["figure.figsize"] = (7,3)
plt.plot(x, y, 'o', color='black')
output

I think what you are looking for is close to swarmplots and stripplots in Seaborn. However, Seaborn's swarmplot and stripplot are purely categorical on one of the axes, which means that they wouldn't preserve the relative x-axis order of your elements inside each category.
One way to do what you want would be to increase the space in your x-axis between categories ([0,4,8]) and modify your xticks accordingly.
Below is an example of this where I assign the data to 3 different categories: [-2,2[, [2,6[, [6,10[. And each bar is dil_k away from its directly neighboring bars.
import matplotlib.pyplot as plt
import numpy as np
#Generating data
x= np.random.choice(8,size=(100))
y= np.random.choice(8,size=(100))
dil_k=20
#Creating the spacing between categories
x[np.logical_and(x<6, x>=2)]+=dil_k
x[np.logical_and(x<10, x>=6)]+=2*dil_k
#Plotting
ax=plt.scatter(x,y)
#Modifying axes accordingly
plt.xticks([0,2,22,24,26,46,48,50],[0,2,2,4,6,6,8,10])
plt.show()
And the output gives:
Alternatively, if you don't care about keeping the order of your elements along the x-axis inside each category, then you can use swarmplot directly.
The code can be seen below:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
#Generating data
x= np.random.choice(8,size=(100))
y= np.random.choice(8,size=(100))
#Creating the spacing between categories
x[np.logical_and(x<2,x>=-2)]=0
x[np.logical_and(x<6, x>=2)]=4
x[np.logical_and(x<10, x>=6)]=8
#Plotting
sns.swarmplot(x=x,y=y)
plt.show()
And the output gives:

Dynamic spectrum using plotly

I want to plot time vs frequency as x and y axis, but also a third parameter that is specified by the intensity of plot at (x, y) rather (time, frequency) point. [Actually, instead of going up with third axis in 3D visualisation, I want something like a 2D plot, with amplitude of third axis governed by the intensity(color) value at (x,y)].
Can someone please suggest me something similar that I am looking for? These plots are actually called dynamical spectrum.
PS: I am plotting in python offline. I have gone through https://plot.ly/python/, but still I am not sure which will serve my purpose.
Please suggest something that will help me accomplish the above :)

This is the code to compute and visualize the spectrogram with plotly, i tested the code with this audio file: vignesh.wav
The code was tested in Jupyter notebook using python 3.6
# Full example
import numpy as np
import matplotlib.pyplot as plt
# plotly offline
import plotly.offline as pyo
from plotly.offline import init_notebook_mode #to plot in jupyter notebook
import plotly.graph_objs as go
init_notebook_mode() # init plotly in jupyter notebook
from scipy.io import wavfile # scipy library to read wav files
AudioName = "vignesh.wav" # Audio File
fs, Audiodata = wavfile.read(AudioName)
Audiodata = Audiodata / (2.**15) # Normalized between [-1,1]
#Spectrogram
from scipy import signal
plt.figure()
N = 512 #Number of point in the fft
w = signal.blackman(N)
freqs, bins, Pxx = signal.spectrogram(Audiodata, fs,window = w,nfft=N)
# Plot with plotly
trace = [go.Heatmap(
x= bins,
y= freqs,
z= 10*np.log10(Pxx),
colorscale='Jet',
)]
layout = go.Layout(
title = 'Spectrogram with plotly',
yaxis = dict(title = 'Frequency'), # x-axis label
xaxis = dict(title = 'Time'), # y-axis label
)
fig = go.Figure(data=trace, layout=layout)
pyo.iplot(fig, filename='Spectrogram')

I'd suggest the pcolormesh plot
import matplotlib.pyplot as mp
import numpy as np
# meshgrid your timevector to get it in the desired format
X, Y = np.meshgrid(timevector, range(num_of_frequency_bins))
fig1, ax1 = mp.subplots()
Plothandle = mp.pcolormesh(X, Y, frequencies, cmap=mp.cm.jet, antialiased=True, linewidth=0)
Whereas num_of_frequency_bins the amount of frequencies to display on your y-axis. For example from 0Hz to 1000Hz with 10Hz resolution you'll have to do: range(0,1000,10)
Antialiased is just for the looks, same with linewidth.
Colormap jet is usually not recommended due to non-linear gray-scale, but in frequency-domains it is regularly used. Thus I used it here. But python has some nice linear gray-scale colormaps as well!
To the topic of using plotly: If you just want a static image, you don't have to use plotly. If you want to have an interactive image where you can drag around axes and stuff like this, you should take a look at plotly.

Is it possible to plot a "checkerboard" type plot in python?

I have a data set that has two independent variables and 1 dependent variable. I thought the best way to represent the dataset is by a checkerboard-type plot wherein the color of the cells represent a range of values, like this:
I can't seem to find a code to do this automatically.

You need to use a plotting package to do this. For example, with matplotlib:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
X = 100*np.random.rand(6,6)
fig, ax = plt.subplots()
i = ax.imshow(X, cmap=cm.jet, interpolation='nearest')
fig.colorbar(i)
plt.show()

For those who come across this years later as myself, what Original Poster wants is a heatmap.
Matplotlib has documentation regarding the following example here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

ploting a bar plot for large amount of data - python

just swap x and y and try to increase the fig size

Related

How to draw a 3D grid using matplotlib based on three columns of data?

Make width of seaborn facets proportional to the range of data along the x axis

How do I cluster values of y axis against x axis in scatterplot?

Dynamic spectrum using plotly

Is it possible to plot a "checkerboard" type plot in python?

Categories

Resources