How to test for Normality using my dataset

How to test for Normality using my dataset - python

This image includes my data I'm using for an ANOVA test.
Using this data, how can I test certain parts or comparisons for normality? For Python3

you could use a qq-plot
from statsmodels.graphics.gofplots import qqplot
from matplotlib import pyplot
column = "column of your dataset"
# q-q plot
qqplot(column, line='s')
pyplot.show()
basically the upper end and also the lower end of the Q-Q plot shouldn't deviate from the straight line

Related

Plotly in Python: show mean and variance of selected data

I am generating histograms using go.Histogram as described here. I am getting what is expected:
What I want to do is to show some statistics of the selected data, as shown in the next image (the white box I added manually in Paint):
I have tried this and within the function selection_fn I placed the add_annotation described here. However, it does nothing. No errors too.
How can I do this?
Edit: I am using this code taken from this link
import plotly.graph_objects as go
import numpy as np
x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x)])
fig.show()
with obviously another data set.

Reset colors back to default [duplicate]

I had a look at Kaggle's univariate-plotting-with-pandas. There's this line which generates bar graph.
reviews['province'].value_counts().head(10).plot.bar()
I don't see any color scheme defined specifically.
I tried plotting it using jupyter notebook but could see only one color instead of all multiple colors as at Kaggle.
I tried reading the document and online help but couldn't get any method to generate these colors just by the line above.
How do we do that? Is there a config to set this randomness by default?

It seems like the multicoloured bars were the default behaviour in one of the former pandas versions and Kaggle must have used that one for their tutorial (you can read more here).
You can easily recreate the plot by defining a list of standard colours and then using it as an argument in bar.
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
'#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
reviews['province'].value_counts().head(10).plot.bar(color=colors)
Tested on pandas 0.24.1 and matplotlib 2.2.2.

In seaborn is it not problem:
import seaborn as sns
sns.countplot(x='province', data=reviews)
In matplotlib are not spaces, but possible with convert values to one row DataFrame:
reviews['province'].value_counts().head(10).to_frame(0).T.plot.bar()
Or use some qualitative colormap:
import matplotlib.pyplot as plt
N = 10
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Paired(np.arange(N)))
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Pastel1(np.arange(N)))

The colorful plot has been produced with an earlier version of pandas (<= 0.23). Since then, pandas has decided to make bar plots monochrome, because the color of the bars is pretty meaningless. If you still want to produce a bar chart with the default colors from the "tab10" colormap in pandas >= 0.24, and hence recreate the previous behaviour, it would look like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
N = 13
df = pd.Series(np.random.randint(10,50,N), index=np.arange(1,N+1))
cmap = plt.cm.tab10
colors = cmap(np.arange(len(df)) % cmap.N)
df.plot.bar(color=colors)
plt.show()

How To Remove Extra Horizontal Line In Matplotlib Plot

I am plotting data from a .txt file using Matplotlib and although the plot looks as expected there is an odd horizontal line through the plot. This occurs across three different .txt data files I've tried. I plotted the data in Mathematica to ensure that it is not an artifact of the data. I am trying to remove the line from my data.
I've tried the accessing some of the Matplotlib methods like lines.remove() with no luck. Below is the code I'm executing and the resulting plot.
import numpy as np
import matplotlib.pyplot as plt
neon = np.loadtxt("data/neon.txt")
neon_plot = plt.plot(neon)
plt.grid()
This is an example of the horizontal line going through my plots

matplotlib.pyplot.plot() doesn't show the graph

I am learning Python and I have a side project to learn to display data using matplotlib.pyplot module. Here is an example to display the data using dates[] and prices[] as data. Does anyone know why we need line 5 and line 6? I am confused why this step is needed to have the graph displayed.
from sklearn import linear_model
import matplotlib.pyplot as plt
def showgraph(dates, prices):
dates = numpy.reshape(dates, (len(dates), 1)) # line 5
prices = numpy.reshape(prices, (len(prices), 1)) # line 6
linear_mod = linear_model.LinearRegression()
linear_mod.fit(dates,prices)
plt.scatter(dates,prices,color='yellow')
plt.plot(dates,linear_mod.predict(dates),color='green')
plt.show()

try the following in terminal to check the backend:
import matplotlib
import matplotlib.pyplot
print matplotlib.backends.backend
If it shows 'agg', it is a non-interactive one and wont show but plt.savefig works.
To show the plot, you need to switch to TkAgg or Qt4Agg.
You need to edit the backend in matplotlibrc file. To print its location in terminal do the following.
import matplotlib
matplotlib.matplotlib_fname()
more about matplotlibrc

Line 5 and 6 transform what Im assuming are row vectors (im not sure how data and prices are encoded before this transformation) into column vectors. So now you have vectors that look like this.
[0,
1,
2,
3]
which is the form that linear_model.Linear_Regression.fit() is expecting. The reshaping was not necessary for plotting under the assumption that data and prices are row vectors.

My approach is exactly like yours but still without line 5 and 6 display is correct. I think those line are unnecessary. It seems that you do not need fit() function because of your input data are in row format.

Plotting two images side by side in python

I'd like to plot two images side by side in Python using matplotlib. However I don't want to create separate subplots. I want to plot two images in the same figure so that I can draw correspondences between the two images. See image below.
In Matlab I believe this can be done using imshow([I1, I2]) however the python API for matplotlib does not accept an array of images. Is there a way to do this in python?

If you use numpy you can simply make one large array that represents the two images using the numpy concatenate function:
import numpy as np
import matplotlib.pyplot as plt
img_A = np.ones((10,10))
img_B = np.ones((10,10))
plot_image = np.concatenate((img_A, img_B), axis=1)
plt.imshow(plot_image)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to test for Normality using my dataset - python

This image includes my data I'm using for an ANOVA test. Using this data, how can I test certain parts or comparisons for normality? For Python3

you could use a qq-plot from statsmodels.graphics.gofplots import qqplot from matplotlib import pyplot column = "column of your dataset" # q-q plot qqplot(column, line='s') pyplot.show() basically the upper end and also the lower end of the Q-Q plot shouldn't deviate from the straight line

Related

Plotly in Python: show mean and variance of selected data

Reset colors back to default [duplicate]

How To Remove Extra Horizontal Line In Matplotlib Plot

matplotlib.pyplot.plot() doesn't show the graph

Plotting two images side by side in python

Categories

Resources