how to pass pandas dataframe as parameter to matplotlib library method plot - python

how to pass pandas dataframe as parameter to matplotlib library method plot ?
For example
import matplotlib.pyplot as plt
plt.plot(df1.as_matrix(['Score']),df1.as_matrix(['Score']))

It seems you need Series.values for convert Series to numpy array:
plt.plot(df1['Score'].values, df1['Col'].values)
Or use DataFrame.plot:
df.plot(x='Score',y='Col')

Related

pandas DataFrame.plot() method

I'm new to data science and trying some python libraries. I know it sound a bit silly but I'm confused with the code below, which i found on the pandas docs. I'm assuming that 'ts' is a pd obj, but how exactly a pd object can use matplotlib method here? What's the connection between pandas and matplotlib? Can someone explain that to me, thank you.
In [3]: ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000))
In [4]: ts = ts.cumsum()
In [5]: ts.plot()
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7fa17967caf0>`
Matplotlib is a library that makes it easy to generate plots in Python. Pandas is a library that helps you perform vector and matrix operations in Python.
According to the Pandas docs:
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
So the only connection between Pandas and Matplotlib is that Pandas uses Matplotlib to generate the plot for you.
If you want to see that plot, you have to add a couple of extra lines:
import matplotlib.pyplot as plt
plt.show()

Reset colors back to default [duplicate]

I had a look at Kaggle's univariate-plotting-with-pandas. There's this line which generates bar graph.
reviews['province'].value_counts().head(10).plot.bar()
I don't see any color scheme defined specifically.
I tried plotting it using jupyter notebook but could see only one color instead of all multiple colors as at Kaggle.
I tried reading the document and online help but couldn't get any method to generate these colors just by the line above.
How do we do that? Is there a config to set this randomness by default?
It seems like the multicoloured bars were the default behaviour in one of the former pandas versions and Kaggle must have used that one for their tutorial (you can read more here).
You can easily recreate the plot by defining a list of standard colours and then using it as an argument in bar.
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
'#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
reviews['province'].value_counts().head(10).plot.bar(color=colors)
Tested on pandas 0.24.1 and matplotlib 2.2.2.
In seaborn is it not problem:
import seaborn as sns
sns.countplot(x='province', data=reviews)
In matplotlib are not spaces, but possible with convert values to one row DataFrame:
reviews['province'].value_counts().head(10).to_frame(0).T.plot.bar()
Or use some qualitative colormap:
import matplotlib.pyplot as plt
N = 10
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Paired(np.arange(N)))
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Pastel1(np.arange(N)))
The colorful plot has been produced with an earlier version of pandas (<= 0.23). Since then, pandas has decided to make bar plots monochrome, because the color of the bars is pretty meaningless. If you still want to produce a bar chart with the default colors from the "tab10" colormap in pandas >= 0.24, and hence recreate the previous behaviour, it would look like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
N = 13
df = pd.Series(np.random.randint(10,50,N), index=np.arange(1,N+1))
cmap = plt.cm.tab10
colors = cmap(np.arange(len(df)) % cmap.N)
df.plot.bar(color=colors)
plt.show()

How do I make the pandas.scatter_matrix chart appear larger in Jupyter Notebook

I'm using Python 3.6.5 and Pandas 0.23.0 in Jupyter Notebook.
Some of my relevant imports:
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
My code to generate the scatter matrix
scatter_matrix(df_obscured)
Some potentially important notes about my dataframe...I have an index field, I have a datetime64[ns] field, I have about 20 float64 fields that I'm looking at.
My problem:
My scatter matrix is super small. Maybe 2 to 3 hundred pixels wide. Most of the output looks like:
<matplotlib.axes._subplots.AxesSubplot object at 0x0000021AC2DDBFD0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000021AC3033DA0>,
How do I make the scatter matrix chart larger?
scatter_matrix takes a figsize parameter:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.scatter_matrix.html
Be aware that as with other matplotlib 'figsize' parameters, the size specified should be in inches, not in pixels.

Transform pandas dataframe into numpy array for fast plotting

I am writting a script to plot some data.
I am using python 3.7.1 on windows and have the following code to plot:
import pandas as pd
import matplotlib.pyplot as plt
files=['path']
for i in range(len(files)):
data = pd.read_csv(files[i], sep=';', skiprows=17, header=None,engine='python', decimal=",")
c=files[0].split('\\')
path='\\'.join(c[:-1])
x= data.loc[:,0].values
y= data.loc[:,1].values
c,data=None,None
plt.ioff() #turns off the plotting
plt.plot(x,y)
plt.xlabel('x]')
plt.ylabel('y')
plt.savefig(path+'\\ title123') #saves image
I want to transform the dataframe from pandas into a numpy array dtype float64.
Currently, the code I have transforms the data into an object type. I cannot plot this because the code is taking too long to run.
An example of what I am trying to achieve is:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(0,10,1000000)
y=np.sin(x)
plt.plot(x,y)
I will leave a link to the file.
https://drive.google.com/open?id=1kir-cGlk3bZSLmvD_tfnbGUaTYzvcW-3
Can anyone give me a help?
Kind Regards!
I just noticed that it was a problem with ',' and '.'. Sort of a math "language" conflict.
However, the for loop runs extremely slow when more than one file is loaded.
Kind regards to all!

Display list in Matplotlib

I'm trying to create histogram with the below values. This is the JSON format they are being parsed in, which I now want to visualise.
import json, requests
import matplotlib.pyplot as plt
import numpy as np
a = [[{"Name": "A"},{"Value":100}], [{"Name": "B"},{"Value":300}]]
Should I be converting to dictionary first? I want a histogram showing A:100, B:300.
Thanks
Take every pair and make a dictionary out of it. Then, make your histogram. Remember to use xticklabels.

Categories

Resources