pandas DataFrame.plot() method - python

I'm new to data science and trying some python libraries. I know it sound a bit silly but I'm confused with the code below, which i found on the pandas docs. I'm assuming that 'ts' is a pd obj, but how exactly a pd object can use matplotlib method here? What's the connection between pandas and matplotlib? Can someone explain that to me, thank you.
In [3]: ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000))
In [4]: ts = ts.cumsum()
In [5]: ts.plot()
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7fa17967caf0>`

Matplotlib is a library that makes it easy to generate plots in Python. Pandas is a library that helps you perform vector and matrix operations in Python.
According to the Pandas docs:
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
So the only connection between Pandas and Matplotlib is that Pandas uses Matplotlib to generate the plot for you.
If you want to see that plot, you have to add a couple of extra lines:
import matplotlib.pyplot as plt
plt.show()

Related

How to plot addresses (Lat/Long) from a csv on JSON map.?

So I am trying to do something which seems relatively simple but is proving incredibly difficult. I have a .csv file with addresses and their correspondent latitude/longitude, I just want to plot those on a California JSON map like this one in python:
https://github.com/deldersveld/topojson/blob/master/countries/us-states/CA-06-california-counties.json
I've tried bubble maps, scatter maps, etc. but to no luck I keep getting all kind of errors :(. This is the closest I've got, but that uses a world map and can't zoom in effectively like that json map up there. I am still learning python so please go easy on me ><
import plotly.express as px
import pandas as pd
df = pd.read_csv(r"C:\Users\FT4\Desktop\FT Imported Data\Calimapdata.csv")
fig = px.scatter_geo(df,lat='Latitude',lon='Longitude', hover_name="lic_type", scope="usa")
fig.update_layout(title = 'World map', title_x=0.5)
fig.show()
If anyone could help me with this I would appreciate it. Thank you
your example link is just a GeoJSON geometry definition. Are you talking about a Choropleth?
If so, check out geopandas - you should be able to link your data to the polygons in the shape definition you linked to by reading it in with geojson and then joining on the shapes with sjoin. Once you have data tied to each geometry, you can plot with geopandas's .plot method. Check out the user guide for more info.
Something along these lines should work:
import geopandas as gpd, pandas as pd
geojson_raw_url = (
"https://raw.githubusercontent.com/deldersveld/topojson/"
"master/countries/us-states/CA-06-california-counties.json"
)
gdf = gpd.read_file(geojson_raw_url, engine="GeoJSON")
df = pd.read_csv(r"C:\Users\FT4\Desktop\FT Imported Data\Calimapdata.csv")
merged = gpd.sjoin(gdf, df, how='right')
# you could plot this directly with geopandas
merged.plot("lic_type")
alternatively, using #r-beginners' excellent answer to another question, we can plot with express:
fig = px.choropleth(merged, geojson=merged.geometry, locations=merged.index, color="lic_type")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

How do I make the pandas.scatter_matrix chart appear larger in Jupyter Notebook

I'm using Python 3.6.5 and Pandas 0.23.0 in Jupyter Notebook.
Some of my relevant imports:
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
My code to generate the scatter matrix
scatter_matrix(df_obscured)
Some potentially important notes about my dataframe...I have an index field, I have a datetime64[ns] field, I have about 20 float64 fields that I'm looking at.
My problem:
My scatter matrix is super small. Maybe 2 to 3 hundred pixels wide. Most of the output looks like:
<matplotlib.axes._subplots.AxesSubplot object at 0x0000021AC2DDBFD0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000021AC3033DA0>,
How do I make the scatter matrix chart larger?
scatter_matrix takes a figsize parameter:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.scatter_matrix.html
Be aware that as with other matplotlib 'figsize' parameters, the size specified should be in inches, not in pixels.

Transform pandas dataframe into numpy array for fast plotting

I am writting a script to plot some data.
I am using python 3.7.1 on windows and have the following code to plot:
import pandas as pd
import matplotlib.pyplot as plt
files=['path']
for i in range(len(files)):
data = pd.read_csv(files[i], sep=';', skiprows=17, header=None,engine='python', decimal=",")
c=files[0].split('\\')
path='\\'.join(c[:-1])
x= data.loc[:,0].values
y= data.loc[:,1].values
c,data=None,None
plt.ioff() #turns off the plotting
plt.plot(x,y)
plt.xlabel('x]')
plt.ylabel('y')
plt.savefig(path+'\\ title123') #saves image
I want to transform the dataframe from pandas into a numpy array dtype float64.
Currently, the code I have transforms the data into an object type. I cannot plot this because the code is taking too long to run.
An example of what I am trying to achieve is:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(0,10,1000000)
y=np.sin(x)
plt.plot(x,y)
I will leave a link to the file.
https://drive.google.com/open?id=1kir-cGlk3bZSLmvD_tfnbGUaTYzvcW-3
Can anyone give me a help?
Kind Regards!
I just noticed that it was a problem with ',' and '.'. Sort of a math "language" conflict.
However, the for loop runs extremely slow when more than one file is loaded.
Kind regards to all!

how to pass pandas dataframe as parameter to matplotlib library method plot

how to pass pandas dataframe as parameter to matplotlib library method plot ?
For example
import matplotlib.pyplot as plt
plt.plot(df1.as_matrix(['Score']),df1.as_matrix(['Score']))
It seems you need Series.values for convert Series to numpy array:
plt.plot(df1['Score'].values, df1['Col'].values)
Or use DataFrame.plot:
df.plot(x='Score',y='Col')

How do I draw a area plot in ggplot with timeseries data?

I'm trying to post a graph like this.
My data set looks like this. It has two columns. The first is the date and the second is the total number:
date volume
3/21/16 280
3/20/16 279
3/18/16 278
3/4/16 277
I am at a loss on how to make the graph from the link work with my data set. Thank you so much.
# Import required modules
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as pyplot
import ggplot
# Data
data = pd.read_csv("niagra-falls-escape.csv") # Read CSV
df = pd.DataFrame(data)
# Viz
ggplot(df, aes(x='date')) + \
geom_area()</code>
There are a couple issues here. First aes, geom_area etc, are classes of the ggplot module. Thus as in the referenced post they import via from ggplot import * instead of import ggplot. What I would recommend for easier debugging and maintainable code is to do from ggplot import ggplot, aes, geom_area.
Then there are a couple issues with your code. I think you need to specify that the date is a datetime type of data. you can do this by adding a line df['date'] = pd.to_datetime(df['date']).
Then you will also need to specify the y axis (both ymin and ymax for an area plot) of your plot. This can be done by: ggplot(df, aes(x='date', ymin='0', ymax='volume')) + geom_area(). Hope this helps.

Categories

Resources