How do I draw a area plot in ggplot with timeseries data? - python

I'm trying to post a graph like this.
My data set looks like this. It has two columns. The first is the date and the second is the total number:
date volume
3/21/16 280
3/20/16 279
3/18/16 278
3/4/16 277
I am at a loss on how to make the graph from the link work with my data set. Thank you so much.
# Import required modules
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as pyplot
import ggplot
# Data
data = pd.read_csv("niagra-falls-escape.csv") # Read CSV
df = pd.DataFrame(data)
# Viz
ggplot(df, aes(x='date')) + \
geom_area()</code>

There are a couple issues here. First aes, geom_area etc, are classes of the ggplot module. Thus as in the referenced post they import via from ggplot import * instead of import ggplot. What I would recommend for easier debugging and maintainable code is to do from ggplot import ggplot, aes, geom_area.
Then there are a couple issues with your code. I think you need to specify that the date is a datetime type of data. you can do this by adding a line df['date'] = pd.to_datetime(df['date']).
Then you will also need to specify the y axis (both ymin and ymax for an area plot) of your plot. This can be done by: ggplot(df, aes(x='date', ymin='0', ymax='volume')) + geom_area(). Hope this helps.

Related

Is there a way to plot lines over a datashader plot in Bokeh (Python)?

I am working with relatively large datasets (approximately 10x20.000.000 data point), for which Datashader is a useful visualisation tool. To give more information in these visualisations, I would like to add lines showing averages/standarddeviations on top of this datashade figure. Does anyone know how this would be possible?
My current code:
from bokeh.plotting import figure
from bokeh.io import show
x = 'xcol'
y= 'ycol'
data = dataframe
fig = figure(x_axis_label=x, y_axis_label=y)
points = hv.Points(data[[x, y]], label=('Title'))
hd.datashade(points, cmap='crest')
What I would like to do is for example add the following line to the figure generated with the code above:
fig.line([1,10,20], [0, 1000,2000], line_width=4)
Thanks in advance.

How to plot addresses (Lat/Long) from a csv on JSON map.?

So I am trying to do something which seems relatively simple but is proving incredibly difficult. I have a .csv file with addresses and their correspondent latitude/longitude, I just want to plot those on a California JSON map like this one in python:
https://github.com/deldersveld/topojson/blob/master/countries/us-states/CA-06-california-counties.json
I've tried bubble maps, scatter maps, etc. but to no luck I keep getting all kind of errors :(. This is the closest I've got, but that uses a world map and can't zoom in effectively like that json map up there. I am still learning python so please go easy on me ><
import plotly.express as px
import pandas as pd
df = pd.read_csv(r"C:\Users\FT4\Desktop\FT Imported Data\Calimapdata.csv")
fig = px.scatter_geo(df,lat='Latitude',lon='Longitude', hover_name="lic_type", scope="usa")
fig.update_layout(title = 'World map', title_x=0.5)
fig.show()
If anyone could help me with this I would appreciate it. Thank you
your example link is just a GeoJSON geometry definition. Are you talking about a Choropleth?
If so, check out geopandas - you should be able to link your data to the polygons in the shape definition you linked to by reading it in with geojson and then joining on the shapes with sjoin. Once you have data tied to each geometry, you can plot with geopandas's .plot method. Check out the user guide for more info.
Something along these lines should work:
import geopandas as gpd, pandas as pd
geojson_raw_url = (
"https://raw.githubusercontent.com/deldersveld/topojson/"
"master/countries/us-states/CA-06-california-counties.json"
)
gdf = gpd.read_file(geojson_raw_url, engine="GeoJSON")
df = pd.read_csv(r"C:\Users\FT4\Desktop\FT Imported Data\Calimapdata.csv")
merged = gpd.sjoin(gdf, df, how='right')
# you could plot this directly with geopandas
merged.plot("lic_type")
alternatively, using #r-beginners' excellent answer to another question, we can plot with express:
fig = px.choropleth(merged, geojson=merged.geometry, locations=merged.index, color="lic_type")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

pandas DataFrame.plot() method

I'm new to data science and trying some python libraries. I know it sound a bit silly but I'm confused with the code below, which i found on the pandas docs. I'm assuming that 'ts' is a pd obj, but how exactly a pd object can use matplotlib method here? What's the connection between pandas and matplotlib? Can someone explain that to me, thank you.
In [3]: ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000))
In [4]: ts = ts.cumsum()
In [5]: ts.plot()
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7fa17967caf0>`
Matplotlib is a library that makes it easy to generate plots in Python. Pandas is a library that helps you perform vector and matrix operations in Python.
According to the Pandas docs:
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
So the only connection between Pandas and Matplotlib is that Pandas uses Matplotlib to generate the plot for you.
If you want to see that plot, you have to add a couple of extra lines:
import matplotlib.pyplot as plt
plt.show()

Python plotly choropleth does not work with geoJSONs

I am trying to use plotly choropleth to draw the map, lets say for a random variable of num for each of the feature regions of the map in Italy. However, it does not work. below is the code that I use:
I have downloaded the GeoJson files for Italy from here.
import random
import pandas as pd
import plotly.express as px
import plotly.io as pio
import json
pio.renderers.default='browser'
with open('it-all.geo.json') as f:
geojson = json.load(f)
n_provinces = len(geojson['features'])
province_names = [geojson['features'][k]['properties']['name'] for k in range(n_provinces)]
randomlist = []
for i in range(0,110):
n = random.randint(1,30)
randomlist.append(n)
datadata = pd.DataFrame({'province':province_names, 'num':randomlist})
fig = px.choropleth(datadata, geojson=geojson, color="num",
locations="province", featureidkey="properties.name",
color_continuous_scale="Viridis")
fig.show()
What I am getting is a mixed shape map as below, can anyone please let me know what I am doing wrong, thanks!!
I tried doing the same thing with data from my country and had the same issues. I think that this data might not be readable by plotly. If you look at the website's demos for their maps, there are several javascript scripts running in order to create the maps. It's possible that they've put their geojson into a custom format so that you have to use their javascript services in order to create a comprehensible map.
I later found a different set of data, and was able to easily create a chorpleth map with plotly using the exact same code that didn't work with the original data. Hopefully you found a different dataset that you could use. Oftentimes governments will provide open data about census areas, province/state borders, etc.

Programmatically making and saving plots in (I)python without rendering them on the screen first

Here's a dummy script that makes three plots and saves them to PDF.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":np.random.normal(100),
"B":np.random.chisquare(5, size = 100),
"C":np.random.gamma(5,size = 100)})
for i in df.columns:
plt.hist(df[i])
plt.savefig(i+".pdf", format = "pdf")
plt.close()
I'm using spyder, which uses IPython. When I run this script, three windows pop at me and then go away. It works, but it's a little annoying.
How can I make the figures get saved to pdf without ever being rendered on my screen?
I'm looking for something like R's
pdf("path/to/plot/name.pdf")
commands
dev.off()
inasmuch as nothing gets rendered on the screen, but the pdf gets saved.
Aha. Partially based on the duplicate suggestion (which wasn't exactly a duplicate), this works:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":np.random.normal(100),
"B":np.random.chisquare(5, size = 100),
"C":np.random.gamma(5,size = 100)})
import matplotlib
old_backend = matplotlib.get_backend()
matplotlib.use("pdf")
for i in df.columns:
plt.hist(df[i])
plt.savefig(i+".pdf", format = "pdf")
plt.close()
matplotlib.use(old_backend)
Basically, set the backend to something like a pdf device, and then set it back to whatever you're accustomed to.
I am referring you to this StackOverflow answer which cites this article as an answer. In the SO answer they also suggest plt.ioff() but are concerned that it could disable other functionality should you want it.

Categories

Resources