I have a networkx graph created from edges such as these:
user_id,edges
11011,"[[340, 269], [269, 340]]"
80973,"[[398, 279]]"
608473,"[[69, 28]]"
2139671,"[[382, 27], [27, 285]]"
3945641,"[[120, 422], [422, 217], [217, 340], [340, 340]]"
5820642,"[[458, 442]]"
Example
Where the edges are a user's movements between clusters, identified by their cluster label, e.g., [[340, 269], [269, 340]]. This represents a user's movement from cluster 340 to cluster 269 and then back to cluster 340. These clusters have coordinates, stored in another file, in the form of latitude and longitude, such as these:
cluster_label,latitude,longitude
0,39.18193382,-77.51885109
1,39.18,-77.27
2,39.17917928,-76.6688633
3,39.1782,-77.2617
4,39.1765,-77.1927
Is it possible to link the edges of my graph to their respective cluster in physical space using the node/cluster's lat/long and not in the abstract space of a graph? If so, how might I go about doing so? I would like to graph this on a map using a package such as mplleaflet (like shown here: http://htmlpreview.github.io/?https://github.com/jwass/mplleaflet/master/examples/readme_example.html) or directly into QGIS/ArcMap.
EDIT
I'm attempting to convert my csv with cluster centroid coordinates into a dictionary, however, I've run into several errors. Mainly, NetwotkXError: Node 0 has no position and IndexError: too many indices for array. Below is how I'm trying to convert to a dict and then graph with mplleaflet.
import csv
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
import time
import mplleaflet
g = nx.Graph()
# Set node positions as a dictionary
df = pd.read_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_centroids.csv', delimiter=',')
df.set_index('cluster_label', inplace=True)
dict_pos = df.to_dict(orient='index')
#print dict_pos
for row in csv.reader(open('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_edges.csv', 'r')):
if '[' in row[1]: #
g.add_edges_from(eval(row[1]))
# Plotting with matplotlib
#nx.draw(g, with_labels=True, alpha=0.15, arrows=True, linewidths=0.01, edge_color='r', node_size=250, node_color='k')
#plt.show()
# Plotting with mplleaflet
fig, ax = plt.subplots()
nx.draw_networkx_nodes(g,pos=dict_pos,node_size=10)
nx.draw_networkx_edges(g,pos=dict_pos,edge_color='gray', alpha=.1)
nx.draw_networkx_labels(g,dict_pos, label_pos =10.3)
mplleaflet.display(fig=ax.figure)
yes it is quite easily possible. Try something along this lines.
Create a dictionary, where the node (cluster_label) is the key and longitude latitude are saved as values in a list. I would use pd.read_csv() to read the csv and then use the df.to_dict() to create the dictionary. It should look like this for example:
dic_pos = {u'0': [-77.51885109, 39.18193382],
u'1': [-76.6688633, 39.18],
u'2': [-77.2617, 39.1791792],
u'3': [-77.1927, 39.1782],
.....
Then plotting the graph on a map is as easy as:
import mplleaflet
fig, ax = plt.subplots()
nx.draw_networkx_nodes(GG,pos=dic_pos,node_size=10,node_color='red',edge_color='k',alpha=.5, with_labels=True)
nx.draw_networkx_edges(GG,pos=dic_pos,edge_color='gray', alpha=.1)
nx.draw_networkx_labels(GG,pos=dic_pos, label_pos =10.3)
mplleaflet.display(fig=ax.figure)
If it does not produce the expected result try to reverse latitude,longitude.
Related
I am trying to create bipartite of certain nodes, for small numbers it looks perfectly fine:
Image for around 30 nodes
Unfortunately, this isn't the case for more nodes like this one:
Image for more nodes
My code for determining the position of each node looks something like this:
pos = {}
pos[SOURCE_STRING] = (0, width/2)
row = 0
for arr in left_side.keys():
pos[str(arr).replace(" ","")]=(NODE_SIZE, row)
row += NODE_SIZE
row = 0
for arr in right_side.keys():
pos[str(arr).replace(" ","")]=(2*NODE_SIZE,row)
row += NODE_SIZE
pos[SINK_STRING] = (3*NODE_SIZE, width/2)
return pos
And then I feed it to the DiGraph class:
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from(edges, len=1)
nx.draw(G, pos=pos ,node_shape = "s", with_labels = True,node_size=NODE_SIZE)
This doesn't make much sense since they should be in the same distance from each other since NODE_SIZE is constant it doesn't change for the rest of the program.
Following this thread:
Bipartite graph in NetworkX
Didn't help me either.
Can something be done about this?
Edit(Following Paul Brodersen Advice using netGraph:
Used this documentation: netgraph doc
And still got somewhat same results, such as:
netgraph try
Using edges and different positions, also played with node size, with no success.
Code:
netgraph.Graph(edges, node_layout='bipartite', node_labels=True)
plt.show()
In your netgraph call, you are not changing the node size.
My suggestion with 30 nodes:
import numpy as np
import matplotlib.pyplot as plt
from netgraph import Graph
edges = np.vstack([np.random.randint(0, 15, 60),
np.random.randint(16, 30, 60)]).T
Graph(edges, node_layout='bipartite', node_size=0.5, node_labels=True, node_label_offset=0.1, edge_width=0.1)
plt.show()
With 100 nodes:
import numpy as np
import matplotlib.pyplot as plt
from netgraph import Graph
edges = np.vstack([np.random.randint(0, 50, 200),
np.random.randint(51, 100, 200)]).T
Graph(edges, node_layout='bipartite', node_size=0.5, node_labels=True, node_label_offset=0.1, edge_width=0.1)
plt.show()
I'd like to know how to fill in a map of U.S. counties by value (i.e., a chloropleth map), using Python 3 and Cartopy, and I haven't yet found anything online to guide me in that. That filled value could be, for instance, highest recorded tornado rating (with counties left blank for no recorded tornadoes), or even something arbitrary such as whether I've visited (=1) or lived (=2) in the county. I found a helpful MetPy example to get the county boundaries on a map:
https://unidata.github.io/MetPy/latest/examples/plots/US_Counties.html
What I envision is somehow setting a list (or dictionary?) of county names to a certain value, and then each value would be assigned to a particular fill color. This is my current script, which generates a nice blank county map of the CONUS/lower 48 (though I'd eventually also like to add Alaska/Hawaii insets).
import cartopy
import cartopy.crs as ccrs
import matplotlib as mpl
import matplotlib.pyplot as plt
from metpy.plots import USCOUNTIES
plot_type = 'png'
borders = cartopy.feature.BORDERS
states = cartopy.feature.NaturalEarthFeature(category='cultural', scale='10m', facecolor='none', name='admin_1_states_provinces_lakes')
oceans = cartopy.feature.OCEAN
lakes = cartopy.feature.LAKES
mpl.rcParams['figure.figsize'] = (12,10)
water_color = 'lightblue'
fig = plt.figure()
ax = plt.axes(projection=ccrs.LambertConformal(central_longitude=-97.5, central_latitude=38.5, standard_parallels=(38.5,38.5)))
ax.set_extent([-120, -74, 23, 50], ccrs.Geodetic())
ax.coastlines()
ax.add_feature(borders, linestyle='-')
ax.add_feature(states, linewidth=0.50, edgecolor='black')
ax.add_feature(oceans, facecolor=water_color)
ax.add_feature(lakes, facecolor=water_color, linewidth=0.50, edgecolor='black')
ax.add_feature(USCOUNTIES.with_scale('500k'), linewidth=0.10, edgecolor='black')
plt.savefig('./county_map.'+plot_type)
plt.close()
Any ideas or tips on how to assign values to counties and fill them accordingly?
So Cartopy's shapereader.Reader can give you access to all of the records in the shapefile, including their attributes. Putting this together with MetPy's get_test_data to get access to the underlying shapefile you can get what you want, assuming you have a dataset that maps e.g. FIPSCODE to EF rating:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree())
cmap = plt.get_cmap('magma')
norm = plt.Normalize(0, 5)
# Fake tornado dataset with a value for each county code
tor_data = dict()
# This will only work (have access to the shapefile's database of
# attributes after it's been download by using `USCOUNTIES` or
# running get_test_data() for the .shx and .dbf files as well.
for rec in shpreader.Reader(get_test_data('us_counties_20m.shp',
as_file_obj=False)).records():
# Mimic getting data, but actually getting a random number
# GEOID seems to be the FIPS code
max_ef = tor_data.get(rec.attributes['GEOID'], np.random.randint(0, 5))
# Normalize the data to [0, 1] and colormap manually
color = tuple(cmap(norm(max_ef)))
# Add the geometry to the plot, being sure to specify the coordinate system
ax.add_geometries([rec.geometry], crs=ccrs.PlateCarree(), facecolor=color)
ax.set_extent((-125, -65, 25, 48))
That gives me:
I'm not sure about passing in a dict, but you can pass in a list to facecolor.
ax.add_feature(USCOUNTIES.with_scale('500k'), linewidth=0.10, edgecolor='black', facecolor=["red", "blue", "green"])
If you know how many counties there are you can make a list that long by:
import matplotlib.cm as cm
import numpy as np
number_of_counties = 3000
color_scale = list(cm.rainbow(np.linspace(0, 1, number_of_counties)))
ax.add_feature(USCOUNTIES.with_scale('500k'), linewidth=.10, edgecolor="black", facecolor=color_scale)
but they didn't make it easy to extract the names from USCOUNTIES. You can see where it is defined in your source code:
from metpy import plots
print(plots.__file__)
If you go inside the directory printed there is a file named cartopy_utils.py and inside the class definition for class MetPyMapFeature(Feature): you will see USCOUNTIES. You might have better luck than I did mapping county names to the geometric shapes.
EDIT: Also, I just used cm.rainbow as an example, you can choose from any color map https://matplotlib.org/stable/tutorials/colors/colormaps.html. Not sure if it even goes up to 3000, but you get the idea.
I have a dataframe that contains thousands of points with geolocation (longitude, latitude) for Washington D.C. The following is a snippet of it:
import pandas as pd
df = pd.DataFrame({'lat': [ 38.897221,38.888100,38.915390,38.895100,38.895100,38.901005,38.960491,38.996342,38.915310,38.936820], 'lng': [-77.031048,-76.898480,-77.021380,-77.036700,-77.036700 ,-76.990784,-76.862907,-77.028131,-77.010403,-77.184930]})
If you plot the points in the map you can see that some of them are clearly within some buildings:
import folium
wash_map = folium.Map(location=[38.8977, -77.0365], zoom_start=10)
for index,location_info in df.iterrows():
folium.CircleMarker(
location=[location_info["lat"], location_info["lng"]], radius=5,
fill=True, fill_color='red',).add_to(wash_map)
wash_map.save('example_stack.html')
import webbrowser
import os
webbrowser.open('file://'+os.path.realpath('example_stack.html'), new=2)
My goal is to exclude all the points that are within buildings. For that, I first download bounding boxes for the city buildings and then try to exclude points within those polygons as follows:
import osmnx as ox
#brew install spatialindex this solves problems in mac
%matplotlib inline
ox.config(log_console=True)
ox.__version__
tags = {"building": True}
gdf = ox.geometries.geometries_from_point([38.8977, -77.0365], tags, dist=1000)
gdf.shape
For computational simplicity I have requested the shapes of all buildings around the White house with a radius of 1 km. On my own code I have tried with bigger radiuses to make sure all the buildings are included.
In order to exclude points within the polygons I developed the following function (which includes the shape obtention):
def buildings(df,center_point,dist):
import osmnx as ox
#brew install spatialindex this solves problems in mac
%matplotlib inline
ox.config(log_console=True)
ox.__version__
tags = {"building": True}
gdf = ox.geometries.geometries_from_point(center_point, tags,dist)
from shapely.geometry import Point,Polygon
# Next step is to put our coordinates in the correct shapely format: remember to run the map funciton before
#df['within_building']=[]
for point in range(len(df)):
if gdf.geometry.contains(Point(df.lat[point],df.lng[point])).all()==False:
df['within_building']=False
else :
df['within_building']=True
buildings(df,[38.8977, -77.0365],1000)
df['within_building'].all()==False
The function always returns that points are outside building shapes although you can clearly see in the map that some of them are within. I don't know how to plot the shapes over my map so I am not sure if my polygons are correct but for the coordinates they appear to be so. Any ideas?
The example points you provided don't seem to fall within those buildings' footprints. I don't know what your points' coordinate reference system is, so I guessed EPSG4326. But to answer your question, here's how you would exclude them, resulting in gdf_points_not_in_bldgs:
import geopandas as gpd
import matplotlib.pyplot as plt
import osmnx as ox
import pandas as pd
# the coordinates you provided
df = pd.DataFrame({'lat': [38.897221,38.888100,38.915390,38.895100,38.895100,38.901005,38.960491,38.996342,38.915310,38.936820],
'lng': [-77.031048,-76.898480,-77.021380,-77.036700,-77.036700 ,-76.990784,-76.862907,-77.028131,-77.010403,-77.184930]})
# create GeoDataFrame of point geometries
geom = gpd.points_from_xy(df['lng'], df['lat'])
gdf_points = gpd.GeoDataFrame(geometry=geom, crs='epsg:4326')
# get building footprints
tags = {"building": True}
gdf_bldgs = ox.geometries_from_point([38.8977, -77.0365], tags, dist=1000)
gdf_bldgs.shape
# get all points that are not within a building footprint
mask = gdf_points.within(gdf_bldgs.unary_union)
gdf_points_not_in_bldgs = gdf_points[~mask]
print(gdf_points_not_in_bldgs.shape) # (10, 1)
# plot buildings and points
ax = gdf_bldgs.plot()
ax = gdf_points.plot(ax=ax, c='r')
plt.show()
# zoom in to see better
ax = gdf_bldgs.plot()
ax = gdf_points.plot(ax=ax, c='r')
ax.set_xlim(-77.04, -77.03)
ax.set_ylim(38.89, 38.90)
plt.show()
in geopandas I use this code to create centroid parameter from geometric parameter.
df["center"]=df.centroid
I want to force the calculation of the centroids to be within the polygon.
here i found somthing in R. can I do it in python?
Calculate Centroid WITHIN / INSIDE a SpatialPolygon
To get the representative points that always fall within their corresponding polygons can be done in geopandas with the function called representative_point(). Here is a demo code that creates and plots the polygons and their rep. points.
import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from shapely.wkt import loads
#Create some test data
d = {'col1': [1,2],
'wkt': [
'POLYGON ((700000 5500000, 700000 5600000, 800000 5600000, 800000 5500000, 700000 5500000))',
"""POLYGON ((1441727.5096940901130438 6550163.0046194596216083,
1150685.2609429201111197 6669225.7427449300885201,
975398.4520359700545669 6603079.7771196700632572,
866257.6087542800232768 6401334.5819626096636057,
836491.9242229099618271 6106985.0349301798269153,
972091.1537546999752522 5835786.5758665995672345,
1547561.0546945100650191 5782869.8033663900569081,
1408654.5268814601004124 5600968.3978968998417258,
720736.4843787000281736 5663807.0652409195899963,
598366.4479719599476084 6001151.4899297598749399,
654590.5187534400029108 6341803.2128998702391982,
869564.9070355399744585 6784981.1825891500338912,
1451649.4045378800947219 6788288.4808704098686576,
1441727.5096940901130438 6550163.0046194596216083))"""
]
}
df = pd.DataFrame( data=d )
gdf = gpd.GeoDataFrame(df, \
crs={'init': 'epsg:3857'}, \
geometry=[loads(pgon) for pgon in df.wkt])
gdf4326 = gdf.to_crs(4326) #coordinates in plain degrees
# create 2nd geometry column, for representative points
gdf4326["geometry2"] = gdf4326.representative_point()
# plot all layers of geometries
ax1 = gdf4326.plot(color="gray", alpha=0.5) # the polygons
gdf4326.set_geometry('geometry2').plot(zorder=10, color='red', ax=ax1) # the rep_points
The following code draws the cdf for datetime values:
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import numpy as np; np.random.seed(42)
import pandas as pd
objDate = dates.num2date(np.random.normal(735700, 300, 700))
ser = pd.Series(objDate)
ax = ser.hist(cumulative=True, density=1, bins=500, histtype='step')
plt.show()
How can I remove the vertical line at the right-most end of graph? The approach mentioned here doesn't work as replacing line#9 with:
ax = ser.hist(cumulative=True, density=1, bins=sorted(objDate)+[np.inf], histtype='step')
gives
TypeError: can't compare datetime.datetime to float
The CDF is actually drawn as a polygon, which in matplotlib is defined by a path. A path is in turn defined by vertices (where to go) and codes (how to get there). The docs say that we should not directly alter these attributes, but we can make a new polygon derived from the old one that suits our needs.
poly = ax.findobj(plt.Polygon)[0]
vertices = poly.get_path().vertices
# Keep everything above y == 0. You can define this mask however
# you need, if you want to be more careful in your selection.
keep = vertices[:, 1] > 0
# Construct new polygon from these "good" vertices
new_poly = plt.Polygon(vertices[keep], closed=False, fill=False,
edgecolor=poly.get_edgecolor(),
linewidth=poly.get_linewidth())
poly.set_visible(False)
ax.add_artist(new_poly)
plt.draw()
You should arrive at something like the figure below: