I have a pandas.core.frame.DataFrame with many attributes. I would like to convert the DF to a GDF and export as a geojson. I have columns 'geometry.type' and 'geometry.coordinates' - both are pandas.core.series.Series. An example exerpt is below - note that geometry.coordinates has a list in it
geometry.type
geometry.coordinates
MultiLineString
[[[-74.07224, 40.64417], [-74.07012, 40.64506], [-74.06953, 40.64547], [-74.03249, 40.68565], [-74.01335, 40.69824], [-74.0128, 40.69866], [-74.01265, 40.69907], [-74.01296, 40.70048]], [[-74.01296, 40.70048], [-74.01265, 40.69907], [-74.0128, 40.69866], [-74.01335, 40.69824], [-74.03249, 40.68565], [-74.06953, 40.64547], [-74.07012, 40.64506], [-74.07224, 40.64417]]]
I would like concatenate the two for a proper geometry column in order to export the data as a geojson
Taking your previous question as well
pandas json_normalize() can be used to create a dataframe from the JSON source. This also expands out the nested dicts
it's then a simple case of selecting out columns you want as properties (have renamed as well)
build geometry from geometry.coordinates
import urllib.request, json
import pandas as pd
import geopandas as gpd
import shapely.geometry
with urllib.request.urlopen(
"https://transit.land/api/v2/rest/routes.geojson?operator_onestop_id=o-9q8y-sfmta&api_key=LsyqCJs5aYI6uyxvUz1d0VQQLYoDYdh4&l&"
) as url:
data = json.loads(url.read())
df = pd.json_normalize(data["features"])
# use just attributes that were properties in input that is almost geojson
gdf = gpd.GeoDataFrame(
data=df.loc[:, [c for c in df.columns if c.startswith("properties.")]].pipe(
lambda d: d.rename(columns={c: ".".join(c.split(".")[1:]) for c in d.columns})
),
# build geometry from the co-rodinates
geometry=df["geometry.coordinates"].apply(shapely.geometry.MultiLineString),
crs="epsg:4386",
)
gdf
Related
I have a dataframe in which geormtry column is MultiLineString. I need to convert it to LineString to use it in networkx. I used the following code from this question to work with shapefile. But I do not want to create a shapefile from my layer and then convert it. I would like to do it directly in my dataframe! But I cannot get it to work.
with fiona.open('hin_centreline.shp') as source:
with fiona.open('line.shp','w', driver='ESRI Shapefile',
crs=source.crs,schema=source.schema) as ouput:
for elem in source:
reconstruct = shape(elem['geometry'])
if elem['geometry']['type'] == 'MultiLineString':
for line in reconstruct:
ouput.write({'geometry':mapping(line),'properties':elem['properties']})
elif elem['geometry']['type'] == 'LineString':
ouput.write({'geometry':mapping(reconstruct),'properties':elem['properties']})
I am having issues adding tooltips to my folium.features.GeoJson. I can't get columns to display from the dataframe when I select them.
feature = folium.features.GeoJson(df.geometry,
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= [df.acquired],aliases=["Time"],labels=True))
ax.add_child(feature)
For some reason when I run the code above it responds with
Name: acquired, Length: 100, dtype: object is not available in the data. Choose from: ().
I can't seem to link the data to my tooltip.
have made your code a MWE by including some data
two key issues with your code
need to pass properties not just geometry to folium.features.GeoJson() Hence passed df instead of df.geometry
folium.GeoJsonTooltip() takes a list of properties (columns) not an array of values. Hence passed ["acquired"] instead of array of values from a dataframe column
implied issue with your code. All dataframe columns need to contain values that can be serialised to JSON. Hence conversion of acquired to string and drop()
import geopandas as gpd
import pandas as pd
import shapely.wkt
import io
import folium
df = pd.read_csv(io.StringIO("""ref;lanes;highway;maxspeed;length;name;geometry
A3015;2;primary;40 mph;40.68;Rydon Lane;MULTILINESTRING ((-3.4851169 50.70864409999999, -3.4849879 50.7090007), (-3.4857269 50.70693379999999, -3.4853034 50.7081574), (-3.488620899999999 50.70365289999999, -3.4857269 50.70693379999999), (-3.4853034 50.7081574, -3.4851434 50.70856839999999), (-3.4851434 50.70856839999999, -3.4851169 50.70864409999999))
A379;3;primary;50 mph;177.963;Rydon Lane;MULTILINESTRING ((-3.4763853 50.70886769999999, -3.4786112 50.70811229999999), (-3.4746017 50.70944449999999, -3.4763853 50.70886769999999), (-3.470350900000001 50.71041779999999, -3.471219399999999 50.71028909999998), (-3.465049699999999 50.712158, -3.470350900000001 50.71041779999999), (-3.481215600000001 50.70762499999999, -3.4813909 50.70760109999999), (-3.4934747 50.70059599999998, -3.4930204 50.7007898), (-3.4930204 50.7007898, -3.4930048 50.7008015), (-3.4930048 50.7008015, -3.4919513 50.70168349999999), (-3.4919513 50.70168349999999, -3.49137 50.70213669999998), (-3.49137 50.70213669999998, -3.4911565 50.7023015), (-3.4911565 50.7023015, -3.4909108 50.70246919999999), (-3.4909108 50.70246919999999, -3.4902349 50.70291189999999), (-3.4902349 50.70291189999999, -3.4897693 50.70314579999999), (-3.4805021 50.7077218, -3.4806265 50.70770150000001), (-3.488620899999999 50.70365289999999, -3.4888806 50.70353719999999), (-3.4897693 50.70314579999999, -3.489176800000001 50.70340539999999), (-3.489176800000001 50.70340539999999, -3.4888806 50.70353719999999), (-3.4865751 50.70487679999999, -3.4882604 50.70375799999999), (-3.479841700000001 50.70784459999999, -3.4805021 50.7077218), (-3.4882604 50.70375799999999, -3.488620899999999 50.70365289999999), (-3.4806265 50.70770150000001, -3.481215600000001 50.70762499999999), (-3.4717096 50.71021009999998, -3.4746017 50.70944449999999), (-3.4786112 50.70811229999999, -3.479841700000001 50.70784459999999), (-3.471219399999999 50.71028909999998, -3.4717096 50.71021009999998))"""),
sep=";")
df = gpd.GeoDataFrame(df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326")
df["acquired"] = pd.date_range("8-feb-2022", freq="1H", periods=len(df))
def style_function(x):
return {"color":"blue", "weight":3}
ax = folium.Map(
location=[sum(df.total_bounds[[1, 3]]) / 2, sum(df.total_bounds[[0, 2]]) / 2],
zoom_start=12,
)
# data time is not JSON serializable...
df["tt"] = df["acquired"].dt.strftime("%Y-%b-%d %H:%M")
feature = folium.features.GeoJson(df.drop(columns="acquired"),
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= ["tt"],aliases=["Time"],labels=True))
ax.add_child(feature)
I need to calculate a moment (Mz,correct) given by the sum of a moment (Mz) and a force (Fx) multiplied by its arm (300.56) because I need to make a change of reference system and mobilize everything on the new system of reference. This is the script I tried to write that Fx and Mz the same starting file (.dat):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#-----------------------input file-------------------------------
filename = 'drag_time_series' #nome file sorgente
# specifica percorso file DAT
df = pd.read_csv(rf"C:\Users\suemack528\Desktop\OneDrive - Università degli Studi di Padova\deme\unipd\magistrale\TESI\impalcato\drag\{filename}.dat",
header=1, delim_whitespace=True)
df = df.round(decimals=3)
#-----for "cycle" to obtain correct moment-----
for i in range(len('Mz')):
Mz,correct[i]=df('Mz') + df('Fx')*300.56
I think that is not correct. How can I write this script better? I'm doing it with spyder
error obtained
To get a column from a pandas dataframe you can use [] (instead of ()). For more information regarding the use of indexes and selecting data from a dataframe you can check the pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics
I pull the data from the census api using the census wrapper, i would like to filter that data out with a list of zips i compiled.
So i am trying to filter the data from a pull request data of the census. I Have a csv file of the zip i want to use and i have it put into a list already. I have tried a few things such as putting the census in a data frame and trying to filter the zipcode column by my list but i dont think my syntax is correct.
this is just the test data i pulled,
census_data = c.acs5.get(('NAME', 'B25034_010E'),
{'for': 'zip code tabulation area:*'})
census_pd = census_pd.rename(columns={"NAME": "Name", "zip code tabulation area": "Zipcode"})
censusfilter = census_pd['Zipcode'==ziplst]
so i tried this way, and also i tried a for loop where i take census_pd['Zipcode'] and a inner for loop to iterate over the list with a if statement like zip1 == zip2 append to a list.
my dependencys
# Dependencies
import pandas as pd
import requests
import json
import pprint
import numpy as np
import matplotlib.pyplot as plt
import requests
from census import Census
import gmaps
from us import states
# Census & gmaps API Keys
from config import (api_key, gkey)
c = Census(api_key, year=2013)
# Configure gmaps
gmaps.configure(api_key=gkey)
as mentioned i want to filter out whatever data i may pull from the census data specific to the zipcodes i use
It's not clear how your data looks like. I am guessing that you have a scalar column and you want to filter that column using a list. If it is the question then you can use isin built in method to filter the dataframe.
import pandas as pd
data = {'col': [2, 3, 4], 'col2': [1, 2, 3], 'col3': ["asd", "ads", "asdf"]}
df = pd.DataFrame.from_dict(data)
random_list = ["asd", "ads"]
df_filtered = df[df["col3"].isin(random_list)]
The sample data isn't very clear, so below is how to filter a dataframe on a column using a list of values to filter by
import pandas as pd
from io import StringIO
# Example data
df = pd.read_csv(StringIO(
'''zip,some_column
"01234",A1
"01234",A2
"01235",A3
"01236",B1
'''), dtype = {"zip": str})
zips_list = ["01234", "01235"]
# using a join
zips_df = pd.DataFrame({"zip": zips_list})
df1 = df.merge(zips_df, how='inner', on='zip')
print(df1)
# using query
df2 = df.query('zip in #zips_list')
print(df2)
# using an index
df.set_index("zip", inplace=True)
df3=df.loc[zips_list]
print(df3)
Output in all cases:
zip some_column
0 01234 A1
1 01234 A2
2 01235 A3
I have a dataframe in rpy2 in python and I want to pull out columns from it. What is the rpy2 equivalent of this R code?
df[,c("colA", "colC")]
this works to get the first column:
mydf.rx(1)
but how can I pull a set of columns, e.g. the 1st, 3rd and 5th?
mydf.rx([1,3,5])
does not work. neither does:
mydf.rx(rpy2.robjects.r.c([1,3,5]))
Alternatively, you can pass the R data frame into a Python pandas data frame and subset your resulting 1, 3, 5 columns:
#!/usr/bin/python
import rpy2
import rpy2.robjects as ro
import pandas as pd
import pandas.rpy.common as com
# SOURCE R SCRIPT INSIDE PYTHON
ro.r.source('C:\\Path\To\R script.R')
# DEFINE PYTHON DF AS R DF
pydf = com.load_data('rdf')
cols = pydf[[1,3,5]]
I think the answer is:
# cols to select
c = rpy2.robjects.IntVector((1,3))
# selection from df
mydf.rx(True, c)
The best possible way that I found is by doing this simple thing:
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
import rpy2.robjects as robjects
dataframe = robjects.r('data.frame')
df_rpy2 = dataframe([1,2,],[5,6])
df_pd = pd.DataFrame({'A': [1,2], 'B': [5,6]})
base = importr('base') #Creates an instance of R's base package
pandas2ri.activate() #Converts any pandas dataframe to R equivalent
base.colnames(df_pd) #Finds the column names of the dataframe df_pd
base.colnames(df_rpy2) #Finds the column names of the dataframe df_rpy2
The output is:
R object with classes: ('character',) mapped to:
<StrVector - Python:0x7fa3504d3048 / R:0x10f65ac0>
['X1L', 'X2L', 'X5L', 'X6L']
R object with classes: ('character',) mapped to:
<StrVector - Python:0x7fa352493548 / R:0x103b6e40>
['A', 'B']
This works for both the dataframes created using pandas & rpy2. Hope this helps!