Related
I have a .csv file that contains taxi trips, and one column is called trip_coordinates stored as strings, for example one trip coordinates would look like this (stored as string!):
[[40.7457407, -73.9781134], [40.7464087, -73.9797169], [40.7457353, -73.9801966], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7539937, -73.9741902], [40.753367, -73.974648], [40.754351, -73.9769749], [40.7547351, -73.9778672], [40.7554134, -73.9794895], [40.7547828, -73.9799429], [40.7451552, -73.9826672], [40.7457757, -73.9822189], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7552761, -73.9752669], [40.755903, -73.9748081], [40.756526, -73.974356], [40.7565994, -73.9745281], [40.7572359, -73.9760484], [40.7578582, -73.975593], [40.7584878, -73.9751336], [40.7591136, -73.9746825], [40.7597325, -73.974231], [40.7603711, -73.9737664], [40.7609986, -73.9733102]]
Using those coordinates I was able to create a LINESTRING and stored it back into the original .csv file in a columns called route_linestring by doing the following:
def convert_to_lineString(batch):
batch_trips = pd.read_csv('batch.csv')
for index, row in batch_trips.iterrows():
if row['selected_distance'] != -100:
temp = row['trip_route'].split(',')
pnts_array = []
for item in range(0,len(temp)):
if item % 2 == 0:
# string manipulation to extract points
x = temp[item].replace('[','')
y = temp[item+1].replace(']','')
pnt = Point(float(x), float(y))
pnts_array.append(pnt)
line = LineString(pnts_array)
print('line:', line)
batch_trips.at[index, 'route_linestring'] = line
batch_trips.to_csv('batch.csv')
convert_to_lineString(1, 1)
The above array or coordinates would look like this now:
LINESTRING (40.7457407 -73.9781134, 40.7464087 -73.9797169, 40.7457353 -73.9801966, 40.7463887 -73.9817513, 40.7508351 -73.9785736, 40.7509627 -73.9785244, 40.7521935 -73.9776193, 40.7546355 -73.9757004, 40.7539937 -73.9741902, 40.753367 -73.974648, 40.754351 -73.9769749, 40.7547351 -73.9778672, 40.7554134 -73.9794895, 40.7547828 -73.9799429, 40.7451552 -73.9826672, 40.7457757 -73.9822189, 40.7463887 -73.9817513, 40.7508351 -73.9785736, 40.7509627 -73.9785244, 40.7521935 -73.9776193, 40.7546355 -73.9757004, 40.7552761 -73.9752669, 40.755903 -73.9748081, 40.756526 -73.974356, 40.7565994 -73.9745281, 40.7572359 -73.9760484, 40.7578582 -73.975593, 40.7584878 -73.9751336, 40.7591136 -73.9746825, 40.7597325 -73.974231, 40.7603711 -73.9737664, 40.7609986 -73.9733102)
I need help to save the column route_linestring in a separate shape file as well as a separate .osm file please?
I would approach this problem by reading the csv as a geopandas.GeoDataFrame and use a combination of json.loads and shapely.LineString to convert the string coordinates to a geometry. Then you can use .to_file to save the geodataframe as a shapefile. Finally, I would use ogr2osm to create the osm file from the newly created shapefile.
example.csv:
label,trip_route
feature 1,"[[40.7457407, -73.9781134], [40.7464087, -73.9797169], [40.7457353, -73.9801966], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7539937, -73.9741902], [40.753367, -73.974648], [40.754351, -73.9769749], [40.7547351, -73.9778672], [40.7554134, -73.9794895], [40.7547828, -73.9799429], [40.7451552, -73.9826672], [40.7457757, -73.9822189], [40.7463887, -73.9817513], [40.7508351, -73.9785736], [40.7509627, -73.9785244], [40.7521935, -73.9776193], [40.7546355, -73.9757004], [40.7552761, -73.9752669], [40.755903, -73.9748081], [40.756526, -73.974356], [40.7565994, -73.9745281], [40.7572359, -73.9760484], [40.7578582, -73.975593], [40.7584878, -73.9751336], [40.7591136, -73.9746825], [40.7597325, -73.974231], [40.7603711, -73.9737664], [40.7609986, -73.9733102]]"
code:
import json
import geopandas as gpd
import ogr2osm
from shapely import LineString
# Load csv as GeoDataFrame
df = gpd.read_file('example.csv')
# Convert coordinate string to geometry
df.geometry = df.trip_route.apply(lambda x: LineString(json.loads(x)))
# Export to shapefile
df.to_file('example.shp')
# Use ogr2osm to convert shapefile to osm file
translation_object = ogr2osm.TranslationBase()
datasource = ogr2osm.OgrDatasource(translation_object)
datasource.open_datasource('example.shp')
osmdata = ogr2osm.OsmData(translation_object)
osmdata.process(datasource)
datawriter = ogr2osm.OsmDataWriter('example.osm')
osmdata.output(datawriter)
I have a pandas.core.frame.DataFrame with many attributes. I would like to convert the DF to a GDF and export as a geojson. I have columns 'geometry.type' and 'geometry.coordinates' - both are pandas.core.series.Series. An example exerpt is below - note that geometry.coordinates has a list in it
geometry.type
geometry.coordinates
MultiLineString
[[[-74.07224, 40.64417], [-74.07012, 40.64506], [-74.06953, 40.64547], [-74.03249, 40.68565], [-74.01335, 40.69824], [-74.0128, 40.69866], [-74.01265, 40.69907], [-74.01296, 40.70048]], [[-74.01296, 40.70048], [-74.01265, 40.69907], [-74.0128, 40.69866], [-74.01335, 40.69824], [-74.03249, 40.68565], [-74.06953, 40.64547], [-74.07012, 40.64506], [-74.07224, 40.64417]]]
I would like concatenate the two for a proper geometry column in order to export the data as a geojson
Taking your previous question as well
pandas json_normalize() can be used to create a dataframe from the JSON source. This also expands out the nested dicts
it's then a simple case of selecting out columns you want as properties (have renamed as well)
build geometry from geometry.coordinates
import urllib.request, json
import pandas as pd
import geopandas as gpd
import shapely.geometry
with urllib.request.urlopen(
"https://transit.land/api/v2/rest/routes.geojson?operator_onestop_id=o-9q8y-sfmta&api_key=LsyqCJs5aYI6uyxvUz1d0VQQLYoDYdh4&l&"
) as url:
data = json.loads(url.read())
df = pd.json_normalize(data["features"])
# use just attributes that were properties in input that is almost geojson
gdf = gpd.GeoDataFrame(
data=df.loc[:, [c for c in df.columns if c.startswith("properties.")]].pipe(
lambda d: d.rename(columns={c: ".".join(c.split(".")[1:]) for c in d.columns})
),
# build geometry from the co-rodinates
geometry=df["geometry.coordinates"].apply(shapely.geometry.MultiLineString),
crs="epsg:4386",
)
gdf
I am having issues adding tooltips to my folium.features.GeoJson. I can't get columns to display from the dataframe when I select them.
feature = folium.features.GeoJson(df.geometry,
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= [df.acquired],aliases=["Time"],labels=True))
ax.add_child(feature)
For some reason when I run the code above it responds with
Name: acquired, Length: 100, dtype: object is not available in the data. Choose from: ().
I can't seem to link the data to my tooltip.
have made your code a MWE by including some data
two key issues with your code
need to pass properties not just geometry to folium.features.GeoJson() Hence passed df instead of df.geometry
folium.GeoJsonTooltip() takes a list of properties (columns) not an array of values. Hence passed ["acquired"] instead of array of values from a dataframe column
implied issue with your code. All dataframe columns need to contain values that can be serialised to JSON. Hence conversion of acquired to string and drop()
import geopandas as gpd
import pandas as pd
import shapely.wkt
import io
import folium
df = pd.read_csv(io.StringIO("""ref;lanes;highway;maxspeed;length;name;geometry
A3015;2;primary;40 mph;40.68;Rydon Lane;MULTILINESTRING ((-3.4851169 50.70864409999999, -3.4849879 50.7090007), (-3.4857269 50.70693379999999, -3.4853034 50.7081574), (-3.488620899999999 50.70365289999999, -3.4857269 50.70693379999999), (-3.4853034 50.7081574, -3.4851434 50.70856839999999), (-3.4851434 50.70856839999999, -3.4851169 50.70864409999999))
A379;3;primary;50 mph;177.963;Rydon Lane;MULTILINESTRING ((-3.4763853 50.70886769999999, -3.4786112 50.70811229999999), (-3.4746017 50.70944449999999, -3.4763853 50.70886769999999), (-3.470350900000001 50.71041779999999, -3.471219399999999 50.71028909999998), (-3.465049699999999 50.712158, -3.470350900000001 50.71041779999999), (-3.481215600000001 50.70762499999999, -3.4813909 50.70760109999999), (-3.4934747 50.70059599999998, -3.4930204 50.7007898), (-3.4930204 50.7007898, -3.4930048 50.7008015), (-3.4930048 50.7008015, -3.4919513 50.70168349999999), (-3.4919513 50.70168349999999, -3.49137 50.70213669999998), (-3.49137 50.70213669999998, -3.4911565 50.7023015), (-3.4911565 50.7023015, -3.4909108 50.70246919999999), (-3.4909108 50.70246919999999, -3.4902349 50.70291189999999), (-3.4902349 50.70291189999999, -3.4897693 50.70314579999999), (-3.4805021 50.7077218, -3.4806265 50.70770150000001), (-3.488620899999999 50.70365289999999, -3.4888806 50.70353719999999), (-3.4897693 50.70314579999999, -3.489176800000001 50.70340539999999), (-3.489176800000001 50.70340539999999, -3.4888806 50.70353719999999), (-3.4865751 50.70487679999999, -3.4882604 50.70375799999999), (-3.479841700000001 50.70784459999999, -3.4805021 50.7077218), (-3.4882604 50.70375799999999, -3.488620899999999 50.70365289999999), (-3.4806265 50.70770150000001, -3.481215600000001 50.70762499999999), (-3.4717096 50.71021009999998, -3.4746017 50.70944449999999), (-3.4786112 50.70811229999999, -3.479841700000001 50.70784459999999), (-3.471219399999999 50.71028909999998, -3.4717096 50.71021009999998))"""),
sep=";")
df = gpd.GeoDataFrame(df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326")
df["acquired"] = pd.date_range("8-feb-2022", freq="1H", periods=len(df))
def style_function(x):
return {"color":"blue", "weight":3}
ax = folium.Map(
location=[sum(df.total_bounds[[1, 3]]) / 2, sum(df.total_bounds[[0, 2]]) / 2],
zoom_start=12,
)
# data time is not JSON serializable...
df["tt"] = df["acquired"].dt.strftime("%Y-%b-%d %H:%M")
feature = folium.features.GeoJson(df.drop(columns="acquired"),
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= ["tt"],aliases=["Time"],labels=True))
ax.add_child(feature)
I am working with Open Street Map data that I download as a dataframe through Overpass as a GEOJSON.
While I am able to filter my data based on tags and subtags like so:
gdf_b = gdf_b.loc[(gdf_b['highway'] != 'service')]
I couldn't figure out the exact command to remove specific rows of a geodataframe that have a particular geometry type (like a point)
So I am looking for something like:
gdf_b = gdf_b.loc[(gdf_b['geometry'].type != 'Point')]
You could apply and lambda
gdf_b = gdf_b[gdf_b['geometry'].apply(lambda x : x.type!='Point' )]
This works too:
gdf_b = gdf_b[gdf_b.geom_type != 'Point']
I have some very noisy (astronomy) data in csv format. Its shape is (815900,2) with 815k points giving information of what the mass of a disk is at a certain time. The fluctuations are pretty noticeable when you look at it close up. For example, here is an snippet of the data where the first column is time in seconds and the second is mass in kg:
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028
So it looks like there is a 1.53E+028 data point of noise, and also probably the 2.19E+028 and 2.35E+028 points.
To fix this, I am trying to set a Python script that will read in the csv data, then put some restriction on it so that if the mass is e.g. < 2.35E+028, it will remove the whole row and then create a new csv file with only the "good" data points:
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41242600,2.40936E+028
Following this old question top answer by n8henrie, I so far have:
import pandas as pd
import csv
# Here are the locations of my csv file of my original data and an EMPTY csv file that will contain my good, noiseless set of data
originaldata = '/Users/myname/anaconda2/originaldata.csv'
gooddata = '/Users/myname/anaconda2/gooddata.csv'
# I use pandas to read in the original data because then I can separate the columns of time as 'T' and mass as 'M'
originaldata = pd.read_csv('originaldata.csv',delimiter=',',header=None,names=['t','m'])
# Numerical values of the mass values
M = originaldata['m'].values
# Now to put a restriction in
for row in M:
new_row = []
for column in row:
if column > 2.35E+028:
new_row.append(column)
csv.writer(open(newfile,'a')).writerow(new_row)
print('\n\n')
print('After:')
print(open(newfile).read())
However, when I run this, I get this error:
TypeError: 'numpy.float64' object is not iterable
I know the first column (time) is dtype int64 and the second column (mass) is dtype float64... but as a beginner, I'm still not quite sure what this error means or where I'm going wrong. Any help at all would be appreciated. Thank you very much in advance.
You can select rows by a boolean operation. Example:
import pandas as pd
from io import StringIO
data = StringIO('''\
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028
''')
df = pd.read_csv(data,names=['t','m'])
good = df[df.m > 2.35e+28]
out = StringIO()
good.to_csv(out,index=False,header=False)
print(out.getvalue())
Output:
40023700,2.40896e+28
40145700,2.44487e+28
40267700,2.44487e+28
40389700,2.44478e+28
40755400,2.44496e+28
40877200,2.44489e+28
40999000,2.44489e+28
41242600,2.40936e+28
This returns a column: M = originaldata['m'].values
So when you do for row in M:, you get only one value in row, so you can't iterate on it again.