Converting geographic coordinates from GEOSTAT to lat and lng - python

I've found an interesting datasource of European Population which I think could help me in achieving such a map:
The source document GEOSTAT_grid_POP_1K_2011_V2_0_1.csv looks like this:
| TOT_P | GRD_ID | CNTR_CODE | METHD_CL | YEAR | DATA_SRC | TOT_P_CON_DT |
|-------|---------------|-----------|----------|------|----------|--------------|
| 8 | 1kmN2689E4337 | DE | A | 2011 | DE | other |
| 7 | 1kmN2689E4341 | DE | A | 2011 | DE | other |
Geographic coordinates look to be coded in the GRD_ID column this document indicates Appendix1_WP1C_production-procedures-bottom-up.pdf:
Grid cell identification codes are based on grid cell’s lower left-hand corner coordinates truncated by grid
cell size (e.g. 1kmN4534E5066 is result from coordinates Y=4534672, X=5066332 and the cell size 1000)
I thought I could get lat and long by parsing the strings. For example in Python:
import re
string = "1kmN2691E4341"
lat = float(re.sub('.*N([0-9]+)[EW].*', '\\1', string))/100
lng = float(re.sub('.*[EW]([0-9]+)', '\\1', string))/100
print lat, ",", lng
Output 26.91 , 43.41
but it makes no sense, it does not correspond to a location in Europe !
It may be that it refers to a geographic coordinate system I'm not aware of.

Thanks to Viktor's comment, I found out that the coordinate system used in my file was EPSG:3035
Based on python's implementation of Proj4, I could achieve a convincing result with the following code:
#! /usr/bin/python
# coding: utf-8
import re
from pyproj import Proj, transform
string = "1kmN2326E3989"
x1 = int(re.sub('.*[EW]([0-9]+)', '\\1', string))*1000
y1 = int(re.sub('.*N([0-9]+)[EW].*', '\\1', string))*1000
inProj = Proj(init='EPSG:3035')
outProj = Proj(init='epsg:4326')
lng,lat = transform(inProj,outProj,x1,y1)
print lat,lng
Output : 43.9613760836 5.870517281

Related

How to lookup data from one CSV in another CSV?

In the crq_data file I have cities and states from a user uploaded *.csv file
In the cityDoordinates.csv file I have a library of American cities and states along with their coordinates, I would like this to be a sort of "look up tool" to compare an uploaded .csv file to find their coordinates to map in Folium
Right now, it reads line by line so it appends the coordinates one at a time (n seconds) I would like it to run much faster so that if there are 6000 lines the user doesn't have to wait for 6000 seconds.
Here is part of my code:
crq_file = askopenfilename(filetypes=[('CSV Files', '*csv')])
crq_data = pd.read_csv(crq_file, encoding="utf8")
coords = pd.read_csv("cityCoordinates.csv")
for crq in range(len(crq_data)):
task_city = crq_data.iloc[crq]["TaskCity"]
task_state = crq_data.iloc[crq]["TaskState"]
for coordinates in range(len(coords)):
cityCoord = coords.iloc[coordinates]["City"]
stateCoord = coords.iloc[coordinates]["State"]
latCoord = coords.iloc[coordinates]["Latitude"]
lngCoord = coords.iloc[coordinates]["Longitude"]
if task_city == cityCoord and task_state == stateCoord:
crq_data["CRQ Latitude"] = latCoord
crq_data["CRQ Longitude"] = lngCoord
print(cityCoord, stateCoord, latCoord, lngCoord)
This is an example of the current Terminal Output
Example of uploaded .csv file
I see this not as a problem w/optimizing Pandas, but finding a good data structure for fast lookups: and a good data structure for fast lookups is the dict. The dict takes memory, though; you'll need to evaluate that cost for yourself.
I mocked up what your cityCoordinates CSV could look like:
| City | State | Latitude | Longitude |
|----------|-------|------------|-------------|
| Portland | OR | 45°31′12″N | 122°40′55″W |
| Dallas | TX | 32°46′45″N | 96°48′32″W |
| Portland | ME | 43°39′36″N | 70°15′18″W |
import csv
import pprint
def cs_key(city_name: str, state_name: str) -> str:
"""Make a normalized City-State key."""
return city_name.strip().lower() + "--" + state_name.strip().lower()
# A dict of { "City_name-State_name": (latitude, longitude), ... }
coords_lookup = {}
with open("cityCoordinates.csv", newline="") as f:
reader = csv.DictReader(f) # your coords file appears to have a header
for row in reader:
city = row["City"]
state = row["State"]
lat = row["Latitude"]
lon = row["Longitude"]
key = cs_key(city, state)
coords_lookup[key] = (lat, lon)
pprint.pprint(coords_lookup, sort_dicts=False)
When I run that, I get:
{'portland--or': ('45°31′12″N', '122°40′55″W'),
'dallas--tx': ('32°46′45″N', '96°48′32″W'),
'portland--me': ('43°39′36″N', '70°15′18″W')}
Now, iterating the task data looks pretty much the same: we take a pair of City and State, make a normalized key out of them, then try to look up that key for known coordinates.
I mocked up some task data:
| TaskCity | TaskState |
|------------|-----------|
| Portland | OR |
| Fort Worth | TX |
| Dallas | TX |
| Boston | MA |
| Portland | ME |
and when I run this:
with open("crq_data.csv", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
city = row["TaskCity"]
state = row["TaskState"]
key = cs_key(city, state)
coords = coords_lookup.get(key, (None, None))
if coords != (None, None):
print(city, state, coords[0], coords[1])
I get:
Portland OR 45°31′12″N 122°40′55″W
Dallas TX 32°46′45″N 96°48′32″W
Portland ME 43°39′36″N 70°15′18″W
This solution is going to be much faster in principle because you're not doing a cityCoordinates-ROWS x taskData-ROWS quadratic loop. And, in practice, Pandas suffers when doing row iteration^1, I'm not sure if the same holds for indexing (iloc), but in general Pandas is for manipulating columns of data, and I would say is not for row-oriented problems/solutions.

CSV file to an array to a table? (Python 3.10.4)

I'm new to Python and doing some project based learning.
I have a CSV file that I've put into an array but I'd like present it in PrettyTable
Here's what I have so far:
import csv
import numpy as np
with open('destiny.csv', 'r') as f:
data = list(csv.reader(f, delimiter=";"))
data = np.array(data)
Output is this:
['Loud Lullaby,Aggressive,Moon,Kinetic,120,Legendary,hand_cannon']
['Pribina-D,Aggressive,Gunsmith,Kinetic,120,Legendary,hand_cannon']
['True Prophecy,Aggressive,World,Kinetic,120,Legendary,hand_cannon']
['Igneous Hammer,Aggressive,Trials,Solar,120,Legendary,hand_cannon']
But I'd like to get it into this:
from prettytable import PrettyTable
myTable = PrettyTable(['Gun Name', 'Archetype', 'Source', 'Element', 'Rounds Per Minute', 'Rarity', 'Weapon Type'])
myTable.add_row(['Loud Lullaby', 'Aggressive', 'Moon', 'Kinetic', '120', 'Legendary', 'Hand Cannon'])
myTable.add_row(["Pribina-D", "Aggressive", "Gunsmith", "Kinetic", "120", "Legendary", "Hand Cannon"])
myTable.add_row(["True Prophecy", "Aggressive", "World", "Kinetic", "120", "Legendary", "Hand Cannon"])
myTable.add_row(["Igneous Hammer", "Aggressive", "Trials", "Solar", "120", "Legendary", "Hand Cannon"])
So it can look like this:
Gun Name | Archetype | Source | Element | Rounds Per Minute | Rarity | Weapon Type |
+---------------------------------+--------------+---------------+---------+-------------------+-----------+-------------+
| Loud Lullaby | Aggressive | Moon | Kinetic | 120 | Legendary | Hand Cannon |
| Pribina-D | Aggressive | Gunsmith | Kinetic | 120 | Legendary | Hand Cannon |
| True Prophecy | Aggressive | World | Kinetic | 120 | Legendary | Hand Cannon |
| Igneous Hammer | Aggressive | Trials | Solar | 120 | Legendary | Hand Cannon |
Thoughts on the best way to get the data set incorporated into the table without having to copy and paste every line into myTable.add_row? Because there's hundreds of lines...
[Credit to vishwasrao99 at Kaggle for this CSV file]
I just combined your two pieces of script:
import csv
import numpy as np
from prettytable import PrettyTable
with open('destiny.csv', 'r') as f:
data = list(csv.reader(f, delimiter=";"))
data = np.array(data)
columns = ['Gun Name', 'Archetype', 'Source', 'Element', 'Rounds Per Minute', 'Rarity', 'Weapon Type']
myTable = PrettyTable(columns)
for row in data:
list = row[0].split(",")
myTable.add_row(list)
print(myTable)
Note that I used split(",") to split the strings you get in your numpy array at every comma, creating identical lists as what you feed in manually in your example.

GeoPandas Convert Geometry Column To Geometry Type

I currently have a geopandas dataframe that looks like this
|----|-------|-----|------------------------------------------------|
| id | name | ... | geometry |
|----|-------|-----|------------------------------------------------|
| 1 | poly1 | ... | 0101000020E6100000A6D52A40F1E16690764A7D... |
|----|-------|-----|------------------------------------------------|
| 2 | poly2 | ... | 0101000020E610000065H7D2A459A295J0A67AD2... |
|----|-------|-----|------------------------------------------------|
And when getting ready to write it to postgis, I am getting the following error:
/python3.7/site-packages/geopandas/geodataframe.py:1321: UserWarning: Geometry column does not contain geometry.
warnings.warn("Geometry column does not contain geometry.")
Is there a way to convert this geometry column to a geometry type so that when it is appending to the existing table with geometry type column errors can be avoided. I've tried:
df['geometry'] = gpd.GeoSeries.to_wkt(df['geometry'])
But there are errors parsing the existing geometry column. Is there a correct way I am missing?
The syntax needs to be changed as below
df['geometry'] = df.geometry.apply(lambda x: x.wkt).apply(lambda x: re.sub('"(.*)"', '\\1', x))

Pandas not displaying all columns when writing to

I am attempting to export a dataset that looks like this:
+----------------+--------------+--------------+--------------+
| Province_State | Admin2 | 03/28/2020 | 03/29/2020 |
+----------------+--------------+--------------+--------------+
| South Dakota | Aurora | 1 | 2 |
| South Dakota | Beedle | 1 | 3 |
+----------------+--------------+--------------+--------------+
However the actual CSV file i am getting is like so:
+-----------------+--------------+--------------+
| Province_State | 03/28/2020 | 03/29/2020 |
+-----------------+--------------+--------------+
| South Dakota | 1 | 2 |
| South Dakota | 1 | 3 |
+-----------------+--------------+--------------+
Using this here code (runnable by running createCSV(), pulls data from COVID govt GitHub):
import csv#csv reader
import pandas as pd#csv parser
import collections#not needed
import requests#retrieves URL fom gov data
def getFile():
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID- 19/master/csse_covid_19_data/csse_covid_19_time_series /time_series_covid19_deaths_US.csv'
response = requests.get(url)
print('Writing file...')
open('us_deaths.csv','wb').write(response.content)
#takes raw data from link. creates CSV for each unique state and removes unneeded headings
def createCSV():
getFile()
#init data
data=pd.read_csv('us_deaths.csv', delimiter = ',')
#drop extra columns
data.drop(['UID'],axis=1,inplace=True)
data.drop(['iso2'],axis=1,inplace=True)
data.drop(['iso3'],axis=1,inplace=True)
data.drop(['code3'],axis=1,inplace=True)
data.drop(['FIPS'],axis=1,inplace=True)
#data.drop(['Admin2'],axis=1,inplace=True)
data.drop(['Country_Region'],axis=1,inplace=True)
data.drop(['Lat'],axis=1,inplace=True)
data.drop(['Long_'],axis=1,inplace=True)
data.drop(['Combined_Key'],axis=1,inplace=True)
#data.drop(['Province_State'],axis=1,inplace=True)
data.to_csv('DEBUGDATA2.csv')
#sets province_state as primary key. Searches based on date and key to create new CSVS in root directory of python app
data = data.set_index('Province_State')
data = data.iloc[:,2:].rename(columns=pd.to_datetime, errors='ignore')
for name, g in data.groupby(level='Province_State'):
g[pd.date_range('03/23/2020', '03/29/20')] \
.to_csv('{0}_confirmed_deaths.csv'.format(name))
The reason for the loop is to set the date columns (everything after the first two) to a date, so that i can select only from 03/23/2020 and beyond. If anyone has a better method of doing this, I would love to know.
To ensure it works, it prints out all the field names, inluding Admin2 (county name), province_state, and the rest of the dates.
However, in my CSV as you can see, Admin2 seems to have disappeared. I am not sure how to make this work, if anyone has any ideas that'd be great!
changed
data = data.set_index('Province_State')
to
data = data.set_index((['Province_State','Admin2']))
Needed to create a multi key to allow for the Admin2 column to show. Any smoother tips on the date-range section welcome to reopen
Thanks for the help all!

Pairing two Pandas data frames with an ID value

I am trying to put together a useable set of data about glaciers. Our original data comes from an ArcGIS dataset, and latitude/longitude values were stored in a separate file, now detached from the CSV with all of our data. I am attempting to merge the latitude/longitude files with our data set. Heres a preview of what the files look like.
This is my main dataset file, glims (columns dropped for clarity)
| ANLYS_ID | GLAC_ID | AREA |
|----------|----------------|-------|
| 101215 | G286929E46788S | 2.401 |
| 101146 | G286929E46788S | 1.318 |
| 101162 | G286929E46788S | 0.061 |
This is the latitude-longitude file, coordinates
| lat | long | glacier_id |
|-------|---------|----------------|
| 1.187 | -70.166 | G001187E70166S |
| 2.050 | -70.629 | G002050E70629S |
| 3.299 | -54.407 | G002939E70509S |
The problem is, the coordinates data frame has one row for each glacier id with latitude longitude, whereas my glims data frame has multiple rows for each glacier id with varying data for each entry.
I need every single entry in my main data file to have a latitude-longitude value added to it, based on the matching glacier_id between the two data frames.
Heres what I've tried so far.
glims = pd.read_csv('glims_clean.csv')
coordinates = pd.read_csv('LatLong_GLIMS.csv')
df['que'] = np.where((coordinates['glacier_id'] ==
glims['GLAC_ID']))
error returns: 'int' object is not subscriptable
and:
glims.merge(coordinates, how='right', on=('glacier_id', 'GLAC_ID'))
error returns: int' object has no attribute 'merge'
I have no idea how to tackle this big of a merge. I am also afraid of making mistakes because it is nearly impossible to catch them, since the data carries no other identifying factors.
Any guidance would be awesome, thank you.
This should work
glims = glims.merge(coordinates, how='left', left_on='GLAC_ID', right_on='glacier_id')
This a classic merging problem. One way to solve is using straight loc and index-matching
glims = glims.set_index('GLAC_ID')
glims.loc[:, 'lat'] = coord.set_index('glacier_id').lat
glims.loc[:, 'long'] = coord.set_index('glacier_id').long
glims = glims.reset_index()
You can also use pd.merge
pd.merge(glims,
coord.rename(columns={'glacier_id': 'GLAC_ID'}),
on='GLAC_ID')

Categories

Resources