Add geo-location data to Pandas data frame

Add geo-location data to Pandas data frame - python

I'm importing a CSV into a pandas data frame, and then I'm trying to create three new columns in that data frame from data retrieved from geopy.geocoders.GoogleV3() :
import pandas from geopy.geocoders import GoogleV3
DATA = pandas.read_csv("file/to/csv")
geolocator = GoogleV3()
DATA.googleaddress, (DATA.latitude, DATA.longitude) = geolocator.geocode(DATA.address)
Problem is I keep getting this error:
Traceback (most recent call last):
File "C:/Path/To/GeoCoder.py", line 9, in <module>
DATA.googleaddress, (DATA.latitude, DATA.longitude) = geolocator.geocode(DATA.address)
TypeError: 'NoneType' object is not iterable
What does this error mean and how do I get around it?

Because geolocator.geocode expects a single argument at a time, not a list (or array).
You could try:
locs = [ geolocator.geocode(addr) for addr in DATA.address ]
geo_info = pandas.DataFrame(
[ (addr.address, addr.latitude, addr.longitude) for addr in locs ],
columns=['googleaddress', 'latitude', 'longitude'])
All you would have to do is merge these DataFrames:
DATA.combine_first(geo_info)
Nnote that it is considered bad form to have an all-uppercase variable in python.

Related

Getting typeError when trying to convert r dataframe to json with rpy2

import json
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
biocPkgTools = importr('BiocPkgTools')
biocPkgList = biocPkgTools.biocPkgList()
biocPkgList = json.loads(ro.conversion.rpy2py(biocPkgList))
The dataframe looks great and I'm just trying to convert it to a json object with column names as keys but I receive this error:
Traceback (most recent call last):
File "/bioconductor/bioconductor.py", line 11, in <module>
json = json.loads(ro.conversion.rpy2py(biocPkgList))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not DataFrame
Other steps I've tried is converting it to a pandas dataframe then to json but that also gives an error. I appreciate any help I can get.
Pandas method:
import rpy2.robjects.numpy2ri as rpyn
import json
import pandas as pd
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
biocPkgTools = importr('BiocPkgTools')
biocPkgList = biocPkgTools.biocPkgList()
columns = list(biocPkgList.colnames)
biocPkgList_df = pd.DataFrame(biocPkgList)
biocPkgList_df = biocPkgList_df.T
biocPkgList_df.columns = columns
biocPkgList_json = biocPkgList_df.to_json(orient='records')
print(biocPkgList_json)
I get these R errors:
R[write to console]: Error: unimplemented type 'char' in 'eval'
R[write to console]: Error: cannot have attributes on a CHARSXP
R[write to console]: Fatal error: unable to initialize the JIT

To convert an R DataFrame to JSON-formatted Python dict/list structure (which seem to be what you are attempting), you need to either:
(a) convert it to JSON string in R and then parse the JSON string in Python
(b) convert it to pandas DataFrame and then covert that to JSON
For the solution (a), I would recommend using rjson R package:
import json
from rpy2.robjects.packages import importr
bioc_pkg_tools = importr('BiocPkgTools')
rjson = importr('rjson')
bioc_pkg_data_frame = bioc_pkg_tools.biocPkgList()
r_json_string_vector = rjson.toJSON(bioc_pkg_data_frame)
py_json_string = r_json_string_vector[0]
py_json_structure = json.loads(py_json_string)
print(py_json_structure.keys())
# dict_keys(['Package', 'Version', 'Depends', 'Suggests', 'License', 'MD5sum', 'NeedsCompilation', 'Title', 'Description', 'biocViews', 'Author', 'Maintainer', 'git_url', 'git_branch', 'git_last_commit', 'git_last_commit_date', 'Date/Publication', 'source.ver', 'win.binary.ver', 'mac.binary.ver', 'vignettes', 'vignetteTitles', 'hasREADME', 'hasNEWS', 'hasINSTALL', 'hasLICENSE', 'Rfiles', 'dependencyCount', 'Imports', 'Enhances', 'dependsOnMe', 'VignetteBuilder', 'suggestsMe', 'LinkingTo', 'Archs', 'URL', 'SystemRequirements', 'BugReports', 'importsMe', 'PackageStatus', 'Video', 'linksToMe', 'License_restricts_use', 'organism', 'OS_type', 'License_is_FOSS'])
Now, as for (b) the code would be along these lines:
from rpy2.robjects import pandas2ri
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import localconverter, rpy2py
base = importr('base')
with localconverter(default_converter + pandas2ri.converter):
pandas_dataframe = base.as_data_frame(bioc_pkg_data_frame)
py_json_string = pandas_dataframe.to_json()
py_json_structure = json.loads(py_json_structure)
However, it does not work in this case (raising TypeError: 'NULLType' object is not iterable), because the R data frame contains lists (e.g. in the Depends column) and conversion of data frames with embedded lists is not yet supported by rpy2 (https://github.com/rpy2/rpy2/issues/773 and https://github.com/rpy2/rpy2/issues/860).
You can still extract a subset of the data frame that does not include list:
list_columns = []
i = 1
columns_to_keep = []
for column_name in bioc_pkg_data_frame.names:
# rx2 is equivalent of `bioc_pkg_data_frame[[column_name]]` in R
column = bioc_pkg_data_frame.rx2(column_name)
r_class = get_r_class(column)[0]
if r_class == 'list':
list_columns.append(column_name)
else:
columns_to_keep.append(i)
i += 1
# we will exclude these:
print(list_columns)
# Depends, Suggests, biocViews, Author, Maintainer, vignettes, vignetteTitles, Rfiles, Imports, Enhances, dependsOnMe, suggestsMe, LinkingTo, Archs, importsMe, linksToMe
And then get a pandas dataframe and JSON (string/structure) with:
with localconverter(default_converter + pandas2ri.converter):
pandas_dataframe = base.as_data_frame(bioc_pkg_data_frame_no_lists)
py_json_string = pandas_dataframe.to_json()
py_json_structure = json.loads(py_json_structure)
(or you could convert the lists to a concatenated string)

Why is this error occuring when I am using filter in pandas: TypeError: 'int' object is not iterable

When I want to remove some elements which satisfy a particular condition, python is throwing up the following error:
TypeError Traceback (most recent call last)
<ipython-input-25-93addf38c9f9> in <module>()
4
5 df = pd.read_csv('fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv;
----> 6 df = filter(df,~('-02-29' in df['Date']))
7 '''tmax = []; tmin = []
8 for dates in df['Date']:
TypeError: 'int' object is not iterable
The following is the code :
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv');
df = filter(df,~('-02-29' in df['Date']))
What wrong could I be doing?
Following is sample data
Sample Data

Use df.filter() (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html)
Also please attach the csv so we can run it locally.

Another way to do this is to use one of pandas' string methods for Boolean indexing:
df = df[~ df['Date'].str.contains('-02-29')]
You will still have to make sure that all the dates are actually strings first.
Edit:
Seeing the picture of your data, maybe this is what you want (slashes instead of hyphens):
df = df[~ df['Date'].str.contains('/02/29')]

DataFrame object is not callable when counting length

I have a list of part numbers that I want to use to extract a list of prices on a website.
However I'm getting the below error when running the code:
Traceback (most recent call last):
File "C:/Users/212677036/.PyCharmCE2019.1/config/scratches/scratch_1.py", line 13, in
data = {"partOptionFilter": {"PartNumber": PN(i), "AlternativeOemId": "17155"}}
TypeError: 'DataFrame' object is not callable
Process finished with exit code 1
import requests
import pandas as pd
df = pd.read_excel(r'C:\Users\212677036\Documents\Copy of MIC Parts Review - July 26 19.xlsx')
PN = pd.DataFrame(df, columns = ['Product code'])
#print(PN)
i = 0
Total_rows = PN.shape[0]
while i < Total_rows:
data = {"partOptionFilter": {"PartNumber": PN(i), "AlternativeOemId": "17155"}}
r = requests.post('https://www.partsfinder.com/Catalog/Service/GetPartOptions', json=data).json()
print(r['Data']['PartOptions'][0]['YourPrice'])
i=i+1

You are calling PN(i). That is why it says
TypeError: 'DataFrame' object is not callable
The (i) is like a method call.
I am not sure how your df looks like and what you want to extract but you have to index the DataFrame like this:
PN[i]
or
PN.loc[i, 'columnname']
or
PN.iloc[i, 0]
or ... depending on your df

Pandas Data Frame Merge

I am new to Pandas. I am trying to make a data set with ZIP Code, Population in that ZIP Code, and Number of Counties in the ZIP Code.
I get the data from census website: https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt
I am trying with the following code, but it is not working. Could you help me to figure out the correct code? I have a hunch that the error is due to data frame or sorts related to data type. But I can not work out the correct code to make it right. Please let me know your thoughts. Thank you in advance!
import pandas as pd
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)
zcta_ct_county = df['ZCTA5'].value_counts()
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]
Here is my error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 58, in merge copy=copy, indicator=indicator)
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 473, in __init__ 'type {0}'.format(type(right)))
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
SOLUTION
import pandas as pd
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)
zcta_ct_county = df['ZCTA5'].value_counts().reset_index()
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]

I think you need add reset_index, because output of value_counts is Series and need DataFrame with 2 columns:
zcta_ct_county = df['ZCTA5'].value_counts().reset_index()

I can load multiple stock tickers from yahoo finance, but I am having trouble adding new columns before it is saved to a csv

The code below starts by uploading data from yahoo finance in pnls. It does this successufly. Then I have code where I want to add columns to the loaded data in pnls. This part is unsuccessful. Finally I want to save pnls to a csv file which it does successfully if the column changes are not in the code.
The question that I need answered is how do I successfully make multiple column changes before I save pnls to csv??
from pandas_datareader import data as dreader
import pandas as pd
from datetime import datetime
import numpy as np
# Symbols is a list of all the ticker symbols that I am downloading from yahoo finance
symbols = ['tvix','pall','pplt','sivr','jju','nib','jo','jjc','bal','jjg','ld','cdw','gaz','jjn','pgm',
'sgg','jjt','grn','oil','gld','corn','soyb','weat','uhn','uga','bunl','jgbl','bndx','aunz',
'cemb','emhy','vwob','ald','udn','fxa','fxb']
#This gets the data from yahoo finance
pnls = {i:dreader.DataReader(i,'yahoo','1985-01-01',datetime.today()) for i in symbols}
# These are the new columns that I want to add
pnls['PofChg'] = ((pnls.Close - pnls.Open) / (pnls.Open)) * 100
pnls['U_D_F'] = np.where(pnls.PofChg > 0 , 'Up', np.where(pnls.PofChg == 0, 'Flat', 'Down'));pnls
pnls['Up_Down'] = pnls.Close.diff()
pnls['Close%pd'] = pnls.Close.pct_change()*100
# This saves the current ticker to a csv file
for df_name in pnls:
pnls.get(df_name).to_csv("{}_data.csv".format(df_name), index=True, header=True)
This is the error that I get when I run the code
Traceback (most recent call last):
File "YahooFinanceDataGetter.py", line 14, in <module>
pnls['PofChg'] = ((pnls.Close - pnls.Open) / (pnls.Open)) * 100
AttributeError: 'dict' object has no attribute 'Close'
Press any key to continue . . .

The line
pnls = {i:dreader.DataReader(i,'yahoo','1985-01-01',datetime.today()) for i in symbols}
builds a python dict (using dictionary comprehension). To check this, you can run the following:
assert type(pnls) == dict
This is what the error message is telling you: AttributeError: 'dict' object has no attribute 'Close'.
The DataFrames are actually the values of the dictionary. To apply transformations to them, you can iterate over the values() of the dictionary:
for df in pnls.values():
df = ((df.Close - df.Open) / (df.Open)) * 100
...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add geo-location data to Pandas data frame - python

Related

Getting typeError when trying to convert r dataframe to json with rpy2

Why is this error occuring when I am using filter in pandas: TypeError: 'int' object is not iterable

DataFrame object is not callable when counting length

Pandas Data Frame Merge

I can load multiple stock tickers from yahoo finance, but I am having trouble adding new columns before it is saved to a csv

Categories

Resources