Getting a word from a set in a dataframe? - python

I have a dataframe column 'address' with values like this in each row:
3466B, Jerome Avenue, The Bronx, Bronx County, New York, 10467, United States, (40.881836199999995, -73.88176324294639)
Jackson Heights 74th Street - Roosevelt Avenue (7), 75th Street, Queens, Queens County, New York, 11372, United States, (40.74691655, -73.8914737373454)
I need only to keep the value Bronx / Queens / Manhattan / Staten Island from each row.
Is there any way to do this?
Thanks in advance.

One option is this, assuming the values are always in the same place. Using .split(', ')[2]
"3466B, Jerome Avenue, The Bronx, Bronx County, New York, 10467, United States, (40.881836199999995, -73.88176324294639)".split(', ')[2]
If the source file is a CSV (Comma-separated values), I would have a look at pandas and pandas.read_csv('filename.csv') and leverage all the nice features that are in pandas.
If the values are not at the same position and you need only a is in set of values or not:
import pandas as pd
df = pd.DataFrame(["The Bronx", "Queens", "Man"])
df.isin(["Queens", "The Bronx"])

You could add a column, let's call it 'district' and then populate it like this.
import pandas as pd
df = pd.DataFrame({'address':["3466B, Jerome Avenue, The Bronx, Bronx County, New York, 10467, United States, (40.881836199999995, -73.88176324294639)",
"Jackson Heights 74th Street - Roosevelt Avenue (7), 75th Street, Queens, Queens County, New York, 11372, United States, (40.74691655, -73.8914737373454)"]})
districts = ['Bronx','Queens','Manhattan', 'Staten Island']
df['district'] = ''
for district in districts:
df.loc[df['address'].str.contains(district) , 'district'] = district
print(df)

Related

Make changes in the row data in excel panda

I want to change the data in a row
database:
Name
City
Country
John
Toronto,Canada
Canada
Smith
Seattle,United States
United States
Raj
Greater Toronto Area,Canada
Canada
The Records in the city with "," should be removed only the name of the city should be their others should be deleted
output required
Name
City
Country
John
Toronto
Canada
Smith
Seattle
United States
Raj
Greater Toronto Area
Canada
USE -df['City'] = df['City'].str.split(',').str[0]
Reproducible Code-
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 'Toronto ,ON','Canada'], [ "Raj","Greater Toronto Area, Canada","Canada"]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'City','Country'])
# print dataframe.
df['City'] = df['City'].str.split(',').str[0]
df
Ouptut-
Name City Country
0 tom Toronto Canada
1 Raj Greater Toronto Area Canada
Ref link-https://stackoverflow.com/questions/40705480/python-pandas-remove-everything-after-a-delimiter-in-a-string

How to delete part of a string by matching conditions?

I have many addresses information, such as:
123 1st Ave Apt501, Flushing, New York, 00000, USA
234 West 20th Street 1A, New York, New York, 11111, USA
345 North 100st Street Apt. 110, New York, New York, 22222, USA
I would like to get the street information. So, I am wondering how can I delete the apartment information after "Ave", and "Street"?
So, the addresses will be cleaned as:
123 1st Ave, Flushing, New York, 00000, USA
234 West 20th Street, New York, New York, 11111, USA
345 North 100st Street, New York, New York, 22222, USA
Or the data can be cleaned as:
123 1st Ave
234 West 20th Street
345 North 100st Street
This is the code I tried. However, I was not able to remove apartment information not including "apt".
conditions = [df.address.str.contains('Apt')]
choices = [df.address.apply(lambda x: x[x.find('Apt'):])]
df['apt'] = np.select(conditions, choices, default = '')
choices2 = [df.address.apply(lambda x: x[:x.find('Apt')])]
df['address'] = np.select(conditions, choices2, default = df.address)
I think you should wrap all the addresses in a list and use a split to separate each element in the address so you can access street information by index 0.
addresses = ['123 1st Ave, Flushing, New York, 00000, USA', '234 West 20th Street, New York, New York, 11111, USA',
'345 North 100st Street, New York, New York, 22222, USA']
for s in addresses:
print(s.split(',')[0])
Output
123 1st Ave
234 West 20th Street
345 North 100st Street
To get the second option, I'd split at comma first and then process the first item with a regular expression.
df['street'] = (df.address
.str.split(',') # split at ,
.str[0] # get the first element
.str.replace('(Apt[.\s]*|Street\s+)\d+\w?$',
'')
)
The regular expression matches
Apt followed by zero or more dots or whitespace OR
Street followed by whitespace
one or more integers
an optional letter
and all that at the end of the string ($).
The pattern might need some tweaking but gives the right result for the example.

Pandas - Create a new column (Branch name) based on another column (City name)

I have the following Python Pandas Dataframe (8 rows):
City Name
New York
Long Beach
Jamestown
Chicago
Forrest Park
Berwyn
Las Vegas
Miami
I would like to add a new Column (Branch Name) based on City Name as below:
City Name Branch Name
New York New York
Long Beach New York
Jamestown New York
Chicago Chicago
Forrest Park Chicago
Berwyn Chicago
Las Vegas Las Vegas
Miami Miami
How do I do that?
You can use .map(). City names not in the dictionnary will be kept.
df["Branch Name"] = df["City Name"].map({"Long Beach":"New York",
"Jamestown":"New York",
"Forrest Park":"Chicago",
"Berwyn":"Chicago",}, na_action='ignore')
df["Branch Name"] = df["Branch Name"].fillna(df["City Name"])

Loop and store coordinates

I have a copy of a dataframe that looks like this:
heatmap_df = test['coords'].copy()
heatmap_df
0 [(Manhattanville, Manhattan, Manhattan Communi...
1 [(Mainz, Rheinland-Pfalz, 55116, Deutschland, ...
2 [(Ithaca, Ithaca Town, Tompkins County, New Yo...
3 [(Starr Hill, Charlottesville, Virginia, 22903...
4 [(Neuchâtel, District de Neuchâtel, Neuchâtel,...
5 [(Newark, Licking County, Ohio, 43055, United ...
6 [(Mae, Cass County, Minnesota, United States o...
7 [(Columbus, Franklin County, Ohio, 43210, Unit...
8 [(Canaanville, Athens County, Ohio, 45701, Uni...
9 [(Arizona, United States of America, (34.39534...
10 [(Enschede, Overijssel, Nederland, (52.2233632...
11 [(Gent, Oost-Vlaanderen, Vlaanderen, België - ...
12 [(Reno, Washoe County, Nevada, 89557, United S...
13 [(Grenoble, Isère, Auvergne-Rhône-Alpes, Franc...
14 [(Columbus, Franklin County, Ohio, 43210, Unit...
Each row has this format with some coordinates:
heatmap_df[2]
[Location(Ithaca, Ithaca Town, Tompkins County, New York, 14853, United States of America, (42.44770298533052, -76.48085858627931, 0.0)),
Location(Chapel Hill, Orange County, North Carolina, 27515, United States of America, (35.916920469999994, -79.05664845999999, 0.0))]
I want to pull the latitude and longitudes from each row and store them as separate columns in the dataframe heatmap_df. I have this so far, but I suck at writing loops. My loop is not working recursively, it only prints out the last coordinates.
x = np.arange(start=0, stop=3, step=1)
for i in x:
point_i = (heatmap_df[i][0].latitude, heatmap_df[i][0].longitude)
i = i+1
point_i
(42.44770298533052, -76.48085858627931)
I am trying to make a heat map with all the coordinates using Folium. Can someone help please? Thank you
Python doesn't know what you are trying to do it's assuming you want to store the tuple value of (heatmap_df[i][0].latitude, heatmap_df[i][0].longitude) in the variable point_i for every iteration. So what happens is it is overwritten every time. You want to declare a list outside then loop the append a lists of the Lat and Long to it creating a List of List which can easily be a DF. Also, your loop in the example isn't recursive, Check this out for recursion
Try this:
x = np.arange(start=0, stop=3, step=1)
points = []
for i in x:
points.append([heatmap_df[i][0].latitude, heatmap_df[i][0].longitude])
i = i+1
print(points)

Filling out empty cells with lists of values

I have a data frame that looks like below:
City State Country
Chicago IL United States
Boston
San Diego CA United States
Los Angeles CA United States
San Francisco
Sacramento
Vancouver BC Canada
Toronto
And I have 3 lists of values that are ready to fill in the None cells:
city = ['Boston', 'San Francisco', 'Sacramento', 'Toronto']
state = ['MA', 'CA', 'CA', 'ON']
country = ['United States', 'United States', 'United States', 'Canada']
The order of the elements in these list are correspondent to each other. Thus, the first items across all 3 lists match each other, and so forth. How can I fill out the empty cells and produce a result like below?
City State Country
Chicago IL United States
Boston MA United States
San Diego CA United States
Los Angeles CA United States
San Francisco CA United States
Sacramento CA United States
Vancouver BC Canada
Toronto ON Canada
My code gives me an error and I'm stuck.
if df.loc[df['City'] == 'Boston']:
'State' = 'MA'
Any solution is welcome. Thank you.
Create two mappings, one for <city : state>, and another for <city : country>.
city_map = dict(zip(city, state))
country_map = dict(zip(city, country))
Next, set City as the index -
df = df.set_index('City')
And, finally use map/replace to transform keys to values as appropriate -
df['State'] = df['City'].map(city_map)
df['Country'] = df['City'].map(country_map)
As an extra final step, you may call df.reset_index() at the end.

Categories

Resources