for lat,lng,value in zip(location_saopaulo_df['geolocation_lat'], location_saopaulo_df['geolocation_lng'], location_saopaulo_df['municipality']):
coordinates = (lat,lng)
items = rg.search(coordinates)
value = items[0]['admin2']
I am trying to iterate over 3 columns from the dataframe, get the latitude and longitude values from the two columns, use it to get the address then add the city name to the last column I stated which is an empty column consists of NaN values.
However, my for loop is not stopping. I would be grateful if you can tell me why it doesn't stop or better way to do what I'm trying to do.
Thank you in advance.
if rg is reverse_geocoder, there is a better way to query several coordinates at once than looping. try this:
res = rg.search(tuple(zip(location_saopaulo_df['geolocation_lat'],
location_saopaulo_df['geolocation_lng'])))
And then extract just the admin2 value by constructing dataframe for example like:
df_ = pd.Dataframe(res)
and see what it looks like. You may be able to perform a merge or index alignment to put it back into your original dataframe location_saopaulo_df
Related
I have a dataframe that might look like this:
print(df_selection_names)
name
0 fatty red meat, like prime rib
0 grilled
I have another dataframe, df_everything, with columns called name, suggestion and a lot of other columns. I want to find all the rows in df_everything with a name value matching the name values from df_selection_names so that I can print the values for each name and suggestion pair, e.g., "suggestion1 is suggested for name1", "suggestion2 is suggested for name2", etc.
I've tried several ways to get cell values from a dataframe and searching for values within a row including
# number of items in df_selection_names = df_selection_names.shape[0]
# so, in other words, we are looping through all the items the user selected
for i in range(df_selection_names.shape[0]):
# get the cell value using at() function
# in 'name' column and i-1 row
sel = df_selection_names.at[i, 'name']
# this line finds the row 'sel' in df_everything
row = df_everything[df_everything['name'] == sel]
but everything I tried gives me ValueErrors. This post leads me to think I may be
way off, but I'm feeling pretty confused about everything at this point!
https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html?highlight=isin#pandas.Series.isin
df_everything[df_everything['name'].isin(df_selection_names["name"])]
I have a pandas dataframe in which some rows didn't pull in correctly so that the values were pushed over into the next column over. Therefore I have a column that is mostly null, but has a few instances where there is a value that should go in the previous column. Below is an example of what it looks like.
enter image description here
I need to replace the 12345 and 45678 in the Approver column with JJones in the NeedtoDelete column.
I am not sure if a for loop, or a regular expression is the right way to go. I also came across the replace function, but I'm not sure how I would set that up in this scenario. Below is the code I have tried thus far (Q1Q2 is the df name):
for Q1Q2['Approver'] in Q1Q2:
Replacement = Q1Q2.loc[Q1Q2['Need to Delete'].notnull()]
Q1Q2.loc[Replacement] = Q1Q2['Approver']
Q1Q2.loc[Q1Q2['Need to Delete'].notnull(), ['Approver'] == Q1Q2['Need to Delete']]
If you could help me fix either attempts above, or point me in the right direction, it would be greatly appreciated. Thanks in advance!
You can use boolean indexing:
r=Q1Q2['Need to Delete'].notnull()
Q1Q2.loc[r,'Approver']=Q1Q2.loc[r,'Need to Delete']
I'm trying to make a data frame from the following code, but it is always empty and I'm not sure why. Any suggestions? Thanks!
step_size = 0.01
start = [100]
iter_list = list(range(10000))
for i in iter_list:
start.append(start[i] - step_size)
iter_list2 = list(range(len(start)))
variable_step = pd.DataFrame()
for i in iter_list2:
variable_step[i] = ((start[i]*step_size)/100)
It looks like you may have some sort of confusion about what a dataframe is. Your code doesn't seem to recognize that a DataFrame is a two-dimensional data structure, with rows and columns.
When you do variable_step[i] = ((start[i]*step_size)/100), you're creating a new column in variable_step with column label set to the current value of i, and initializing every element of that column to ((start[i]*step_size)/100), since ((start[i]*step_size)/100) is a scalar.
Creating a new column this way doesn't add more rows. It just adds more values to the existing rows - all 0 of them. Each new column you create has length 0, because you never create rows.
If you want me to tell you how to fix this, well, I can't, because I don't know what you were even trying to do.
You initialize empty data frame here:
variable_step = pd.DataFrame()
Assuming your intention is to put list into a data frame, you should:
variable_step = pd.DataFrame(start) # or any other list you need
Also, you address items by index in the data frame in the last loop to assign values while data frame is empty.
Use .append() instead
I want to create a loop which creates multiple csvs which have the same 9 columns in the beginning but differ iteratively in the last column.
[col1,col2,col3,col4,...,col9,col[i]]
I have a dataframe with a shape of (20000,209).
What I want is that I create a loop which does not takes too much computation power and resources but creates 200 csvs which differ in the last column. All columns exist in one dataframe. The columns which should be added are in columns i =[10:-1].
I thought of something like:
for col in df.columns[10:-1]:
dfi = df[:9]
dfi.concat(df[10])
dfi.dropna()
dfi.to_csv('dfi.csv'))
Maybe it is also possible to use
dfi.to_csv('dfi.csv', sequence = [:9,i])
The i should display the number of the added column. Any idea how to make this happen easily? :)
Thanks a lot!
I'm not sure I understand fully what you want but are you saying that each csv should just have 10 columns, all should have the first 9 and then one csv for each of the remaining 200 columns?
If so I would go for something as simple as:
base_cols = list(range(9))
for i in range(9, 209):
df.iloc[:, base_cols+[i]].to_csv('csv{}.csv'.format(i))
Which should work I think.
I have a dataset in a relational database format (linked by ID's over various .csv files).
I know that each data frame contains only one value of an ID, and I'd like to know the simplest way to extract values from that row.
What I'm doing now:
# the group has only one element
purchase_group = purchase_groups.get_group(user_id)
price = list(purchase_group['Column_name'])[0]
The third row is bothering me as it seems ugly, however I'm not sure what is the workaround. The grouping (I guess) assumes that there might be multiple values and returns a <class 'pandas.core.frame.DataFrame'> object, while I'd like just a row returned.
If you want just the value and not a df/series then call values and index the first element [0] so just:
price = purchase_group['Column_name'].values[0]
will work.
If purchase_group has single row then doing purchase_group = purchase_group.squeeze() would make it into a series so you could simply call purchase_group['Column_name'] to get your values
Late to the party here, but purchase_group['Column Name'].item() is now available and is cleaner than some other solutions
This method is intuitive; for example to get the first row (list from a list of lists) of values from the dataframe:
np.array(df)[0]