Populate dataframe with variable value pandas - python

I really hope you can help.
I am working on the below, which finds the closest schools to a particular property.
Everything is working fine, except the final step. Where I define the field 'Primary' in the raw_data dataframe, it always returns NaN, but, if I step through, the variable Primary_Name does get populated.
Any idea why this might be?
for index, row in raw_data.iterrows():
start = (row['lat'], row['long'])
for index, row in schools.iterrows():
schoolloc = (row['Latitude'], row['Longitude'])
schools.loc[index,'distance'] = geopy.distance.geodesic(start, schoolloc).km
schools.dropna()
primary = schools.where(schools['PhaseOfEducation (name)'] == 'Primary')
Nearest_Primary = primary[primary['distance'] == min(primary['distance'])]
Primary_Name = Nearest_Primary.iloc[0]['EstablishmentName']
raw_data.loc[index,'primary'] = Primary_Name

Related

How do I loop column names in a pandas dataframe?

I am new to Python and have never really used Pandas, so forgive me if this doesn't make sense. I am trying to create a df based on frontend data I am sending to a flask route. The data is looped through and appended for each row. My only problem is that I don't know how to get the df columns to reflect that. Here is my code to build the rows and the current output:
claims = csv_data["claims"]
setups = csv_data["setups"]
for setup in setups:
setup = setups[0]
offerings = setup["currentOfferings"]
considered = setup["considerationSet"]
reach_dict = setup["reach"]
favorite_dict = setup["favorite"]
summary_dict = setup["summaryMetrics"]
rows = []
for i, claim in enumerate(claims):
row = []
row.append(i + 1)
row.append(claim)
for setup in setups:
setup = setups[0]
row.append("X") if claim in setup["currentOfferings"] else row.append(float('nan'))
row.append("X") if claim in setup["considerationSet"] else row.append(float('nan'))
if claim in setup["currentOfferings"]:
reach_score = reach_dict[claim]
reach_percentage = "{:.0%}".format(reach_score)
row.append(reach_percentage)
else:
row.append(float('nan'))
if claim in setup["currentOfferings"]:
favorite_score = favorite_dict[claim]
fav_percentage = "{:.0%}".format(favorite_score)
row.append(fav_percentage)
else:
row.append(float('nan'))
rows.append(row)
I know that I can put columns = ["#", "Claims", "Setups", etc...] in the df, but that doesn't work because the rows are looping through multiple setups, and the number of setups can change. If I don't specify the column names (how it is in the image), then I just have numbers as columns names. Ideally it should loop through the data it receives in the route, and would start with "#" "Claims" as columns, and then for each setup "Setup 1", "Consideration Set 1", "Reach", "Favorite", "Setup 2", "Consideration Set 2", and so on... etc.
I tried to create a similar type of loop for the columns:
my_columns = []
for i, row in enumerate(rows):
col = []
if row[0] != None:
col.append("#")
else:
pass
if row[1] != None:
col.append("Claims")
else:
pass
if row[2] != None:
col.append("Setup")
else:
pass
if row[3] != None:
col.append("Consideration Set")
else:
pass
if row[4] != None:
col.append("Reach")
else:
pass
if row[5] != None:
col.append("Favorite")
else:
pass
my_columns.append(col)
df = pd.DataFrame(
rows,
columns = my_columns
)
But this didn't work because I have the same issue of no loop, I have 6 columns passed and 10 data columns passed. I'm not sure if I am just not doing the loop of the columns properly, or if I am making everything more complicated than it needs to be.
This is what I am trying to accomplish without having to explicitly name the columns because this is just sample data. There could end up being 3, 4, however many setups in the actual app.
what I would like the ouput to look like
I don't know if this is the most efficient way of doing something like this but I think this is what you want to achieve.
def create_columns(df):
new_cols=[]
for i in range(len(df.columns)):
repeated_cols = 6 #here is the number of columns you need to repeat for every setup
idx = 1 + i // repeated_cols
basic = ['#', 'Claims', f'Setup_{idx}', f'Consideration_Set_{idx}', 'Reach', 'Favorite']
new_cols.append(basic[i % len(basic)])
return new_cols
df.columns = create_columns(df)
If your data comes as csv then try pd.read_csv() to create dataframe.

How to create new columns of last 5 sale price off in dataframe

I have a pandas data frame of sneakers sale, which looks like this,
I added columns last1, ..., last5 indicating the last 5 sale prices of the sneakers and made them all None. I'm trying to update the values of these new columns using the 'Sale Price' column. This is my attempt to do so,
for index, row in df.iterrows():
if (index==0):
continue
for i in range(index-1, -1, -1):
if df['Sneaker Name'][index] == df['Sneaker Name'][i]:
df['last5'][index] = df['last4'][i]
df['last4'][index] = df['last3'][i]
df['last3'][index] = df['last2'][i]
df['last2'][index] = df['last1'][i]
df['last1'][index] = df['Sale Price'][i]
continue
if (index == 100):
break
When I ran this, I got a warning,
A value is trying to be set on a copy of a slice from a DataFrame
and the result is also wrong.
Does anyone know what I did wrong?
Also, this is the expected output,
Use this instead of for loop, if you have rows sorted:
df['last1'] = df['Sale Price'].shift(1)
df['last2'] = df['last1'].shift(1)
df['last3'] = df['last2'].shift(1)
df['last4'] = df['last3'].shift(1)
df['last5'] = df['last4'].shift(1)

How to build a dataframe from scratch while filling in missing data? (details included in question)

I have a dataframe which looks like the following (Name of the first dataframe(image below) is relevantdata in the code):
I want the dataframe to be transformed to the following format:
Essentially, I want to get the relevant confirmed number for each Key for all the dates that are available in the dataframe. If a particular date is not available for a Key, we make that value to be zero.
Currently my code is as follows (A try/except block is used as some Keys don't have the the whole range of dates, hence a Keyerror occurs the first time you refer to that date using countrydata.at[date,'Confirmed'] for the respective Key, hence the except block will make an entry of zero into the dictionary for that date):
relevantdata = pandas.read_csv('https://raw.githubusercontent.com/open-covid-19/data/master/output/data_minimal.csv')
dates = relevantdata['Date'].unique().tolist()
covidcountries = relevantdata['Key'].unique().tolist()
data = dict()
data['Country'] = covidcountries
confirmeddata = relevantdata[['Date','Key','Confirmed']]
for country in covidcountries:
for date in dates:
countrydata = confirmeddata.loc[lambda confirmeddata: confirmeddata['Key'] == country].set_index('Date')
try:
if (date in data.keys()) == False:
data[date] = list()
data[date].append(countrydata.at[date,'Confirmed'])
else:
data[date].append(countrydata.at[date,'Confirmed'])
except:
if (date in data.keys()) == False:
data[date].append(0)
else:
data[date].append(0)
finaldf = pandas.DataFrame(data = data)
While the above code accomplished what I want in getting the dataframe in the format I require, it is way too slow, having to loop through every key and date. I want to know if there is a better and faster method to doing the same without having to use a nested for loop. Thank you for all your help.

Must pass DataFrame with boolean values only after 71 iterations

I've checked posts and haven't found a solution to my problem. I'm getting the error I put in the subject after the code works fine.
I'm simply trying to add a row to a holder dataframe that only appends rows that aren't similar to previously appended rows. You'll see that friend is checked against 'Target' and Target against 'Friend' in the query.
It iterates 71 times before giving me the error. 'cur' is the iterator, which is not included in this section of code. Here's the code:
same = df[(df['Source']==cur) & (df['StratDiff']==0)]
holder = pd.DataFrame(index=['pbp'],columns=['Source', 'Target', 'Friend', 'SS', 'TS', 'FS'])
holder.iloc[0:0]
i=1
for index, row in same.iterrows():
Target = row['Target']
stratcur = row['SourceStrategy']
strattar = row['TargetStrategy']
sametarget = df[(df['Source']==Target)]
samejoin = pd.merge(same, sametarget, how='inner', left_on=['Target'],
right_on = ['Target'])
for index, row in samejoin.iterrows():
Friend = row['Target']
stratfriend = row['TargetStrategy_x']
#print(cur, Friend, Target)
temp = holder[holder[(holder['Source']==cur) &
(holder['Target']==Friend) & (holder['Friend']==Target)]]
if temp.isnull().values.any():
holder.loc[i] = [cur,Target,Friend,stratcur,strattar,stratfriend]
print(i, cur)
i=i+1
I just want to update everyone. I was able to solve this. It took awhile, but the problem was located in line where I query holder. It was too complex. I simplified it into multiple, simpler queries. It works fine now.

Google chart input data

I have a python script to build inputs for a Google chart. It correctly creates column headers and the correct number of rows, but repeats the data for the last row in every row. I tried explicitly setting the row indices rather than using a loop (which wouldn't work in practice, but should have worked in testing). It still gives me the same values for each entry. I also had it working when I had this code on the same page as the HTML user form.
end1 = number of rows in the data table
end2 = number of columns in the data table represented by a list of column headers
viewData = data stored in database
c = connections['default'].cursor()
c.execute("SELECT * FROM {0}.\"{1}\"".format(analysis_schema, viewName))
viewData=c.fetchall()
curDesc = c.description
end1 = len(viewData)
end2 = len(curDesc)
Creates column headers:
colOrder=[curDesc[2][0]]
if activityOrCommodity=="activity":
tableDescription={curDesc[2][0] : ("string", "Activity")}
elif (activityOrCommodity == "commodity") or (activityOrCommodity == "aa_commodity"):
tableDescription={curDesc[2][0] : ("string", "Commodity")}
for i in range(3,end2 ):
attValue = curDesc[i][0]
tableDescription[curDesc[i][0]]= ("number", attValue)
colOrder.append(curDesc[i][0])
Creates row data:
data=[]
values = {}
for i in range(0,end1):
for j in range(2, end2):
if j == 2:
values[curDesc[j][0]] = viewData[i][j].encode("utf-8")
else:
values[curDesc[j][0]] = viewData[i][j]
data.append(values)
dataTable = gviz_api.DataTable(tableDescription)
dataTable.LoadData(data)
return dataTable.ToJSon(columns_order=colOrder)
An example javascript output:
var dt = new google.visualization.DataTable({cols:[{id:'activity',label:'Activity',type:'string'},{id:'size',label:'size',type:'number'},{id:'compositeutility',label:'compositeutility',type:'number'}],rows:[{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]}]}, 0.6);
it seems you're appending values to the data but your values are not being reset after each iteration...
i assume this is not intended right? if so just move values inside the first for loop in your row setting code

Categories

Resources