I have a list in my pandas data frame and i want to delete all the rows that have a specific value in a columns up to the 10th row.
Something like this,
del df[df.value == 5][:10]
but it's not working
anyone know the proper syntax
thanks
Related
I have a df having a single column containing rows of repeating data. I want to display a pivot table of unique values of that column along with their count. I know it would be some sort of groupby however I could not get it to work, please help.
.
Try:
df.groupby("PdDistrict").size()
I am trying to get a list of the corresponding index names of 500,000 values out of 2,000,000 entries in a pandas dataframe which are located in a specific column names "entity_id" (out of 1000+ columns). My solution has been the following code:
index_names_list= []
for id in id_dataframe:
index_names_list.append(full_data[full_data['entity_id'] == id ].index.values)
However, this runs very very slow. Can anyone suggest a better and more efficient way of doing it?
try this,
full_data.loc[full_data['entity_id'].isin(id_dataframe),:].index.tolist()
I have a DataFrame with four columns and want to generate a new DataFrame with only one column containing the maximum value of each row.
Using df2 = df1.max(axis=1) gave me the correct results, but the column is titled 0 and is not operable. Meaning I can not check it's data type or change it's name, which is critical for further processing. Does anyone know what is going on here? Or better yet, has a better way to generate this new DataFrame?
It is Series, for one column DataFrame use Series.to_frame:
df2 = df1.max(axis=1).to_frame('maximum')
I am trying to create a DataFrame from a simple if statement result with no success. Could you show me the right method, please? This is what I have so far but the value of discrep is not added to the DataFrame.
discrepancy_value=round(system_availability.iloc[0,0]-data_av.iloc[0,0],2)
discrep=[]
if discrepancy_value>1:
discrep=discrepancy_value
else:
discrep=r'Discrepancy is not significant'
discrepancy=pd.DataFrame()
discrepancy['Discrepancy']=discrep
Your problem is, that you are trying to insert a single value in the dataframe. The dataframe needs lists, not values.
What you should be doing is:
discrep=[]
if discrepancy_value>1:
discrep.append(discrepancy_value)
else:
discrep.append(r'Discrepancy is not significant')
discrepancy=pd.DataFrame()
discrepancy['Discrepancy']=discrep
On one line:
discrepancy = pd.DataFrame({'Discrepancy': [discrepancy_value if discrepancy_value > 1 else r'Discrepancy is not significant']})
You are trying to set a column on an empty dataset with 0 rows. If there would be already rows in the dataframe the following would add the same value to all rows:
discrepancy['Discrepancy']=discrep
But because there are no rows in the dataframe, the column is not added to any row.
You could append a new row with the column value like this:
discrepancy.append([{'Discrepancy': discrep}])
Or add the row already when you create the dataframe
discrepancy=pd.DataFrame([{'Discrepancy': discrep}])
I have a data frame df with a column called "Num_of_employees", which has values like 50-100, 200-500 etc. I see a problem with few values in my data. Wherever the employee number should be 1-10, the data has it as 10-Jan. Also, wherever the value should be 11-50, the data has it as Nov-50. How would I rectify this problem using pandas?
A clean syntax for this kind of "find and replace" uses a dict, as
df.Num_of_employees = df.Num_of_employees.replace({"10-Jan": "1-10",
"Nov-50": "11-50"})