I have a dataframe that looks like this:
It looks a little weird because there is a blank space under 'product_id', but it is a dataframe. I tested it with this method.
if isinstance(prod_names, pd.DataFrame):
print('DF')
The DF comes from a count function.
prod_names = pd.DataFrame(df.groupby('product_name')['product_id'].count().sort_values(ascending=False).head(20))
Now, I am trying to plot the results, like this.
pd.value_counts(prod_names['product_name']).plot.bar()
When I run that line of code, I get this error:
KeyError: 'product_name'
When I list the field names in the 'product_names' dataframe
list(prod_names)
I see only: ['product_id']
For some reason, the 'product_name' field is missing. It may have something to do with the space under the 'product_id', but I'm not sure. Thoughts?
Most probably your product name here comes as an index. I don't see any other indexes here. And in this case you cannot access it through column name. You may or reset index(add new numerical column) via prod_names.reset_index() or alternatively just call prod_names.index to see product name info. In first case you can keep your function, in second you can modify it to smth like pd.value_counts(prod_names.index).plot.bar()
Related
I am making a data frame by concatenating several data frames .The code is given below.
summary_FR =pd.concat([Chip_Cur_Summary_funct_mode2,Noise_Summary_funct_mode2,VCM_Summary_funct_mode2,Sens_Summary_funct_mode2,Vbias_Summary_funct_mode2,vcm_delta_Summary_funct_mode2,THD_FUN_M2,F_LOW_FUNC_Summary_mode2,OSC_FUNC_Summary_mode2,FOSC_FUNC_Summary_mode2,VREF_CP_FUNC_Summary_mode2,Summary_PSRR_1KHz_funct_mode2,Summary_PSRR_20Hzto20KHz_funct_mode2])
The image of the table is given below. You can see that the 1st column don't have any name.I need to set it name as Parameter and make it as unique index column.
I tried the below code to set the name as 'Parameter' and I failed.
summary_FR.columns = ["Parameters", "SPEC_MIN", "SPEC_TYP", "SPEC_MAX","min","mean","max","std","Units","Remarks"]
# summary_FR.set_index()
May I know where I went wrong.Can someone please help me.
It helps to share the error message but that is probably an index column. You need to call reset_index on the concatenated dataframe like
summary_FR =pd.concat([Chip_Cur_,...,Summar]).reset_index()
then you can change the colun names.
You can give a name for the index in the following way:
your_dataframe.index.name = 'Parameter'
I am new to Python and I want to access some rows for an already grouped dataframe (used groupby).
However, I am unable to select the row I want and would like your help.
The code I used for groupby shown below:
language_conversion = house_ads.groupby(['date_served','language_preferred']).agg({'user_id':'nunique',
'converted':'sum'})
language_conversion
Result shows:
For example, I want to access the number of Spanish-speaking users who received house ads using:
language_conversion[('user_id','Spanish')]
gives me KeyError('user_id','Spanish')
This is the same when I try to create a new column, which gives me the same error.
Thanks for your help
Use this,
language_conversion.loc[(slice(None), 'Arabic'), 'user_id']
You can see the indices(in this case tuples of length 2) using language_conversion.index
you should use this
language_conversion.loc[(slice(None),'Spanish'), 'user_id']
slice(None) here includes all rows in date index.
if you have one particular date in mind just replace slice(None) with that specific date.
the error you are getting is because u accessed columns before indexes which is not correct way of doing it follow the link to learn more indexing
Apologies for the fairly basic question.
Basically I have a large dataframe where I'm pulling out the top dates for the sum of certain values. Looks like this:
hv_toploss = hv.groupby(['END_VALID_DT']).sum()
hv_toploss=hv_toploss.sort_values('TOTALPL',ascending=False).iloc[:10]
hv_toploss['END_VALID_DT'] = pd.to_datetime(hv_toploss['END_VALID_DT'])
Now, END_VALID_DT becomes the index of hv_toploss, and I get a KeyError when running line 3. If I try to reindex, I get a multi-index error, and since these are the values I need, I can't just drop the index.
I will be calling these values in a line like:
PnlByDay = PnlByDay.loc[hv_toploss['END_VALID_DT']]
Any help here would be great. I'm still a novice using Python.
You can use the index directly instead of creating another column containing the index.
the_dates = hv_toploss.sort_values('TOTALPL',ascending=False).iloc[:10].index
PnlByDay.loc[PnlByDay.index.isin(the_dates)]
I don't know the structure of PnlByDay, so you may have to modify that part.
Ok I got around this by just copying the index values into a new column and using that.
hv_toploss = hv.groupby(['END_VALID_DT']).sum()
hv_toploss['Scenario_Dates'] = hv_toploss.index
hv_toploss=hv_toploss.sort_values('TOTALPL',ascending=False).iloc[:10]
However any input on how to do this properly please advise.
I have a dataframe which looks like this
I tried to delete matchId but no matter what I use to delete it, for preprocessing, its outputting this error:
KeyError: "['matchId'] not found in axis"
What you attempted to do (which you should have mentioned in the question) is probably failing because you assume that the matchID column is a normal column. It is actually a special, index column and so cannot be accessed in the same way other columns can be accessed.
As suggested by anky_91, because of that, you should do
df = df.reset_index(drop=True)
if you want to completely remove the indexes in your table. This will replace them with the default indexes. To just make them into another column, you can just remove the drop=True from the above statement.
Your table will always have indexes, however, so you cannot completely get rid of them.
You can, however, output it with
df.values
and this will ignore the indexes and show just the values as arrays.
I have a project in which I need to be able to drop a line in a dataframe. However, whenever I try, I get an error no matter what I try
I've tried changing the order of the things in df.drop. I've also tried changing the type of the file to csv without success. And now I can't change it anymore.
import pandas as pd
df = pd.read_csv('Partitions.csv', index_col = 0)
choice = int(input("Which do you want to delete?")
df.drop([choice], inplace = True)
df.to_csv('Partitions.csv')
Partitions.csv:
,Composer,Title,
0,Beethoven, Fur Elise
1,Mozart,Symphony 2
I would like to be able to delete any line from the csv file but I always seem to get "Key Error: "['choice'] not found in axis"
I am assuming you want to drop row by iloc, i.e. the serial number of the row. It can be achieved in a roundabout way.
df.drop(df.index[i], inplace=True)
Edit - Reason behind
pandas.DataFrame.drop by default work on labels, i.e. either index or column. There is no direct way to use the method. So we need to mention the index value of the row that we want to drop, which can be obtained with df.index[i], assuming we want to drop ith row from the top