I have a data frame as the following;
I am trying to use the reshape function from pandas package and it keep giving me the error that
" the id variables need to uniquely identify each row".
This is my code to reshape:
link to the data: https://pastebin.com/GzujhX3d
GG_long=pd.wide_to_long(data_GG,stubnames='time_',i=['Customer', 'date'], j='Cons')
The combination of 'Customer' and 'Date' is a unique row within my data, so I don't understand why it throws me this error and how I can fix it. Any help is appreciated.
I could identify the issue. the error was due to two things- first the name of the columns having ":" in them and second the format of the date- for some reason it doesn't like dd-mm-yy, instead it works with dd/mm/yy.
Related
I am making a data frame by concatenating several data frames .The code is given below.
summary_FR =pd.concat([Chip_Cur_Summary_funct_mode2,Noise_Summary_funct_mode2,VCM_Summary_funct_mode2,Sens_Summary_funct_mode2,Vbias_Summary_funct_mode2,vcm_delta_Summary_funct_mode2,THD_FUN_M2,F_LOW_FUNC_Summary_mode2,OSC_FUNC_Summary_mode2,FOSC_FUNC_Summary_mode2,VREF_CP_FUNC_Summary_mode2,Summary_PSRR_1KHz_funct_mode2,Summary_PSRR_20Hzto20KHz_funct_mode2])
The image of the table is given below. You can see that the 1st column don't have any name.I need to set it name as Parameter and make it as unique index column.
I tried the below code to set the name as 'Parameter' and I failed.
summary_FR.columns = ["Parameters", "SPEC_MIN", "SPEC_TYP", "SPEC_MAX","min","mean","max","std","Units","Remarks"]
# summary_FR.set_index()
May I know where I went wrong.Can someone please help me.
It helps to share the error message but that is probably an index column. You need to call reset_index on the concatenated dataframe like
summary_FR =pd.concat([Chip_Cur_,...,Summar]).reset_index()
then you can change the colun names.
You can give a name for the index in the following way:
your_dataframe.index.name = 'Parameter'
I am new to Python and I want to access some rows for an already grouped dataframe (used groupby).
However, I am unable to select the row I want and would like your help.
The code I used for groupby shown below:
language_conversion = house_ads.groupby(['date_served','language_preferred']).agg({'user_id':'nunique',
'converted':'sum'})
language_conversion
Result shows:
For example, I want to access the number of Spanish-speaking users who received house ads using:
language_conversion[('user_id','Spanish')]
gives me KeyError('user_id','Spanish')
This is the same when I try to create a new column, which gives me the same error.
Thanks for your help
Use this,
language_conversion.loc[(slice(None), 'Arabic'), 'user_id']
You can see the indices(in this case tuples of length 2) using language_conversion.index
you should use this
language_conversion.loc[(slice(None),'Spanish'), 'user_id']
slice(None) here includes all rows in date index.
if you have one particular date in mind just replace slice(None) with that specific date.
the error you are getting is because u accessed columns before indexes which is not correct way of doing it follow the link to learn more indexing
I have a dataframe that's the result of importing a csv and then performing a few operations and adding a column that's the difference between two other columns (column 10 - column 9 let's say). I am trying to sort the dataframe by the absolute value of that difference column, without changing its value or adding another column.
I have seen this syntax over and over all over the internet, with indications that it was a success (accepted answers, comments saying "thanks, that worked", etc.). However, I get the error you see below:
df.sort_values(by='Difference', ascending=False, inplace=True, key=abs)
Error:
TypeError: sort_values() got an unexpected keyword argument 'key'
I'm not sure why the syntax that I see working for other people is not working for me. I have a lot more going on with the code and other dataframes, so it's not a pandas import problem I don't think.
I have moved on and just made a new column that is the absolute value of the difference column and sorted by that, and exclude that column from my export to worksheet, but I really would like to know how to get it to work the other way. Any help is appreciated.
I'm using Python 3
df.loc[(df.c - df.b).sort_values(ascending = False).index]
Sorting by difference between "c" and "b" , without creating new column.
I hope this is what you were looking for.
key is optional argument
It accepts series as input , maybe you were working with dataframe.
check this
I am trying to calculate one column mean from an excel.
I delete all the null value and '-' in the column called 'TFD' and form a new dataframe by selecting three columns. I want to calculated the mean from the new dataframe with groupby. But there is an error called "No numeric types to aggregate", I don't know why I have this error and how to fix it.
sheet=pd.read_excel(file)
sheet_copy=sheet
sheet_copy=sheet_copy[(~sheet_copy['TFD'].isin(['-']))&(~sheet_copy['TFD'].isnull())]
sheet_copy=sheet_copy[['Participant ID','Paragraph','TFD']]
means=sheet_copy['TFD'].groupby([sheet_copy['Participant ID'],sheet_copy['Paragraph']]).mean()
From your spreadsheet snippet above it looks as though your Participant ID and Paragraph columns have data types which are Text formats which leads me to believe that they will be strings inside of your dataframe? Which leads me to believe this is precisely where your issue lies from the exception "No numeric types to aggregate"
Following this, here are some good examples of group by with the mean clause from the pandas documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.mean.html
If you had the dataset to hand I would have tried it out myself and provided a snippet of the code used.
I have tried to write the following code but I get the error message: "ValueError: Cannot shift with no freq."
I have no idea of how to fix it? I tried to google on the error message but couldn't find any case similar to mine.
df is a python pandas dataframe for which I want to create new columns showing the daily change. The code is shown below. How can I fix the code to avoid the value error?
for column_names in df:
df[column_names+'%-daily'] =df[column_names].pct_change(freq=1).fillna(0)
The problem was that I had date as index. Since only weekdays were shown delta became incorrect. When I changed to period.
for column_names in list(df.columns.values):
df[column_names+'%-daily'] =df[column_names].pct_change(periods=1).fillna(0)