Pandas reset index is not taking effect [duplicate] - python

This question already has answers here:
How to convert index of a pandas dataframe into a column
(9 answers)
Closed 1 year ago.
I'm not sure where I am astray but I cannot seem to reset the index on a dataframe.
When I run test.head(), I get the output below:
As you can see, the dataframe is a slice, so the index is out of bounds.
What I'd like to do is to reset the index for this dataframe. So I run test.reset_index(drop=True). This outputs the following:
That looks like a new index, but it's not. Running test.head again, the index is still the same. Attempting to use lambda.apply or iterrows() creates problems with the dataframe.
How can I really reset the index?

reset_index by default does not modify the DataFrame; it returns a new DataFrame with the reset index. If you want to modify the original, use the inplace argument: df.reset_index(drop=True, inplace=True). Alternatively, assign the result of reset_index by doing df = df.reset_index(drop=True).

BrenBarn's answer works.
The following also worked via this thread, which isn't a troubleshooting so much as an articulation of how to reset the index:
test = test.reset_index(drop=True)

As an extension of in code veritas's answer... instead of doing del at the end:
test = test.reset_index()
del test['index']
You can set drop to True.
test = test.reset_index(drop=True)

I would add to in code veritas's answer:
If you already have an index column specified, then you can save the del, of course. In my hypothetical example:
df_total_sales_customers = pd.DataFrame({'Sales': total_sales_customers['Sales'],
'Customers': total_sales_customers['Customers']}, index = total_sales_customers.index)
df_total_sales_customers = df_total_sales_customers.reset_index()

Related

Using Pandas function isin() [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
I explain my problem to you. I have a data frame and I want to add a column (true / false). This dataframe contains the following columns: Référence, msn, description... I have another dataframe containing a reference called "AM" and other columns. The objective of filling this one column (true / false) if there is a correspondence between the two tables on the refe field.
here is my python code:
df["Avis BE"]=False
df[df["Référence"].isin(df1["AM"])]["Avis BE"]=True
I have this error message:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
It's a warning, use
df.loc[:, "Avis BE"] = False
df.loc[df["Référence"].isin(df1["AM"]), "Avis BE"] = True
Also refer pandas documentation for indexing and setting values.
it highlights this issue and suggesets better practices.
Documentation

Pandas - Appending DataFrame [duplicate]

This question already has answers here:
df.append() is not appending to the DataFrame
(2 answers)
Closed 1 year ago.
When appending to a pandas DataFrame, the appended value doesn't get added to the DataFrame.
I am trying to make an empty DataFrame, and then be able to add more rows onto it, later in my code.
import pandas
df = pandas.DataFrame(columns=["A"])
df.append(DataFrame([[1]]))
print(df)
Output:
Empty DataFrame
Columns: [date, start_time, end_time]
Index: []
Any ideas what I might be doing wrong?
According to the documentation this should work as expected with a new row of value 1 under column A. However, as described above, instead it doesn't append a new row.
As #HenryEcker mentionned to you, append returns a copy of the dataframe with the new values. Your code should be:
import pandas
df = pandas.DataFrame(columns=["A"])
df = df.append(pandas.DataFrame([1], columns=['A']))
print(df)
Output:
A
0 1

How to convert index and values to a proper dataframe with callable column names? Python Pandas [duplicate]

This question already has answers here:
Accessing a Pandas index like a regular column
(3 answers)
Closed 1 year ago.
I am working on this dataset where I have used the sort_values() function to get two distinct columns: the index and the values. I can even rename the index and the values columns. However, if I rename the dataset columns and assign everything to a new dataframe, I am not able to call the index column with the name that I assigned to it earlier.
pm_freq = df["payment_method"].value_counts()
pm_freq = pm_freq.head().to_frame()
pm_freq.index.name="Method"
pm_freq.rename(columns={"payment_method":"Frequency"},inplace=True)
pm_freq
Now I want to call it like this:
pm_freq["Method"]
But there's an error which states:
" KeyError: 'Method'"
Can someone please help me out with this?
Check out the comment here, not sure if still correct:
https://stackoverflow.com/a/18023468/15600610

Losing data on a Pandas DataFrame reindex [duplicate]

This question already has answers here:
Difference between df.reindex() and df.set_index() methods in pandas
(3 answers)
Closed 2 years ago.
I was losing data on a reindex. I just wanted to make an existing column the index.
So this works:
df_all_maa = df_all_maa.set_index("VERSION_SEQ")
Originally I was doing this:
df_all_maa = df_all_maa.reindex(df_all_maa["VERSION_SEQ"])
I think what was happening was I was only getting values in the resulting dataframe, where the VERSION_SEQ value happened to match the numeric default index, but I would be interested to know what my original incorrect syntax was actually doing.
reindex is similar to loc, but allowing non-existing indexes. reindex creates a row with nan values whence there are non-existing indexes, while loc would throw an error.

assign new value to repeated (or multiple) objective element(s) to a pandas dataframe [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 4 years ago.
I have a pandas dataframe:
df = pd.DataFrame({'AKey':[1, 9999, 1, 1, 9999, 2, 2, 2],\
'AnotherKey':[1, 1, 1, 1, 2, 2, 2, 2]})
I want to assign a new value to a specific column and for each element having a specific value in that column.
Let say I want to assign the new value 8888 to the elements having value 9999.
I tried the following:
df[df["AKey"]==9999]["AKey"]=8888
but it returns the following error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
So I tried to use loc
df.loc[df["AKey"]==9999]["AKey"]=8888
which returned the same error.
I would appreciate some help and some explanation on the error as I really can't wrap my head around it.
You can use loc in this way:
df.loc[df["AKey"]==9999, "AKey"] = 8888
Producing the following output:
With your original code you are first slicing the dataframe with:
df.loc[df["AKey"]==9999]
Then assign a value for the sliced dataframe's column AKey.
["AKey"]=8888
In other words, you were updating the slice, not the dataframe itself.
From Pandas documentatiom:
.loc[] is primarily label based, but may also be used with a boolean
array.
Breaking down the code:
df.loc[df["AKey"]==9999, "AKey"]
df["AKey"]==9999 will return a boolean array identifying the rows, and the string "Akey" will identify the column that will receive the new value, at once without slicing.
Ok, I found a solution. It works if I use logical indexing to also identify the column.
df.loc[df["AKey"]==9999& df["AKey"]]=8888
However I would still appreciate help on the error I was receiving as it is not fully clear to me why Python thought that I was slicing instead of indexing

Categories

Resources