This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
I cannot figure this out. I want to change the "type" column in this dataset to 0/1 values.
url = "http://www.stats.ox.ac.uk/pub/PRNN/pima.tr"
Pima_training = pd.read_csv(url,sep = '\s+')
Pima_training["type"] = Pima_training["type"].apply(lambda x : 1 if x == 'Yes' else 0)
I get the following error:
A value is trying to be set on a copy of a slice from a DataFrame.
This is a warning and won't break your code. This happens when pandas detects chained assignment, which is when you use multiple indexing operations, and there might be ambiguity about whether you are modifying the original df or a copy of the df. Other more experienced programmers have explained it in depth in another SO thread, so feel free to give it a read for a further explanation.
In your particular example, you don't need .apply at all here (see this question for why not, but using apply on a single column is very inefficient because it loops over rows internally), and I think it makes more sense to use .replace instead, and a pass a dictionary.
Pima_training['type'] = Pima_training['type'].replace({"No":0,"Yes":1})
This question already has an answer here:
Pandas, loc vs non loc for boolean indexing
(1 answer)
Closed 2 years ago.
I am learning pandas and want to know the best practice for filtering rows of a DataFrame by column values.
According to https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html, the recommendation is to use optimized pandas data access methods such as .loc
An example from https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html -
df.loc[df['shield'] > 6]
However, according to https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#where, a construction like tips[tips['time'] == 'Dinner'] could be used.
Why is the recommended .loc omitted? Is there any difference?
With .loc you can also correctly set a value, as not using it raises an you are trying to set a value on a copy of a DataFrame error. For getting something out of your DataFrame, there might be performance differences, but I don't know that.
This question already has answers here:
Removing index column in pandas when reading a csv
(9 answers)
Closed 3 years ago.
here is my dataframe
my csv file is
date,open,high,low,close,volume,cap,Unnamed: 7
20190816,28600,28850,28150,28350,335508,6065213000000,
20190814,29550,29600,28800,28950,296026,6193563000000,
20190813,29400,29900,29400,29550,196955,6321927000000,
20190812,29450,30350,29400,29850,166580,6386109000000,
20190809,29500,30300,29450,29750,468338,6364715000000,
20190808,29000,30000,29000,29650,448959,6343321000000,
20190807,29800,29800,28950,29000,431524,6204260000000,
20190806,30900,30950,29650,29900,710348,6396806000000,
20190805,30300,31100,30300,30950,608970,6621443000000,
20190802,30400,30750,29900,30400,420984,6503776000000,
I don't know why 0 ~ 11 index exists
I want to remove this (0~11)
I searched and tried index_col=False, index_col=None and to_csv with index=False but The problem was not resolved.
how can I remove this index(0~11)?
Your valuable opinions and thoughts will be very much appreciated.
The only solution that fully matches what you desire is to create a string:
print(df.to_string(index=False))
Another solution would be the below, it will still be a dataframe, but just the first column's values will be shifted down one level:
print(df.set_index('date'))
You cannot remove the index of a Pandas DataFrame. It is not one of the columns of your DataFrame. And it is not coming from the csv file.
See: https://stackoverflow.com/a/20107825/4936825
You can not display this automatically generated index with „df.style.hide_index()“ or set one of the colums as index with set_index() method.
This question already has answers here:
drop_duplicates not working in pandas?
(7 answers)
DataFrame.drop_duplicates and DataFrame.drop not removing rows
(2 answers)
Closed 3 years ago.
I've got a DataSet with two columns, one with categorical value (State2), and another (State) that contains the same values only in binary.
I used OneHotEncoding.
import pandas as pd
mydataset = pd.read_csv('fieldprotobackup.binetflow')
mydataset.drop_duplicates(['Proto2','Proto'], keep='first')
mydataset.to_csv('fieldprotobackup.binetflow', columns=['Proto2','Proto'], index=False)
Dataset
I'd like to remove all redundancies from the file. While researching, I found the command df.drop_duplicates, but it's not working for me.
You either need to add the inplace=True parameter, or you need to capture the returned dataframe:
mydataset.drop_duplicates(['Proto2','Proto'], keep='first', inplace=True)
or
no_duplicates = mydataset.drop_duplicates(['Proto2','Proto'], keep='first')
Always a good idea to check the documentation when something isn't working as expected.
This question already has answers here:
Action with pandas SettingWithCopyWarning
(1 answer)
Confusion re: pandas copy of slice of dataframe warning
(1 answer)
Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning
(2 answers)
Closed 5 years ago.
My attempt: df['uid'] = df.uid.astype(int)
Which works...! However, Python doesn't like it:
A value is trying to be set on a copy of a slice from a DataFrame. Try
using .loc[row_indexer,col_indexer] = value instead
My question - what's the "best practice" of how to do this simple code?
Research so far:
Pandas: change data type of columns
A value is trying to be set on a copy of a slice from a DataFrame Warning
Attempts:
df[df['uid']].astype(int)
...some of the links say to use iloc but I can't see how that can be used here...