How to drop all strings in a column using a wildcard? - python

I have some data that changes regularly but the column headers need to be consistent (so I cant drop the headers) but I need to clear our the strings in a given column.
This is what I have now but this only seems to work for where I know what the string is called and one at a time?
df1= pd.read_csv(r'C:\Users\Test.csv')
df2 = df1.drop(df1[~(df1['column'] != 'String1')].index)

You can use the pd.drop function which removes rows having a specific index from a dataframe.
for i in df.index:
if type(df.loc[i, 'Aborted Reason']) == str:
df.drop(i, inplace = True)
df.drop will remove the index having a string in the relevant column from the dataframe.

Related

How to drop rows with empty string values in certain columns?

df.dropna(subset=[column_name], inplace=True)
I have a dataframe with some missing values in some column (column_name). For example, one value is the empty string, ''. But the code above doesn't drop the row with such empty values.
Isn't the code right way to do it?
The code does not drop those because they are not na they are an empty string. For those you would have to do something like:
rows_to_drop = df[df[column_name]==''].index
df.drop(rows_to_drop, inplace=True)
Alternative:
Something like this would also work:
df = df.loc[df[column_name]!='',:]

Remove a row based on two empty columns in python pandas

I want to be able to remove rows that are empty in column NymexPlus and NymexMinus
right now the code I have is
df.dropna(subset=['NymexPlus'], inplace=True)
The thing about this code is that it will also delete rows in the column NymexMinus which I don't want to happen.
Is there an If/AND statement that will work in terms of only getting rid of empty cells for both of the columns?
Use a list as subset parameter and how='all':
df.dropna(subset=['NymexPlus', 'NymexMinus'], how='all', inplace=True)

Is there a way to reverse the dropping method in pandas?

I'm aware that you can use
df1 = df1[df1['Computer Name'] != 'someNameToBeDropped']
to drop a given string as a row
what if i wanted to do it the other way around. Let's say dropping everything except what i have in a list of strings.
is there a simple hack I haven't noticed?
Try this to get rows such that value of col is in that given list
df = df[df[column].isin(list_of_strings)]
Additional to exclude what's in the list
df = df[~df[column].isin(list_of_values)]

Cant drop columns with pandas if index_col = 0 is used while reading csv's [duplicate]

I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "
When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.
df.reset_index(drop=True, inplace=True)
DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.
You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)
If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))
you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data
One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)
To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.
It works for me this way:
Df = data.set_index("name of the column header to start as index column" )

Replace values in column based on condition, then return dataframe

I'd like to replace some values in the first row of a dataframe by a dummy.
df[[0]].replace(["x"], ["dummy"])
The problem here is that the values in the first column are replaced, but not as part of the dataframe.
print(df)
yields the dataframe with the original data in column 1. I've tried
df[(df[[0]].replace(["x"], ["dummy"]))]
which doesn't work either..
replace returns a copy of the data by default, so you need to either overwrite the df by self-assign or pass inplace=True:
df[[0]].replace(["x"], ["dummy"], inplace=True)
or
df[0] = df[[0]].replace(["x"], ["dummy"])
see the docs

Categories

Resources