I'm trying to loop over columns to find a 0 in a specific cell (eg 'Users 0') in all columns of the df and replace the cell with null.
I tried running this :
for col in df.columns:
df.loc[sa[col].str.contains('0'), col] = ''
But it gives me 'DataFrame' object has no attribute 'str'
This could be because your dataframe has multiple columns with the same name. I can recreate this error by doing the following:
import pandas as pd
df = pd.DataFrame([['0','1','2'],['2','4','5']],columns = ['a','b','b'])
for c in df.columns:
print(df[c].str.replace("1",""))
The problem is that once you get to the repeated column name (in my example, when c == 'b'), then df[c] is actually a dataframe with 2 cols and .str is not available.
So if this is your issue, find the columns with the same name and give them unique names.
Also, as mentioned by #JonClements it isn't necessary to loop over the columns at all, you can just do df = df.replace('.*0.*', '', regex=True)
Related
csv with df
import pandas as pd
df = pd.read_csv('loves_1.csv')
in the column FuelPrices you'll see another df
df1 = pd.DataFrame(df['FuelPrices'][0])
df1
so, how to extract values of LastPriceChangeDateTime and CashPrice as a key:value pair in to a new column of the main df for DIESEL only(df['diesel_price_change'])?
eventually, i want to append in that column dict with LastPriceChangeDateTime: CashPrice every time it's changed
i tried to loop with bunch of parameters but seems like somthing is messed up
for index, row in df.iterrows():
dfnew = pd.DataFrame(df['FuelPrices'][index])
dfnew['price_change'] = dfnew.apply(lambda row: {row['LastPriceChangeDateTime']: row['CashPrice']}, axis=1)
df['diesel_price_change'][index] = dfnew.apply(lambda x: y['price_change'] for y in x if y['ProductName'] == 'DIESEL')
i receive "'int' object is not iterable"
Unfortunately, The only way I found is to loop through it, but I still hope that i'll find pandas solution for it.
for index, row in df.iterrows():
for row in df['FuelPrices'][index]:
if row['ProductName'] == 'DIESEL':
df['diesel_price_change'][index] = {row['LastPriceChangeDateTime']:row['CashPrice']}
can you try this:
df['test_v1']=df['FuelPrices'].apply(lambda x: {x[0]['LastPriceChangeDateTime']:x[0]['CashPrice']})
if you are getting TypeError: string indices must be integers use:
import ast
df['FuelPrices']=df['FuelPrices'].apply(ast.literal_eval)
df['test_v1']=df['FuelPrices'].apply(lambda x: {x[0]['LastPriceChangeDateTime']:x[0]['CashPrice']})
Can someone please help me with this. I want to call rows by name, so I used set_index on the 1st column in the dataframe to index the rows by name instead of using integers for indexing.
# Set 'Name' column as index on a Dataframe
df1 = df1.set_index("Name", inplace = True)
df1
Output:
AttributeError: 'NoneType' object has no attribute 'set_index'
Then I run the following code:
result = df1.loc["ABC4"]
result
Output:
AttributeError: 'NoneType' object has no attribute 'loc'
I don't usually run a second code that depends on the 1st before fixing the error, but originally I run them together in one Jupyter notebook cell. Now I see that the two code cells have problems.
Please let me know where I went wrong. Thank you!
Maybe you should define your dataframe?
import pandas as pd
df1 = pd.DataFrame("here's your dataframe")
df1.set_index("Name")
or just
import pandas as pd
df1 = pd.DataFrame("here's your dataframe").set_index("Name")
df1
Your variable "df1" is not defined anywhere before doing something with it.
Try this:
# Set 'Name' column as index on a Dataframe
df1 = ''
df1 = df1.set_index("Name", inplace = True)
If its defined before, its value is NONE. So check this variable first.
The rest of the code "SHOULD" work afterwards.
I have a DataFrame called df want to iterate the columns_to_encode list and get the value of df.column but I'm getting the following error (as expected). Any idea about how cancould I do it?
columns_to_encode = ['column1','column2','column3']
for column in columns_to_encode:
df.column
AttributeError: 'DataFrame' object has no attribute 'column'
Try this code, this will solve your issue:
columns_to_encode = ['column1','column2','column3']
for column in columns_to_encode:
df[column]
I have a Pandas dataframe with a lot of columns looking like p_d_d_c0, p_d_d_c1, ... p_d_d_g1, p_d_d_g2, ....
df =
a b c p_d_d_c0 p_d_d_c1 p_d_d_c2 ... p_d_d_g0 p_d_d_g1 ...
All these columns, which confirm to the regex need to be selected and their datatypes need to be changed from object to float. In particular, columns look like p_d_d_c* and p_d_d_g* are they are all object types and I would like to change them to float types. Is there a way to select columns in bulk by using regular expression and change them to float types?
I tried the answer from here, but it takes a lot of time and memory as I have hundreds of these columns.
df[df.filter(regex=("p_d_d_.*"))
I also tried:
df.select(lambda col: col.startswith('p_d_d_g'), axis=1)
But, it gives an error:
AttributeError: 'DataFrame' object has no attribute 'select'
My Pandas version is 1.0.1
So, how to select columns in bulk and change their data types using regex?
Try this:
import pandas as pd
# sample dataframe
df = pd.DataFrame(data={"co1":[1,2,3,4], "co22":[4,3,2,1], "co3":[2,3,2,4], "abc":[5,4,3,2]})
# select all columns which have co in it
floatcols = [col for col in df.columns if "co" in col]
for floatcol in floatcols:
df[floatcol] = df[floatcol].astype(float)
From the same link, and with some astype magic.
column_vals = df.columns.map(lambda x: x.startswith("p_d_d_"))
train_temp = df.loc(axis=1)[column_vals]
train_temp = train_temp.astype(float)
EDIT:
To modify the original dataframe, do something like this:
column_vals = [x for x in df.columns if x.startswith("p_d_d_")]
df[column_vals] = df[column_vals].astype(float)
I currently have a list of Pandas DataFrames. I'm trying to perform an operation on each list element (i.e. each DataFrame contained in the list) and then save that DataFrame to a CSV file.
I assigned a name attribute to each DataFrame, but I realized that in some cases the program throws an error AttributeError: 'DataFrame' object has no attribute 'name'.
Here's the code that I have.
# raw_og contains the file names for each CSV file.
# df_og is the list containing the DataFrame of each file.
for idx, file in enumerate(raw_og):
df_og.append(pd.read_csv(os.path.join(data_og_dir, 'raw', file)))
df_og[idx].name = file
# I'm basically checking if the DataFrame is in reverse-chronological order using the
# check_reverse function. If it is then I simply reverse the order and save the file.
for df in df_og:
if (check_reverse(df)):
df = df[::-1]
df.to_csv(os.path.join(data_og_dir, 'raw_new', df.name), index=False)
else:
continue
The program is throwing an error in the second for loop where I used df.name.
This is especially strange because when I run print(df.name) it prints out the file name. Would anybody happen to know what I'm doing wrong?
Thank you.
the solution is to use a loc to set the values, rather than creating a copy.
creating a copy of df loses the name:
df = df[::-1] # creates a copy
setting the value 'keeps' the original object intact, along with name
df.loc[:] = df[:, ::-1] # reversal maintaining the original object
Example code that reverses values along the column axis:
df = pd.DataFrame([[6,10]], columns=['a','b'])
df.name='t'
print(df.name)
print(df)
df.iloc[:] = df.iloc[:,::-1]
print(df)
print(df.name)
outputs:
t
a b
0 6 10
a b
0 10 6
t
A workaround is to set a columns.name and use it when needed.
Example:
df = pd.DataFrame()
df.columns.name = 'name'
print(df.columns.name)
name
I suspect it's the reversal that loses the custom .name attribute.
In [11]: df = pd.DataFrame()
In [12]: df.name = 'empty'
In [13]: df.name
Out[13]: 'empty'
In [14]: df[::-1].name
AttributeError: 'DataFrame' object has no attribute 'name'
You'll be better off storing a dict of dataframes rather than using .name:
df_og = {file: pd.read_csv(os.path.join(data_og_dir, 'raw', fn) for fn in raw_og}
Then you could iterate through this and reverse the values that need reversing...
for fn, df in df_og.items():
if (check_reverse(df)):
df = df[::-1]
df.to_csv(os.path.join(data_og_dir, 'raw_new', fn), index=False)