python pandas assignment of missing value as an copy - python

I'm trying to set the mean value of group of products in my dataset (wants to iterate each category and fill the missing data eventually)
df.loc[df.iCode == 160610,'oPrice'].fillna(value=df[df.iCode == 160610].oPrice.mean(), inplace=True)
it's not working (maybe treating it like a copy)
Thanks

df.loc[(df.iCode == 160610) & (df.oPrice.isna()),'oPrice'] = df.loc[df.iCode == 160610].oPrice.mean()

Related

comparing two columns of a row in python dataframe

I know that one can compare a whole column of a dataframe and making a list out of all rows that contain a certain value with:
values = parsedData[parsedData['column'] == valueToCompare]
But is there a possibility to make a list out of all rows, by comparing two columns with values like:
values = parsedData[parsedData['column01'] == valueToCompare01 and parsedData['column02'] == valueToCompare02]
Thank you!
It is completely possible, but I have never tried using and in order to mask the dataframe, rather using & would be of interest in this case. Note that, if you want your code to be more clear, use ( ) in each statement:
values = parsedData[(parsedData['column01'] == valueToCompare01) & (parsedData['column02'] == valueToCompare02)]

How can I create new dataframes using for looping and the method query to filter my dataframe that already exist?

I want to create new dataframes using method query and a for looping, but when I try to make this happen
this error appears UndefinedVariableError: name 'i' is not defined.
I tried to do this using this code:
for sigla in sigla_estados:
nome_estado_df = 'dataset_' + sigla
for i in range(28):
nome_estado_df = consumo_alimentar.query("UF == #lista_estados[i]")
My list (lista_estados) has 27 items, so I tried to pass through all using range.
I couldn't realize what is the problem, I am beginner.
From your code I suppose you want to create multiple dataframes, each one of them containing the rows in consumo_alimentar that apply to one specific country (column UF with a name that matches the country names in lista_estados).
I also assume that you have an array (sigla_estados) that contains the country codes of countries in lista_estados and that have the same length that lista_estados and arranged in such a way that the country code of lista_estados[x] is equal to sigla_estados[x] for all x.
If my assumptions are right, this code could work:
for i in range(len(lista_estados)):
estado = lista_estados[i]
sigla = sigla_estados[i]
mask = consumo_alimentar['UF'] == estado
nome_estado_df[sigla] = consumo_alimentar[mask]
With that code you'll get an array of data frames that I think is more or less what you want to. If you want to use the query method, this should also work:
for i in range(len(lista_estados)):
estado = lista_estados[i]
sigla = sigla_estados[i]
query_str = "UF == #estado"
nome_estado_df[sigla] = consumo_alimentar.query(query_str)

Access different values in one data frame column?

Df is a loaded in csv file that contains different stats.
player_name,player_id,season,season_type,team
Giannis Antetokounmpo,antetgi01,2020,PO,MIL
I have tried this:
print(df.loc[(df["team"] == "LAL") & (df["team"] == "LAC") & (df["season_type"] == "
I am trying to access the "team" column and filter elements that also meet the "season_type" requirement, however there is no output.
What works currently:
print(df.loc[(df["team"] == "LAL") & (df["season_type"] == "PO")])
When I do this I am able to get the correct output but for only one specific team.
My question is how can I perform this on multiple names?
Good question, this should work for you:
team_list = ["LAL", "LAC"]
df = df[df.team.isin(team_list) & df.season_type == 'PO']

Python pandas - boolean filtering. T/F vs. returning the table

I am doing an exercise and have a dataset of school information. I want to filter the data by school year so I have:
data['demographics'] = data['demographics'][data['demographics']['schoolyear'] == 20112012]
I don't really understand the data['demographics'] at the beginning of the assignment.
If I just have:
data['demographics'] = [data['demographics']['schoolyear'] == 20112012]
the code returns True or False and not the actual data of the table. How does adding data['demographics'] make Python realize that I want the data returned instead of T/F?
data['demographics']['schoolyear'] == 20112012 tells you if they match or not.
So, [data['demographics']['schoolyear'] == 20112012] gives you a list of True or False
So,
data['demographics'][data['demographics']['schoolyear'] == 20112012]
pulls out the value of data['demographics'] where you have True.
i.e. This gives the values you want.
The first assignment should be throwing a ValueError: Length of values does not match length of index.
data['demographics'] this returns only the column 'demographics' of the dataframe, then with this [data['demographics']['schoolyear'] == 20112012] you are filtering the 'demographics' column such that the school year is 20112012.
The error in the statement is that you are trying to assign the filtered data to
data['demographics'], because the filtered data has less elements than data['demographics'].
I recommend you to assign the filtered data to a new variable like this.
filteredData = data['demographics'][data['demographics']['schoolyear'] == 20112012]

Pandas For Loop, If String Is Present In ColumnA Then ColumnB Value = X

I'm pulling Json data from the Binance REST API, after formatting I'm left with the following...
I have a dataframe called Assets with 3 columns [Asset,Amount,Location],
['Asset'] holds ticker names for crypto assets e.g.(ETH,LTC,BNB).
However when all or part of that asset has been moved to 'Binance Earn' the strings are returned like this e.g.(LDETH,LDLTC,LDBNB).
['Amount'] can be ignored for now.
['Location'] is initially empty.
I'm trying to set the value of ['Location'] to 'Earn' if the string in ['Asset'] includes 'LD'.
This is how far I got, but I can't remember how to apply the change to only the current item, it's been ages since I've used Pandas or for loops.
And I'm only able to apply it to the entire column rather than the row iteration.
for Row in Assets['Asset']:
if Row.find('LD') == 0:
print('Earn')
Assets['Location'] = 'Earn' # <----How to apply this to current row only
else:
print('???')
Assets['Location'] = '???' # <----How to apply this to current row only
The print statements work correctly, but currently the whole column gets populated with the same value (whichever was last) as you might expect.
So (LDETH,HOT,LDBTC) returns ('Earn','Earn','Earn') rather than the desired ('Earn','???','Earn')
Any help would be appreciated...
np.where() fits here. If the Asset starts with LD, then return Earn, else return ???:
Assets['Location'] = np.where(Assets['Asset'].str.startswith('LD'), 'Earn', '???')
You could run a lambda in df.apply to check whether 'LD' is in df['Asset']:
df['Location'] = df['Asset'].apply(lambda x: 'Earn' if 'LD' in x else None)
One possible solution:
def get_loc(row):
asset = row['Asset']
if asset.find('LD') == 0:
print('Earn')
return 'Earn'
print('???')
return '???'
Assets['Location'] = Assets.apply(get_loc, axis=1)
Note, you should almost never iterate over a pandas dataframe or series.

Categories

Resources