what if index access fails in pandas df - python

I am trying to access a scalar value in a multi column dataframe via a lookup as follows:
targetDate = '2016-01-01'
df['revenue'][df['date']== targetDate].values[0]
Now, in my case there is nothing found in the dataframe for the targetDate. So I get the following index error:
IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 69322')
Is there a pandas built-in way to gracefully result in np.nan in such cases? How would you handle such a situation?
I don't want my script to fail when nothing is found.

You can check if Series is empty and then add if-else:
targetDate = '2016-01-01'
a = df.loc[df['date']== targetDate, 'revenue']
print (a)
Series([], Name: revenue, dtype: int32)
if len(a) == 0:
print ('empty')
else:
first_val = a.values[0]
Similar solution with Series.empty:
if a.empty:
first_val = np.nan
else:
first_val = a.values[0]
print (first_val)

If you precede with head(1) and remove the subscript on values then that will avoid the error message although it won't fill with a nan (it will just be an empty numpy array).
df['revenue'][df['date']== targetDate].head(1).values
But you could do something like this to get a nan instead of empty.
df['revenue'][df['date']== targetDate].append(pd.Series(np.nan)).head(1).values
Or do it as a try/except or an if/else as #jezrael does. Lots of ways to do this, of course, just depends on what is convenient for you.

Related

Issue w/ pandas.index.get_loc() when match is found, TypeError: ("'>' not supported between instances of 'NoneType' and 'str'", 'occurred at index 1')

Below is the example to reproduce the error:
testx1df = pd.DataFrame()
testx1df['A'] = [100,200,300,400]
testx1df['B'] = [15,60,35,11]
testx1df['C'] = [11,45,22,9]
testx1df['D'] = [5,15,11,3]
testx1df['E'] = [1,6,4,0]
(testx1df[testx1df < 6].apply(lambda x: x.index.get_loc(x.first_valid_index(), method='ffill'), axis=1))
The desired output should be a list or array with the values [3,NaN,4,3]. The NaN because it does not satisfy the criteria.
I checked the pandas references and it says that for cases when you do not have an exact match you can change the "method" to 'fill', 'brill', or 'nearest' to pick the previous, next, or closest index. Based on this, if i indicated the method as 'ffill' it would give me an index of 4 instead of NaN. However, when i do so it does not work and i get the error show in the question title. For criteria higher than 6 it works fine but it doesn't for less than 6 due to the fact that the second row in the data frame does not satisfy it.
Is there a way around this issue? should it not work for my example(return previous index of 3 or 4)?
One solution i thought of is to add a dummy column populated by zeros so that is has a place to "find" and index that satisfies the criteria but this is a bit crude to me and i think there is a more efficient solution out there.
please try this:
import numpy as np
ls = list(testx1df[testx1df<6].T.isna().sum())
ls = [np.nan if x==testx1df.shape[1] else x for x in ls]
print(ls)

How can I set nulls in my pandas dataframe column with a different value

I am trying to set some null values in my dataframe to a different value: 'Non-étiquettés'
page_data_fr['Lookup_FR_tag'].loc[page_data_fr['Lookup_FR_tag'].isnull()] = page_data_fr['Lookup_FR_tag'].loc[page_data_fr['Lookup_FR_tag'].isnull()].apply(lambda x: ['Non-étiquettés'])
However, setting the values using this method above results in this warning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
After reading the documentation and attempting to adjust my code, I try this...
dfc = page_data_fr.copy()
mask = dfc['Lookup_FR_tag'].loc[page_data_fr['Lookup_FR_tag'].isnull()]
dfc['Lookup_FR_tag'].loc[mask, 'Lookup_FR_tag'] = 'Non-étiquettés'
This still generates another error:
ValueError: Cannot mask with non-boolean array containing NA / NaN values
dfc['Lookup_FR_tag'].loc[page_data_fr['Lookup_FR_tag'].isnull()]
I have also tried doing it this way which is still no good:
arrOfNulls = []
counter = 0
for x in dfc['Lookup_FR_tag'].isnull():
if x == True:
arrOfNulls.append(counter)
counter += 1
counter += 1
for x in range(len(arrOfNulls)):
page_data_fr['Lookup_FR_tag'][arrOfNulls[x]] = ['Non-étiquettés']
Any help would be greatly appreciated, I am not sure what I am doing wrong or if I am close..
You can do it like this:
import numpy as np
page_data_fr['Lookup_FR_tag'].replace(np.nan,"Non-étiquettés")

Pandas - find occurrence within a subset

I'm stripping values from unformatted summary sheets in a for loop, and I need to dynamically find the index location of a string value after the occurrence of another specific string value. I used this question as my starting point. Example dataframe:
import pandas as pd
df = pd.DataFrame([['Small'],['Total',4],['Medium'],['Total',12],['Large'],['Total',7]])
>>>df
0 1
0 Small NaN
1 Total 4.0
2 Medium NaN
3 Total 12.0
4 Large NaN
5 Total 7.0
Say I want to find the 'Total' after 'Medium.' I can find the location of 'Medium' with the following:
MedInd = df[df.iloc[:,0]=='Medium'].first_valid_index()
>>>MedInd
2
After this, I run into issues placing a subset limitation on the query:
>>>MedTotal = df[df.iloc[MedInd:,0]=='Total'].first_valid_index()
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Still very new to programming and could use some direction with this error. Searching the error itself it seems like it's an issue of the ordering in which I should define the subset, but I've been unable to fix it thus far. Any assistance would be greatly appreciated.
EDIT:
So I ended up resolving this by moving the subset limitation to the front, outside the first_valid_index clause as follows (suggestion obtained from this reddit comment):
MedTotal = df.iloc[MedInd:][df.iloc[:,0]=='Total'.first_valid_index()
This does throw the following warning:
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
But the output was as desired, which was just the index number for the value being sought.
I don't know if this will always produce desired results given the warning, so I'll continue to scan the answers for other solutions.
You may want to use shift:
df[df.iloc[:,0].shift().eq('Medium') & df.iloc[:,0].eq('Total')]
Output:
0 1
3 Total 12.0
This would work
def find_idx(df, first_str, second_str):
first_idx = df[0].eq(first_str).idxmax()
rest_of_df = df.iloc[first_idx:]
return rest_of_df[0].eq(second_str).idxmax()
find_idx(df, 'Medium', 'Total')

IF AND statement only outputs ELSE statement Python

I have a function that generates different dataframes, the 3rd dataframe causes an error because it contains a final row of NaN values at the bottom.
I tried an if-else conditional statement to remove the row of NaN values, but everytime I do, it keeps outputting the NaN values.
ma = 1
year = 3
df
if ma > 0 and year == 3:
df[0:-1]
else:
df
I also tried a nested if statement, but that produced the same output of NaN values.
ma_path = "SMA"
year_path = "YEAR_3"
if ma_path == ["SMA"]:
if year_path == ["YEAR_3"]:
df[0:-1]
else:
df
I'm sure it's something simple that I've missed. Can anyone help? Thanks in advance.
df[0:-1] does not change the values that df currently contains. If you want to remove the last item of df, you need to assign the slice back to the name:
df = df[0:-1]
If df was an ordinary list, you could also remove items with pop.
df.pop()

Python pandas check if dataframe is not empty

I have an if statement where it checks if the data frame is not empty. The way I do it is the following:
if dataframe.empty:
pass
else:
#do something
But really I need:
if dataframe is not empty:
#do something
My question is - is there a method .not_empty() to achieve this? I also wanted to ask if the second version is better in terms of performance? Otherwise maybe it makes sense for me to leave it as it is i.e. the first version?
Just do
if not dataframe.empty:
# insert code here
The reason this works is because dataframe.empty returns True if dataframe is empty. To invert this, we can use the negation operator not, which flips True to False and vice-versa.
.empty returns a boolean value
>>> df_empty.empty
True
So if not empty can be written as
if not df.empty:
#Your code
Check pandas.DataFrame.empty
, might help someone.
You can use the attribute dataframe.empty to check whether it's empty or not:
if not dataframe.empty:
#do something
Or
if len(dataframe) != 0:
#do something
Or
if len(dataframe.index) != 0:
#do something
As already clearly explained by other commentators, you can negate a boolean expression in Python by simply prepending the not operator, hence:
if not df.empty:
# do something
does the trick.
I only want to clarify the meaning of "empty" in this context, because it was a bit confusing for me at first.
According to the Pandas documentation, the DataFrame.empty method returns True if any of the axes in the DataFrame are of length 0.
As a consequence, "empty" doesn't mean zero rows and zero columns, like someone might expect. A dataframe with zero rows (axis 1 is empty) but non-zero columns (axis 2 is not empty) is still considered empty:
> df = pd.DataFrame(columns=["A", "B", "C"])
> df.empty
True
Another interesting point highlighted in the documentation is a DataFrame that only contains NaNs is not considered empty.
> df = pd.DataFrame(columns=["A", "B", "C"], index=['a', 'b', 'c'])
> df
A B C
a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
> df.empty
False
No doubt that the use of empty is the most comprehensive in this case (explicit is better than implicit).
However, the most efficient in term of computation time is through the usage of len :
if not len(df.index) == 0:
# insert code here
Source : this answer.
Another way:
if dataframe.empty == False:
#do something`

Categories

Resources