How to get the index of a value in a pandas series - python

What's the code to get the index of a value in a pandas series data structure?.
animals=pd.Series(['bear','dog','mammoth','python'],
index=['canada','germany','iran','brazil'])
What's the code to extract the index of "mammoth"?

You can just use boolean indexing:
In [8]: animals == 'mammoth'
Out[8]:
canada False
germany False
iran True
brazil False
dtype: bool
In [9]: animals[animals == 'mammoth'].index
Out[9]: Index(['iran'], dtype='object')
Note, indexes aren't necessarily unique for pandas data structures.

You have two options:
1) If you make sure that value is unique, or just want to get the first one, use the find function.
find(animals, 'mammoth') # retrieves index of first occurrence of value
2) If you would like to get all indices matching that value, as per #juanpa.arrivillaga 's post.
animals[animals == 'mammoth'].index # retrieves indices of all matching values
You can also index find any number occurrence of the value by treating the the above statement as a list:
animals[animas == 'mammoth'].index[1] #retrieves index of second occurrence of value.

Related

Python Return all Columns [duplicate]

Using Python Pandas I am trying to find the Country & Place with the maximum value.
This returns the maximum value:
data.groupby(['Country','Place'])['Value'].max()
But how do I get the corresponding Country and Place name?
Assuming df has a unique index, this gives the row with the maximum value:
In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7
Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.
Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):
df = df.reset_index()
df[df['Value']==df['Value'].max()]
This will return the entire row with max value
I think the easiest way to return a row with the maximum value is by getting its index. argmax() can be used to return the index of the row with the largest value.
index = df.Value.argmax()
Now the index could be used to get the features for that particular row:
df.iloc[df.Value.argmax(), 0:2]
The country and place is the index of the series, if you don't need the index, you can set as_index=False:
df.groupby(['country','place'], as_index=False)['value'].max()
Edit:
It seems that you want the place with max value for every country, following code will do what you want:
df.groupby("country").apply(lambda df:df.irow(df.value.argmax()))
Use the index attribute of DataFrame. Note that I don't type all the rows in the example.
In [14]: df = data.groupby(['Country','Place'])['Value'].max()
In [15]: df.index
Out[15]:
MultiIndex
[Spain Manchester, UK London , US Mchigan , NewYork ]
In [16]: df.index[0]
Out[16]: ('Spain', 'Manchester')
In [17]: df.index[1]
Out[17]: ('UK', 'London')
You can also get the value by that index:
In [21]: for index in df.index:
print index, df[index]
....:
('Spain', 'Manchester') 512
('UK', 'London') 778
('US', 'Mchigan') 854
('US', 'NewYork') 562
Edit
Sorry for misunderstanding what you want, try followings:
In [52]: s=data.max()
In [53]: print '%s, %s, %s' % (s['Country'], s['Place'], s['Value'])
US, NewYork, 854
In order to print the Country and Place with maximum value, use the following line of code.
print(df[['Country', 'Place']][df.Value == df.Value.max()])
You can use:
print(df[df['Value']==df['Value'].max()])
Using DataFrame.nlargest.
The dedicated method for this is nlargest which uses algorithm.SelectNFrame on the background, which is a performant way of doing: sort_values().head(n)
x y a b
0 1 2 a x
1 2 4 b x
2 3 6 c y
3 4 1 a z
4 5 2 b z
5 6 3 c z
df.nlargest(1, 'y')
x y a b
2 3 6 c y
import pandas
df is the data frame you create.
Use the command:
df1=df[['Country','Place']][df.Value == df['Value'].max()]
This will display the country and place whose value is maximum.
My solution for finding maximum values in columns:
df.ix[df.idxmax()]
, also minimum:
df.ix[df.idxmin()]
I'd recommend using nlargest for better performance and shorter code. import pandas
df[col_name].value_counts().nlargest(n=1)
I encountered a similar error while trying to import data using pandas, The first column on my dataset had spaces before the start of the words. I removed the spaces and it worked like a charm!!

How to do slicing in pandas Series through elements instead of indices in case they are similar

I have pandas Series like:
s = pd.Series([1,9,3,4,5], index = [1,2,5,3,9])
How can I obtain, say, element '3'? Given that I do not know exact elements in advance. I need to write a function that gets, say, first element of the Series.
series[2] understands it like 'index=2' instead of 'second element', when we do have indices.
When I do not indicate indices, the slicing works fine, just through elements.
But how can I prioritize slicing through elements if they overlap with indices?
Like this, using boolean indexing:
s[s==3]
Given:
s = pd.Series([1,9,3,4,5], index = [1,2,5,3,9])
Let's find elements 3 and 9, use:
s[s.isin([9,3])]
Output:
2 9
5 3
dtype: int64
Update per comment below
Use iloc for integer location:
s.iloc[2]
Output:
3

Compare pandas series of Dataframe as a Whole and not element wise

Problem:
Accessing the same column of a Dataframe I would like to compare if series is the same.
Data:
DATA link for copy and paste: API_link_to_data='https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
energyDF = pd.read_csv(API_link_to_data)
row3_LOC = energyDF.loc[[3],:]
row3_ILOC = energyDF.iloc[[3],:]
This code compares element wise
row3_LOC == row3_ILOC
getting a list with booleans
What I would like to get is TRUE, since row3_LOC and row3_ILOC are the same
Thanks
If you check,both row3_LOC and row3_ILOC are in turn dataframes.
print(type(row3_LOC))
print(type(row3_ILOC))
results in:
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
You can check if they are equal using row3_ILOC.equals(row3_LOC). Refer to the equals function.
You can compare two series using all():
(row3_loc == row3_ILOC).all()
As soon as one of the values don't match, you will get a false. You may also be interested in .any(), which checks whether at least one value is true.
Fill the Nans with NULL
energyDF.fillna('NULL')
energyDF = energyDF.fillna('NULL')
energyDF.loc[[3],:] == (energyDF.iloc[[3],:])
Date Consumption Wind Solar Wind+Solar
3 True True True True True

If statement over pandas.series and append result to list

i'm trying to build a few list made from results. Could you tell me why this results is empty?
I'm not looking for sollution with numpy, that's why originally i'll create > 50 list, later save it to CSV.
df1 = pd.DataFrame(data={"Country":["USA","Germany","Russia","Poland"],
"Capital":["Washington","Berlin","Moscow","Warsaw"], "Region":
["America","Europe","Europe",'Europe']})
America = []
if (df1['Region']=='America').all():
America.append(df1)
print(America)
Your expression df1['Region']=='America' gives a so-called boolean mask (docs on boolean indexing). A boolean mask is a pandas Series of True and False whose index is lined up with the index of df1.
It's easy to get your expected values once you get used to boolean indexing:
df1[df1['Region']=='America']
Country Capital Region
0 USA Washington America
If you are interested in keeping entire rows, don't bother manually building a python list; that would complicate your work immensely compared to sticking to pandas. You can store the rows in a new DataFrame:
# Use df.copy() here so that changing America won't change df1
America = df1[df1['Region']=='America'].copy()
Why if (df1['Region']=='America').all(): didn't work
The Series.all() method checks whether all values in the Series are True. What you need to do here is to check each row for your condition df1['Region']=='America', and keep only those rows that match this condition (if I understand you correctly).
I'm not sure about what you want.
If you want to add the whole dataframe to the list if 'America' is in region :
for region in df1.Region :
if region == 'America':
America.append(df1)
If you want to add the element from each list wich is at same index than 'America' in 'Region' list :
count = 0
for region in df1.Region :
if region == 'America':
America.append(df1.Country[count])
America.append(df1.Capital[count])
count += 1
Does this answer the question ?

select series from transposed pandas dataframe

With a dataframe, called mrna as follows:
id Cell_1 Cell_2 Cell_3
CDH3 8.006 5.183 10.144
ERBB2 9.799 12.355 8.571
...
How can I select the ERBB2 row as a pandas series (if I don't know its index)?
I tried:
mrna.iloc['ERBB2'] but that only takes a integer, and doesn't map to string
I also tried:
mrna_t = mrna.transpose()
mrna_t['ERBB2']
but I get KeyError: 'ERBB2'
Pass a boolean condition to generate a boolean mask, this mask is used against the index and will return just the rows where the condition is met:
In [116]:
df[df['id']=='ERBB2']
Out[116]:
id Cell_1 Cell_2 Cell_3
1 ERBB2 9.799 12.355 8.571
Output from boolean condition:
In [117]:
df['id']=='ERBB2'
Out[117]:
0 False
1 True
Name: id, dtype: bool
As for your error: mrna_t['ERBB2'] will attempt to look for a column with that name which doesn't exist hence the KeyError
If it was your index then you could just do:
df.loc['ERBB2']
To select index values matching the passed label, it's worth checking the docs, including the section on index selection by position and label
Just figured it out. I just reset the index labels then transposed. This allowed me to index by "ERBB2".
mrna.set_index('id').T

Categories

Resources