Get DataFrame selection's row posititions - python

Instead of the indices, I'd like to obtain the row positions, so I can use the result later using df.iloc(row_positions).
This is the example:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}, index=[10, 2, 7])
print df[df['a']>=2].index
# Int64Index([2, 7], dtype='int64')
# How do I convert the index list [2, 7] to [1, 2] (the row position)
# I managed to do this for 1 index element, but how can I do this for the entire selection/index list?
df.index.get_loc(2)
Update
I could use a list comprehension to apply the selected result on the get_loc function, but perhaps there's some Pandas-built-in function.

you can use where from numpy:
import numpy as np
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}, index=[10, 2, 7])
np.where( df.a>=2)
returns row indices:
(array([1, 2], dtype=int64),)

#ssm's answer is what I would normally use. However to answer your specific query of how to select multiple rows try this:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}, index=[10, 2, 7])
indices = df[df['a']>=2].index
print df.ix[indices]
More information on .ix indexing scheme is here
[EDIT to answer the specific query]
How do I convert the index list [2, 7] to [1, 2] (the row position)
df[df['a']>=2].reset_index().index

Related

How to change values in one column based on whether conditions in columns A and B are met in Pandas/Python

I have a data frame. I want to change values in column C to null values based on whether conditions in columns A and B are met. To do this, I think I need to iterate over the rows of the dataframe, but I can't figure out how:
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
dataframe image
I tried something like this:
for row in df.iterrows()
if df['A'] > 2 and df['B'] == 3:
df['C'] == np.nan
but I just keep getting errors. Could someone please show me how to do this?
Yours is not a DataFrame, it's a dictionary. This is a DataFrame:
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
It is usually faster to use pandas/numpy arithmetic instead of regular Python loops.
df.loc[(df['A'].values > 2) & (df['B'].values == 3), 'C'] = np.nan
Or if you insist on your way of coding, the code (besides converting df to a real DataFrame) can be updated to:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
for i, row in df.iterrows():
if row.loc['A'] > 2 and row.loc['B'] == 3:
df.loc[i, 'C'] = np.nan
or
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
for i, row in df.iterrows():
if df.loc[i, 'A'] > 2 and df.loc[i, 'B'] == 3:
df.loc[i, 'C'] = np.nan
You can try
df.loc[(df["A"].values > 2) & (df["B"].values==3), "C"] = None
Using pandas and numpy is way easier for you :D

How can I convert a matrix into characters?

classes = ['A', 'B', 'C']
my_data = [
[2, 1, 3],
[1, 1, 2],
[3, 3, 3],
[3, 1, 3],
[3, 1, 3],
[3, 3, 2]
]
Here, A = 1, B=2, and C=3.
Suppose, I want to first find the maximum value in each row of the matrix my_data, and then I want to convert them into characters from classes.
Can I do it in python without using loops?
The following source code is not working for me:
def prediction_to_name(pred):
return classes[np.argmax(pred)]
You need to iterate over your data unless repeatedly cut and paste result.append(classes[np.argmax(my_data[n])]) for each n in 0:len(my_data) which is just manually typing out the loop.
import numpy as np
classes = ['A', 'B', 'C']
my_data = [[2, 1, 3],
[1, 1, 2],
[3, 3, 3],
[3, 1, 3],
[3, 1, 3],
[3, 3, 2]]
classifiedData = [classes[np.argmax(row)] for row in my_data]
print(classifiedData) # ['C', 'C', 'A', 'A', 'A', 'A']
Your indexing of the data is one-based whereas python is zero-based, so just subtract one so that they are equivalent.
>>> [classes[max(row) - 1] for row in my_data]
['C', 'B', 'C', 'C', 'C', 'C']
How about this?
print(np.array(classes)[np.max(my_data,axis=1)-1])
The result:
['C' 'B' 'C' 'C' 'C' 'C']

Sort 2-D list with last character's frequancy in python

I want to sort a 2-D list
t = [[3, 3, 3, 'a'], [2, 2, 2, 'b'], [1, 1, 1, 'b']] with each list's last character's frequancy in reverse.
since b is 2 times and a is 1 time so sorted list should be
t_sorted = [[2,2,2,'b'],[1,1,1,'b], [3,3,3,'a']]
I wrote the code:
def mine(a):
return t.count(a[-1])
t = [[3, 3, 3, 'a'], [2, 2, 2, 'b'], [1, 1, 1, 'b']]
print(sorted(t,key = mine, reverse = True))
but it is not working fine. what is the right way to do it without using counter python?
This is because t doesn't have any 'a's or any 'b's. It has lists which include 'a's and 'b's.
Just check it manually:
>>> t = [[3, 3, 3, 'a'], [2, 2, 2, 'b'], [1, 1, 1, 'b']]
>>> t.count('a')
0
The least confusing way to do it is to just make a (proper) counter of those last elements - let's get only last elements of the sublists and convert it to Counter:
from collections import Counter
t = [[3, 3, 3, 'a'], [2, 2, 2, 'b'], [1, 1, 1, 'b']]
my_count = Counter(elem[-1] for elem in t)
Now we can use our Counter object to be our position:
print(sorted(t,key = lambda x: my_count[x[-1]], reverse = True))
Here's an (inferior) solution that doesn't use Counter:
def mine(a):
return [x[-1] for x in t].count(a[-1])

Convert DataFrame into multi-dimensional array with the column names of DataFrame

Below is the DataFrame I want to action upon:
df = pd.DataFrame({'A': [1,1,1],
'B': [2,2,3],
'C': [4,5,4]})
Each row of df creates a unique key. Objective is to create the following list of multi-dimensional arrays:
parameter = [[['A', 1],['B', 2], ['C', 4]],
[['A', 1],['B', 2], ['C', 5]],
[['A', 1],['B', 3], ['C', 4]]]
Problem is related to this question where I have to iterate over the parameter but instead of manually providing them to my function, I have to put all parameter from df (rows) in a list.
You could use the following list comprehension, which zips the values on each row with the columns of the dataframe:
from itertools import repeat
[list(map(list,zip(cols, i))) for cols, i in zip(df.values.tolist(), repeat(df.columns))]
[[[1, 'A'], [2, 'B'], [4, 'C']],
[[1, 'A'], [2, 'B'], [5, 'C']],
[[1, 'A'], [3, 'B'], [4, 'C']]]

Selecting different rows from different GroupBy groups

As opposed to GroupBy.nth, which selects the same index for each group, I would like to take specific indices from each group. For example, if my GroupBy object consisted of four groups and I would like the 1st, 5th, 10th, and 15th from each respectively, then I would like to be able to pass x = [0, 4, 9, 14] and get those rows.
This is kind of a strange thing to want; is there a reason?
In any case, to do what you want, try this:
df = pd.DataFrame([['a', 1], ['a', 2],
['b', 3], ['b', 4], ['b', 5],
['c', 6], ['c', 7]],
columns=['group', 'value'])
def index_getter(which):
def get(series):
return series.iloc[which[series.name]]
return get
which = {'a': 0, 'b': 2, 'c': 1}
df.groupby('group')['value'].apply(index_getter(which))
Which results in:
group
a 1
b 5
c 7

Categories

Resources