See if a value exists in a DataFrame - python

In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?

Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d

Related

How to convert first column of dataframe in to its headers

I have dataframe df:
0
0 a
1 b
2 c
3 d
4 e
O/P should be:
a b c d e
0
1
2
3
4
5
I want column containing(a, b,c,d,e) as header of my dataframe.
Could anyone help?
If your dataframe is pandas and its name is df. Try solving it with pandas:
Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with the list.
import pandas as pd
list = df[0].tolist() #df[0] is getting the content of first column
dfSolved = pd.DataFrame([], columns = list)
You may provide more details like the index and values of the expected output, the operation you wanna do, etc, so that we could give a specific solution to your case
Here is the solution:
import pandas as pd
import io
import numpy as np
data_string = """ columns_name
0 a
1 b
2 c
3 d
4 e
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s+')
# Solution
df_result = pd.DataFrame(data=[[np.nan]*5],
columns=df['columns_name'].tolist())

Changing each list type element of a dataframe to second element of the list

If I have a dataframe with elements of list type in it. Now I want to use pandas or numpy to replace
each element(each list) of this dataframe to second element of this element(list). How can I do that?
I wrote the below code to make a trial dataframe.
import pandas as pd
df = pd.DataFrame({'a':[[1,7],[0,5]],'b':[[3,1],[4,0]],'c':[[1,4],[2,0]]})
My df looks like:
a b c
0 [1,7] [3,1] [1,4]
1 [0,5] [4,0] [2,0]
Now I want df to be changed like shown below:
a b c
0 7 1 4
1 5 0 0
I tried using replace function, lambda function etc but nothing worked.
I don't want to use loops or anything that takes time to run.
here is one way to do it, using applymap
df.applymap(lambda x: x[1])
a b c
0 7 1 4
1 5 0 0

Iterating Conditions through Pandas .loc

I just wanted to ask the community and see if there is a more efficient to do this.
I have several rows in a data frame and I am using .loc to filter values in row A for I can perform calculations on row B.
I can easily do something like...
filter_1 = df.loc['Condition'] = 1
And then perform the mathematical calculation on row B that I need.
But there are many conditions I must go through so I was wondering if I could possibly make a list of the conditions and then iterate them through the .loc function in less lines of code?
Would something like this work where I create a list, then iterate the conditions through a loop?
Thank you!
This example gets most of what I want. I just need it to show 6.4 and 7.0 in this example. How can I manipulate the iteration for it shows the results for the unique values in row 'a'?
import pandas as pd
a = [1,2,1,2,1,2,1,2,1,2]
b = [5,1,3,5,7,20,9,5,8,4]
col = ['a', 'b']
list_1 = []
for i, j in zip(a,b):
list_1.append([i,j])
df1 = pd.DataFrame(list_1, columns= col)
for i in a:
aa = df1[df1['a'].isin([i])]
aa1 = aa['b'].mean()
print (aa1)
Solution using set
set_a = set(a)
for i in set_a:
aa = df[df['a'].isin([i])]
aa1 = aa['b'].mean()
print (aa1)
Solution using pandas mean function
Is this what you are looking for?
import pandas as pd
a = [1,2,1,2,1,2,1,2,1,2]
b = [5,1,3,5,7,20,9,5,8,4]
df = pd.DataFrame({'a':a,'b':b})
print (df)
print(df.groupby('a').mean())
The results from this are:
Original Dataframe df:
a b
0 1 5
1 2 1
2 1 3
3 2 5
4 1 7
5 2 20
6 1 9
7 2 5
8 1 8
9 2 4
The mean value of df['a'] is:
b
a
1 6.4
2 7.0
Here you go:
df = df[(df['A'] > 1) & (df['A'] < 10)]

I have a value in a table but when I try to verify if the value is there it returns false [duplicate]

In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?
Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d

Query on Pandas DataFrame

I´m new to Python and I'm trying to get a subset of rows/columns from a DataFrame:
In [1]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
In [2]:
example=DataFrame(np.random.rand(6,5),columns=['a','b','c','d','e'])
In [3]:
example.a={2,4,6,8,10,12}
In [4]:
example
Out[4]:
a b c d e
0 2 0.225608 0.023888 0.535053 0.953350
1 4 0.803721 0.741708 0.256522 0.062574
2 6 0.354936 0.597274 0.801495 0.763515
3 8 0.204974 0.870951 0.220088 0.446273
4 10 0.673855 0.693210 0.494213 0.842049
5 12 0.516609 0.038669 0.972165 0.183945
In [5]:
example[['a','b','d','e']].query('a==10')
Out[5]:
a b d e
4 10 0.673855 0.494213 0.842049
In [6]:
example[['b','d','e']].query('a==10')
.....
UndefinedVariableError: name 'a' is not defined
The 1st case was ok, but I got error on the 2nd query, do you know why is this error showing up? thank you very much
In example[['b','d','e']] you only have a subset of example that doesn't include column a.
To get values ['b','d','e'] from the row where a==10 you just need to turn the query and index around. So first it queries, returning only the row, and then on that row you use your index:
In[113]: example.query('a==10')[['b','c','d']]
Out[113]:
b c d
4 0.439672 0.181699 0.770421
When you create the second selection example[['b','d','e']], you effectively drop 'a' from the dataframe:
example[['b','d','e']]
b d e
0 0.910757 0.565006 0.284420
1 0.601034 0.697879 0.983803
2 0.516938 0.829621 0.471825
3 0.896217 0.663177 0.093502
4 0.277488 0.796543 0.643166
5 0.594420 0.759634 0.164800
So you're trying to access a column that doesn't exist. In other words, if you want to query a column from a dataframe, you need to include it in your selection before querying it.

Categories

Resources