Here supposed my dataframe is here,
When comparing a column in a dataframe with a list that I have, I want to label the rows in that column if they have the same value.
for example compare with 'Name' and my 'list'(ex) list = [Y,B]
so In 'Name' Column have [Y,B] -> Labeled '0'
How Can I make this code having above condition?
(*The list length is very shorter than other column)
Use numpy.where with Series.isin:
df = pd.DataFrame({'Name':list('KYBBC')})
L = ['Y','B']
df['Label'] = np.where(df['Name'].isin(L), '0', '')
print (df)
Name Label
0 K
1 Y 0
2 B 0
3 B 0
4 C
Related
I am working on a graph problem, and want to drop the data where two nodes A and B would be connected twice
A to B
B to A.
Could you help me with that please ?
I have a dataframe data
Column A
Column B
value 1
value 2
value 1
value 3
value 2
value 3
value 2
value 1
I want to extract a dataframe of all the cases where we have these two conditions respected
Column A
Column B
value i
value j
value j
value i
in our example :
Column A
Column B
value 1
value 2
value 2
value 1
thank you very much !
I tried looping and creating lists but it's time consuming and not very aesthetic :
`l=[]
indexes=[]
for i in data['aretes']:
l.append([list(data[data['aretes']==i]['column A'])[0],list(data[data['aretes']==i]['column B'])[0]])
index = 0
for j in l:
index+=1
h=[j[1],j[0]]
if h in l:
indexes.append(index)`
If you want to extract the all the rows in the dataframe that are duplicated, I would first create a string representation of the set of your nodes to create a sorted id:
df["id"] = df.apply(lambda x: str(set([x['a'],x['b']])),axis=1)
Then you can used the duplicated function to drop all the rows that are not duplicated according to the id:
df[df.duplicated(["id"],keep=False)]
Results:
a b id
0 Value 1 Value 2 {'Value 1', 'Value 2'}
1 Value 2 Value 1 {'Value 1', 'Value 2'}
Convert your column as set then remove duplicates:
>>> df[df[['Column A', 'Column B']].agg(set, axis=1).duplicated(keep=False)]
Column A Column B
0 value 1 value 2
3 value 2 value 1
Caveats: if you have 2 instances of (value 1, value 2), they will be extracted. You can also find a solution with NetworkX.
I have a pandas DataFrame and want to find select the column with the most unique values.
I already filtered the unique values with nunique(). How can I now choose the column with the highest nunique()?
This is my code so far:
numeric_columns = df.select_dtypes(include = (int or float))
unique = []
for column in numeric_columns:
unique.append(numeric_columns[column].nunique())
I later need to filter all the columns of my dataframe depending on this column(most uniques)
Use DataFrame.select_dtypes with np.number, then get DataFrame.nunique with column by maximal value by Series.idxmax:
df = pd.DataFrame({'a':[1,2,3,4],'b':[1,2,2,2], 'c':list('abcd')})
print (df)
a b c
0 1 1 a
1 2 2 b
2 3 2 c
3 4 2 d
numeric = df.select_dtypes(include = np.number)
nu = numeric.nunique().idxmax()
print (nu)
a
The dictionary dict_set has dataframes as the value for their keys.
I'm trying to extract data from a dictionary of dataframes based on a filter on 'A' column in the dataframe based on the value in column.
dict_set={}
dict_set['a']=pd.DataFrame({'A':[1,2,3],'B':[1,2,3]})
dict_set['b']=pd.DataFrame({'A':[1,4,5],'B':[1,5,6]})
df=pd.concat([dict_set[x][dict_set[x]['A']==1] for x in dict_set.keys()],axis=0)
output being the below.
A B
0 1 1
0 1 1
But I would want the output to be
A B x
0 1 1 a
0 1 1 b
Basically, I want the value of x to be present in the new dataframe formed as a column, say column x in the dataframe formed such that df[x] would give me the x values. Is there a simple way to do this?
Try this:
pd.concat([df.query("A == 1") for df in dict_set.values()], keys=dict_set.keys())\
.reset_index(level=0)\
.rename(columns={'level_0':'x'})
Output:
x A B
0 a 1 1
0 b 1 1
Details:
Let's get the dataframes from the dictionary using list comprehension and filter the datafames. Here, I choose to use query, but you could use boolean index with df[df['A'] == 1] also, then pd.concat with the keys parameter set to the dictionary keys. Lastly, reset_index level=0 and rename.
I want something like this.
Index Sentence
0 I
1 want
2 like
3 this
Keyword Index
want 1
this 3
I tried with df.index("Keyword") but its not giving for all the rows. It will be really helpful if someone solve this.
Use isin with boolean indexing only:
df = df[df['Sentence'].isin(['want', 'this'])]
print (df)
Index Sentence
1 1 want
3 3 this
EDIT: If need compare by another column:
df = df[df['Sentence'].isin(df['Keyword'])]
#another DataFrame df2
#df = df[df['Sentence'].isin(df2['Keyword'])]
And if need index values:
idx = df.index[df['Sentence'].isin(df['Keyword'])]
#alternative
#idx = df[df['Sentence'].isin(df['Keyword'])].index
For a Dataframe, how do I conduct a conditional statement to assign a new value based on preexisting value in a column?
If the value of Column contains a string of len(>0); then assign a value = 0
If value of Column is None (NoneType) then assign 1
I am trying to get a counter to check how many of the rows are missing value based on a string length.
I can covert the series to a list and do the test, but I would like to learn how this is possible within the dataframe itself.
Dataframe.Series
df['old'] df['old'] (after)
String A 0
String B 0
String C 0
None 1
String D 0
String E 0
None 1
#So that I can sum the df['old'](after) to get counter value
Sum 2
are you just trying to see how many None you have?
you could just do this
import pandas as pd
df = pd.DataFrame(['a', 'b', None, 'q'], columns=['old'])
df['old'].isnull().sum()
Out[37]:
1
If you want to convert the strings to 1 values and the None to 0, you can apply a lambda function:
import pandas as pd
x = pd.DataFrame(['S', 'X', 'Z', None, 'B'])
x[0] = x[0].apply(lambda x: 1 if x else 0)
Then, to count the values which are one, you could use sum:
x[0].sum()
For a fast vectorized solution, just use the isnull method and multiply by 1 to convert to an integer.
df = pd.DataFrame({'col' :['a','b',None, None, 'sdaf']})
df['count'] = df.col.isnull() * 1
output:
col count
0 a 0
1 b 0
2 None 1
3 None 1
4 sdaf 0