I have a list of list and a dataframe df:
test_list=[[A,B,C],[A,B,D],[A,B,E],[F,G]]
and dataframe is
ID
B
C
D
E
The element of List of list represent hierarchy .I want to create a new column "type" in the dataframe whose value represent its parent.
My final Dataframe should be like:
value parent
B A
C B
D B
E B
I have a very large dataset and test_list is also very large
As per my comments on using a dictionary, here's the code.
import pandas as pd
test_list=[["A","B","C"],["A","B","D"],["A","B","E"],["F","G"]]
dict = {}
for sublist in test_list:
for n, elem in enumerate(sublist):
if n != 0:
dict[elem] = prev
prev = elem
df = pd.DataFrame([dict.keys(), dict.values()]).T
df.columns= ['element', 'parent']
df.set_index('element', inplace=True)
print(df)
giving the following output.
parent
element
B A
C B
D B
E B
G F
You could use a dictionary. Here is a working example :
df = pd.DataFrame({'ID': ['B', 'C', 'D', 'E']})
test_list=[['A','B','C'],['A','B','D'],['A','B','E'],['F','G']]
parent = {}
for element in test_list:
for i in range(len(element)-1):
parent[element[i+1]] = element[i]
df['parent'] = [parent[x] for x in df['ID']]
In [1] : print(df)
Out[1] : ID parent
0 B A
1 C B
2 D B
3 E B
Related
I have a data frame as follows:
col1
col2
col3
A
E
C
A
D
D
B
A
D
E
A
C
And list answer_key = ["A", "B", "C"].
I want to compare the values of each column to the list's value in sequence.
Returns the score based on the following rule: no responses = 0, successfully answered = 5, incorrectly answered = -5. Also, please return the entire score.
This sounds like a homework question, so I will only provide you with pseudocode to help point you in the correct direction. I am also assuming that you are looking to compare the contents in each column to your answer_key and that these wont be dynamically added to.
# Create a list with your keys (you already did this)
# Create three separate lists for each column (col1, col2, col3)
# Also use something as a default value for values that are empty
# Ex1: col2 = ['E', None, 'B']
# Ex2: col2 = ['E', '0', 'B'] - either of these methods could work
# Create a dictionary to reference these list
cols = [0 : col1, 1 : col2, 2 : col3]
# Create an variable to store the entire score
score = 0
# Use nested loops to iterate through each column & each value
# example
for i in range(3):
# temporarily cache a list object for referrence
curList = cols.get(i)
# Compare contents of the key and list
for c in range(len(answer_key)):
# If curList[c] == None (or whatever value you
# are using for null) then score += 0
# If answer_key[c] == curList[c] then score += 5
# Else if answer_key[c] != curList[c] then score -= 5
Try:
answer_key = np.array(["A", "B", "C"])
df['score'] = df.apply(lambda x: (pd.Series((x.to_numpy() == answer_key)).replace(False, -1).sum())*5 + len(x[x.isnull()])*5, axis=1)
OUTPUT:
col1 col2 col3 score
0 A E C 5
1 A NaN D 0
2 D B NaN 0
3 A D E -5
4 A NaN C 10
I have a dataframe that encodes the last value of row 'this' in row 'last'. I want to match the column 'this' in the table according to value in a list, e.g. ['b', 'c'] and then change the preceding row's 'this', as well as this row's 'last' to the value 'd' on such a match.
For example, I want to change this:
this
last
a
b
a
a
b
c
a
a
c
Into this:
this
last
d
b
d
d
b
c
d
a
c
This is straightforward if iterating, but too slow:
for i, v in df['this'].iteritems():
if v in ['b', 'c']:
df['this'].iloc[i - 1] = 'd'
df['last'].iloc[i] = 'd'
I believe this can be done by assigning df.this.shift(-1) to column 'last', however I'm not sure how to do this when I'm matching values in the list ['b', 'c']. How can I do this without iterating?
df
this last
0 a NaN
1 b a
2 a b
3 c a
4 a c
You can use isin to get boolean index where the values belong to the list (l1). Then populate corresponding last with d. And then shift in upward direction the boolean index, to populate required this values with d
l1 = ['b', 'c']
this_in_l1 = df['this'].isin(l1)
df.loc[this_in_l1, 'last'] = 'd'
df.loc[this_in_l1.shift(-1, fill_value=False), 'this'] = 'd'
df
this last
0 d NaN
1 b d
2 d b
3 c d
4 a c
I'm trying to change to uppercase of alternate column names of a Dataframe having 6 columns.
input :
df.columns[::2].str.upper()
Output :
Index(['FIRST_NAME', 'AGE_VALUE', 'MOB_#'], dtype='object')
Now i want to apply this to Dataframe.
input : df.columns= df.columns[::2].str.upper()
ValueError: Length mismatch: Expected axis has 6 elements, new values have 3 elements
You can use rename
df
a b c d e f
0 a b c d e f
column_names = dict(zip(df.columns[::2], df.columns[::2].str.upper()))
column_names
{'a': 'A', 'c': 'C', 'e': 'E'}
df = df.rename(columns=column_names)
df
A b C d E f
0 a b c d e f
I have a Dataframe series that contains is a list of strings for each row. I'd like to create another series that is the last string in the list for that row.
So one row may have a list e.g
['a', 'b', 'c', 'd']
I'd like to create another pandas series made up of the last element of the row, normally access as a -1 reference, in this 'd'. The lists for each observation (i.e. row) are of varying length. How can this be done?
I believe need indexing with str, it working with all iterables:
df = pd.DataFrame({'col':[['a', 'b', 'c', 'd'],['a', 'b'],['a'], []]})
df['last'] = df['col'].str[-1]
print (df)
col last
0 [a, b, c, d] d
1 [a, b] b
2 [a] a
3 [] NaN
strings are iterables too:
df = pd.DataFrame({'col':['abcd','ab','a', '']})
df['last'] = df['col'].str[-1]
print (df)
col last
0 abcd d
1 ab b
2 a a
3 NaN
Why not making the list column to a info dataframe, and you can using the index for join
Infodf=pd.DataFrame(df.col.values.tolist(),index=df.index)
Infodf
Out[494]:
0 1 2 3
0 a b c d
1 a b None None
2 a None None None
3 None None None None
I think I over looked the question , and both PiR and Jez provided their valuable suggestion to help me achieve the final result .
Infodf.ffill(1).iloc[:,-1]
After using my script my algorithms return the exptected outcome in a list of lists like this: pred=[[b,c,d],[b,a,u],...[b,i,o]]
I already have a dataframe that needs those values added in a new matching column.
The list is exactly x long like the other columns in the frame and I just need to create a new column with all the values of the lists.
However when I try to put the list into the column I get the error:
ValueError: Length of values does not match length of index
Looking at the data, it puts the entire list into one row instead of each entry in a new row.
EDIT:
All values in the list should be put in the column namend pred
sent token pred
0 a b
0 b c
0 b d
1 a b
1 b a
1 c u
Solution:
x = []
for _ in pred:
if _ is not None:
x += _
df_new = pd.DataFrame(df)
df_new["pred"] = list(itertools.chain.from_iterable(x))
You can use itertools.chain, which allows you to flatten a list of lists, which you can then slice according to the length of your dataframe.
Data from #ak_slick.
import pandas as pd
from itertools import chain
df = pd.DataFrame({'sent': [0, 0, 0, 1, 1, 1],
'token': ['a', 'b', 'b', 'a', 'b', 'c']})
lst = [['b','c',None],['b',None,'u'], ['b','i','o']]
df['pred'] = list(filter(None, chain.from_iterable(lst)))[:len(df.index)]
print(df)
sent token pred
0 0 a b
1 0 b c
2 0 b d
3 1 a b
4 1 b a
5 1 c u
import pandas as pd
# combine input lists
x = []
for _ in [['b','c','d'],['b','a','u'], ['b','i','o']]:
x += _
# output into a single column
a = pd.Series(x)
# mock original dataframe
b = pd.DataFrame({'sent': [0, 0, 0, 1, 1, 1],
'token': ['a', 'b', 'b', 'a', 'b', 'c']})
# add column to existing dataframe
# this will avoid the mis matched length error by ignoring anything longer
# than your original data frame
b['pred'] = a
sent token pred
0 0 a b
1 0 b c
2 0 b d
3 1 a b
4 1 b a
5 1 c u