Excel vlookup in python dataframe - python

How can i make vlookup like in excel to pandas, i'm totally begginer in python. My first and second dataframe like this
data_01 = pd.DataFrame({'Tipe Car':['A', 'B', 'C', 'D'], 'Branch':['UD', 'UA', 'UK', 'UA'], 'Area':['1A', '1B', '1C', '1D']})
data_02 = pd.DataFrame({'Tipe Car':['A', 'B', 'E', 'F'], 'Branch':['UD', 'UA', 'UK', 'UA']})
and then expected output is
data_03 = pd.DataFrame({'Tipe Car':['A', 'B', 'E', 'F'], 'Branch':['UD', 'UA', 'UK', 'UA'], 'Area':['1A', '1B', 'NaN', 'NaN']})

Use pandas.DataFrame.join
import pandas as pd
df1 = pd.DataFrame({'Tipe Car':['A', 'B', 'C', 'D'], 'Branch':['UD', 'UA', 'UK', 'UA'], 'Area':['1A', '1B', '1C', '1D']})
df2 = pd.DataFrame({'Tipe Car':['A', 'B', 'E', 'F'], 'Branch':['UD', 'UA', 'UK', 'UA']})
df1.set_index('Tipe Car').join(df2.set_index('Tipe Car'), how='right', lsuffix='_df1', rsuffix='_df2')
>>>
Branch_df1 Area Branch_df2
Tipe Car
A UD 1A UD
B UA 1B UA
E NaN NaN UK
F NaN NaN UA

Related

splitting list in python by keyword

I have a list like the following:
lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
and my desired result is to split the list into sublists like this:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a','a'],['start','b','b','b','end'],['a','a','a','a'],['start','b','b','end']]
so start and end are keywords, is there anyway you can use .split() by using particular keywords/if it matches?
So far I have made a function which finds the indices of 'start' i.e. starting_ind = [3, 9, 18] and ending_ind = [5, 13, 21] however if I do
temp=[]
for i in range(len(starting_ind)):
x = lst[starting_ind[i]: ending_ind[i]]
temp += x
print(temp)
the result is incorrect.
This solution doesn't require you to calculate indices beforehand:
lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a', 'a', 'start', 'b', 'b',
'b', 'end', 'a', 'a', 'a', 'a', 'start', 'b', 'b', 'end', 'a', 'a', 'a']
result = []
sublist = []
for el in range(len(lst)):
if lst[el] == 'start':
result.append(sublist.copy())
sublist.clear()
sublist.append(lst[el])
else:
sublist.append(lst[el])
if lst[el] == 'end':
result.append(sublist.copy())
sublist.clear()
if el == len(lst) - 1:
result.append(sublist)
print(result)
The result is:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end'], ['a', 'a', 'a']]
Here's a possible way to use regular expression to extract the patterns, please check if it's acceptable:
import re
lst = ['a','a','a', 'start','b','end', 'a','a','a', 'start','b','b','b','end', 'a','a','a','a', 'start','b','b','end']
result = []
for e in re.findall('a_[a_]+|start[_b]+_end', '_'.join(lst)):
result.append(e.strip('_').split('_'))
print(result)
Output is as desired:
[['a', 'a', 'a'],
['start', 'b', 'end'],
['a', 'a', 'a'],
['start', 'b', 'b', 'b', 'end'],
['a', 'a', 'a', 'a'],
['start', 'b', 'b', 'end']]
A better way is this:
result = []
for e in re.split(r'(start[_b]+_end)', '_'.join(lst)):
result.append(e.strip('_').split('_'))
print([x for x in result if x != ['']])
Same output
You can write so:
lst = ['a', 'a', 'a', 'start', 'b', 'end',
'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
temp=[]
ind = [0, 3, 6, 9, 14, 18, 22]
for i in range(len(ind)-1):
x = lst[ind[i]: ind[i+1]]
temp.append(x)
print(temp)
and you will get:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end']]
If you can be certain that your keywords will always appear in pairs, and in the right order (i.e. there will never be a 'start' without an 'end' that follows it, at some point in the list), this should work:
l = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
def get_sublist(l):
try:
return l[:l.index('end') + 1] if l.index('start') == 0 else l[:l.index('start')]
except ValueError:
return l
result = []
while l:
sublist = get_sublist(l)
result.append(sublist)
l = l[len(sublist):]
print(result)
Gives the following result:
[['a', 'a', 'a'],
['start', 'b', 'end'],
['a', 'a', 'a'],
['start', 'b', 'b', 'b', 'end'],
['a', 'a', 'a', 'a'],
['start', 'b', 'b', 'end']]

Iterate over rows in pandas dataframe. If blanks exist before a specific column, move all column values over

I am attempting to iterate over all rows in a pandas dataframe and move all leftmost columns within each row over until all the non null column values in each row touch. The amount of column movement depends on the number of empty columns between the first null value and the cutoff column.
In this case I am attempting to 'close the gap' between values in the leftmost columns into the column 'd' touching the specific cutoff column 'eee'. The correlating 'abc' rows should help to visualize the problem.
Column 'eee' or columns to the right of 'eee' should not be touched or moved
def moveOver():
df = {
'aaa': ['a', 'a', 'a', 'a', 'a', 'a'],
'bbb': ['', 'b', 'b', 'b', '', 'b'],
'ccc': ['', '', 'c', 'c', '', 'c'],
'ddd': ['', '', '', 'd', '', ''],
'eee': ['b', 'c', 'd', 'e', 'b', 'd'],
'fff': ['c', 'd', 'e', 'f', 'c', 'e'],
'ggg': ['d', 'e', 'f', 'g', 'd', 'f']
}
In row 1 AND 5: 'a' would be moved over 3 column index's to column 'ddd'
In row 2: ['a','b'] would be moved over 2 column index's to columns ['ccc', 'ddd'] respectively
etc.
finalOutput = {
'aaa': ['', '', '', 'a', '', ''],
'bbb': ['', '', 'a', 'b', '', 'a'],
'ccc': ['', 'a', 'b', 'c', '', 'b'],
'ddd': ['a', 'b', 'c', 'd', 'a', 'c'],
'eee': ['b', 'c', 'd', 'e', 'b', 'd'],
'fff': ['c', 'd', 'e', 'f', 'c', 'e'],
'ggg': ['d', 'e', 'f', 'g', 'd', 'f']
}
You can do this:
keep_cols = df.columns[0:df.columns.get_loc('eee')]
df.loc[:,keep_cols] = [np.roll(v, Counter(v)['']) for v in df[keep_cols].values]
print(df):
aaa bbb ccc ddd eee fff ggg
0 a b c d
1 a b c d e
2 a b c d e f
3 a b c d e f g
4 a b c d
5 a b c d e f
Explanation:
You want to consider only those columns which are to the left of 'eee', so you take those columns as stored in keep_cols
Next you'd want each row to be shifted by some amount (we need to know how much), to shift I used numpy's roll. But how much amount? It is given by number of blank values - for that I used Counter from collections.

python join strings in list of lists

I have a list of lists of individual litters,
however I I would like a list of lists of a string.
What I have:
[
['a', 'b', 'c', 'd', 'e'],
['b', 'c', 'd', 'e', 'a'],
['c', 'd', 'e', 'a', 'b'],
['d', 'e', 'a', 'b', 'c'],
['e', 'a', 'b', 'c', 'd']
]
What I need:
[
['a b c d e'],
['b c d e a'],
['c d e a b'],
['d e a b c'],
['e a b c d']
]
I guess below will work as you wish.
list = [
['a', 'b', 'c', 'd', 'e'],
['b', 'c', 'd', 'e', 'a'],
['c', 'd', 'e', 'a', 'b'],
['d', 'e', 'a', 'b', 'c'],
['e', 'a', 'b', 'c', 'd']
]
newList = [[' '.join(elem)] for elem in list]
print(newList)
List comprehension:
result = [[' '.join(inner)] for inner in outer_list]
You can also do it with this
array = [['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
string = [' '.join(i).split(',') for i in array]
print(string)
You can also do with map,
list(map(lambda x:[' '.join(x)], lst))

Tokenize words and getting elements right before and after this word

My dataframe had a column of strings (col A). I tokenized it and now I have:
Input:
Col A
'A', B', 'C', 'dog', 'C', 'C', 'C', 'C'
'A', B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'
I want to get 2 itens right before and after the word 'dog' in a column B. Therefore, I want something like this:
Output:
Col B
'B', 'C', 'dog', 'C', 'C'
'B', 'B', 'dog', 'D', 'A'
How do I get that?
If there must exist one and only one dog in your column.
import pandas as pd
df = pd.DataFrame({'Col A': ["'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C'", "'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'"]})
def extract(l):
l = [e.strip() for e in l]
idx = l.index("'dog'")
return l[(idx-2 if idx-2 >= 0 else 0):idx+3]
df['Col B'] = df['Col A'].str.split(',').apply(extract)
print(df)
Col A Col B
0 'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C' ['B', 'C', 'dog', 'C', 'C']
1 'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D' ['B', 'B', 'dog', 'D', 'A']

How to combine adjacent same elements of a list in python? [duplicate]

This question already has answers here:
How do I use itertools.groupby()?
(15 answers)
Closed last month.
This is my list:
nab = ['b', 'b', 'a', 'b', 'b', 'b', 'a', 'a', 'a', 'a']
I want to combine the same elements which are adjacent into another list, and if they are not the same, just return the element itself.
The output that I am looking for is:
['b', 'a', 'b', 'a']
I mean:
two 'b' ---> 'b', one 'a' ---> 'a', three 'b' ---> 'b', four 'a' ---> 'a'
I want to know the length of the new list.
Thank you so much #tdelaney, I did it as below:
import itertools
nab = ['B', 'B', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'A', 'B', 'B', 'B', 'B', 'A']
U = []
key_func = lambda x: x[0]
for key, group in itertools.groupby(nab, key_func):
U.append(list(group))
print(U)
print(len(U))
Output:
[['B', 'B'], ['A'], ['B', 'B'], ['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['A', 'A'], ['B', 'B'], ['A', 'A'], ['B'], ['A'], ['B', 'B', 'B', 'B'], ['A']]

Categories

Resources