Python: reconstruct after pos_tag - python

I have to following results after using pos_tag:
list = [('a',` '1'), ('b', '2'), ('c', '3'), ('d', '4')]
Now, I have to reconstruct like the following:
a b c d
I used:
[x[0] for x in list]
But, it resulted in
['a', 'b', 'c' , 'd']

Use join method with " " it will make as string
data = [('a', '1'), ('b', '2'), ('c', '3'), ('d', '4')]
s= " ".join(x[0] for x in data)
print(s)
OUT
a b c d

Related

Generate MultiIndex DataFrame at different length

I need to initialize a multiIndex DataFrame from given data.
id = ['a','b','c'] ;
days = [2,5,4], which means, each id has its corresponding duration of days, i.e. 'a' has day 1,2; 'b' has day 1,..,5; and 'c' has day 1,...4. In another words, the days varies for each id.
And within each day, there are 4 periods, prd = [0,1,2,3].
What I expect to have, is a MultiIndex of DataFrame for each id, at each day and each period.
MultiIndex([('a',1,0),
('a',1,1),
('a',1,2),
('a',1,3),
('a',2,0),
('a',2,1),
('a',2,2),
('a',2,3),
('b',1,0),
('b',1,1),
('b',1,2),
...
('b',5,1),
('b',5,2),
('b',5,3),
('c',1,0),
('c',1,1),
('c',1,2),
...
('c',4,1),
('c',4,2),
('c',4,3),
],
names=['id','day','prd']
)
I tried to handle in python:
Because the days are different for different id, I generate two complete lists of id and days, by loop and list comprehension, and then zip them together to get the tuple pairs. And then I use itertools.product() to combine with period. But what I get is like
[(('a',1),0),
(('a',1),1),
(('a',1),2),....]
If I use pd.MultiIndex.from_product(), I got similar results, that first two index are in a group, third one separated.
Since product won't help either way, the old fashion is to also stretch prd into long and complete list to match the other two fellas , and zip them at once.
I really want to know if there is a better way of generate index from beginning, better than such a long way of loops, list comprehension, zip and product to combine them together. Is there anything in Pandas can handle this case, other than native python data structures?
Many thanks!
Create the combinations using list comprehension with zip:
id = ['a','b','c']
prd = [0,1,2,3]
days = [2,5,4]
result = [(idx, i, p) for d, idx in zip(days, id) for i in range(1, d+1) for p in prd]
print (pd.MultiIndex.from_tuples(result))
MultiIndex([('a', 1, 0),
('a', 1, 1),
('a', 1, 2),
('a', 1, 3),
('a', 2, 0),
('a', 2, 1),
('a', 2, 2),
('a', 2, 3),
('b', 1, 0),
('b', 1, 1),
('b', 1, 2),
('b', 1, 3),
('b', 2, 0),
('b', 2, 1),
('b', 2, 2),
('b', 2, 3),
('b', 3, 0),
('b', 3, 1),
('b', 3, 2),
('b', 3, 3),
('b', 4, 0),
('b', 4, 1),
('b', 4, 2),
('b', 4, 3),
('b', 5, 0),
('b', 5, 1),
('b', 5, 2),
('b', 5, 3),
('c', 1, 0),
('c', 1, 1),
('c', 1, 2),
('c', 1, 3),
('c', 2, 0),
('c', 2, 1),
('c', 2, 2),
('c', 2, 3),
('c', 3, 0),
('c', 3, 1),
('c', 3, 2),
('c', 3, 3),
('c', 4, 0),
('c', 4, 1),
('c', 4, 2),
('c', 4, 3)],
)
You can use np.repeat and np.tile here. You can use this when dealing with large id, prd, days.
You need to repeat id values w.r.t days and each element len(prd) times, we can use np.multiply here.
You need to tile prd values by the total number of days, we can use np.sum here.
Use pd.MultiIndex.from_arrays to build desired output.
id = ['a','b','c']
prd = [0,1,2,3]
days = [2,5,4]
x = np.repeat(id,np.multiply(days, len(prd)))
y = np.concatenate([np.arange(1, i+1).repeat(len(prd)) for i in days])
z = np.tile(prd,np.sum(days))
pd.MultiIndex.from_arrays([x,y,z])
# Equivalent to
# pd.MultiIndex.from_tuples(np.c_[x,y,z].tolist())
# x y z
# | | |
# V V V
MultiIndex([('a', '1', '0'),
('a', '1', '1'),
('a', '1', '2'),
('a', '1', '3'),
('a', '2', '0'),
('a', '2', '1'),
('a', '2', '2'),
('a', '2', '3'),
('b', '1', '0'),
('b', '1', '1'),
('b', '1', '2'),
('b', '1', '3'),
('b', '2', '0'),
('b', '2', '1'),
('b', '2', '2'),
('b', '2', '3'),
('b', '3', '0'),
('b', '3', '1'),
('b', '3', '2'),
('b', '3', '3'),
('b', '4', '0'),
('b', '4', '1'),
('b', '4', '2'),
('b', '4', '3'),
('b', '5', '0'),
('b', '5', '1'),
('b', '5', '2'),
('b', '5', '3'),
('c', '1', '0'),
('c', '1', '1'),
('c', '1', '2'),
('c', '1', '3'),
('c', '2', '0'),
('c', '2', '1'),
('c', '2', '2'),
('c', '2', '3'),
('c', '3', '0'),
('c', '3', '1'),
('c', '3', '2'),
('c', '3', '3'),
('c', '4', '0'),
('c', '4', '1'),
('c', '4', '2'),
('c', '4', '3')],
)

Get every pair-wise combination of values between two lists

Let's say I have two lists;
a = ['A', 'B', 'C', 'D']
b = ['1', '2', '3', '4']
I know I can get every permutation of these two lists like so:
for r in itertools.product(a, b): print (r[0] + r[1])
But what I'm looking for is every pairwise combination stored in a tuple. So, for example, some combinations would be:
[(A, 1), (B, 2), (C, 3), (D, 4)]
[(A, 1), (B, 3), (C, 2), (D, 4)]
[(A, 1), (B, 4), (C, 3), (D, 2)]
[(A, 1), (B, 3), (C, 4), (D, 2)]
[(A, 1), (B, 2), (C, 4), (D, 3)]
So it would iterate through every possible combination so that no letter has the same number value. I'm at a loss for an efficient way to do this (particularly since I need to scale this to three lists in my actual example)
We can make permutations of one of the lists (for example here the latter one), and then zip each permutation together with the first list, like:
from functools import partial
from itertools import permutations
def pairwise_comb(xs, ys):
return map(partial(zip, xs), permutations(ys))
or in case all subelements should be lists (here these are still iterables that can take a specific shape when you "materialize" these):
from functools import partial
from itertools import permutations
def pairwise_comb(xs, ys):
return map(list, map(partial(zip, xs), permutations(ys)))
For the given sample input, we obtain:
>>> for el in pairwise_comb(a, b):
... print(list(el))
...
[('A', '1'), ('B', '2'), ('C', '3'), ('D', '4')]
[('A', '1'), ('B', '2'), ('C', '4'), ('D', '3')]
[('A', '1'), ('B', '3'), ('C', '2'), ('D', '4')]
[('A', '1'), ('B', '3'), ('C', '4'), ('D', '2')]
[('A', '1'), ('B', '4'), ('C', '2'), ('D', '3')]
[('A', '1'), ('B', '4'), ('C', '3'), ('D', '2')]
[('A', '2'), ('B', '1'), ('C', '3'), ('D', '4')]
[('A', '2'), ('B', '1'), ('C', '4'), ('D', '3')]
[('A', '2'), ('B', '3'), ('C', '1'), ('D', '4')]
[('A', '2'), ('B', '3'), ('C', '4'), ('D', '1')]
[('A', '2'), ('B', '4'), ('C', '1'), ('D', '3')]
[('A', '2'), ('B', '4'), ('C', '3'), ('D', '1')]
[('A', '3'), ('B', '1'), ('C', '2'), ('D', '4')]
[('A', '3'), ('B', '1'), ('C', '4'), ('D', '2')]
[('A', '3'), ('B', '2'), ('C', '1'), ('D', '4')]
[('A', '3'), ('B', '2'), ('C', '4'), ('D', '1')]
[('A', '3'), ('B', '4'), ('C', '1'), ('D', '2')]
[('A', '3'), ('B', '4'), ('C', '2'), ('D', '1')]
[('A', '4'), ('B', '1'), ('C', '2'), ('D', '3')]
[('A', '4'), ('B', '1'), ('C', '3'), ('D', '2')]
[('A', '4'), ('B', '2'), ('C', '1'), ('D', '3')]
[('A', '4'), ('B', '2'), ('C', '3'), ('D', '1')]
[('A', '4'), ('B', '3'), ('C', '1'), ('D', '2')]
[('A', '4'), ('B', '3'), ('C', '2'), ('D', '1')]
This thus results in 24 possible ways to combine this, since the order of 'A', 'B', 'C' and 'D' remains fixed, and the 4 characters can be assigned in 4! ways, or 4! = 4×3×2×1 = 24.
It may be a lot easier than you think. What about:
import itertools
a = ['A', 'B', 'C', 'D']
b = ['1', '2', '3', '4']
for aperm in itertools.permutations(a):
for bperm in itertools.permutations(b):
print(list(zip(aperm, bperm)))
First outputs:
[('A', '1'), ('B', '2'), ('C', '3'), ('D', '4')]
[('A', '1'), ('B', '2'), ('C', '4'), ('D', '3')]
[('A', '1'), ('B', '3'), ('C', '2'), ('D', '4')]
[('A', '1'), ('B', '3'), ('C', '4'), ('D', '2')]
[('A', '1'), ('B', '4'), ('C', '2'), ('D', '3')]
[('A', '1'), ('B', '4'), ('C', '3'), ('D', '2')]
[('A', '2'), ('B', '1'), ('C', '3'), ('D', '4')]
[('A', '2'), ('B', '1'), ('C', '4'), ('D', '3')]
[('A', '2'), ('B', '3'), ('C', '1'), ('D', '4')]
...
(There are 576 lines printed for these two 4-element lists)
Edit: If you want to generalize this to more iterables, you could do something like:
import itertools
a = ['A', 'B', 'C', 'D']
b = ['1', '2', '3', '4']
gens = [itertools.permutations(lst) for lst in (a,b)]
for perms in itertools.product(*gens):
print(list(zip(*perms)))
Which outputs the same thing, but could be easily extended, e.g.
import itertools
a = ['A', 'B', 'C', 'D']
b = ['1', '2', '3', '4']
c = ['W', 'X', 'Y', 'Z']
gens = [itertools.permutations(lst) for lst in (a,b,c)] # add c
for perms in itertools.product(*gens): # no change
print(list(zip(*perms))) # ''
You can use recursion with yield for a no-import solution:
a = ['A', 'B', 'C', 'D']
b = ['1', '2', '3', '4']
def combinations(d, current = []):
if len(current) == 4:
yield current
elif filter(None, d):
for i in d[0]:
_d0, _d1 = [c for c in d[0] if c != i], [c for c in d[1] if c != d[1][0]]
yield from combinations([_d0, _d1] , current+[[i, d[1][0]]])
for i in combinations([a, b]):
print(i)
Output:
[['A', '1'], ['B', '2'], ['C', '3'], ['D', '4']]
[['A', '1'], ['B', '2'], ['D', '3'], ['C', '4']]
[['A', '1'], ['C', '2'], ['B', '3'], ['D', '4']]
[['A', '1'], ['C', '2'], ['D', '3'], ['B', '4']]
[['A', '1'], ['D', '2'], ['B', '3'], ['C', '4']]
[['A', '1'], ['D', '2'], ['C', '3'], ['B', '4']]
[['B', '1'], ['A', '2'], ['C', '3'], ['D', '4']]
[['B', '1'], ['A', '2'], ['D', '3'], ['C', '4']]
[['B', '1'], ['C', '2'], ['A', '3'], ['D', '4']]
[['B', '1'], ['C', '2'], ['D', '3'], ['A', '4']]
[['B', '1'], ['D', '2'], ['A', '3'], ['C', '4']]
[['B', '1'], ['D', '2'], ['C', '3'], ['A', '4']]
[['C', '1'], ['A', '2'], ['B', '3'], ['D', '4']]
[['C', '1'], ['A', '2'], ['D', '3'], ['B', '4']]
[['C', '1'], ['B', '2'], ['A', '3'], ['D', '4']]
[['C', '1'], ['B', '2'], ['D', '3'], ['A', '4']]
[['C', '1'], ['D', '2'], ['A', '3'], ['B', '4']]
[['C', '1'], ['D', '2'], ['B', '3'], ['A', '4']]
[['D', '1'], ['A', '2'], ['B', '3'], ['C', '4']]
[['D', '1'], ['A', '2'], ['C', '3'], ['B', '4']]
[['D', '1'], ['B', '2'], ['A', '3'], ['C', '4']]
[['D', '1'], ['B', '2'], ['C', '3'], ['A', '4']]
[['D', '1'], ['C', '2'], ['A', '3'], ['B', '4']]
[['D', '1'], ['C', '2'], ['B', '3'], ['A', '4']]

How to print values from a tuple next to each other after permutation?

Task:
You are given a string "S".
Your task is to print all possible permutations of size of the string in
lexicographic sorted order.
Input Format:
A single line containing the space separated string "S" and the integer
value "K".
Sample Code explaining how permutations work(I used one of these later):
>>> from itertools import permutations
>>> print permutations(['1','2','3'])
<itertools.permutations object at 0x02A45210>
>>>
>>> print list(permutations(['1','2','3']))
[('1', '2', '3'), ('1', '3', '2'), ('2', '1', '3'), ('2', '3', '1'),
('3', '1', '2'), ('3', '2', '1')]
>>>
>>> print list(permutations(['1','2','3'],2))
[('1', '2'), ('1', '3'), ('2', '1'), ('2', '3'), ('3', '1'), ('3',
'2')]
>>>
>>> print list(permutations('abc',3))
[('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'),
('c', 'a', 'b'), ('c', 'b', 'a')]
Sample Input:
HACK 2
Sample Output:
One under another:
AC
AH
AK
CA
CH
CK
HA
HC
HK
KA
KC
KH
Explanation:
All possible size 2 permutations of the string "HACK" are printed in
lexicographic sorted order.
Here is my Code:
from itertools import permutations
S = input().split()
K = "".join(sorted(A[0].upper()))
C = int(A[1])
for i in permutations(S,C):
print(i)
But the output is:
('A', 'C')
('A', 'H')
('A', 'K')
('C', 'A')
('C', 'H')
('C', 'K')
('H', 'A')
('H', 'C')
('H', 'K')
('K', 'A')
('K', 'C')
('K', 'H')
How to print the elements of these tuples without parentheses and quotes in this way?:
AC
AH
AK one under another.
Note that it has to work when user type: "hack 3" or "anything x", where x is the number of elements of every element from permutaion.
You can use str.join() and print them like a strings:
from itertools import permutations
a = list(permutations('hack',2))
# In a more pythonic way you can do:
# a = permutations('hack', 2)
# Then you can include it in a foor loop
for k in a:
print("".join(k).upper(), end = " ")
Output:
HA HC HK AH AC AK CH CA CK KH KA KC
for i in permutations(S,C):
print(i)
to
for i in permutations(S,C):
for j in range(C):
print(i[j], end='')
I'm assuming you're using Python 3.
from itertools import permutations
A = input().split()
B = "".join(sorted(A[0].upper()))
C = int(A[1])
a = list(permutations(B,C))
for k in a:
print("".join(k))
We got it! Thanks to #Chiheb Nexus!

Python Iteration Through Lists with Stepped Progression

Given the following lists:
letters = ('a', 'b', 'c', 'd', 'e', 'f', 'g')
numbers = ('1', '2', '3', '4')
How can I produce an iterated list that produces the following:
output = [('a', '1'), ('b', '2'), ('c', '3'), ('d', '4'),
('e', '1'), ('f', '2'), ('g', '3'), ('a', '4'),
('b', '1'), ('c', '2'), ('d', '3'), ('e', '4'),
('f', '1'), ('g', '2')...]
I feel like I should be able to produce the desired output by using
output = (list(zip(letters, itertools.cycle(numbers))
But this produces the following:
output = [('a', '1'), ('b', '2'), ('c', '3'), ('d', '4'),
('e', '1'), ('f', '2'), ('g', '3')]
Any help would be greatly appreciated.
If you are looking for an infinite generator, you can use cycle with zip for both lists, in the form of zip(itertools.cycle(x), itertools.cycle(y)). That would supply you with the required generator:
>>> for x in zip(itertools.cycle(letters), itertools.cycle(numbers)):
... print(x)
...
('a', '1')
('b', '2')
('c', '3')
('d', '4')
('e', '1')
('f', '2')
('g', '3')
('a', '4')
('b', '1')
('c', '2')
('d', '3')
...
If you want a finite list of elements, this should work
import itertools
letters = ('a', 'b', 'c', 'd', 'e', 'f', 'g')
numbers = ('1', '2', '3', '4')
max_elems = 10
list(itertools.islice((zip(itertools.cycle(letters), itertools.cycle(numbers))), max_elems))
results in
[('a', '1'), ('b', '2'), ('c', '3'), ('d', '4'), ('e', '1'), ('f', '2'), ('g', '3'), ('a', '4'), ('b', '1'), ('c', '2')]

How to create a new list of tuples based on a values from original list of tuples?

My function is currently returning:
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'e', 'f'), ('h', 'b', 'c')]
However, I need the final output to be:
[('a', 'h'), ('d', 'g')]
As you can see, if i[1] and i[2] match I need i[0] to be paired together.
I was trying to use a for loop but I can't think of how to write it, at this moment.
This seems to work:
from itertools import combinations
l = [('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'e', 'f'), ('h', 'b', 'c')]
print([(a[0], b[0]) for a, b in combinations(l, 2) if a[1:] == b[1:]])
You can do this by sorting the list based on second and third element , and then using itertools.groupby . Then for each group, you can take the first elements from the elements inside it. Example -
>>> a = [('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'e', 'f'), ('h', 'b', 'c')]
>>> lst = []
>>> new_a = sorted(a, key=lambda i: (i[1], i[2]))
>>> for _, x in itertools.groupby(new_a, lambda i: (i[1], i[2])):
... lst.append(tuple(y[0] for y in x))
...
>>> lst
[('a', 'h'), ('d', 'g')]
This can also be done in one line as (though unreadable) -
>>> l = [tuple(y[0] for y in x) for _, x in itertools.groupby(sorted(a, key=lambda i: (i[1], i[2])), lambda i: (i[1], i[2]))]
>>> l
[('a', 'h'), ('d', 'g')]
group based on the second and third elements of each tuple, appending the first element to a list then filter out the lists that have a length < 1:
from collections import defaultdict
d = defaultdict(list)
for a,b,c in l:
d[b,c].append(a)
print([tuple(val) for val in d.values() if len(val)>1])
[('a', 'h'), ('d', 'g')]
To guarantee first match order use an OrderedDict:
from collections import OrderedDict
d = OrderedDict()
for a,b,c in l:
d.setdefault((b,c),[]).append(a)
print([tuple(val) for val in d.values() if len(val)>1])
I think this solution will preserve order (based on initial match location):
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
x = [('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'e', 'f'), ('h', 'b', 'c')]
groupings, seen_list=defaultdict(list), []
for key, value in groupby(x, itemgetter(1, 2)):
if key not in seen_list:
seen_list.append(key)
groupings[key].extend(list(map(itemgetter(0),value)))
print([groupings[key] for key in seen_list])
if order is not important you can disregard the seen_list and just print the groupings.values()
x = [('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'e', 'f'), ('h', 'b', 'c')]
groupings=defaultdict(list)
for key, value in groupby(x, itemgetter(1, 2)):
groupings[key].extend(list(map(itemgetter(0),value)))
print(groupings.values())
May be not so pythonic, but a bit easier:
>>> a = [('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'e', 'f'), ('h', 'b', 'c')]
>>> c = {}
>>> [c[j+k].append(i) if j+k in c else c.update({j+k:[i]}) for i,j,k in a]
>>> c = c.values()
>>> print c
[['d', 'g'], ['a', 'h']]

Categories

Resources