Join elements of two lists in Python - python

Say I have two Python lists containing strings that may or may not be of the same length.
list1 = ['a','b']
list2 = ['c','d','e']
I want to get the following result:
l = ['a c','a d','a e','b c','b d','b e']
The final list all possible combinations from the two lists with a space in between them.
One method I've tried is with itertools
import itertools
for p in itertools.permutations(, 2):
print(zip(*p))
But unfortunately this was not what I needed, as it did not return any combinations at all.

First make all possible combinations of the two lists, then use list comprehension to achieve the desired result:
list1 = ['a', 'b']
list2 = ['c', 'd', 'e']
com = [(x,y) for x in list1 for y in list2]
print([a + ' ' + b for (a, b) in com]) # ['a c', 'a d', 'a e', 'b c', 'b d', 'b e']

What you want is a cartesian product.
Code:
import itertools
list1 = ['a', 'b']
list2 = ['c', 'd', 'e']
l = ['%s %s' % (e[0], e[1]) for e in itertools.product(list1, list2)]
print(l)
result:
['a c', 'a d', 'a e', 'b c', 'b d', 'b e']

This is another possible method:
list1=['a','b']
list2=['c','d','e']
list3=[]
for i in list1:
for j in list2:
list3.append(i+" "+j)
print(list3)

One-Liner Solution, Use list comprehension and add the items of list
list1 = ['a','b']
list2 = ['c','d','e']
print([i+j for i in list1 for j in list2])

Related

Replace parts of string that match items from one list with parts of string from another list one by one

For example, I have a string:
string1 = 'a b c d e f g h i j'
How can I make it look like this:
string1 = 'a_b c_d e_f g_h i_j'
by matching the substrings of string1 with items from list1, and then replacing the matched substrings with items from list2 sequentially (e.g. by checking the presence of each item from list1 in string1, and then replacing it with respective list2's item if list1's item is present in the string)?
list1 = ['a b', 'c d', 'e f', 'g h', 'i j']
list2 = ['a_b', 'c_d', 'e_f', 'g_h', 'i_j']
You may pair up the lists with zip, then apply str.replace on every pair
list1 = ['a b', 'c d', 'e f', 'g h', 'i j']
list2 = ['a_b', 'c_d', 'e_f', 'g_h', 'i_j']
string1 = 'a b c d e f g h i j'
for search_text, replace_text in zip(list1, list2):
string1 = string1.replace(search_text, replace_text)
# a_b c_d e_f g_h i_j

Selecting appropriate lines of a list using indexing

I want to extract lines in a list that contain carbons ('C').
The actual lines are:
propene_data = ['H -0.08677109049370 0.00000005322169 0.02324774260533\n', 'C -0.02236345244409 -0.00000001742911 1.09944502076327\n', 'C 1.14150994274008 0.00000000299501 1.72300489107368\n', 'H -0.95761218150040 -0.00000002374717 1.63257861279343\n', 'H 1.17043966864771 0.00000000845005 2.80466760537188\n', 'C 2.46626448549704 -0.00000000616665 1.02315746104893\n', 'H 3.28540550052797 0.00000001315434 1.73628424885091\n', 'H 2.55984407099540 -0.87855375749407 0.38655722260408\n', 'H 2.55984405602998 0.87855372701591 0.38655719488850\n']
I've tried to extract the carbons line using the following solution;
car1 = propene_data[1].split()
car2 = propene_data[2].split()
car3 = propene_data[5].split()
propene_carbons = car1 + car2 + car3
This solution gives;
propene_carbons = ['C', '-0.02236345244409', '-0.00000001742911', '1.09944502076327', 'C', '1.14150994274008', '0.00000000299501', '1.72300489107368', 'C', '2.46626448549704', '-0.00000000616665', '1.02315746104893']
It gives what I want, but I would like to know if I could indexing instead (in case the list is much longer). How do I use indexing in this case?
What you need here is startswith:
result = text.startswith('C')
in loop:
result = [i for i in propene_data if i.startswith('C')]
Output:
['C -0.02236345244409 -0.00000001742911 1.09944502076327\n',
'C 1.14150994274008 0.00000000299501 1.72300489107368\n',
'C 2.46626448549704 -0.00000000616665 1.02315746104893\n']
you can use this :
propene_array=np.array([i.split() for i in propene_data])
sub_array=np.where(propene_array[:,0]=='C')[0]
propene_carbon=[]
for i in sub_array :
propene_carbon+=list(propene_array[i])
output :
['C', '-0.02236345244409', '-0.00000001742911', '1.09944502076327', 'C',
'1.14150994274008', '0.00000000299501', '1.72300489107368', 'C',
'2.46626448549704', '-0.00000000616665', '1.02315746104893']

how to enumerate / zip as lambda

Is there a way to replace the for-loop in the groupList function with a lambda function, perhaps with map(), in Python 3.
def groupList(input_list, output_list=[]):
for i, (v, w) in enumerate(zip(input_list[:-2], input_list[2:])):
output_list.append(f'{input_list[i]} {input_list[i+1]} {input_list[i+2]}')
return output_list
print(groupList(['A', 'B', 'C', 'D', 'E', 'F', 'G']))
(Output from the groupList function would be ['A B C', 'B C D', 'C D E', 'D E F', 'E F G'])
Solution 1:
def groupList(input_list):
return [' '.join(input_list[i:i+3]) for i in range(len(input_list) - 2)]
Solution 2:
def groupList(input_list):
return list(map(' '.join, (input_list[i:i+3] for i in range(len(input_list) - 2))))
Besides the previous solutions, a more efficient (but less concise) solution is to compute a full concatenation first and then slice it.
from itertools import accumulate
def groupList(input_list):
full_concat = ' '.join(input_list)
idx = [0]
idx.extend(accumulate(len(s) + 1 for s in input_list))
return [full_concat[idx[i]:idx[i+3]-1] for i in range(len(idx) - 3)]

Moving elements to the end of a list

I want to solve two problems regarding sorting my list in python.
1) In my list, there is an element starts with "noname" and a number comes after it like this, "noname3" or "noname4" (each list contains only one noname+number)
This noname aggregates all the nonames and the number after it shows however many nonames are there.
My question is that how can I send this noname+integer element to the end?
2) As you can see below, sorted function will sort English first then Korean. Is there any way that I can sort Korean first then English? Of course 'noname' at the end.
names = ['Z', 'C', 'A B', 'noname3', 'ㄴ', 'ㄱ', 'D A', 'A A' , 'ㄷ']
sorted(names)
# Output
['A A', 'A B', 'C', 'D A','noname3', 'Z', 'ㄱ', 'ㄴ', 'ㄷ']
# Desired Output
[ 'ㄱ', 'ㄴ', 'ㄷ', 'A A', 'A B', 'C', 'D A', 'Z', 'noname3']
Use a key function that sorts the noname items higher than the non-noname items.
sorted(names, key=lambda x: (x.startswith("noname"), x))
Without knowing how exactly Korean characters are alphabetized, here's my attempt (based on #kindall's start). Note, you can pass a custom function into the key parameter of the sorter
def sorter(char):
#Place english characters after Korean
if ord(char[0])>122:
return ord(char[0])-12000
else:
return ord(char[0])+12000
lst=['Z', 'C', 'A B', 'noname3', 'ㄴ', 'ㄱ', 'D A', 'A A' , 'ㄷ']
sorted(lst, key=lambda x: (x.startswith('noname'),sorter(x)))
['ㄱ', 'ㄴ', 'ㄷ', 'A B', 'A A', 'C', 'D A', 'Z', 'noname3']

How to split a string into characters in python

I have a string 'ABCDEFG'
I want to be able to list each character sequentially followed by the next one.
Example
A B
B C
C D
D E
E F
F G
G
Can you tell me an efficient way of doing this? Thanks
In Python, a string is already seen as an enumerable list of characters, so you don't need to split it; it's already "split". You just need to build your list of substrings.
It's not clear what form you want the result in. If you just want substrings, this works:
s = 'ABCDEFG'
[s[i:i+2] for i in range(len(s))]
#=> ['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
If you want the pairs to themselves be lists instead of strings, just call list on each one:
[list([s[i:i+2]) for i in range(len(s))]
#=> [['A', 'B'], ['B', 'C'], ['C', 'D'], ['D', 'E'], ['E', 'F'], ['F', 'G'], ['G']]
And if you want strings after all, but with something like a space between the letters, join them back together after the list call:
[' '.join(list(s[i:i+2])) for i in range(len(s))]
#=> ['A B', 'B C', 'C D', 'D E', 'E F', 'F G', 'G']
You need to keep the last character, so use izip_longest from itertools
>>> import itertools
>>> s = 'ABCDEFG'
>>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''):
... print c, cnext
...
A B
B C
C D
D E
E F
F G
G
def doit(input):
for i in xrange(len(input)):
print input[i] + (input[i + 1] if i != len(input) - 1 else '')
doit("ABCDEFG")
Which yields:
>>> doit("ABCDEFG")
AB
BC
CD
DE
EF
FG
G
There's an itertools pairwise recipe for exactly this use case:
import itertools
def pairwise(myStr):
a,b = itertools.tee(myStr)
next(b,None)
for s1,s2 in zip(a,b):
print(s1,s2)
Output:
In [121]: pairwise('ABCDEFG')
A B
B C
C D
D E
E F
F G
Your problem is that you have a list of strings, not a string:
with open('ref.txt') as f:
f1 = f.read().splitlines()
f.read() returns a string. You call splitlines() on it, getting a list of strings (one per line). If your input is actually 'ABCDEFG', this will of course be a list of one string, ['ABCDEFG'].
l = list(f1)
Since f1 is already a list, this just makes l a duplicate copy of that list.
print l, f1, len(l)
And this just prints the list of lines, and the copy of the list of lines, and the number of lines.
So, first, what happens if you drop the splitlines()? Then f1 will be the string 'ABCDEFG', instead of a list with that one string. That's a good start. And you can drop the l part entirely, because f1 is already an iterable of its characters; list(f1) will just be a different iterable of the same characters.
So, now you want to print each letter with the next letter. One way to do that is by zipping 'ABCDEFG' and 'BCDEFG '. But how do you get that 'BCDEFG '? Simple; it's just f1[1:] + ' '.
So:
with open('ref.txt') as f:
f1 = f.read()
for left, right in zip(f1, f1[1:] + ' '):
print left, right
Of course for something this simple, there are many other ways to do the same thing. You can iterate over range(len(f1)) and get 2-element slices, or you can use itertools.zip_longest, or you can write a general-purpose "overlapping adjacent groups of size N from any iterable" function out of itertools.tee and zip, etc.
As you want space between the characters you can use zip function and list comprehension :
>>> s="ABCDEFG"
>>> l=[' '.join(i) for i in zip(s,s[1:])]
['A B', 'B C', 'C D', 'D E', 'E F', 'F G']
>>> for i in l:
... print i
...
A B
B C
C D
D E
E F
F G
if you dont want space just use list comprehension :
>>> [s[i:i+2] for i in range(len(s))]
['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']

Categories

Resources