I want to extract lines in a list that contain carbons ('C').
The actual lines are:
propene_data = ['H -0.08677109049370 0.00000005322169 0.02324774260533\n', 'C -0.02236345244409 -0.00000001742911 1.09944502076327\n', 'C 1.14150994274008 0.00000000299501 1.72300489107368\n', 'H -0.95761218150040 -0.00000002374717 1.63257861279343\n', 'H 1.17043966864771 0.00000000845005 2.80466760537188\n', 'C 2.46626448549704 -0.00000000616665 1.02315746104893\n', 'H 3.28540550052797 0.00000001315434 1.73628424885091\n', 'H 2.55984407099540 -0.87855375749407 0.38655722260408\n', 'H 2.55984405602998 0.87855372701591 0.38655719488850\n']
I've tried to extract the carbons line using the following solution;
car1 = propene_data[1].split()
car2 = propene_data[2].split()
car3 = propene_data[5].split()
propene_carbons = car1 + car2 + car3
This solution gives;
propene_carbons = ['C', '-0.02236345244409', '-0.00000001742911', '1.09944502076327', 'C', '1.14150994274008', '0.00000000299501', '1.72300489107368', 'C', '2.46626448549704', '-0.00000000616665', '1.02315746104893']
It gives what I want, but I would like to know if I could indexing instead (in case the list is much longer). How do I use indexing in this case?
What you need here is startswith:
result = text.startswith('C')
in loop:
result = [i for i in propene_data if i.startswith('C')]
Output:
['C -0.02236345244409 -0.00000001742911 1.09944502076327\n',
'C 1.14150994274008 0.00000000299501 1.72300489107368\n',
'C 2.46626448549704 -0.00000000616665 1.02315746104893\n']
you can use this :
propene_array=np.array([i.split() for i in propene_data])
sub_array=np.where(propene_array[:,0]=='C')[0]
propene_carbon=[]
for i in sub_array :
propene_carbon+=list(propene_array[i])
output :
['C', '-0.02236345244409', '-0.00000001742911', '1.09944502076327', 'C',
'1.14150994274008', '0.00000000299501', '1.72300489107368', 'C',
'2.46626448549704', '-0.00000000616665', '1.02315746104893']
Related
Say I have two Python lists containing strings that may or may not be of the same length.
list1 = ['a','b']
list2 = ['c','d','e']
I want to get the following result:
l = ['a c','a d','a e','b c','b d','b e']
The final list all possible combinations from the two lists with a space in between them.
One method I've tried is with itertools
import itertools
for p in itertools.permutations(, 2):
print(zip(*p))
But unfortunately this was not what I needed, as it did not return any combinations at all.
First make all possible combinations of the two lists, then use list comprehension to achieve the desired result:
list1 = ['a', 'b']
list2 = ['c', 'd', 'e']
com = [(x,y) for x in list1 for y in list2]
print([a + ' ' + b for (a, b) in com]) # ['a c', 'a d', 'a e', 'b c', 'b d', 'b e']
What you want is a cartesian product.
Code:
import itertools
list1 = ['a', 'b']
list2 = ['c', 'd', 'e']
l = ['%s %s' % (e[0], e[1]) for e in itertools.product(list1, list2)]
print(l)
result:
['a c', 'a d', 'a e', 'b c', 'b d', 'b e']
This is another possible method:
list1=['a','b']
list2=['c','d','e']
list3=[]
for i in list1:
for j in list2:
list3.append(i+" "+j)
print(list3)
One-Liner Solution, Use list comprehension and add the items of list
list1 = ['a','b']
list2 = ['c','d','e']
print([i+j for i in list1 for j in list2])
I want to solve two problems regarding sorting my list in python.
1) In my list, there is an element starts with "noname" and a number comes after it like this, "noname3" or "noname4" (each list contains only one noname+number)
This noname aggregates all the nonames and the number after it shows however many nonames are there.
My question is that how can I send this noname+integer element to the end?
2) As you can see below, sorted function will sort English first then Korean. Is there any way that I can sort Korean first then English? Of course 'noname' at the end.
names = ['Z', 'C', 'A B', 'noname3', 'ㄴ', 'ㄱ', 'D A', 'A A' , 'ㄷ']
sorted(names)
# Output
['A A', 'A B', 'C', 'D A','noname3', 'Z', 'ㄱ', 'ㄴ', 'ㄷ']
# Desired Output
[ 'ㄱ', 'ㄴ', 'ㄷ', 'A A', 'A B', 'C', 'D A', 'Z', 'noname3']
Use a key function that sorts the noname items higher than the non-noname items.
sorted(names, key=lambda x: (x.startswith("noname"), x))
Without knowing how exactly Korean characters are alphabetized, here's my attempt (based on #kindall's start). Note, you can pass a custom function into the key parameter of the sorter
def sorter(char):
#Place english characters after Korean
if ord(char[0])>122:
return ord(char[0])-12000
else:
return ord(char[0])+12000
lst=['Z', 'C', 'A B', 'noname3', 'ㄴ', 'ㄱ', 'D A', 'A A' , 'ㄷ']
sorted(lst, key=lambda x: (x.startswith('noname'),sorter(x)))
['ㄱ', 'ㄴ', 'ㄷ', 'A B', 'A A', 'C', 'D A', 'Z', 'noname3']
I've got a list where each element is:
['a ',' b ',' c ',' d\n ']
I want to manipulate it so that each element just becomes:
['a','b','c','d']
I don't think the spaces matter, but for some reason I can't seem to remove the \n from the end of the 4th element. I've tried converting to string and removing it using:
str.split('\n')
No error is returned, but it doesn't do anything to the list, it still has the \n at the end.
I've also tried:
d.replace('\n','')
But this just returns an error.
This is clearly a simple problem but I'm a complete beginner to Python so any help would be appreciated, thank you.
Edit:
It seems I have a list of arrays (I think) so am I right in thinking that list[0], list[1] etc are their own arrays? Does that mean I can use a for loop for i in list to strip \n from each one?
>>> my_array = ['a ',' b ',' c ',' d\n ']
>>> my_array = [c.strip() for c in my_array]
>>> my_array
['a', 'b', 'c', 'd']
If you have a list of arrays then you can do something in the lines of:
>>> list_of_arrays = [['a', 'b', 'c', 'd'], ['a ', ' b ', ' c ', ' d\n ']]
>>> new_list = [[c.strip() for c in array] for array in list_of_arrays]
>>> new_list
[['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']]
Try this -
arr = ['a ',' b ',' c ',' d\n ']
arr = [s.strip() for s in arr]
A very simple answer is join your list, strip the nextline charcter and split to get a new list:
Newlist = ''.join(myList).strip().split()
Your Newlist is now:
['a', 'b', 'c', 'd']
I have a string 'ABCDEFG'
I want to be able to list each character sequentially followed by the next one.
Example
A B
B C
C D
D E
E F
F G
G
Can you tell me an efficient way of doing this? Thanks
In Python, a string is already seen as an enumerable list of characters, so you don't need to split it; it's already "split". You just need to build your list of substrings.
It's not clear what form you want the result in. If you just want substrings, this works:
s = 'ABCDEFG'
[s[i:i+2] for i in range(len(s))]
#=> ['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
If you want the pairs to themselves be lists instead of strings, just call list on each one:
[list([s[i:i+2]) for i in range(len(s))]
#=> [['A', 'B'], ['B', 'C'], ['C', 'D'], ['D', 'E'], ['E', 'F'], ['F', 'G'], ['G']]
And if you want strings after all, but with something like a space between the letters, join them back together after the list call:
[' '.join(list(s[i:i+2])) for i in range(len(s))]
#=> ['A B', 'B C', 'C D', 'D E', 'E F', 'F G', 'G']
You need to keep the last character, so use izip_longest from itertools
>>> import itertools
>>> s = 'ABCDEFG'
>>> for c, cnext in itertools.izip_longest(s, s[1:], fillvalue=''):
... print c, cnext
...
A B
B C
C D
D E
E F
F G
G
def doit(input):
for i in xrange(len(input)):
print input[i] + (input[i + 1] if i != len(input) - 1 else '')
doit("ABCDEFG")
Which yields:
>>> doit("ABCDEFG")
AB
BC
CD
DE
EF
FG
G
There's an itertools pairwise recipe for exactly this use case:
import itertools
def pairwise(myStr):
a,b = itertools.tee(myStr)
next(b,None)
for s1,s2 in zip(a,b):
print(s1,s2)
Output:
In [121]: pairwise('ABCDEFG')
A B
B C
C D
D E
E F
F G
Your problem is that you have a list of strings, not a string:
with open('ref.txt') as f:
f1 = f.read().splitlines()
f.read() returns a string. You call splitlines() on it, getting a list of strings (one per line). If your input is actually 'ABCDEFG', this will of course be a list of one string, ['ABCDEFG'].
l = list(f1)
Since f1 is already a list, this just makes l a duplicate copy of that list.
print l, f1, len(l)
And this just prints the list of lines, and the copy of the list of lines, and the number of lines.
So, first, what happens if you drop the splitlines()? Then f1 will be the string 'ABCDEFG', instead of a list with that one string. That's a good start. And you can drop the l part entirely, because f1 is already an iterable of its characters; list(f1) will just be a different iterable of the same characters.
So, now you want to print each letter with the next letter. One way to do that is by zipping 'ABCDEFG' and 'BCDEFG '. But how do you get that 'BCDEFG '? Simple; it's just f1[1:] + ' '.
So:
with open('ref.txt') as f:
f1 = f.read()
for left, right in zip(f1, f1[1:] + ' '):
print left, right
Of course for something this simple, there are many other ways to do the same thing. You can iterate over range(len(f1)) and get 2-element slices, or you can use itertools.zip_longest, or you can write a general-purpose "overlapping adjacent groups of size N from any iterable" function out of itertools.tee and zip, etc.
As you want space between the characters you can use zip function and list comprehension :
>>> s="ABCDEFG"
>>> l=[' '.join(i) for i in zip(s,s[1:])]
['A B', 'B C', 'C D', 'D E', 'E F', 'F G']
>>> for i in l:
... print i
...
A B
B C
C D
D E
E F
F G
if you dont want space just use list comprehension :
>>> [s[i:i+2] for i in range(len(s))]
['AB', 'BC', 'CD', 'DE', 'EF', 'FG', 'G']
I have some lists:
list_line[0] = ['my name is A']
list_line[0] = ['my name is B']
list_line[0] = ['my name is C']
list_line[0] = ['my name is D']
And I make this work:
list_words_0 = list_line[0].split()
list_words_1 = list_line[1].split()
list_words_2 = list_line[2].split()
list_words_3 = list_line[3].split()
list_words = list_words_0 + list_words_1 + list_words_2 + list_words_3
print list_words
So with the above steps, i can do it with a for-loop:
for i in range(3):
do something
but i don't know how to put i in variables: list_words_0, list_words_1, list_words_2 ...
Can you help me change the name of that lists with a for loop? thank you so much !
I'd recommend using Dictionaries. What you want to do is loop 0 to i (i being the max amount of list_words_ variables you want), then create a string that goes "list_words_" + str(i). Then use the dictionary to keep track of these:
#assume you're inside the loop
example_dictionary["list_words_" + str(i)] = list_line[0];
NOTE: Treat this as pseudo code. It's kind of late so I may have small syntax errors.
However, your question is a bit unclear. It may be that you want to combine multiple lists or arrays into one list or array. There are many examples of this already on this site (like this and this).
I am assuming your data is like this since you are trying to do a split later on:
list_line = ['my name is A',
'my name is B',
'my name is C',
'my name is D']
Your end goal (of concatenating all the lists of words) can be achieved by simply doing:
output_list = []
for alist in list_line:
output_list.append(alist.split())
prints
[['my', 'name', 'is', 'A'],
['my', 'name', 'is', 'B'],
['my', 'name', 'is', 'C'],
['my', 'name', 'is', 'D']]