How to split a string into characters [duplicate] - python

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
Essentially I want to suck a line of text from a file, assign the characters to a list, and create a list of all the separate characters in a list -- a list of lists.
At the moment, I've tried this:
fO = open(filename, 'rU')
fL = fO.readlines()
That's all I've got. I don't quite know how to extract the single characters and assign them to a new list.
The line I get from the file will be something like:
fL = 'FHFF HHXH XXXX HFHX'
I want to turn it into this list, with each single character on its own:
['F', 'H', 'F', 'F', 'H', ...]

You can do this using list:
new_list = list(fL)
Be aware that any spaces in the line will be included in this list, to the best of my knowledge.

I'm a bit late it seems to be, but...
a='hello'
print list(a)
# ['h','e','l','l', 'o']

Strings are iterable (just like a list).
I'm interpreting that you really want something like:
fd = open(filename,'rU')
chars = []
for line in fd:
for c in line:
chars.append(c)
or
fd = open(filename, 'rU')
chars = []
for line in fd:
chars.extend(line)
or
chars = []
with open(filename, 'rU') as fd:
map(chars.extend, fd)
chars would contain all of the characters in the file.

python >= 3.5
Version 3.5 onwards allows the use of PEP 448 - Extended Unpacking Generalizations:
>>> string = 'hello'
>>> [*string]
['h', 'e', 'l', 'l', 'o']
This is a specification of the language syntax, so it is faster than calling list:
>>> from timeit import timeit
>>> timeit("list('hello')")
0.3042821969866054
>>> timeit("[*'hello']")
0.1582647830073256

So to add the string hello to a list as individual characters, try this:
newlist = []
newlist[:0] = 'hello'
print (newlist)
['h','e','l','l','o']
However, it is easier to do this:
splitlist = list(newlist)
print (splitlist)

fO = open(filename, 'rU')
lst = list(fO.read())

Or use a fancy list comprehension, which are supposed to be "computationally more efficient", when working with very very large files/lists
fd = open(filename,'r')
chars = [c for line in fd for c in line if c is not " "]
fd.close()
Btw: The answer that was accepted does not account for the whitespaces...

a='hello world'
map(lambda x:x, a)
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
An easy way is using function “map()”.

In python many things are iterable including files and strings.
Iterating over a filehandler gives you a list of all the lines in that file.
Iterating over a string gives you a list of all the characters in that string.
charsFromFile = []
filePath = r'path\to\your\file.txt' #the r before the string lets us use backslashes
for line in open(filePath):
for char in line:
charsFromFile.append(char)
#apply code on each character here
or if you want a one liner
#the [0] at the end is the line you want to grab.
#the [0] can be removed to grab all lines
[list(a) for a in list(open('test.py'))][0]
.
.
Edit: as agf mentions you can use itertools.chain.from_iterable
His method is better, unless you want the ability to specify which lines to grab
list(itertools.chain.from_iterable(open(filename, 'rU)))
This does however require one to be familiar with itertools, and as a result looses some readablity
If you only want to iterate over the chars, and don't care about storing a list, then I would use the nested for loops. This method is also the most readable.

Because strings are (immutable) sequences they can be unpacked similar to lists:
with open(filename, 'rU') as fd:
multiLine = fd.read()
*lst, = multiLine
When running map(lambda x: x, multiLine) this is clearly more efficient, but in fact it returns a map object instead of a list.
with open(filename, 'rU') as fd:
multiLine = fd.read()
list(map(lambda x: x, multiLine))
Turning the map object into a list will take longer than the unpacking method.

Related

Remove "\n" with strip in Python?

I'm working on a file text, but, as it has spaces at the beginning too, when I try to delete my \n using the strip mode and list comprehension, I get a list with empty elements (" ") and I don't know how to delete them.
I have a text and my code is:
with open(filename) as f:
testo= f.readlines()
[e.strip() for e in testo]
but I get a list like this:
[' ', ' ', 'word1', 'word2', 'word3', ' ']
I wanted to know if I can work it out with the strip method, otherwise with another method.
You can use a generator to read all the lines and strip() the unwanted newlines.
From the generator you only use those elements that are "Truthy" - empty strings are considered False.
Advantage: you create only one list and get rid of empty strings:
Write file:
filename = "t.txt"
with open(filename,"w") as f:
f.write("""
c
oo
l
te
xt
""")
Process file:
with open(filename) as f:
testo = [x for x in (line.strip() for line in f) if x] # f.readlines() not needed. f is
# an iterable in its own right
print(testo) # ['c', 'oo', 'l', 'te', 'xt']
You could do the similarly:
testo = [line.strip() for line in f if line.strip()]
but that would execute strip() twice and would be slightly less efficient.
Output:
['c', 'oo', 'l', 'te', 'xt']
Doku:
strip()
truth value testing
A suggested alternative from Eli Korvigo is:
testo = list(filter(bool, map(str.strip, f)))
with is essentially the same - replacing the explicit list comp using a generator comp with a map of str.strip on f (resulting in a generator) and applying a filter to that to feed it into a list.
See built in function for the docu of filter,map,bool.
I like mine better though ;o)
You are getting those empty string because few of lines were just empty line breaks. Here's the code for weeding out these empty strings.
with open(filename) as f:
testo = [e.strip() for e in f.readlines()]
final_list = list(filter(lambda x: x != '', testo))
print(final_list)
Without lambda and using map:
with open(filename) as f:
final_list = list(filter(bool, map(str.strip, f)))
print(final_list)
Another solution is:
with open(filename) as f:
testo = [x for x in f.read().splitlines() if x]
print(testo)
For second solution is source is:
https://stackoverflow.com/a/15233379/2988776
For performance upgrades refer to #Patrick 's answer
From the data you showed us, it looks like there is a line with just a space in it. With that in mind, you have to decide whether this is something you want or not.
In case you want it, then your code should look something like this:
with open(filename) as f:
testo=f.readlines()
list(filter(None, (l.rstrip('\n') for l in testo)))
In case you don't want lines with just whitespace characters, you can do something like:
with open(filename) as f:
testo=f.readlines()
[e.rstrip('\n') for e in testo if e.strip()]
In this case, we avoid stripping the: " a word with leading and trailing spaces " to "a word with leading and trailing spaces", since in some cases it might change the semantics of the line:)

get an element out of a string python

I am trying to use line.strip() and line.split() to get an element out of a file, but this always gives me a list of string, does line.split() always return a string? how can I just get a list of elements instead of a list of 'elements'?
myfile = open('myfile.txt','r')
for line in myfile:
line_strip = line.strip()
myline = line_strip.split(' ')
print(myline)
So my code gives me ['hello','hi']
I want to get a list out of the file look likes[hello,hi]
[2.856,9.678,6.001] 6 Mary
[8.923,3.125,0.588] 7 Louis
[7.122,9.023,4,421] 16 Ariel
so when I try
list = []
list.append((mylist[0][0],mylist[0][1]))
I actually want a list = [(2.856,9.678),(8.923,3.123),(7.122,9.023)]
but it seems this mylist[0][0] refers to '[' in my file
my_string = 'hello'
my_list = list(my_string) # ['h', 'e', 'l', 'l', 'o']
my_new_string = ''.join(my_list) # 'hello'
I think you are looking for this
>>> print("[{}]".format(", ".join(data)))
[1, 2, 3]
To address your question, though
this always gives me a list of string,
Right. As str.split() should do.
does line.split() always return a string?
Assuming type(line) == str, then no, it returns a list of string elements from the split line.
how can I just get a list of elements instead of a list of 'elements'?
Your "elements" are strings. The ' marks are only Python's repr of a str type.
For example...
print('4') # 4
print(repr('4')) # '4'
line = "1,2,3"
data = line.split(",")
print(data) # ['1', '2', '3']
You can cast to a different data-type as you wish
print([float(x) for x in data]) # [1.0, 2.0, 3.0]
For what you posted, use a regex:
>>> s="[2.856,9.678,6.001] 6 Mary"
>>> import re
>>> [float(e) for e in re.search(r'\[([^\]]+)',s).group(1).split(',')]
[2.856, 9.678, 6.001]
For all the lines you posted (and this would be similar to a file) you might do:
>>> txt="""\
... [2.856,9.678,6.001] 6 Mary
... [8.923,3.125,0.588] 7 Louis
... [7.122,9.023,4,421] 16 Ariel"""
>>> for line in txt.splitlines():
... print [float(e) for e in re.search(r'\[([^\]]+)',line).group(1).split(',')]
...
[2.856, 9.678, 6.001]
[8.923, 3.125, 0.588]
[7.122, 9.023, 4.0, 421.0]
You would need to add error code to that (if the match fails for instance) but this is the core of what you are looking for.
BTW: Don't use list as a variable name. You will overwrite the list function and have confusing errors in the future...
line.split() returns a list of strings.
For example:
my_string = 'hello hi'
my_string.split(' ') is equal to ['hello', 'hi']
To put a list of strings, like ['hello', 'hi] back together, use join.
For example, ' '.join(['hello', 'hi']) is equal to 'hello hi'. The ' ' specifies to put a space between all the elements in the list that you are joining.

Adding each character in a line of input file to list, and adding each list to another list after each line

Basically what I am trying to do is read in each character from each line into a list, and after each line, add that list into another list (one list per line in input file, each list containing all the individual characters of each line)
This is what I have so far but it doesnt seem to be working and I can't figure out why.
allseq = []
with open("input.txt", "r") as ins:
seq = []
for line in ins:
for ch in line:
if ins != "\n":
seq.append(ch)
else:
allseq.append(seq)
seq[:] = []
print(allseq)
Strings in Python can be easily converted into literal lists of characters! Let's make a function.
def get_char_lists(file):
with open(file) as f:
return [list(line.strip()) for line in f.readlines()]
This opens a file for reading, reads all the lines, strips off extraneous whitespace, sticks a list of the characters into a list, and returns that last list.
Even though there is an easier way (#Pierce answer), there are two problems with your original code. The second is important to understand.
allseq = []
with open("input.txt", "r") as ins:
seq = []
for line in ins:
for ch in line:
if ch != "\n": # Use ch instead of ins here.
seq.append(ch)
else:
allseq.append(seq)
seq = [] # Don't clear the existing list, start a new one.
print(allseq)
Test file:
this is
some input
Output:
[['t', 'h', 'i', 's', ' ', 'i', 's'], ['s', 'o', 'm', 'e', ' ', 'i', 'n', 'p', 'u', 't']]
To clarify why the 2nd fix is needed, when you append an object to a list, a reference to the object is placed in the list. So if you later mutate that object the displayed content of the list changes, since it references the same object. seq[:] = [] mutates the original list to be empty.
>>> allseq = []
>>> seq = [1,2,3]
>>> allseq.append(seq)
>>> allseq # allseq contains seq
[[1, 2, 3]]
>>> seq[:] = [] # seq is mutated to be empty
>>> allseq # since allseq has a reference to seq, it changes too.
[[]]
>>> seq.append(1) # change seq again
>>> allseq # allseq's reference to seq displays the same thing.
[[1]]
>>> allseq.append(seq) # Add another reference to the same list
>>> allseq
[[1], [1]]
>>> seq[:]=[] # Clearing the list shows both references cleared.
>>> allseq
[[], []]
You can see that allseq contains the same references to seq with id():
>>> id(seq)
46805256
>>> id(allseq[0])
46805256
>>> id(allseq[1])
46805256
seq = [] Creates a new list with a different ID, instead of mutating the same list.
If you, or anyone else, prefer a one liner, here it is (based on Pierce Darragh's excellent answer):
allseq = [list(line.strip()) for line in open("input.txt").readlines()]

Break string into list of characters in Python [duplicate]

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
Essentially I want to suck a line of text from a file, assign the characters to a list, and create a list of all the separate characters in a list -- a list of lists.
At the moment, I've tried this:
fO = open(filename, 'rU')
fL = fO.readlines()
That's all I've got. I don't quite know how to extract the single characters and assign them to a new list.
The line I get from the file will be something like:
fL = 'FHFF HHXH XXXX HFHX'
I want to turn it into this list, with each single character on its own:
['F', 'H', 'F', 'F', 'H', ...]
You can do this using list:
new_list = list(fL)
Be aware that any spaces in the line will be included in this list, to the best of my knowledge.
I'm a bit late it seems to be, but...
a='hello'
print list(a)
# ['h','e','l','l', 'o']
Strings are iterable (just like a list).
I'm interpreting that you really want something like:
fd = open(filename,'rU')
chars = []
for line in fd:
for c in line:
chars.append(c)
or
fd = open(filename, 'rU')
chars = []
for line in fd:
chars.extend(line)
or
chars = []
with open(filename, 'rU') as fd:
map(chars.extend, fd)
chars would contain all of the characters in the file.
python >= 3.5
Version 3.5 onwards allows the use of PEP 448 - Extended Unpacking Generalizations:
>>> string = 'hello'
>>> [*string]
['h', 'e', 'l', 'l', 'o']
This is a specification of the language syntax, so it is faster than calling list:
>>> from timeit import timeit
>>> timeit("list('hello')")
0.3042821969866054
>>> timeit("[*'hello']")
0.1582647830073256
So to add the string hello to a list as individual characters, try this:
newlist = []
newlist[:0] = 'hello'
print (newlist)
['h','e','l','l','o']
However, it is easier to do this:
splitlist = list(newlist)
print (splitlist)
fO = open(filename, 'rU')
lst = list(fO.read())
Or use a fancy list comprehension, which are supposed to be "computationally more efficient", when working with very very large files/lists
fd = open(filename,'r')
chars = [c for line in fd for c in line if c is not " "]
fd.close()
Btw: The answer that was accepted does not account for the whitespaces...
a='hello world'
map(lambda x:x, a)
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
An easy way is using function “map()”.
In python many things are iterable including files and strings.
Iterating over a filehandler gives you a list of all the lines in that file.
Iterating over a string gives you a list of all the characters in that string.
charsFromFile = []
filePath = r'path\to\your\file.txt' #the r before the string lets us use backslashes
for line in open(filePath):
for char in line:
charsFromFile.append(char)
#apply code on each character here
or if you want a one liner
#the [0] at the end is the line you want to grab.
#the [0] can be removed to grab all lines
[list(a) for a in list(open('test.py'))][0]
.
.
Edit: as agf mentions you can use itertools.chain.from_iterable
His method is better, unless you want the ability to specify which lines to grab
list(itertools.chain.from_iterable(open(filename, 'rU)))
This does however require one to be familiar with itertools, and as a result looses some readablity
If you only want to iterate over the chars, and don't care about storing a list, then I would use the nested for loops. This method is also the most readable.
Because strings are (immutable) sequences they can be unpacked similar to lists:
with open(filename, 'rU') as fd:
multiLine = fd.read()
*lst, = multiLine
When running map(lambda x: x, multiLine) this is clearly more efficient, but in fact it returns a map object instead of a list.
with open(filename, 'rU') as fd:
multiLine = fd.read()
list(map(lambda x: x, multiLine))
Turning the map object into a list will take longer than the unpacking method.

List Comprehension for removing duplicates of characters in a string [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you remove duplicates from a list whilst preserving order?
So the idea is the program takes a string of characters and removes the same
string with any duplicated character only appearing
once -- removing any duplicated copy of a character.
So Iowa stays Iowa but the word eventually would become eventually
Here is an inefficient method:
x = 'eventually'
newx = ''.join([c for i,c in enumerate(x) if c not in x[:i]])
I don't think that there is an efficient way to do it in a list comprehension.
Here it is as an O(n) (average case) generator expression. The others are all roughly O(n2).
chars = set()
string = "aaaaa"
newstring = ''.join(chars.add(char) or char for char in string if char not in chars)
It works because set.add returns None, so the or will always cause the character to be yielded from the generator expression when the character isn't already in the set.
Edit: Also see refaim's solutions. My solution is like his second one, but it uses the set in the opposite way.
My take on his OrderedDict solution:
''.join(OrderedDict((char, None) for char in word))
Without list comprehensions:
from collections import OrderedDict
word = 'eventually'
print ''.join(OrderedDict(zip(word, range(len(word)))).keys())
With list comprehensions (quick and dirty solution):
word = 'eventually'
uniq = set(word)
print ''.join(c for c in word if c in uniq and not uniq.discard(c))
>>> s='eventually'
>>> "".join([c for i,c in enumerate(s) if i==s.find(c)])
'evntualy'
note that using a list comprehension with join() is silly when you can just use a generator expression. You should tell your teacher to update their question
You could make a set from the string, then join it together again. This works since sets can only contain unique values. The order wont be the same though:
In [1]: myString = "mississippi"
In [2]: set(myString))
Out[2]: set(['i', 'm', 'p', 's'])
In [3]: print "".join(set(myString))
Out[3]: ipsm
In [4]: set("iowa")
Out[4]: set(['a', 'i', 'o', 'w'])
In [5]: set("eventually")
Out[5]: set(['a', 'e', 'l', 'n', 't', 'u', 'v', 'y'])
Edit: Just saw the "List Comprehension" in the title so this probably isnt what your looking for.
Create a set from the original string, and then sort by position of character in original string:
>>> s='eventually'
>>> ''.join(sorted(set(s), key=s.index))
'evntualy'
Taken from this question, I think this is the fastest way:
>>> def remove_dupes(str):
... chars = set()
... chars_add = chars.add
... return ''.join(c for c in str if c not in chars and not chars_add(c))
...
>>> remove_dupes('hello')
'helo'
>>> remove_dupes('testing')
'tesing'
word = "eventually"
evntualy = ''.join(
c
for d in [dict(zip(word, word))]
for c in word
if d.pop(c, None) is not None)
Riffing off of agf's (clever) solution but without making a set outside of the generator expression:
evntualy = ''.join(s.add(c) or c for s in [set()] for c in word if c not in s)

Categories

Resources