I'm working on a file text, but, as it has spaces at the beginning too, when I try to delete my \n using the strip mode and list comprehension, I get a list with empty elements (" ") and I don't know how to delete them.
I have a text and my code is:
with open(filename) as f:
testo= f.readlines()
[e.strip() for e in testo]
but I get a list like this:
[' ', ' ', 'word1', 'word2', 'word3', ' ']
I wanted to know if I can work it out with the strip method, otherwise with another method.
You can use a generator to read all the lines and strip() the unwanted newlines.
From the generator you only use those elements that are "Truthy" - empty strings are considered False.
Advantage: you create only one list and get rid of empty strings:
Write file:
filename = "t.txt"
with open(filename,"w") as f:
f.write("""
c
oo
l
te
xt
""")
Process file:
with open(filename) as f:
testo = [x for x in (line.strip() for line in f) if x] # f.readlines() not needed. f is
# an iterable in its own right
print(testo) # ['c', 'oo', 'l', 'te', 'xt']
You could do the similarly:
testo = [line.strip() for line in f if line.strip()]
but that would execute strip() twice and would be slightly less efficient.
Output:
['c', 'oo', 'l', 'te', 'xt']
Doku:
strip()
truth value testing
A suggested alternative from Eli Korvigo is:
testo = list(filter(bool, map(str.strip, f)))
with is essentially the same - replacing the explicit list comp using a generator comp with a map of str.strip on f (resulting in a generator) and applying a filter to that to feed it into a list.
See built in function for the docu of filter,map,bool.
I like mine better though ;o)
You are getting those empty string because few of lines were just empty line breaks. Here's the code for weeding out these empty strings.
with open(filename) as f:
testo = [e.strip() for e in f.readlines()]
final_list = list(filter(lambda x: x != '', testo))
print(final_list)
Without lambda and using map:
with open(filename) as f:
final_list = list(filter(bool, map(str.strip, f)))
print(final_list)
Another solution is:
with open(filename) as f:
testo = [x for x in f.read().splitlines() if x]
print(testo)
For second solution is source is:
https://stackoverflow.com/a/15233379/2988776
For performance upgrades refer to #Patrick 's answer
From the data you showed us, it looks like there is a line with just a space in it. With that in mind, you have to decide whether this is something you want or not.
In case you want it, then your code should look something like this:
with open(filename) as f:
testo=f.readlines()
list(filter(None, (l.rstrip('\n') for l in testo)))
In case you don't want lines with just whitespace characters, you can do something like:
with open(filename) as f:
testo=f.readlines()
[e.rstrip('\n') for e in testo if e.strip()]
In this case, we avoid stripping the: " a word with leading and trailing spaces " to "a word with leading and trailing spaces", since in some cases it might change the semantics of the line:)
Related
I have text file in alphabetic order looking like this in Python:
At 210.001 \n Au 196.9665 \n B 10.81 \n Ba 137.34 \n
How do I make each line a list? To make it a list, the space between the letters and the numbers need to be ",", and how do I do that as well?
You can use the following code:
with open('list.txt', 'r') as myfile:
data=myfile.read()
print([i.strip().split() for i in data.split(' \\n') if len(i.strip())>0])
output:
[['At', '210.001'], ['Au', '196.9665'], ['B', '10.81'], ['Ba', '137.34']]
If you want to convert the 2nd element into a float then change the code into:
def floatify_list_of_lists(nested_list):
def floatify(s):
try:
return float(s)
except ValueError:
return s
def floatify_list(lst):
return [floatify(s) for s in lst]
return [floatify_list(lst) for lst in nested_list]
with open('list.txt', 'r') as myfile:
data = myfile.read()
print(floatify_list_of_lists([i.strip().split() for i in data.split(' \\n') if len(i.strip())>0]))
output:
[['At', 210.001], ['Au', 196.9665], ['B', 10.81], ['Ba', 137.34]]
If you do really need to have one string in all the nested lines then use:
with open('list.txt', 'r') as myfile:
data=myfile.read()
print([[i.strip().replace(' ',',')] for i in data.split(' \\n') if len(i.strip())>0])
output:
[['At,210.001'], ['Au,196.9665'], ['B,10.81'], ['Ba,137.34']]
Using replace() to replace the space with ,:
list.txt:
At 210.001
Au 196.9665
B 10.81
Ba 137.34
Hence:
logFile = "list.txt"
with open(logFile) as f:
content = f.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
print([line.replace(" ", ",")]) # for each line, replace the space with ,
OUTPUT:
['At,210.001']
['Au,196.9665']
['B,10.81']
['Ba,137.34']
I want to find all the "phrases" in a list in remove them from the list, so that I have only words (without spaces) left. I'm making a hangman type game and want the computer to choose a random word. I'm new to Python and coding, so I'm happy to hear other suggestions for my code as well.
import random
fhand = open('common_words.txt')
words = []
for line in fhand:
line = line.strip()
words.append(line)
for word in words:
if ' ' in word:
words.remove(word)
print(words)
Sets are more efficient than lists. When lazily constructed like here, you can gain significant performance boost.
# Load all words
words = {}
with open('common_words.txt') as file:
for line in file.readlines():
line = line.strip()
if " " not in line:
words.add(line)
# Can be converted to one-liner using magic of Python
words = set(filter(lambda x: " " in x, map(str.strip, open('common_words.txt').readlines())))
# Get random word
import random
print(random.choice(words))
Use str.split(). It separates by both spaces and newlines by default.
>>> 'some words\nsome more'.split()
['some', 'words', 'some', 'more']
>>> 'this is a sentence.'.split()
['this', 'is', 'a', 'sentence.']
>>> 'dfsonf 43 SDFd fe#2'.split()
['dfsonf', '43', 'SDFd', 'fe#2']
Read the file normally and make a list this way:
words = []
with open('filename.txt','r') as file:
words = file.read().split()
That should be good.
with open( 'common_words.txt', 'r' ) as f:
words = [ word for word in filter( lambda x: len( x ) > 0 and ' ' not in x, map( lambda x: x.strip(), f.readlines() ) ) ]
with is used because file objects are content managers. The strange list-like syntax is a list comprehension, so it builds a list from the statements inside of the brackets. map is a function with takes in an iterable, applying a provided function to each item in the iterable, placing each transformed result into a new list*. filter is function which takes in an iterable, testing each item against the provided predicate, placing each item which evaluated to True into a new list*. lambda is used to define a function (with a specific signature) in-line.
*: The actual return types are generators, which function like iterators so they can still be used with for loops.
I am not sure if I understand you correctly, but I guess the split() method is something for you, like:
with open('common_words.txt') as f:
words = [line.split() for line in f]
words = [word for words in words_nested for word in words] # flatten nested list
As mentioned, the
.split()
Method could be a solution.
Also, the NLTK module might be useful for future language processing tasks.
Hope this helps!
This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
Essentially I want to suck a line of text from a file, assign the characters to a list, and create a list of all the separate characters in a list -- a list of lists.
At the moment, I've tried this:
fO = open(filename, 'rU')
fL = fO.readlines()
That's all I've got. I don't quite know how to extract the single characters and assign them to a new list.
The line I get from the file will be something like:
fL = 'FHFF HHXH XXXX HFHX'
I want to turn it into this list, with each single character on its own:
['F', 'H', 'F', 'F', 'H', ...]
You can do this using list:
new_list = list(fL)
Be aware that any spaces in the line will be included in this list, to the best of my knowledge.
I'm a bit late it seems to be, but...
a='hello'
print list(a)
# ['h','e','l','l', 'o']
Strings are iterable (just like a list).
I'm interpreting that you really want something like:
fd = open(filename,'rU')
chars = []
for line in fd:
for c in line:
chars.append(c)
or
fd = open(filename, 'rU')
chars = []
for line in fd:
chars.extend(line)
or
chars = []
with open(filename, 'rU') as fd:
map(chars.extend, fd)
chars would contain all of the characters in the file.
python >= 3.5
Version 3.5 onwards allows the use of PEP 448 - Extended Unpacking Generalizations:
>>> string = 'hello'
>>> [*string]
['h', 'e', 'l', 'l', 'o']
This is a specification of the language syntax, so it is faster than calling list:
>>> from timeit import timeit
>>> timeit("list('hello')")
0.3042821969866054
>>> timeit("[*'hello']")
0.1582647830073256
So to add the string hello to a list as individual characters, try this:
newlist = []
newlist[:0] = 'hello'
print (newlist)
['h','e','l','l','o']
However, it is easier to do this:
splitlist = list(newlist)
print (splitlist)
fO = open(filename, 'rU')
lst = list(fO.read())
Or use a fancy list comprehension, which are supposed to be "computationally more efficient", when working with very very large files/lists
fd = open(filename,'r')
chars = [c for line in fd for c in line if c is not " "]
fd.close()
Btw: The answer that was accepted does not account for the whitespaces...
a='hello world'
map(lambda x:x, a)
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
An easy way is using function “map()”.
In python many things are iterable including files and strings.
Iterating over a filehandler gives you a list of all the lines in that file.
Iterating over a string gives you a list of all the characters in that string.
charsFromFile = []
filePath = r'path\to\your\file.txt' #the r before the string lets us use backslashes
for line in open(filePath):
for char in line:
charsFromFile.append(char)
#apply code on each character here
or if you want a one liner
#the [0] at the end is the line you want to grab.
#the [0] can be removed to grab all lines
[list(a) for a in list(open('test.py'))][0]
.
.
Edit: as agf mentions you can use itertools.chain.from_iterable
His method is better, unless you want the ability to specify which lines to grab
list(itertools.chain.from_iterable(open(filename, 'rU)))
This does however require one to be familiar with itertools, and as a result looses some readablity
If you only want to iterate over the chars, and don't care about storing a list, then I would use the nested for loops. This method is also the most readable.
Because strings are (immutable) sequences they can be unpacked similar to lists:
with open(filename, 'rU') as fd:
multiLine = fd.read()
*lst, = multiLine
When running map(lambda x: x, multiLine) this is clearly more efficient, but in fact it returns a map object instead of a list.
with open(filename, 'rU') as fd:
multiLine = fd.read()
list(map(lambda x: x, multiLine))
Turning the map object into a list will take longer than the unpacking method.
Is it possible to write into file string without quotes and spaces (spaces for any type in list)?
For example I have such list:
['blabla', 10, 'something']
How can I write into file so line in a file would become like:
blabla,10,something
Now every time I write it into file I get this:
'blabla', 10, 'something'
So then I need to replace ' and ' ' with empty symbol. Maybe there is some trick, so I shouldn't need to replace it all the time?
This will work:
lst = ['blabla', 10, 'something']
# Open the file with a context manager
with open("/path/to/file", "a+") as myfile:
# Convert all of the items in lst to strings (for str.join)
lst = map(str, lst)
# Join the items together with commas
line = ",".join(lst)
# Write to the file
myfile.write(line)
Output in file:
blabla,10,something
Note however that the above code can be simplified:
lst = ['blabla', 10, 'something']
with open("/path/to/file", "a+") as myfile:
myfile.write(",".join(map(str, lst)))
Also, you may want to add a newline to the end of the line you write to the file:
myfile.write(",".join(map(str, lst))+"\n")
This will cause each subsequent write to the file to be placed on its own line.
Did you try something like that ?
yourlist = ['blabla', 10, 'something']
open('yourfile', 'a+').write(', '.join([str(i) for i in yourlist]) + '\n')
Where
', '.join(...) take a list of strings and glue it with a string (', ')
and
[str(i) for i in yourList] converts your list into a list of string (in order to handle numbers)
Initialise an empty string j
for all item the the list,concatenate to j which create no space in for loop,
printing str(j) will remove the Quotes
j=''
for item in list:
j = j + str(item)
print str(j)
This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
Essentially I want to suck a line of text from a file, assign the characters to a list, and create a list of all the separate characters in a list -- a list of lists.
At the moment, I've tried this:
fO = open(filename, 'rU')
fL = fO.readlines()
That's all I've got. I don't quite know how to extract the single characters and assign them to a new list.
The line I get from the file will be something like:
fL = 'FHFF HHXH XXXX HFHX'
I want to turn it into this list, with each single character on its own:
['F', 'H', 'F', 'F', 'H', ...]
You can do this using list:
new_list = list(fL)
Be aware that any spaces in the line will be included in this list, to the best of my knowledge.
I'm a bit late it seems to be, but...
a='hello'
print list(a)
# ['h','e','l','l', 'o']
Strings are iterable (just like a list).
I'm interpreting that you really want something like:
fd = open(filename,'rU')
chars = []
for line in fd:
for c in line:
chars.append(c)
or
fd = open(filename, 'rU')
chars = []
for line in fd:
chars.extend(line)
or
chars = []
with open(filename, 'rU') as fd:
map(chars.extend, fd)
chars would contain all of the characters in the file.
python >= 3.5
Version 3.5 onwards allows the use of PEP 448 - Extended Unpacking Generalizations:
>>> string = 'hello'
>>> [*string]
['h', 'e', 'l', 'l', 'o']
This is a specification of the language syntax, so it is faster than calling list:
>>> from timeit import timeit
>>> timeit("list('hello')")
0.3042821969866054
>>> timeit("[*'hello']")
0.1582647830073256
So to add the string hello to a list as individual characters, try this:
newlist = []
newlist[:0] = 'hello'
print (newlist)
['h','e','l','l','o']
However, it is easier to do this:
splitlist = list(newlist)
print (splitlist)
fO = open(filename, 'rU')
lst = list(fO.read())
Or use a fancy list comprehension, which are supposed to be "computationally more efficient", when working with very very large files/lists
fd = open(filename,'r')
chars = [c for line in fd for c in line if c is not " "]
fd.close()
Btw: The answer that was accepted does not account for the whitespaces...
a='hello world'
map(lambda x:x, a)
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
An easy way is using function “map()”.
In python many things are iterable including files and strings.
Iterating over a filehandler gives you a list of all the lines in that file.
Iterating over a string gives you a list of all the characters in that string.
charsFromFile = []
filePath = r'path\to\your\file.txt' #the r before the string lets us use backslashes
for line in open(filePath):
for char in line:
charsFromFile.append(char)
#apply code on each character here
or if you want a one liner
#the [0] at the end is the line you want to grab.
#the [0] can be removed to grab all lines
[list(a) for a in list(open('test.py'))][0]
.
.
Edit: as agf mentions you can use itertools.chain.from_iterable
His method is better, unless you want the ability to specify which lines to grab
list(itertools.chain.from_iterable(open(filename, 'rU)))
This does however require one to be familiar with itertools, and as a result looses some readablity
If you only want to iterate over the chars, and don't care about storing a list, then I would use the nested for loops. This method is also the most readable.
Because strings are (immutable) sequences they can be unpacked similar to lists:
with open(filename, 'rU') as fd:
multiLine = fd.read()
*lst, = multiLine
When running map(lambda x: x, multiLine) this is clearly more efficient, but in fact it returns a map object instead of a list.
with open(filename, 'rU') as fd:
multiLine = fd.read()
list(map(lambda x: x, multiLine))
Turning the map object into a list will take longer than the unpacking method.