Fill a list from a file - python

I have a file which contains words separated by commas like:
tom,harry,ant,qqqq
aa,ww,rr,gg,aa,hh,ss
I would like to split each element separated by a comma and fill a list like this:
array=['tom','harry','ant','qqqq','aa','ww','rr','gg','aa','hh','ss']
So far I tried with:
array=list()
for i in open(filename):
element = i.split(',',len(i))
array.append(element)
When I print I obtain two problems:
for i in array
print i
I obtain ['tom','harry','ant','qqqq\n'] and ['qqqq','aa','ww','rr','gg','aa','hh','ss\n']
I would like to avoid the \n and to have a unique list as said before

with open('myFile.txt', 'r') as myFile:
array = myFile.read().replace('\n', ',').split(',')
for i in array:
print i
One liner:
with open('myFile.txt', 'r') as myFile: array = myFile.read().replace('\n', ',').split(',')
You should also avoid using names like array, list etc when assigning values. It's bad practice.
If you have any other questions send me a pm!

You can strip the line first to avoid the \n, and use extend instead of append:
for i in open(filename):
line = i.strip()
element = line.split(',')
array.extend(element)
Extend is used to add the elements to your array, instead of adding the array itself. The result would be:
['tom','harry','ant','qqqq','aa','ww','rr','gg','aa','hh','ss']
Instead of:
[['tom','harry','ant','qqqq'], ['aa','ww','rr','gg','aa','hh','ss']]

Since it looks like a comma separated file, i recommend you to use CSV module.
import csv
with open('file') as f:
csv_file = csv.reader(f)
L = []
for i in csv_file:
L.append(i)
print [i for j in L for i in j]
Output:
['tom', 'harry', 'ant', 'qqqq', 'aa', 'ww', 'rr', 'gg', 'aa', 'hh', 'ss']

Iterating file yield lines with newline. Strip the newline explicitly:
By Replacing following line:
element = i.split(',',len(i))
with:
element = i.rstrip().split(',',len(i)) # Remove trailing space charcters.
or
element = i.rstrip('\r\n').split(',',len(i)) # Remove CR / LF.

You could use a regex:
>>> import re
>>> re.split(r"[,\n]+", open(filename).read())
['tom', 'harry', 'ant', 'qqqq', 'aa', 'ww', 'rr', 'gg', 'aa', 'hh', 'ss']

Related

How two write two nested lists column wise in .txt file?

I have two list of lists looks like:
sentences = [['its', 'a', 'great', 'show'], ['nice', 'movie'], ['good', 'series']]
labels = [['O', 'O', 'O', 'B_A'], ['O', 'B_A'], ['O', 'B_A']]
I want to save these pair of lists column wise in txt file so each element pair should be separated by a white space, and each list pair should be separated by an empty line.
The desired output should be like:
its O
a O
great O
show B_A
nice O
movie B_A
good O
series B_A
I have tried this:
filename = 'data.txt'
with open(filename, 'w') as f:
for sen in sentences:
for lab in labels:
line = sen + ' ' + lab
f.write(line)
I have the following error:
TypeError: can only concatenate list (not "str") to list
Update: Using the first answer, I am trying to define a function which takes two nested lists and the new file name as follows:
def create_txt(ls1, ls2,file_name):
with open(file_name, 'w') as f:
for sen, lab in zip(ls1,ls2):
for i, j in zip(sen, lab):
f.write(f'{i} {j}')
f.write('\n')
return file_name
But it returns the provided file name as a string:
create_txt(sentences, labels,'data_n.txt')
Output: 'data_n.txt'
what is the logical problem I've done here?
Thanks in advance!
you can use csv module for this.
import csv
with open("file.txt", "w", newline="\n") as fp:
writer = csv.writer(fp, delimiter=" ")
for sentence, label in zip(sentences, labels):
for i, j in zip(sentence, label):
writer.writerow([i, j])
fp.write('\n')
without using any additional modules
with open("file.txt", "w") as fp:
for sentence, label in zip(sentences, labels):
for i, j in zip(sentence, label):
fp.write(f'{i} {j}\n')
fp.write('\n')
Another working answer, slightly different, with some explanatory comments:
sentences = [['its', 'a', 'great', 'show'], ['nice', 'movie'], ['good', 'series']]
labels = [['O', 'O', 'O', 'B_A'], ['O', 'B_A'], ['O', 'B_A']]
filename = "data.txt"
outputstring = ""
# Construct the output string with zip.
# First we're zipping the elements of the source lists,
# which gives a sequence of pairs like this:
# (sentences[0], labels[0]), (sentences[1], labels[1]), etc.
# Then we iterate over that sequence and zip up the contents of
# each pair of lists in the same way, and concatenate those strings
# with the outputstring, followed by a single newline character.
# After that, an extra newline is added to break up the groups.
for sentence, label in zip(sentences, labels):
for i, j in zip(sentence, label):
outputstring += i + " " + j + "\n"
outputstring += "\n"
# This removes the extra whitespace at the end.
outputstring = outputstring.rstrip()
# Finally, you can just write the string to your output file.
with open(filename, "w") as f:
f.write(outputstring)
And here is a second example without using zip:
sentences = [['its', 'a', 'great', 'show'], ['nice', 'movie'], ['good', 'series']]
labels = [['O', 'O', 'O', 'B_A'], ['O', 'B_A'], ['O', 'B_A']]
filename = "data.txt"
outputstring = ""
# Check the length of each list of lists and make sure they're the same:
sentenceslen = len(sentences)
labelslen = len(labels)
if sentenceslen != labelslen:
print("Malformed data!")
raise SystemExit
# Iterate over both lists using their lengths to define the range of indices.
for i in range(sentenceslen):
# Check the lengths of each pair of sublists and make sure they're the same:
subsentenceslen = len(sentences[i])
sublabelslen = len(labels[i])
if subsentenceslen != sublabelslen:
print("Malformed data!")
raise SystemExit
# Iterate over each pair of sublists using their lengths to define the range of indices:
for j in range(subsentenceslen):
# Construct the outputstring by using both indices to drill down to the right strings,
# ending with newline:
outputstring += sentences[i][j] + " " + labels[i][j] + "\n"
# Break up groups with newline again:
outputstring += "\n"
# Remove whitespace at end:
outputstring = outputstring.rstrip()
# Write outputstring to file:
with open(filename, "w") as f:
f.write(outputstring)
I do not recommend actually using the code in the second example. It is unnecessarily complicated, but I include it for completeness and to illustrate how the use of the zip function above saves effort. The zip function also does not care if you feed it lists of different length, so your script won't crash if you try that but don't check for it; it'll spit out pairs of values up to the length of the smaller list, and ignore values after that for the larger list.

deleting '\n' from a subprocess.Popen result in python [duplicate]

I have to take a large list of words in the form:
['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
and then using the strip function, turn it into:
['this', 'is', 'a', 'list', 'of', 'words']
I thought that what I had written would work, but I keep getting an error saying:
"'list' object has no attribute 'strip'"
Here is the code that I tried:
strip_list = []
for lengths in range(1,20):
strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
strip_list.append(lines[a].strip())
You can either use a list comprehension
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
stripped = [s.strip() for s in my_list]
or alternatively use map():
stripped = list(map(str.strip, my_list))
In Python 2, map() directly returned a list, so you didn't need the call to list. In Python 3, the list comprehension is more concise and generally considered more idiomatic.
list comprehension?
[x.strip() for x in lst]
You can use lists comprehensions:
strip_list = [item.strip() for item in lines]
Or the map function:
# with a lambda
strip_list = map(lambda it: it.strip(), lines)
# without a lambda
strip_list = map(str.strip, lines)
This can be done using list comprehensions as defined in PEP 202
[w.strip() for w in ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]
All other answers, and mainly about list comprehension, are great. But just to explain your error:
strip_list = []
for lengths in range(1,20):
strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
strip_list.append(lines[a].strip())
a is a member of your list, not an index. What you could write is this:
[...]
for a in lines:
strip_list.append(a.strip())
Another important comment: you can create an empty list this way:
strip_list = [0] * 20
But this is not so useful, as .append appends stuff to your list. In your case, it's not useful to create a list with defaut values, as you'll build it item per item when appending stripped strings.
So your code should be like:
strip_list = []
for a in lines:
strip_list.append(a.strip())
But, for sure, the best one is this one, as this is exactly the same thing:
stripped = [line.strip() for line in lines]
In case you have something more complicated than just a .strip, put this in a function, and do the same. That's the most readable way to work with lists.
If you need to remove just trailing whitespace, you could use str.rstrip(), which should be slightly more efficient than str.strip():
>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])
Output:
['this', 'is', 'a', 'list', 'of', 'words']

Python: loading txt file into 2d list of different types

I have a 2d list saved in a text file that looks like this (showing the first 2 entries):
('9b7dad', "text", 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5)
('2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6)
How should this be loaded into a list? (so for example list[0][0] = '9b7dad', list[1][1] = 'text2' etc)
You could try this:
f = open(<your file path>)
result = [
[g.replace("'", "")
for g in l.strip('()\n').replace(' ', '').replace('"', '').split(',')]
for l in f.readlines()]
f.close()
Given a text file with each line in the form you've shown:
('9b7dad', "text", 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5)
('2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6)
You can use Pandas which offers a more straightforward way to handle/manipulate different data types.
Import pandas and read in the file, here called 'stack.txt':
import pandas as pd
data = pd.read_csv('stack.txt', sep=",", header=None)
Returns only the list of list:
alist = data.values.tolist()
Print to check:
print(alist)
[['9b7dad', 'text', 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5],
['2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6]]
If need to process columns:
for i in range(len(data.columns)):
if i == 0:
data[i] = data[i].map(lambda x: str(x)[1:])
data[i] = data[i].map(lambda x: str(x)[1:-1])
if i == 5:
data[i] = data[i].map(lambda x: str(x)[:-1])
data[i] = data[i].astype(int)
if 0 < i < 5:
data[i] = data[i].map(lambda x: str(x)[2:-1])
#!/usr/bin/env python
import sys
myList = []
for line in sys.stdin:
elems = line.strip('()\n').replace(' ', '').split(',')
elems = [x.strip('\'\"') for x in elems]
myList.append(elems)
print(myList[0][0])
print(myList[1][1])
To use:
$ python ./load.py < someText.txt
9b7dad
text2
Use int(), float(), or str() to coerce fields in elems to certain types, as needed. Use a try..except block to catch malformed input.
import ast
with open(file_name) as f:
content = f.readlines()
content = [list(ast.literal_eval(x)) for x in content]
How to read files:
In Python, how do I read a file line-by-line into a list?
More about eval:
Convert string representation of list to list
try this (converting tuple to list):
my_list = []
my_list.append(list('9b7dad', "text", 'http://imgur.com/gallery/SPdGm27', '1', 'A', 5))
my_list.append(list('2b6ebj', 'text2', 'https://i.redd.it/lzft358csdi21.jpg', '1', 'B', 6))
result is a list of lists i.e. a 2 dimensional list. You can easily modify the code to fetch a line at a time in a for loop and append it to the list. Consider using split(',') if it is a comma separated list instead of a tuple e.g.
mylist = []
with open(filename, 'r') as my_file:
for text in my_file.readlines()
my_list.append(text.split(','))

How to split a section in a list?

Okay so I have a list called names and there are other words in the list but this is the result of names[0]
Chen,David,M,334791530,,11Z,,16770712,,,,,,00015956753,
Chen,Peter,M,321564726,,11B,,19979810,,,,,,00012446698,
Chung,Rowan,M,32355988,,11T,,17890708,,,,,,00012127821,
Chung,Kyle,M,387638355,,10U,,19970317,,,,,,00015604870,
Fan,Mark,M,34217543,,10U,,19707713,,,,,,00015799079,
How do I split names[0] so that it comes out with just the last name, first name, and gender?
Here's the rest of my code:
file = open('CASS.txt', 'r')
f = file.readlines()
file.close()
for line in f:
if line.find('ICS3M105')>=0:
names = line.split()
for name in names[0]:
if name in range(0,1):
print(names)
for line in f:
names = line.split()
print names[0].split(',')[0:3]
with open('CASS.txt', 'r') as f:
for line in f:
name_last, name_first, gender = line.split(',')[0:3]
Or using the csv module which will may be more reliable for upcoming tasks
import csv
with open('CASS.txt', 'r') as f:
for row in csv.reader(f):
name_last, name_first, gender = row[0:3]
>>> s = """Chen,David,M,334791530,,11Z,,16770712,,,,,,00015956753,
Chen,Peter,M,321564726,,11B,,19979810,,,,,,00012446698,
Chung,Rowan,M,32355988,,11T,,17890708,,,,,,00012127821,
Chung,Kyle,M,387638355,,10U,,19970317,,,,,,00015604870,
Fan,Mark,M,34217543,,10U,,19707713,,,,,,00015799079,"""
You can use a list comprehension to split on commas, then use slicing to index element [0] to [2] (inclusive) of the split operation.
>>> [i.split(',')[:3] for i in s.split('\n')]
[['Chen', 'David', 'M'],
['Chen', 'Peter', 'M'],
['Chung', 'Rowan', 'M'],
['Chung', 'Kyle', 'M'],
['Fan', 'Mark', 'M']]

Removing empty elements from an array in Python

with open("text.txt", 'r') as file:
for line in file:
line = line.rstrip('\n' + '').split(':')
print(line)
I am having trouble trying to remove empty lists in the series of arrays that are being generated. I want to make every line an array in text.txt, so I would have the ability to accurately access each element individually, of each line.
The empty lists display themselves as [''] - as you can see by the fourth line, I've tried to explicitly strip them out. The empty elements were once filled with new line characters, these were successfully removed using .rstrip('\n').
Edit:
I have had a misconception with some terminology, the above is now updated. Essentially, I want to get rid of empty lists.
Since I can't see your exact line, its hard to give you a solution that matches your requirements perfectly, but if you want to get all the elements in a list that are not empty strings, then you can do this:
>>> l = ["ch", '', '', 'e', '', 'e', 'se']
>>> [var for var in l if var]
Out[4]: ['ch', 'e', 'e', 'se']
You may also use filter with None or bool:
>>> filter(None, l)
Out[5]: ['ch', 'e', 'e', 'se']
>>> filter(bool, l)
Out[6]: ['ch', 'e', 'e', 'se']
If you wish to get rid of lists with empty strings, then for your specific example you can do this:
with open("text.txt", 'r') as file:
for line in file:
line = line.rstrip('\n' + '').split(':')
# If line is just empty
if line != ['']:
print line

Categories

Resources