Related
I have two list of lists looks like:
sentences = [['its', 'a', 'great', 'show'], ['nice', 'movie'], ['good', 'series']]
labels = [['O', 'O', 'O', 'B_A'], ['O', 'B_A'], ['O', 'B_A']]
I want to save these pair of lists column wise in txt file so each element pair should be separated by a white space, and each list pair should be separated by an empty line.
The desired output should be like:
its O
a O
great O
show B_A
nice O
movie B_A
good O
series B_A
I have tried this:
filename = 'data.txt'
with open(filename, 'w') as f:
for sen in sentences:
for lab in labels:
line = sen + ' ' + lab
f.write(line)
I have the following error:
TypeError: can only concatenate list (not "str") to list
Update: Using the first answer, I am trying to define a function which takes two nested lists and the new file name as follows:
def create_txt(ls1, ls2,file_name):
with open(file_name, 'w') as f:
for sen, lab in zip(ls1,ls2):
for i, j in zip(sen, lab):
f.write(f'{i} {j}')
f.write('\n')
return file_name
But it returns the provided file name as a string:
create_txt(sentences, labels,'data_n.txt')
Output: 'data_n.txt'
what is the logical problem I've done here?
Thanks in advance!
you can use csv module for this.
import csv
with open("file.txt", "w", newline="\n") as fp:
writer = csv.writer(fp, delimiter=" ")
for sentence, label in zip(sentences, labels):
for i, j in zip(sentence, label):
writer.writerow([i, j])
fp.write('\n')
without using any additional modules
with open("file.txt", "w") as fp:
for sentence, label in zip(sentences, labels):
for i, j in zip(sentence, label):
fp.write(f'{i} {j}\n')
fp.write('\n')
Another working answer, slightly different, with some explanatory comments:
sentences = [['its', 'a', 'great', 'show'], ['nice', 'movie'], ['good', 'series']]
labels = [['O', 'O', 'O', 'B_A'], ['O', 'B_A'], ['O', 'B_A']]
filename = "data.txt"
outputstring = ""
# Construct the output string with zip.
# First we're zipping the elements of the source lists,
# which gives a sequence of pairs like this:
# (sentences[0], labels[0]), (sentences[1], labels[1]), etc.
# Then we iterate over that sequence and zip up the contents of
# each pair of lists in the same way, and concatenate those strings
# with the outputstring, followed by a single newline character.
# After that, an extra newline is added to break up the groups.
for sentence, label in zip(sentences, labels):
for i, j in zip(sentence, label):
outputstring += i + " " + j + "\n"
outputstring += "\n"
# This removes the extra whitespace at the end.
outputstring = outputstring.rstrip()
# Finally, you can just write the string to your output file.
with open(filename, "w") as f:
f.write(outputstring)
And here is a second example without using zip:
sentences = [['its', 'a', 'great', 'show'], ['nice', 'movie'], ['good', 'series']]
labels = [['O', 'O', 'O', 'B_A'], ['O', 'B_A'], ['O', 'B_A']]
filename = "data.txt"
outputstring = ""
# Check the length of each list of lists and make sure they're the same:
sentenceslen = len(sentences)
labelslen = len(labels)
if sentenceslen != labelslen:
print("Malformed data!")
raise SystemExit
# Iterate over both lists using their lengths to define the range of indices.
for i in range(sentenceslen):
# Check the lengths of each pair of sublists and make sure they're the same:
subsentenceslen = len(sentences[i])
sublabelslen = len(labels[i])
if subsentenceslen != sublabelslen:
print("Malformed data!")
raise SystemExit
# Iterate over each pair of sublists using their lengths to define the range of indices:
for j in range(subsentenceslen):
# Construct the outputstring by using both indices to drill down to the right strings,
# ending with newline:
outputstring += sentences[i][j] + " " + labels[i][j] + "\n"
# Break up groups with newline again:
outputstring += "\n"
# Remove whitespace at end:
outputstring = outputstring.rstrip()
# Write outputstring to file:
with open(filename, "w") as f:
f.write(outputstring)
I do not recommend actually using the code in the second example. It is unnecessarily complicated, but I include it for completeness and to illustrate how the use of the zip function above saves effort. The zip function also does not care if you feed it lists of different length, so your script won't crash if you try that but don't check for it; it'll spit out pairs of values up to the length of the smaller list, and ignore values after that for the larger list.
I am trying to compute the total number of lists and total number of elements of a text file. The text file try1.txt consists of a list of lists structure like this:
[[], ['i', 'am', 'a', 'good', 'boy'], ['i', 'am', 'an', 'engineer']]
import ast
global inputList
inputList = []
path = "C:/Users/hp/Desktop/folder/"
def read_data():
for file in ['try1.txt']:
with open(path + file, 'r', encoding = 'utf-8') as infile:
inputList.extend(ast.literal_eval(*infile.readlines()))
print(len(inputList))
print(sum(len(x) for x in inputList))
read_data()
The output for the above mentioned input list should be: 3 and 9.
I have tried but I am getting error when there is a empty list. Is there any way to solve the issue? If not then I want to display the output by removing the empty lists; in that case the output should be 2 and 9.
If I remove the empty list, then I am getting the output as 2 and 9. But the inclusion of empty list creating problem. The error I am getting:
ValueError: malformed node or string: <_ast.Subscript object at 0x0000020E99CC0088>
This problem was not the empty list! It's the LF at the end of the string.
This code work on python 3.6:
import ast
v = ast.literal_eval("[[],['i', 'am', 'a', 'good', 'boy'],['i', 'am', 'an', 'engineer']]")`
If the error persists on a older version and python upgrade is not a option, just remove the empty list before evaluate the expression:
exp = infile.read()
empty_count = exp.count('[]')
exp = exp.replace('[],','')
inputList.extend(ast.literal_eval(*exp))
print('List Count:%d' % len(inputList)+empty_count)
I have a list of strings and variables. For example:
['oz_', A, 'ab'], where A is a list and I don't want anything to happen to it.
And I want to convert it in:
['o','z','_', A, 'a', 'b']
A is a list, so I don't want anything to change it. How can I do this?
You'll need to iterate over each element and turn it into a list if it's a string, but otherwise leave it as a variable and append it.
source = ['oz_', A, 'ab']
result = []
for name in source:
if isinstance(name, str):
result += name
else:
result.append(name)
Note: Use isinstance(name, basetring) for Python2.x if you want to account for other types of string like unicode.
Updated now that we know A shall not be altered.
A = []
seq = ['oz_', A, 'ab']
res = []
for elt in seq:
if isinstance(elt, str):
for e in list(elt):
res.append(e)
else:
res.append(elt)
print(res)
output:
['o', 'z', '_', [], 'a', 'b']
Obligatory one-liner:
>>> A = []
>>> seq = ['oz_', A, 'ab']
>>> [value for values in seq
... for value in (values if isinstance(values, str)
... else [values])]
['o', 'z', '_', [], 'a', 'b']
For converting a list of strings into a list of character, I see two approaches:
Either use a list comprehension, containing literally each char for each of the strings:
>>> lst = ['oz_', 'A', 'ab']
>>> [char for string in lst for char in string]
['o', 'z', '_', 'A', 'a', 'b']
Or join the strings and turn the result into a list:
>>> list(''.join(lst))
['o', 'z', '_', 'A', 'a', 'b']
If A is meant to be a variable and you want to preserve it, things get more tricky. If A is a string, then that's just not possible, as A will get evaluated and is then indistinguishable from the other strings. If it is something else, then you will have to differentiate between the two types:
>>> joined = []
>>> for x in lst:
... joined += x if isinstance(x, str) else [x] # +x extends, +[x] appends
If the complete elements of the list were strings, You could use itertools.chain.from_iterable() , it takes an iterable (like list/tuple, etc) and then for each iterable element inside it, it creates a new list consisting of the elements of those inner iterables (which in this case are strings). Example -
In [5]: lst = ['oz_', 'A', 'ab']
In [6]: list(chain.from_iterable(lst))
Out[6]: ['o', 'z', '_', 'A', 'a', 'b']
As given in the updated question -
A is a list, so I don't want anything to change it.
You can do this (similar to what #SuperBiasedMan is suggesting) (For Python 3.x) -
In [14]: lst = ['oz_', 'A', 'ab',[1,2,3]]
In [15]: ret = []
In [18]: for i in lst:
....: if isinstance(i, str):
....: ret.extend(i)
....: else:
....: ret.append(i)
....:
In [19]: ret
Out[19]: ['o', 'z', '_', 'A', 'a', 'b', [1, 2, 3]]
You can use basestring in Python 2.x to account for both unicode as well as normal strings.
Please also note, the above method does not check whether a particular object in the list came from variable or not, it just breaks strings up into characters and for all other types it keeps it as it is.
>>> [a for a in ''.join(['oz_', 'A', 'ab'])]
['o', 'z', '_', 'A', 'a', 'b']
You can use chain.from_iterable either way, you just need to wrap your non strings in a list:
from itertools import chain
out = list(chain.from_iterable([sub] if not isinstance(sub, str) else sub for sub in l))
I am doing a python project for my Intro to CSC class. We are given a .txt file that is basically 200,000 lines of single words. We have to read in the file line by line, and count how many times each letter in the alphabet appears as the first letter of a word. I have the count figured out and stored in a list. But now I need to print it in the format
"a:10,898 b:9,950 c:17,045 d:10,596 e:8,735
f:11,257 .... "
Another aspect is that it has to print 5 of the letter counts per line, as I did above.
This is what I am working with so far...
def main():
file_name = open('dictionary.txt', 'r').readlines()
counter = 0
totals = [0]*26
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in file_name:
for n in range(0,26):
if i.startswith(alphabet[n]):
totals[n] = totals[n]+1
print(totals)
main()
This code currently outputs
[10898, 9950, 17045, 10675, 7421, 7138, 5998, 6619, 6619, 7128, 1505, 1948, 5393, 10264, 4688, 6079, 15418, 890, 10790, 20542, 9463, 5615, 2924, 3911, 142, 658]
I would highly recommend using a dictionary to store the counts. It will greatly simplify your code, and make it much faster. I'll leave that as an exercise for you since this is clearly homework. (other hint: Counter is even better). In addition, right now your code is only correct for lowercase letters, not uppercase ones. You need to add additional logic to either treat uppercase letters as lowercase ones, or treat them independently. Right now you just ignore them.
Having said that, the following will get it done for your current format:
print(', '.join('{}:{}'.format(letter, count) for letter, count in zip(alphabet, total)))
zip takes n lists and generates a new list of tuples with n elements, with each element coming from one of the input lists. join concatenates a list of strings together using the supplied separator. And format does string interpolation to fill in values in a string with the provided ones using format specifiers.
python 3.4
the solution is to read the line of the file into words variable below in cycle and use Counter
from collections import Counter
import string
words = 'this is a test of functionality'
result = Counter(map(lambda x: x[0], words.split(' ')))
words = 'and this is also very cool'
result = result + Counter(map(lambda x: x[0], words.split(' ')))
counters = ['{letter}:{value}'.format(letter=x, value=result.get(x, 0)) for x in string.ascii_lowercase]
if you print counters:
['a:3', 'b:0', 'c:1', 'd:0', 'e:0', 'f:1', 'g:0', 'h:0', 'i:2', 'j:0', 'k:0', 'l:0', 'm:0', 'n:0', 'o:1', 'p:0', 'q:0', 'r:0', 's:0', 't:3', 'u:0', 'v:1', 'w:0', 'x:0', 'y:0', 'z:0']
I have a file which contains words separated by commas like:
tom,harry,ant,qqqq
aa,ww,rr,gg,aa,hh,ss
I would like to split each element separated by a comma and fill a list like this:
array=['tom','harry','ant','qqqq','aa','ww','rr','gg','aa','hh','ss']
So far I tried with:
array=list()
for i in open(filename):
element = i.split(',',len(i))
array.append(element)
When I print I obtain two problems:
for i in array
print i
I obtain ['tom','harry','ant','qqqq\n'] and ['qqqq','aa','ww','rr','gg','aa','hh','ss\n']
I would like to avoid the \n and to have a unique list as said before
with open('myFile.txt', 'r') as myFile:
array = myFile.read().replace('\n', ',').split(',')
for i in array:
print i
One liner:
with open('myFile.txt', 'r') as myFile: array = myFile.read().replace('\n', ',').split(',')
You should also avoid using names like array, list etc when assigning values. It's bad practice.
If you have any other questions send me a pm!
You can strip the line first to avoid the \n, and use extend instead of append:
for i in open(filename):
line = i.strip()
element = line.split(',')
array.extend(element)
Extend is used to add the elements to your array, instead of adding the array itself. The result would be:
['tom','harry','ant','qqqq','aa','ww','rr','gg','aa','hh','ss']
Instead of:
[['tom','harry','ant','qqqq'], ['aa','ww','rr','gg','aa','hh','ss']]
Since it looks like a comma separated file, i recommend you to use CSV module.
import csv
with open('file') as f:
csv_file = csv.reader(f)
L = []
for i in csv_file:
L.append(i)
print [i for j in L for i in j]
Output:
['tom', 'harry', 'ant', 'qqqq', 'aa', 'ww', 'rr', 'gg', 'aa', 'hh', 'ss']
Iterating file yield lines with newline. Strip the newline explicitly:
By Replacing following line:
element = i.split(',',len(i))
with:
element = i.rstrip().split(',',len(i)) # Remove trailing space charcters.
or
element = i.rstrip('\r\n').split(',',len(i)) # Remove CR / LF.
You could use a regex:
>>> import re
>>> re.split(r"[,\n]+", open(filename).read())
['tom', 'harry', 'ant', 'qqqq', 'aa', 'ww', 'rr', 'gg', 'aa', 'hh', 'ss']