Splitting up a text file into sets of n characters - python

So I have a long text file with a bunch of numbers and I want to reformat this file so that every 12 characters are on their own line, the file is 4392 characters long. My strategy was to add the contents of the infile to a list and slice and append the first 12 characters to a new list then write it to an outfile using a while loop for the list slicing parameters. I am getting an error on out.writelines(l) :
TypeError: writelines() argument must be a sequence of strings.
Here is my code:
l = []
outl=[]
with open('r6.txt', 'r') as f, \
open('out.txt', 'w') as out:
outl.append(f)
a = 0
b = 11
while b <= 4392:
l.append(outl[a:b])
l.append('/n')
out.writelines(l)
a+=12
b+=12
l=[]

Well you're appending the file object to the list, and then you're taking slices of the list and writing them. Perhaps you forgot the file object reference among the strings.
Just use a print outl to get your answer. If you've got a file object among the items in the list, then you know :)
OR better yet:
l = []
outl=[]
with open('r6.txt', 'r') as f, \
open('out.txt', 'w') as out:
outl.extend(f.readlines())
a = 0
b = 11
while b <= 4392:
l.append(outl[a:b])
l.append('\n')
out.writelines(l)
a+=12
b+=12
l=[]

Hm, although other answers seem to be correct, I still think that the final solution can be, well, faster:
with open('r6.txt', 'r') as f, \
open('out.txt', 'w') as out:
# call anonymous lambda function returning f.read(12) until output is '', put output to part
for part in iter(lambda: f.read(12), ''):
# write this part and newline character
out.write(part)
out.write('\n')

Vlad-ardelean is correct in saying you need to append f.readlines() to outl instead of the file f.
Also, you're using writelines() to write a single line each time, but writelines() is intended for writing out a list of strings to a file, not one item lists. Perhaps a better way to approach the insertion of newline characters would be:
l = []
outl=[]
with open('r6.txt', 'r') as f, \
open('out.txt', 'w') as out:
# gets entire file as one string and removes line breaks
outl = ''.join(f.readlines()).replace('\n','')
l = [outl[each:each+12]+'\n' for each in xrange(0,len(outl),12)]
out.writelines(l)
Sample input for r6:
abcdefeounv lernbtlttb
berolinervio
bnrtopimrtynprymnpobm,t
2497839085gh
b640h846j048nm5gh0m8-9
2g395gm4-59m46bn
2vb-9mb5-9046m-b946m-b946mb-96m-05n=570n;rlgbm'dfb
output:
abcdefeounv
lernbtlttbbe
rolinerviobn
rtopimrtynpr
ymnpobm,t249
7839085ghb64
0h846j048nm5
gh0m8-92g395
gm4-59m46bn2
vb-9mb5-9046
m-b946m-b946
mb-96m-05n=5
70n;rlgbm'df
b

Related

Combine two wordlist in one file Python

I have two wordlists, as per the examples below:
wordlist1.txt
aa
bb
cc
wordlist2.txt
11
22
33
I want to take every line from wordlist2.txt and put it after each line in wordlist1.txt and combine them in wordlist3.txt like this:
aa
11
bb
22
cc
33
.
.
Can you please help me with how to do it? Thanks!
Try to always try to include what you have tried.
However, this is a great place to start.
def read_file_to_list(filename):
with open(filename) as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]
return lines
wordlist1= read_file_to_list("wordlist1.txt")
wordlist2= read_file_to_list("wordlist2.txt")
with open("wordlist3.txt",'w',encoding = 'utf-8') as f:
for x,y in zip(wordlist1,wordlist2):
f.write(x+"\n")
f.write(y+"\n")
Check the following question for more ideas and understanding: How to read a file line-by-line into a list?
Cheers
Open wordlist1.txt and wordlist2.txt for reading and wordlist3.txt for writing. Then it's as simple as:
with open('wordlist3.txt', 'w') as w3, open('wordlist1.txt') as w1, open('wordlist2.txt') as w2:
for l1, l2 in zip(map(str.rstrip, w1), map(str.rstrip, w2)):
print(f'{l1}\n{l2}', file=w3)
Instead of using .splitlines(), you can also iterate over the files directly. Here's the code:
wordlist1 = open("wordlist1.txt", "r")
wordlist2 = open("wordlist2.txt", "r")
wordlist3 = open("wordlist3.txt", "w")
for txt1,txt2 in zip(wordlist1, wordlist2):
if not txt1.endswith("\n"):
txt1+="\n"
wordlist3.write(txt1)
wordlist3.write(txt2)
wordlist1.close()
wordlist2.close()
wordlist3.close()
In the first block, we are opening the files. For the first two, we use "r", which stands for read, as we don't want to change anything to the files. We can omit this, as "r" is the default argument of the open function. For the second one, we use "w", which stands for write. If the file didn't exist yet, it will create a new file.
Next, we use the zip function in the for loop. It creates an iterator containing tuples from all iterables provided as arguments. In this loop, it will contain tuples containing each one line of wordlist1.txt and one of wordlist2.txt. These tuples are directly unpacked into the variables txt1 and txt2.
Next we use an if statement to check whether the line of wordlist1.txt ends with a newline. This might not be the case with the last line, so this needs to be checked. We don't check it with the second line, as it is no problem that the last line has no newline because it will also be at the end of the resulting file.
Next, we are writing the text to wordlist3.txt. This means that the text is appended to the end of the file. However, the text that was already in the file before the opening, is lost.
Finally, we close the files. This is very important to do, as otherwise some progress might not be saved and no other applications can use the file meanwhile.
Try this:
with open('wordlist1.txt', 'r') as f1:
f1_list = f1.read().splitlines()
with open('wordlist2.txt', 'r') as f2:
f2_list = f2.read().splitlines()
f3_list = [x for t in list(zip(f1, f2)) for x in t]
with open('wordlist3.txt', 'w') as f3:
f3.write("\n".join(f3_list))
with open('wordlist1.txt') as w1,\
open('wordlist2.txt') as w2,\
open('wordlist3.txt', 'w') as w3:
for wordlist1, wordlist2 in zip(w1.readlines(), w2.readlines()):
if wordlist1[-1] != '\n':
wordlist1 += '\n'
if wordlist2[-1] != '\n':
wordlist2 += '\n'
w3.write(wordlist1)
w3.write(wordlist2)
Here you go :)
with open('wordlist1.txt', 'r') as f:
file1 = f.readlines()
with open('wordlist2.txt', 'r') as f:
file2 = f.readlines()
with open('wordlist3.txt', 'w') as f:
for x in range(len(file1)):
if not file1[x].endswith('\n'):
file1[x] += '\n'
f.write(file1[x])
if not file2[x].endswith('\n'):
file2[x] += '\n'
f.write(file2[x])
Open wordlist 1 and 2 and make a line paring, separate each pair by a newline character then join all the pairs together and separated again by a newline.
# paths
wordlist1 = #
wordlist2 = #
wordlist3 = #
with open(wordlist1, 'r') as fd1, open(wordlist2, 'r') as fd2:
out = '\n'.join(f'{l1}\n{l2}' for l1, l2 in zip(fd1.read().split(), fd2.read().split()))
with open(wordlist3, 'w') as fd:
fd.write(out)

Counting specific characters in a file (Python)

I'd like to count specific things from a file, i.e. how many times "--undefined--" appears. Here is a piece of the file's content:
"jo:ns 76.434
pRE 75.417
zi: 75.178
dEnt --undefined--
ba --undefined--
I tried to use something like this. But it won't work:
with open("v3.txt", 'r') as infile:
data = infile.readlines().decode("UTF-8")
count = 0
for i in data:
if i.endswith("--undefined--"):
count += 1
print count
Do I have to implement, say, dictionary of tuples to tackle this or there is an easier solution for that?
EDIT:
The word in question appears only once in a line.
you can read all the data in one string and split the string in a list, and count occurrences of the substring in that list.
with open('afile.txt', 'r') as myfile:
data=myfile.read().replace('\n', ' ')
data.split(' ').count("--undefined--")
or directly from the string :
data.count("--undefined--")
readlines() returns the list of lines, but they are not stripped (ie. they contain the newline character).
Either strip them first:
data = [line.strip() for line in data]
or check for --undefined--\n:
if line.endswith("--undefined--\n"):
Alternatively, consider string's .count() method:
file_contents.count("--undefined--")
Or don't limit yourself to .endswith(), use the in operator.
data = ''
count = 0
with open('v3.txt', 'r') as infile:
data = infile.readlines()
print(data)
for line in data:
if '--undefined--' in line:
count += 1
count
When reading a file line by line, each line ends with the newline character:
>>> with open("blookcore/models.py") as f:
... lines = f.readlines()
...
>>> lines[0]
'# -*- coding: utf-8 -*-\n'
>>>
so your endswith() test just can't work - you have to strip the line first:
if i.strip().endswith("--undefined--"):
count += 1
Now reading a whole file in memory is more often than not a bad idea - even if the file fits in memory, it still eats fresources for no good reason. Python's file objects are iterable, so you can just loop over your file. And finally, you can specify which encoding should be used when opening the file (instead of decoding manually) using the codecs module (python 2) or directly (python3):
# py3
with open("your/file.text", encoding="utf-8") as f:
# py2:
import codecs
with codecs.open("your/file.text", encoding="utf-8") as f:
then just use the builtin sum and a generator expression:
result = sum(line.strip().endswith("whatever") for line in f)
this relies on the fact that booleans are integers with values 0 (False) and 1 (True).
Quoting Raymond Hettinger, "There must be a better way":
from collections import Counter
counter = Counter()
words = ('--undefined--', 'otherword', 'onemore')
with open("v3.txt", 'r') as f:
lines = f.readlines()
for line in lines:
for word in words:
if word in line:
counter.update((word,)) # note the single element tuple
print counter

python delete specific line and re-assign the line number

I would like delete specific line and re-assign the line number:
eg:
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
what I want: if line 1 is the line need to be delete, then
output should be:
0,abc,def
1,mno,pqr
2,stu,vwx
What I have done so far:
f=open(file,'r')
lines = f.readlines()
f.close()
f.open(file,'w')
for line in lines:
if line.rsplit(',')[0] != 'line#':
f.write(line)
f.close()
above lines can delete specifc line#, but I don't konw how to rewrite the line number before the first ','
Here is a function that will do the job.
def removeLine(n, file):
f = open(file,"r+")
d = f.readlines()
f.seek(0)
for i in range(len(d)):
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
elif i != n:
f.write(d[i])
f.truncate()
f.close()
Where the parameters n and file are the line you wish to delete and the filepath respectively.
This is assuming the line numbers are written in the line as implied by your example input.
If the number of the line is not included at the beginning of each line, as some other answers have assumed, simply remove the first if statement:
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
I noticed that your account wasn't created in the past few hours, so I figure that there's no harm in giving you the benefit of the doubt. You will really have more fun on StackOverflow if you spend the time to learn its culture.
I wrote a solution that fits your question's criteria on a file that's already written (you mentioned that you're opening a text file), so I assume it's a CSV.
I figured that I'd answer your question differently than the other solutions that implement the CSV reader library and use a temporary file.
import re
numline_csv = re.compile("\d\,")
# substitute your actual file opening here
so_31195910 = """
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
"""
so = so_31195910.splitlines()
# this could be an input or whatever you need
delete_line = 1
line_bank = []
for l in so:
if l and not l.startswith(str(delete_line)+','):
print(l)
l = re.split(numline_csv, l)
line_bank.append(l[1])
so = []
for i,l in enumerate(line_bank):
so.append("%s,%s" % (i,l))
And the output:
>>> so
['0,abc,def', '1,mno,pqr', '2,stu,vwx']
In order to get a line number for each line, you should use the enumerate method...
for line_index, line in enumerate(lines):
# line_index is 0 for the first line, 1 for the 2nd line, &ct
In order to separate the first element of the string from the rest of the string, I suggest passing a value for maxsplit to the split method.
>>> '0,abc,def'.split(',')
['0', 'abc', 'def']
>>> '0,abc,def'.split(',',1)
['0', 'abc,def']
>>>
Once you have those two, it's just a matter of concatenating line_index to split(',',1)[1].

writing a list to a txt file in python

list consists of RANDOM strings inside it
#example
list = [1,2,3,4]
filename = ('output.txt')
outfile = open(filename, 'w')
outfile.writelines(list)
outfile.close()
my result in the file
1234
so now how do I make the program produce the result that I want which is:
1
2
3
4
myList = [1,2,3,4]
with open('path/to/output', 'w') as outfile:
outfile.write('\n'.join(str(i) for i in myList))
By the way, the list that you have in your post contains ints, not strings.
Also, please NEVER name your variables list or dict or any other type for that matter
writelines() needs a list of strings with line separators appended to them but your code is only giving it a list of integers. To make it work you'd need to use something like this:
some_list = [1,2,3,4]
filename = 'output.txt'
outfile = open(filename, 'w')
outfile.writelines([str(i)+'\n' for i in some_list])
outfile.close()
In Python file objects are context managers which means they can be used with a with statement so you could do the same thing a little more succinctly with the following which will close the file automatically. It also uses a generator expression (enclosed in parentheses instead of brackets) to do the string conversion since doing that avoids the need to build a temporary list just to pass to the function.
with open(filename, 'w') as outfile:
outfile.writelines((str(i)+'\n' for i in some_list))

python line separated values in a text when converted to list, adds "\n" to the elements in the list

I was astonished that a thing this simple has been troubling me. Below is the code
list = []
f = open("log.txt", "rb") # log.txt file has line separated values,
for i in f.readlines():
for value in i.split(" "):
list.append(value)
print list
The output is
['xx00', '\n', 'xx01in', '\n', 'xx01na', '\n', 'xx01oz', '\n', 'xx01uk', '\n']
How can I get rid of the new line i.e. '\n'?
list = []
f = open("log.txt", "rb") # log.txt file has line separated values,
for i in f.readlines():
for value in i.strip().split(" "):
list.append(value)
print list
.strip() removes trailing newlines. to be explicit you can use .strip('\n') or .strip('\r\n') in some cases.
you can read more about .strip() here
edit
better way to do what you wanted:
with open("log.txt", 'rb') as f:
mylist = [val for subl in [l.split(' ') for l in f.read().splitlines()] for val in subl]
for an answer which is much easier on the eyes, you can import itertools and use chain to flatten the list of lists, like #Jon Clements example
so it would look like this:
from itertools import chain
with open("log.txt", 'rb') as f:
mylist = list(chain.from_iterable(l.split(' ') for l in f.read().splitlines()))
If line-separated means that there is only one value per line, you don't need split() at all:
with open('log.txt', 'rb') as f:
mylist = map(str.strip, f)
In Python 3 wrap map() in a list().
with open("log.txt", "rb") as f:
mylist = f.read().splitlines()
Also, don't use list as a variable name, as it overshadows the python type list().
The correct way to do this, is:
with open('log.txt') as fin:
for line in fin:
print line.split()
By using split() without an argument, the '\n''s automatically don't become a problem (as split or split(None) uses different rules for splitting).
Or, more concisely:
from itertools import chain
with open('log.txt') as fin:
mylist = list(chain.from_iterable(line.split() for line in fin))
If you have a bunch of lines with space separated values, and you just want a list of all the values without caring about where the line breaks were (which appears to be the case from your example, since you're always appending to the same list regardless of what line you're on), then don't bother looping over lines. Just read the whole file as a single string and call split() with no arguments; it will split the string on any sequence of one or more whitespace characters, including both spaces and newlines, with the result that none of the values will contain any whitespace:
with open('log.txt', 'rb') as f:
values = f.read().split()

Categories

Resources