Hi I have a small problem here.
I have a text file with numbers which looks like this
2.131583
2.058964
6.866568
0.996470
6.424396
0.996004
6.421990
And with
fList = [s.strip() for s in open('out.txt').readlines()]
outStr = ''
for i in fList:
outStr += (i+',')
f = open('text_to_csv.csv', 'w')
f.write(outStr.strip())
f.close()
I am able to generate a CSV and all the data is stored in it, but all in one row.
I would like to have them in two columns.
Is there any easy addition that would make the CSV look like this?
2.131583 2.058964
6.866568 0.996470
6.424396 0.996004
A better way would be using csv module. You can write like
import csv
with open('text_to_csv.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=',',quoting=csv.QUOTE_MINIMAL)
for i in range(0, len(fList), 2):
writer.writerow(fList[i:i+2])
fList = [s.strip() for s in open('out.txt').readlines()]
outStr = ''
count = 0
for i in fList:
outStr += (i+',')
if count % 2 == 0: # You can replace 2 with what ever number you of columns you need
outStr += ('\r\n') # Make the return correct for your system
count += 1
f = open('text_to_csv.csv', 'w')
f.write(outStr.strip())
f.close()
Something like this:
with open('out.txt', 'r') as fList, open('text_to_csv.csv', 'w') as f:
i = 0
for line in fList:
f.write(line)
f.write('\n' if i% 2 == 0 else '\t')`
If you're not interested in storing the entries from the original file in a new list but just want the output file, you can also do something like this:
fList = [s.strip() for s in open('out.txt').readlines()]
f = open('text_to_csv.csv', 'w')
for i in range(0,len(fList)-1,2):
f.write(fList[i] + "," + fList[i+1] + "\n")
f.close()
If you have a list (from reading the file) in memory, just reformat the list into what you want:
input='''\
2.131583
2.058964
6.866568
0.996470
6.424396
0.996004
6.421990'''
cols=2
data=input.split() # proxy for a file
print data
print '==='
for li in [data[i:i+cols] for i in range(0,len(data),cols)]:
print li
Prints:
['2.131583', '2.058964', '6.866568', '0.996470', '6.424396', '0.996004', '6.421990']
===
['2.131583', '2.058964']
['6.866568', '0.996470']
['6.424396', '0.996004']
['6.421990']
Or, use a N-at-a-time file reading idiom:
import itertools
cols=2
with open('/tmp/nums.txt') as fin:
for li in itertools.izip_longest(*[fin]*cols):
print li
# prints
('2.131583\n', '2.058964\n')
('6.866568\n', '0.996470\n')
('6.424396\n', '0.996004\n')
('6.421990', None)
Which you can combine into one iterator in, one iterator out if you want a type of file filter:
import itertools
cols=2
with open('/tmp/nums.txt') as fin, open('/tmp/nout.txt','w') as fout:
for li in itertools.izip_longest(*[fin]*cols):
fout.write('\t'.join(e.strip() for e in li if e)+'\n')
The output file will now be:
2.131583 2.058964
6.866568 0.996470
6.424396 0.996004
6.421990
If you only want to write the output of there are the full set of numbers, i.e., the remainder numbers at the end of the file that are less than cols in total length:
import itertools
cols=2
# last number '6.421990' not included since izip is used instead of izip_longest
with open('/tmp/nums.txt') as fin, open('/tmp/nout.txt','w') as fout:
for li in itertools.izip(*[fin]*cols):
fout.write('\t'.join(e.strip() for e in li)+'\n')
Then the output file is:
2.131583 2.058964
6.866568 0.996470
6.424396 0.996004
I'am not really sure what you mean, but i think your expected output is:
2.131583,2.058964,
6.866568,0.996470,
6.424396,0.996004,
6.421990
My code for this:
with open('out.txt', 'r') as fif, open('text_to_csv.csv', 'w') as fof:
fList = ','.join([v.strip() if i % 2 else '\n'+v.strip()
for i, v in enumerate(fif.readlines())])[1:]
fof.write(fList)
Interesting points:
If you want to get rid of the trailing "," at the end of your file, just concatenate the list via the join() function.
flat_string = ','.join([item1,...,])
For leading linebreak on odd-items in the list i have enumerated it.
index, value enumerate([item1,...,])
And find the odd-items via the modulo-operator index % 2.
With an "inline-if" you can check this on the fly.
At least i exclude the redundant linebreak at the beginning on the string with [1:]
Related
I have a .txt file which looks like:
# Explanatory text
# Explanatory text
# ID_1 ID_2
10310 34426
104510 4582343
1032410 5424233
12410 957422
In the file, the two IDs on the same row are separated with tabs and the tab character is encoded as '\t'
I'm trying to do some analysis using the numbers in the dataset so want to delete the first three rows. How can this be done in Python? I.e. I'd like to produce a new dataset that looks like:
10310 34426
104510 4582343
1032410 5424233
12410 957422
I've tried the following code but it didn't work:
f = open(filename,'r')
lines = f.readlines()[3:]
f.close()
It doesn't work because I get this format (a list, with \t and \n present), not the one I indicated I want above:
[10310\t34426\n', '104510\t4582343\n', '1032410\t5424233\n' ... ]
You Can Try Something Like this
with open(filename,'r') as fh
for curline in fh:
# check if the current line
# starts with "#"
if curline.startswith("#"):
...
...
else:
...
...
You can use Python's Pandas to do these kind of tasks easily:
import pandas as pd
pd.read_csv(filename, header=None, skiprows=[0, 1, 2], sep='\t')
Ok, here is the solution:
with open('file.txt') as f:
lines = f.readlines()
lines = lines[3:]
Remove Comments
This function remove all comment lines
def remove_comments(lines):
return [line for line in lines if line.startswith("#") == False]
Remove n number of top lines
def remove_n_lines_from_top(lines, n):
if n <= len(lines):
return lines[n:]
else:
return lines
Here is the complete source:
with open('file.txt') as f:
lines = f.readlines()
def remove_comments(lines):
return [line for line in lines if line.startswith("#") == False]
def remove_n_line(lines, n):
return lines[n if n<= len(lines) else 0:]
lines = remove_n_lines_from_top(lines, 3)
f = open("new_file.txt", "w+") # save on new_file
f.writelines(lines)
f.close()
I tried doing what I can to solve this, the movie titles just won't move up. The problem is at the 2nd block in the for loop.. This is the function I wrote.
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
headers = "No., Title\n"
f.write(headers)
i = 0
for title in movie_titles:
while i < len(movie_titles[0:]): i = i + 1; f.write(str(i) + '\n')
f.write(', '+ "%s\n" % title.replace(',', '') + '\n')
f.close()
Another answer has a more straightforward and pythonic method, but for your specific code, this would solve it:
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
headers = "No., Title\n"
f.write(headers)
i = 0
for title in movie_titles:
i = i + 1
f.write(str(i) + ', '+ "%s\n" % title.replace(',', '') + '\n')
Note that the final f.close() is not needed. The with command takes care of that.
You can use enumerate() in for loop to get index. For example:
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
f.write("No., Title\n")
for i, title in enumerate(movie_titles, 1):
f.write('{},{}\n'.format(i, title.replace(',', '')))
Note: To create CSV file look at csv module.
You got your loops mixed up a bit. your code goes into the for-loop and iterates over all movies, during the first iteration it executes the while-loop and only after that is finished the for-loop is continued.
I would suggest something like this:
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
headers = "No., Title\n"
f.write(headers)
i = 0
for i in range(len(movie_titles)):
f.write(str(i+1) + ',')
f.write("%s\n" % movie_titles[i].replace(',', ''))
f.close()
the for loop iterates over all numbers from 0 to length of movielist - 1
then the number is written, here you add 1 so that your list starts with 1
after that you write your movie title. i assumed your movielist variable is a list and thus you can index this list by list[index], this index is in our case i and it's highest value corresponds to the last element of movie list.
also you had too many newlines because you only need one new line per line.
one could probably also write numbers and movienames separately but then you would need to specify which row of the file you are writing to.
I have a big text file with a lot of parts. Every part has 4 lines and next part starts immediately after the last part.
The first line of each part starts with #, the 2nd line is a sequence of characters, the 3rd line is a + and the 4th line is again a sequence of characters.
Small example:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACGCTTATCGATAAAATTTTGAATTTTGTAACTTGTTTTTGTAATTCTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCCCTGCACTGTACCCCCCAATCCCCCCTTTTCTTTTAAAAGTTAACCGATACCGTCGAGATCCGTTCACTAATCGAACGGATCTGTCTCTGTCTCTCTC
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5AEG1EF511F1?GFH3#BFADGD55F?#GFHFGGFCGG/GHGHHHHHHHDBG4E?FB?BGHHHHHHHHHHHHHHHHHFHHHHHHHHHGHGHGHHHHHFHHHHHGGGGHHHHGGGGHHHHHHHGHGHHHHHHFGHCFGGGHGGGGGGGGFGGEGBFGGGGGGGGGFGGGGFFB9/BFFFFFFFFFF/
I want to change the 2nd and the 4th line of each part and make a new file with similar structure (4 lines for each part). In fact I want to keep the 1st 65 characters (in lines 2 and 4) and remove the rest of characters. The expected output for the small example would look like this:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACG
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5A
I wrote the following code:
infile = open("file.fastq", "r")
new_line=[]
for line_number in len(infile.readlines()):
if line_number ==2 or line_number ==4:
new_line.append(infile[line_number])
with open('out_file.fastq', 'w') as f:
for item in new_line:
f.write("%s\n" % item)
but it does not return what I want. How to fix it to get the expected output?
This code will achieve what you want -
from itertools import islice
with open('bio.txt', 'r') as infile:
while True:
lines_gen = list(islice(infile, 4))
if not lines_gen:
break
a,b,c,d = lines_gen
b = b[0:65]+'\n'
d = d[0:65]+'\n'
with open('mod_bio.txt', 'a+') as f:
f.write(a+b+c+d)
How it works?
We first make a generator that gives 4 lines at a time as you mention.
Then we open the lines into individual lines a,b,c,d and perform string slicing. Eventually we join that string and write it to a new file.
I think some itertools.cycle could be nice here:
import itertools
with open("transformed.file.fastq", "w+") as output_file:
with open("file.fastq", "r") as input_file:
for i in itertools.cycle((1,2,3,4)):
line = input_file.readline().strip()
if not line:
break
if i in (2,4):
line = line[:65]
output_file.write("{}\n".format(line))
readlines() will return list of each line in your file. You don't need to prepare a list new_line. Directly iterate over index-value pair of list, then you can modify all the values in your desired position.
By modifying your code, try this
infile = open("file.fastq", "r")
new_lines = infile.readlines()
for i, t in enumerate(new_lines):
if i == 1 or i == 3:
new_lines[i] = new_lines[i][:65]
with open('out_file.fastq', 'w') as f:
for item in new_lines:
f.write("%s" % item)
I have a folder with multiple ascii coded txt files and would like to open all of them, read all the lines, write into the file and remove whitespaces if any and change/delete the first number of the 4th object in the list at the same time.
One file content looks like that, as a list:
[' 0.200000\n', ' 0.000000\n', ' 0.000000\n', ' -0.200000\n', ' 3400000.100000\n', ' 5867999.900000\n']
At the end it should look like that:
['0.200000\n', '0.000000\n', '0.000000\n', '-0.200000\n', '400000.100000\n', '5867999.900000\n']
Whithout whitespaces and the first number in the 4th object
My code so far:
import glob, fileinput, os, shutil, string, tempfile, linecache,sys
pfad = "D:\\Test\\"
filelist = glob.glob(pfad+"*.tfw")
if not filelist:
print "none tfw-file found"
sys.exit("nothing to convert")
for fileName in fileinput.input(filelist,inplace=True):
data_list = [''.join(s.split()) for s in data_list]
data_list[4]= data_list[4][1:]
print(data_list)
sys.stdout.write(data_list)
i have managed to modify the files at the same time but still can't overwrite them with a new content. I recieve the following error:
"data_list = [''.join(s.split()) for s in data_list]
NameError: name 'data_list' is not defined"
You want to str.lstrip the leading whitespace:
for fileName in filelist:
with open(fileName, "r" ) as f:
lines = [line.lstrip() for line in f]
lines[4] = lines[4][1:]
Using with will close your files automatically, also ' 3400000.100000\n' is the fifth object in the list.
I have no idea what you are actually trying to do after you extract the lines as you don't store the data anywhere as you iterate, you just reassign to new values each iteration. If you want to write the data to a file then write as you iterate using file.writelines on the list:
for fileName in filelist:
with open(fileName, "r" ) as f, open("{}_new".format(fileName),w") as out:
lines = [line.lstrip() for line in f]
lines[4] = lines[4][1:]
out.writelines(lines)
If you want to replace the original use either approach from this answer
from tempfile import NamedTemporaryFile
from shutil import move
import os
for fileName in filelist:
with open(fileName) as f, NamedTemporaryFile("w",dir=".", delete=False) as temp:
for ind, line in enumerate(f):
if ind == 4:
temp.write(line.lstrip()[1:])
else:
temp.write(line.lstrip())
move(temp.name, fileName)
Actually, a list object is indexed. In your code, the first character of the 4th element (if we start counting at zero) is at data_list[4][0].
Using slicing, data_list[4][1:] will remove the first character of the 4th element.
Sample Script: You can test it here:
>>> # original list
>>> lst = [' 0.200000\n', ' 0.000000\n', ' 0.000000\n', ' -0.200000\n', ' 3400000.100000\n', ' 5867999.900000\n']
>>>
>>> # removes leading whitespaces from each string of the list
>>> lst = [ s.lstrip() for s in lst ]
>>>
>>> # removes the first character of the 4th string of the list
>>> lst[4] = lst[4][1:]
>>>
>>> # prints the modified list
>>> print(lst)
['0.200000\n', '0.000000\n', '0.000000\n', '-0.200000\n', '400000.100000\n', '5867999.900000\n']
Overwriting the file with the modified list:
Way 1: Closing and reopening in write mode:
for fileName in filelist:
# open in read mode
with open(fileName, 'r') as data_file:
data_list = data_file.readlines()
# list modification
data_list = [ s.lstrip() for s in data_list ]
data_list[4] = data_list[4][1:]
# reopens file in write mode, deletes contents
with open(fileName, 'w') as data_file:
# overwriting
for line in data_list:
data_file.write(line)
Way 2: Using file.truncate() so that the file won't be closed and reopened:
for fileName in filelist:
# open in read/write mode
with open(fileName, 'r+') as data_file:
data_list = data_file.readlines()
# list modification
data_list = [ s.lstrip() for s in data_list ]
data_list[4] = data_list[4][1:]
# removes file contents from first character to end
data_file.truncate(0)
# puts cursor to the start of the file
data_file.seek(0)
# overwriting
for line in data_list:
data_file.write(line)
this does what you want:
import io
file0 = io.StringIO(''' 0.200000
0.000000
0.000000
-0.200000
3400000.100000
5867999.900000
''')
def read_data(fle):
out_str = ''
for (i, line) in enumerate(file0.readlines()):
if i != 4:
out_str += '{}\n'.format(line.strip())
else:
out_str += '{}\n'.format(line.strip()[1:])
return out_str
print(read_data(file0))
i am not entirely sure what you mean with "indexing characters". in python strings behave like lists of characters. you can address individual characters with string[5] or get slices string[5:-1]. does that answer your question?
I would like to write a code that will read and open a text file and tell me how many "." (full stops) it contains
I have something like this but i don't know what to do now?!
f = open( "mustang.txt", "r" )
a = []
for line in f:
with open('mustang.txt') as f:
s = sum(line.count(".") for line in f)
Assuming there is absolutely no danger of your file being so large it will cause your computer to run out of memory (for instance, in a production environment where users can select arbitrary files, you may not wish to use this method):
f = open("mustang.txt", "r")
count = f.read().count('.')
f.close()
print count
More properly:
with open("mustang.txt", "r") as f:
count = f.read().count('.')
print count
I'd do it like so:
with open('mustang.txt', 'r') as handle:
count = handle.read().count('.')
If your file isn't too big, just load it into memory as a string and count the dots.
with open('mustang.txt') as f:
fullstops = 0
for line in f:
fullstops += line.count('.')
This will work:
with open('mustangused.txt') as inf:
count = 0
for line in inf:
count += line.count('.')
print 'found %d periods in file.' % count
even with Regular Expression
import re
with open('filename.txt','r') as f:
c = re.findall('\.+',f.read())
if c:print len(c)