Print line in file until line blank line - python

I have a file "testread.txt" having below data.
A
1
2
3
4
BA
5
6
7
8
CB
9
10
11
D
12
13
14
15
I Wanted to read and extract data each section wise and write it to different files. Eg;
1
2
3
4
Write it to File "a.txt"
5
6
7
8
Write it to File "b.txt"
9
10
11
Write it to File "c.txt"
and so on...

A (rough) solution could be get using:
collections.defaultdict to divide and store items;
numpy.savetxt to save them into files.
import numpy as np
from collections import defaultdict
with open('testread.txt', 'r') as f:
content = f.readlines()
d = defaultdict(list)
i = 0
for line in content:
if line == '\n':
i+=1
else:
d[i].append(line.strip())
for k,v in d.items():
np.savetxt('file{}.txt'.format(k), v[1:], delimiter=",", fmt='%s')
and you get:
file0.txt
1
2
3
4
file1.txt:
5
6
7
8
file2.txt:
9
10
11
file3.txt
12
13
14
15

The idea is to skip file when a new empty line is available. The below code should do the trick.
files_list = ['a.txt', 'b.txt', 'c.txt']
fpr = open('input.txt')
for f in files_list:
with open(f, 'w') as fpw:
for i, line in enumerate(fpr):
if i == 0: # skips first line
continue
if line.strip():
fpw.write(line)
else:
break

Related

Append data of a second file to the first file in each line

My question looks exactly like this post : Append float data at the end of each line in a text file
But for me, it is different. I have a dat file containing over 500 lines.
I want that for each line, it adds me the value of the corresponding line in the second file. This second file only contains values like 0 or 1 in one column.
What I have :
File 1 : File 2 :
1 2 3 4 0
1 2 3 4 1
1 2 3 4 0
What I want :
File 1 : File 2 :
1 2 3 4 0 0
1 2 3 4 1 1
1 2 3 4 0 0
What I've already tried :
Y = np.loadtxt('breastcancerY')
def get_number(_):
lines = []
for line in Y:
print('this is a line', line)
return " " + str(line) + '\n'
with open("breastcancerX","r") as f:
data = f.read()
out = re.sub('\n',get_number,data)
with open("output.txt","w") as f:
f.write(out)
When I do that and I print my values in file of 0 and 1, all the values are 0, it doesn't correspond to my file.
EDIT 1 :
Using this code :
# first read the two files into list of lines
with open("breastcancerY","r") as f:
dataY = f.readlines()
with open("breastcancerX","r") as f:
dataX = f.readlines()
# then combine lines from two files to one line.
with open("output.dat","w") as f:
for X,Y in zip(dataX,dataY):
f.write(f"{X} {Y}")
It gives me
this
# I don't understand what you want to do this this part
Y = np.loadtxt('breastcancerY')
def get_number(_):
lines = []
for line in Y:
print('this is a line', line)
return " " + str(line) + '\n'
# I don't understand what you want to do this this part
# first read the two files into list of lines
with open("breastcancerY","r") as f:
dataY = f.readlines()
with open("breastcancerX","r") as f:
dataX = f.readlines()
# then combine lines from two files to one line.
with open("output.txt","w") as f:
for X,Y in zip(dataX,dataY):
f.write(f"{X.strip()} {Y.strip()}\n")
Using zip which provides the pairing of lines
Code
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2, open('fil3.txt', 'w') as f3:
for line1, line2 in zip(f1, f2):
f3.write(f'{line1.rstrip()} {line2}') # Writes:
# line from file1 without \n
# space,
# corresponding line from file2
Files
File1.txt
1 2 3 4
1 2 3 4
1 2 3 4
file2.txt
0
1
0
Result: file3.txt
1 2 3 4 0
1 2 3 4 1
1 2 3 4 0

Python combine rows from different files into one data file

I have distributed information over multiple large csv files.
I want to combine all the files into one new file such as the first row from the first file is combined to the first row from the other file etc.
file1.csv
A,B
A,C
A,D
file2.csv
F,G
H,I
J,K
expected result:
output.csv
A,B,F,G
A,C,H,I
A,D,J,K
so consider I have an array ['file1.csv', 'file2.csv', ...] How to go from here ?
I tried to load each file into the memory and combine by np.column_stack but my files are too large to fit in memory.
Not pretty code, but this should work.
I'm not using with(open'filename','r') as myfile for the inputs. It could get a bit messy with 50 files, so these are opened and closed explicitly.
It opens each file then places the handle in a list. The first handle is taken as the master file, then we iterate through it line-by-line, each time reading one line from all the other open files and joining them with ',' then output that to the output file.
Note that if the other files have more lines, they won't be included. If any have less lines, this will raise an exception. I'll leave it to you to deal with these situations gracefully.
Note also that you can use glob to create filelist if the names follow a logical pattern (thanks to N. Wouda, below)
filelist = ['book1.csv','book2.csv','book3.csv','book4.csv']
openfiles = []
for filename in filelist:
openfiles.append(open(filename,'rb'))
# Use first file in the list as the master
# All files must have same number of lines (or greater)
masterfile = openfiles.pop(0)
with (open('output.csv','w')) as outputfile:
for line in masterfile:
outputlist = [line.strip()]
for openfile in openfiles:
outputlist.append(openfile.readline().strip())
outputfile.write(str.join(',', outputlist)+'\n')
masterfile.close()
for openfile in openfiles:
openfile.close()
Input Files
a b c d e f
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
Output
a b c d e f a b c d e f a b c d e f a b c d e f
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
7 8 9 10 11 12 7 8 9 10 11 12 7 8 9 10 11 12 7 8 9 10 11 12
13 14 15 16 17 18 13 14 15 16 17 18 13 14 15 16 17 18 13 14 15 16 17 18
Instead of completely reading the files into the memory you can iterate over them line by line.
from itertools import izip # like zip but gives us an iterator
with open('file1.csv') as f1, open('file2.csv') as f2, open('output.csv', 'w') as out:
for f1line, f2line in izip(f1, f2):
out.write('{},{}'.format(f1line.strip(), f2line))
Demo:
$ cat file1.csv
A,B
A,C
A,D
$ cat file2.csv
F,G
H,I
J,K
$ python2.7 merge.py
$ cat output.csv
A,B,F,G
A,C,H,I
A,D,J,K

Converting coordinate representation to adjecency list representation

What is the most efficient way to convert this file:
10 3
10 5
12 6
12 19
19 12
19 14
19 10
to this:
10 3 5
12 6 19
19 12 14 10
First column of the input file is numerically sorted in increasing order.
Any solutions using Python, AWK, etc. are welcome.
from itertools import groupby
lines, op_file = [line.split() for line in open("In.txt")], open("Out.txt", "w")
for key, grp in groupby(lines, key = lambda x: x[0]):
print >> op_file, "{} {}".format(key, " ".join([i[1] for i in grp]))
op_file.close()
Output
10 3 5
12 6 19
19 12 14 10
Since you mentioned awk:
$ awk '{a[$1]=a[$1]" "$2}END{for (i in a){print i a[i]}}' input
19 12 14 10
10 3 5
12 6 19
pipe it to sort to have it, well, sorted:
$ awk '...' input | sort
10 3 5
12 6 19
19 12 14 10
In Python 2:
import itertools, operator
with open(infilename) as infile:
input = (line.split() for line in infile)
output = itertools.groupby(input, operator.itemgetter(0))
with open(outfilename, 'w') as outfile:
for key, line in output:
print >>outfile, key, ' '.join(val[1] for val in line)
This assumes that the input and output files are different: you could just write the output to standard out and leave it as the user's problem to save it.
Try out this code
fp = open('/tmp/test.txt')
list_dict = {}
for line in fp.readlines():
split_values = line.split()
if split_values[0] in list_dict:
list_dict[split_values[0]].extend(split_values[1:])
else:
list_dict[split_values[0]] = split_values
for val in list_dict.values():
print " ".join(val)

Count all +1's in the file python

I have the following data:
1 3 4 2 6 7 8 8 93 23 45 2 0 0 0 1
0 3 4 2 6 7 8 8 90 23 45 2 0 0 0 1
0 3 4 2 6 7 8 6 93 23 45 2 0 0 0 1
-1 3 4 2 6 7 8 8 21 23 45 2 0 0 0 1
-1 3 4 2 6 7 8 8 0 23 45 2 0 0 0 1
The above data is in a file. I want to count the number of 1's,0's,-1's but only in 1st column. I am taking the file in standard input but the only way I could think of is to do like this:
cnt = 0
cnt1 = 0
cnt2 = 0
for line in sys.stdin:
(t1, <having 15 different variables as that many columns are in files>) = re.split("\s+", line.strip())
if re.match("+1", t1):
cnt = cnt + 1
if re.match("-1", t1):
cnt1 = cnt1 + 1
if re.match("0", t1):
cnt2 = cnt2 + 1
How can I make it better especially the 15 different variables part as thats the only place where I will be using those variables.
Use collections.Counter:
from collections import Counter
with open('abc.txt') as f:
c = Counter(int(line.split(None, 1)[0]) for line in f)
print c
Output:
Counter({0: 2, -1: 2, 1: 1})
Here str.split(None, 1) splits the line just once:
>>> s = "1 3 4 2 6 7 8 8 93 23 45 2 0 0 0 1"
>>> s.split(None, 1)
['1', '3 4 2 6 7 8 8 93 23 45 2 0 0 0 1']
Numpy makes it even easy:
>>> import numpy as np
>>> from collections import Counter
>>> Counter(np.loadtxt('abc.txt', usecols=(0,), dtype=np.int))
Counter({0: 2, -1: 2, 1: 1})
If you only want the first column, then only split the first column. And use a dictionary to store the counts for each value.
count = dict()
for line in sys.stdin:
(t1, rest) = line.split(' ', 1)
try:
count[t1] += 1
except KeyError:
count[t1] = 1
for item in count:
print '%s occurs %i times' % (item, count[item])
Instead of using tuple unpacking, where you need a number of variables exactly equal to the number of parts returned by split(), you can just use the first element of those parts:
parts = re.split("\s+", line.strip())
t1 = parts[0]
or equivalently, simply
t1 = re.split("\s+", line.strip())[0]
import collections
def countFirstColum(fileName):
res = collections.defaultdict(int)
with open(fileName) as f:
for line in f:
key = line.split(" ")[0]
res[key] += 1;
return res
rows = []
for line in f:
column = line.strip().split(" ")
rows.append(column)
then you get a 2-dimensional array.
1st column:
for row in rows:
print row[0]
output:
1
0
0
-1
-1
This is from a script of mine with an infile, I checked and it works with standard input as infile:
dictionary = {}
for line in someInfile:
line = line.strip('\n') # if infile but you should
f = line.split() # do your standard input thing
dictionary[f[0]]=0
for line in someInfile:
line = line.strip('\n') # if infile but you should
f = line.split() # do your standard input thing
dictionary[f[0]]+=1
print dictionary

csv writer is adding delimiters in each words..

I wrote some throw away code which takes a list of ids checks for duplicates and writes a list of ids. Nothing fancy just a small part of what I am working on..
I get this weird output. It looks to me like the delimiter is adding spaces where it shouldn't. Is delimiter just between words or line ? Very confused.
r s 9 3 6 4 5 5 4
r s 9 3 1 1 1 7 1
r s 7 8 9 0 2 0 2 5
r s 7 6 5 2 3 3 1
r s 7 2 1 0 4 8
r s 6 9 8 3 2 6 7
r s 6 4 6 5 6 5 7
r s 6 2 9 2 4 2
r s 6 1 9 9 1 1 5 6
Code:
__author__ = 'prumac'
import csv
allsnps = []
def open_file():
ifile = open('mirnaduplicates.csv', "rb")
print "open file"
return csv.reader(ifile)
def write_file():
with open('mirnaduplicatesremoved.csv', 'w') as fp:
a = csv.writer(fp, delimiter=' ')
a.writerows(allsnps)
def checksnp(name):
if name in allsnps:
pass
else:
allsnps.append(name)
def mymain():
reader = open_file()
for r in reader:
checksnp(r[0])
print len(allsnps)
print allsnps
write_file()
mymain()
.writerows() expects a list of lists. Instead, you are handing it a list of strings, and these are treated as sequences of characters.
Put each string in a tuple or list:
a.writerows([val] for val in allsnps)
Note that you could do this all a little more efficiently:
with open('mirnaduplicates.csv', "rb") as ifile, \
open('mirnaduplicatesremoved.csv', 'wb') as fp:
reader = csv.reader(ifile)
writer = csv.writer(fp, delimiter=' ')
seen = set()
seen_add = seen.add
writer.writerows(row for row in reader if row[0] not in seen and not seen_add(row[0]))

Categories

Resources