how to add a line above specific lines in python - python

I have a file kind of like this:
===
1 2 3 4
===
2 3 4 5
===
3 4 5 6
and I am trying to make a program to turn the file into this
p
===
1 2 3 4
p
===
2 3 4 5
p
===
3 4 5 6
Is there any way I could do this in python?

you can use:
with open('my_file.txt') as fp:
lines = fp.readlines()
for i, l in enumerate(lines):
if l == '===\n':
lines[i] = 'p\n===\n'
with open('my_file.txt', 'w') as fp:
fp.write(''.join(lines))

Related

Multiplying a specific column in txt file with constant number

I have txt file with 7 column...I want to mutiply a 3rd column with a constant number keeping all other column same and then output the file containing all the columns. Anyone can help?
1 2 1
2 2 1
3 2 1
mutiplying column 3 with "14" the output should be like
1 2 14
2 2 14
3 2 14
While you have a text file with 7 columns, your example only shows 3.
So I have based my answer on your example:
The important part of code related to multiplication is this:
matrix[:,(target_col-1)] *= c_val
Here is the full PYTHON code:
import numpy as np
# Constant value (used for multiplication)
c_val = 14
# Number of columns in the matrix
n_col = 3
# Column to be multiplied (ie. Third column)
target_col = 3
# Import the text file containing the matrix
filename = 'data.txt'
matrix = np.loadtxt(filename, usecols=range(n_col))
# Multiply the target column (ie. 3rd column) by c_val (ie.14)
matrix[:,(target_col-1)] *= c_val
# Save the matrix to a new text file
with open('new_text_file.txt','wb') as f:
np.savetxt(f, matrix, delimiter=' ', fmt='%d')
OUTPUT:
new_text_file.txt
1 2 14
2 2 14
3 2 14
This is a possible solution for C++17.
If you are sure about the format of the input file, you could reduce the code to the one below:
Just walk the input stream, multiply every 3rd number by a constant, and add a new line for every 10th number (you mentioned 7 numbers per line but then your example contained 9 numbers).
Notice you would need to use file streams instead of string streams.
#include <fmt/core.h>
#include <sstream> // istringstream, ostringstream
void parse_iss(std::istringstream& iss, std::ostringstream& oss, int k) {
for (int number_counter{ 0 }, number; iss >> number; ++number_counter) {
oss << ((number_counter % 3 == 2) ? number*k : number);
oss << ((number_counter % 9 == 8) ? "\n" : " ");
}
}
int main() {
std::istringstream iss{
"1 2 1 2 2 1 3 2 1\n"
"2 4 2 4 4 5 5 5 6\n"
};
std::ostringstream oss{};
parse_iss(iss, oss, 14);
fmt::print("{}", oss.str());
}
// Outputs:
//
// 1 2 14 2 2 14 3 2 14
// 2 4 28 4 4 70 5 5 84
[Demo]
Can be done as below:
MULTIPLIER = 14
input_file_name = "numbers_in.txt"
output_file_name = "numbers_out.txt"
with open(input_file_name, 'r') as f:
lines = f.readlines()
with open(output_file_name, 'w+') as f:
for line in lines:
new_line = ""
for i, x in enumerate(line.strip().split(" ")):
if (i+1)%3 == 0:
new_line += str(int(x)*MULTIPLIER) + " "
else:
new_line += x + " "
f.writelines(new_line + "\n")
# numbers_in.txt:
# 1 2 1 2 2 1 3 2 1
# 1 3 1 3 3 1 4 3 1
# 1 4 1 4 4 1 5 4 1
# numbers_out.txt:
# 1 2 14 2 2 14 3 2 14
# 1 3 14 3 3 14 4 3 14
# 1 4 14 4 4 14 5 4 14

csv.writer.writerow is separating my data improperly [duplicate]

This question already has answers here:
Why does csvwriter.writerow() put a comma after each character?
(4 answers)
Closed 3 years ago.
I made a code to delete certain columns (column 0,1,2,3,4,5,6) from a bunch of .csv dataset.
import csv
import os
data_path = "C:/Users/hhs/dataset/PSP/Upper/"
save_path = "C:/Users/hhs/Refined/PSP/Upper/"
for filename in os.listdir(data_path):
data_full_path = os.path.join(data_path, filename)
save_full_path = os.path.join(save_path, filename)
with open(data_full_path,"r") as source:
rdr= csv.reader(source)
with open(save_full_path,"w") as result:
wtr= csv.writer( result )
for line in rdr:
wtr.writerow((line[7]))
One of original dataset looks like this
Normals:0 Normals:1 Normals:2 Points:0 Points:1 Points:2 area cp
-0.69498 0.62377 0.34311 28.829 3.4728 -0.947160 0.25877 -0.094391
-0.73130 0.54405 0.39395 30.082 4.9111 -0.785480 0.23499 -0.261690
-0.74539 0.49691 0.42782 31.210 6.4629 -0.626470 0.20982 -0.330730
-0.75245 0.48322 0.42985 32.359 8.0473 -0.455080 0.19428 -0.221340
-0.77195 0.46254 0.41825 33.546 9.7963 -0.270990 0.19849 -0.086641
-0.78905 0.45241 0.39759 34.737 11.6860 -0.079976 0.18456 -0.022418
-0.79771 0.45422 0.37858 35.915 13.5840 0.118160 0.17047 0.026102
-0.80090 0.45479 0.37198 37.092 15.4810 0.330220 0.15594 0.154880
-0.80260 0.45516 0.36904 38.268 17.3770 0.550100 0.14279 0.316590
-0.80504 0.45774 0.36178 39.444 19.2740 0.769020 0.12996 0.475640
-0.80747 0.46024 0.35383 40.620 21.1710 0.982050 0.11692 0.624090
The result does have the last column as "cp" values, which is what I want.
However, the result looks very weird, every digit is located at different columns.
c p
- 0 . 0 9 4 3 9
- 0 . 2 6 1 6 9
- 0 . 3 3 0 7 3
- 0 . 2 2 1 3 4
- 0 . 0 8 6 6 4
- 0 . 0 2 2 4 1
0 . 0 2 6 1 0 2
0 . 1 5 4 8 8
0 . 3 1 6 5 9
0 . 4 7 5 6 4
0 . 6 2 4 0 9
.
.
.
Why the result looks like this?
Fix two issues in the second loop
add newline & delimiter
CSV file written with Python has blank lines between each row
change (line[7]) to [line[7]]
Why does csvwriter.writerow() put a comma after each character?
with open(save_full_path, "w", newline='') as result:
wtr= csv.writer(result, delimiter=',')
for line in rdr:
wtr.writerow([line[7]])

select some lines as a group then give each line same label in same group in txt file

I want to extract some lines of txt file between two patterns as different groups and give each line a same number as a label in each group.(last column input the group label ) for example, have data:
==g1
a 1 2 3 4
b 2 1 2 3
~~
==g2
c 2...
d 1...
...
I want to get output like
a 1 2 3 4 1
b 2 1 2 3 1
c 2 ... 2
d 1 ... 2
...
I use python 3.7
with open('input.txt') as infile, open('output.txt', 'w') as outfile:
copy = False
for line in infile:
#ng as the new label in each group
ng=0
if line.strip() == "start":
copy = True
continue
elif line.startswith ('with'):
copy = False
continue
elif copy:
ng=ng+1
line=line.rstrip('\n') +'\t'+ str(ng)+'\n'
outfile.write(line)

Reading and Rearranging data in Python

I have a very large (10GB) data file of the form:
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
5 2 3 4
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
5 2 3 4
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
5 2 3 4
I would like to read just the B column of the file and rearrange it in the form
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
it takes very long time to read the data and rearrange them, could some give me a very efficient method to do this in python
This is the code that I used for my MATLAB for processing the data
fid = fopen('hpts.out', 'r'); % Open text file
InputText = textscan(fid, '%s', 1, 'delimiter', '\n'); % Read header lines
HeaderLines = InputText{1}
A = textscan(fid,'%n %n %n %n %n', 'HeaderLines', 1);
t = A{1};
vz = A{4};
L = 1;
for j = 1:1:5000
for i=1:1:14999
V1(j,i) = vz(L);
L = L +1 ;
end
end
imagesc(V1);
You can us Python for this, but I think this is exactly the sort of job where a shell script is better, since it's a lot shorter & easier:
$ tail -n+2 input_file | awk '{print $2}' | tr '\n' ' ' | fmt -w 10
tail removes the first (header) line;
awk gets the second column;
tr puts it on a single line;
and fmt makes lines a maximum of 10 characters.
Since this is a streaming operation, it should not take a lot of memory, and most performance for this is limited to just disk I/O (although shell pipes also introduce some overhead).
Example:
$ tail -n+2 input_file | awk '{print $2}' | tr '\n' ' ' | fmt -w 10
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
This streaming approach should perform well:
from itertools import izip_longest
with open('yourfile', 'r') as fin, open('newfile', 'w') as fout:
# discard header row
next(fin)
# make generator for second column
col2values = (line.split()[1] for line in fin)
# zip into groups of five.
# fillvalue used to make a partial last row look good.
for row in izip_longest(*[col2values ]*5, fillvalue=''):
fout.write(' '.join(row) + '\n')
Dont't read the whole file at one time! Read the file line by line:
def read_data():
with open("filename.txt", 'r') as f:
for line in f:
yield line.split()[1]
with open('file_to_save.txt', 'w') as f:
for i, data in enumerate(read_data()):
f.write(data)
if i % 5 == 0:
f.write('\n')

csv writer is adding delimiters in each words..

I wrote some throw away code which takes a list of ids checks for duplicates and writes a list of ids. Nothing fancy just a small part of what I am working on..
I get this weird output. It looks to me like the delimiter is adding spaces where it shouldn't. Is delimiter just between words or line ? Very confused.
r s 9 3 6 4 5 5 4
r s 9 3 1 1 1 7 1
r s 7 8 9 0 2 0 2 5
r s 7 6 5 2 3 3 1
r s 7 2 1 0 4 8
r s 6 9 8 3 2 6 7
r s 6 4 6 5 6 5 7
r s 6 2 9 2 4 2
r s 6 1 9 9 1 1 5 6
Code:
__author__ = 'prumac'
import csv
allsnps = []
def open_file():
ifile = open('mirnaduplicates.csv', "rb")
print "open file"
return csv.reader(ifile)
def write_file():
with open('mirnaduplicatesremoved.csv', 'w') as fp:
a = csv.writer(fp, delimiter=' ')
a.writerows(allsnps)
def checksnp(name):
if name in allsnps:
pass
else:
allsnps.append(name)
def mymain():
reader = open_file()
for r in reader:
checksnp(r[0])
print len(allsnps)
print allsnps
write_file()
mymain()
.writerows() expects a list of lists. Instead, you are handing it a list of strings, and these are treated as sequences of characters.
Put each string in a tuple or list:
a.writerows([val] for val in allsnps)
Note that you could do this all a little more efficiently:
with open('mirnaduplicates.csv', "rb") as ifile, \
open('mirnaduplicatesremoved.csv', 'wb') as fp:
reader = csv.reader(ifile)
writer = csv.writer(fp, delimiter=' ')
seen = set()
seen_add = seen.add
writer.writerows(row for row in reader if row[0] not in seen and not seen_add(row[0]))

Categories

Resources