Split raw data in python

Split raw data in python - python

I want to cut the raw data from txt file by python. The raw data like this:
-0.156 200
-0.157 300
-0.158 400
-0.156 201
-0.157 305
-0.158 403
-0.156 199
-0.157 308
-0.158 401
I expect to extract the file to many txt file like this.
-0.156 200
-0.157 300
-0.158 400
-0.156 201
-0.157 305
-0.158 403
-0.156 199
-0.157 308
-0.158 401
Would you please help me?

This splits the data into files with three entries in each file
READ_FROM = 'my_file.txt'
ENTRIES_PER_FILE = 3
with open(READ_FROM) as f:
data = f.read().splitlines()
i = 1
n = 1
for line in data:
with open(f'new_file_{n}.txt', 'a') as g:
g.write(line + '\n')
i += 1
if i > ENTRIES_PER_FILE:
i = 1
n += 1
new_file_1.txt
-0.156 200
-0.157 300
-0.158 400
new_file_2.txt
-0.156 201
-0.157 305
-0.158 403
new_file_3.txt
-0.156 199
-0.157 308
-0.158 401

If you have spaces between your lines and you want spaces between your lines in your new file this will work:
with open('path/to/file.txt', 'r') as file:
lines = file.readlines()
cleaned_lines = [line.strip() for line in lines if len(line.strip()) > 0]
num_lines_per_file = 3
for num in range(0, len(cleaned_lines), num_lines_per_file):
with open(f'{num}.txt', 'w') as out_file:
for line in cleaned_lines[num:num + num_lines_per_file]:
out_file.write(f'{line}\n\n')

Related

Compare a file line by line and for those lines that meet the given requirement, print them

I have a txt file with content:
577 181 619 216
603 175 630 202
651 180 681 202
661 152 676 179
604 176 630 204
605 177 632 202
I want to read each line of this file and compare each line with one another and if for e.g. line i - line j <= 3 then remove that line and output only one between those lines.
For above content I want the output as:
577 181 619 216
603 175 630 202
651 180 681 202
661 152 676 179
In this case second line 603 175 630 202 falls under above condition so other 2 lines:5 and 6 are removed and only line 2 is written to output as given above.
f1 = open("result.txt", "r")
f2 = open("final.txt", "w" )
for line1 in f1:
for line2 in f1:
if each number line2 - line1 <= 3:
#remove one of those line and write the remaining line to new file
#f2.write(lines)
f1.close()
f2.close()
For example if you look at line 2, 5 and 6, each adjacent number in each line, the difference between is less then 3 i.e For line 2 and 5, the first element are 603 and 604 ( 603 -604 =1 i.e less then 3) and the second element 175 - 176 =1, 3rd element 630 -630 =0 and 4th element 202 - 204 = 2 i.e less then 3, all this falls under the given condition and hence for 1st and 5th line only one line is enough

For starters, you need to convert the lines into the numbers.
With that, find absolute difference across the lines.
for i, line1 in enumerate(f1);
nums1 = list(map(int, line1.strip().split()))
for j, line2 in enumerate(f1);
if j <= i: # skip repeating + equal lines
continue
nums2 = list(map(int, line2.strip().split()))
diffs = [abs(nums2[x] - nums1[x]) for x in range(len(nums1))]
print(f'Diff between {nums1} and {nums2}: {diffs}') # for debugging
# check all the differences something like this...
if not all(d <= 3 for d in diffs):
f2.write(line1)

how do i print 6 values per line using the while loop statement in python

here's my code.
i need some help in figuring out the print function.
x=0
while x < 999:
if x%40 ==0:
print(format(x,'7d'),end='')
x = x + 1
the print function i tried using is not working for me.
my code won't print out the values how i would like it too. i want it to print out 6 values per line.
i want it to print like this.
40 80 120 160 200 240
280 320 360 400 440 480
but instead it prints on one straight like please help.

In python, there is always a short way to do something :
arr = [format(x, '7d') for x in range(40, 999, 40)]
print('\n'.join(''.join(arr[i:i+6]) for i in range(0, len(arr), 6)))
Which outputs :
40 80 120 160 200 240
280 320 360 400 440 480
520 560 600 640 680 720
760 800 840 880 920 960

I would use additional variables-counters:
x = c = 0
l = '' # resulting line
while x < 999:
if x % 40 == 0:
l += format(x,'7d')
c += 1
if c % 6 == 0:
l += '\n'
x += 1
print(l)
The output:
0 40 80 120 160 200
240 280 320 360 400 440
480 520 560 600 640 680
720 760 800 840 880 920
960

On every 6th time that you print the number, use print() as follow:
x = 1
counter = 0 while x < 999:
if x % 40 == 0:
counter += 1
print(format(x,'7d'),end='')
if counter == 6:
print()
counter = 0
x = x + 1
Output is similar as follow:
40 80 120 160 200 240
280 320 360 400 440 480
520 560 600 640 680 720
760 800 840 880 920 960

How to extract specific rows from an input text file and print them in python?

I have this text file containing transition lines of FeII emissions. The heads are: n_high, n_low, wavelength, intensity (where n_high and n_low are the upper and lower transitions, starting from
2 --> 1,,,371 --> 1,3 --> 2,,,371 --> 2,,, (and so on till the last chunk) 371 --> 370
The input file looks like:
#n_hi n_lo WL(A) logI
2 1 259811.86 1.158
3 1 149730.41 -2.054
4 1 115894.98 -2.134
5 1 102320.80 -2.389
6 1 53387.13 0.256
7 1 41138.69 -0.277
8 1 35226.70 -1.585
9 1 32068.36 -1.741
10 1 12566.77 2.323
.
.
.
.
369 1 1069.66 1.461
370 1 1065.75 -7.901
371 1 1065.64 -8.011
3 2 353390.47 0.759
4 2 209224.17 -2.390
5 2 168797.89 -2.607
.
.
.
370 369 291200.84 -10.337
371 369 283465.88 -10.436
371 370 10672868.00 -12.012
There are in total 68635 rows.
The task here is that I'd like to select only those specific transitions that are within the wavelength range, say [x1,x2] and print the entire row into another file.
So, what I have been able to do is sort of prepare an algorithm to do that:
for n_low from 1 to 370:
for n_hi from n_low+1 to 371:
if x2 <= wavelength <= x1:
print this row to file
else:
exit
I'd like to execute this using python.

if you want to use standard python, something like the function below should work (assuming the data is tab separated):
def filter_wavelength(x1, x2, input_path, output_path):
with open(output_path, 'w') as output_file:
with open(input_path) as input_file:
for line in input_file:
try:
tokens = line.split('\t')
wave_length = float(tokens[2])
if x1 <= wave_length <= x2:
output_file.write(line)
except Exception, e:
print(str(e))
call it like so:
filter_wavelength(1,2,'path/to/input', 'path/to/output')

You can use powerfull pandas
I use io.StringIO to simulate file with data but you have to use filename instead of f
data = '''2 1 259811.86 1.158
3 1 149730.41 -2.054
4 1 115894.98 -2.134
5 1 102320.80 -2.389
6 1 53387.13 0.256
7 1 41138.69 -0.277
8 1 35226.70 -1.585
9 1 32068.36 -1.741
10 1 12566.77 2.323
369 1 1069.66 1.461
370 1 1065.75 -7.901
371 1 1065.64 -8.011
3 2 353390.47 0.759
4 2 209224.17 -2.390
5 2 168797.89 -2.607
370 369 291200.84 -10.337
371 369 283465.88 -10.436
371 370 10672868.00 -12.012'''
import pandas as pd
# simulate file
import io
f = io.StringIO(data)
# use filename instead of `f`
# it reads data from file using spaces as separators
# and add headers 'n_hi','n_lo', 'WL(A)', 'logI'
df = pd.read_csv(f, names=['n_hi','n_lo', 'WL(A)', 'logI'], sep='\s+')
#print(df)
# get rows which have 1000 < WL < 25000
selected = df[ df['WL(A)'].between(1000, 25000) ]
print(selected)
selected.to_csv('result.csv', sep=' ', header=False)

You don't need to care for n_hi and n_lo if your only concern is WL(A), try this:
def extract_wave_lengths(x1, x2, input_file, output_file):
with open(input_file, 'r') as ifile, open(output_file, 'w') as ofile:
next(ifile) # Skip header
for line in ifile:
parts = line.split()
wave_length = float(parts[2])
if x2 <= wave_length <= x1:
ofile.write(line)
You can then call it this way:
extract_wave_lengths(100000, 5000, "/path/to/input/file", "/path/to/output/file")

extracting/manipulating tab-delimited data with string and integers in python

I have a tab-delimited file with three columns (Name Nr1 Nr2) like the following:
ABC 201 215
DEF 301 320
GHI 350 375
I would like to transfer the last file into the following format:
ABC 201 201 #taking the value from the first value from the second column and continue line by line till the second value in the third line as the following
ABC 202 202
ABC 203 203
......and so on till the third column value
ABC 215 215
DEF 301 301
....and so on till the third column value
DEF 320 320
GHI 350 350
GHI 351 351
GHI 351 351
....
GHI 375 375
is that possible in python?
I would really appreciate your help in this
Thanks in advance

Using the method here: How do I read a file line-by-line into a list?
You can take each line of the file and make it into an array.
lines = tuple(open(filename, 'r'))
As shown here: splitting a string based on tab in the file
You can then split each array value by the tab delimiter.
import re
line_array = re.split(r'\t+', lines[0])

Extract rows based on values from text file using Python

I have a list of information in file A that I want to extract according to the numbering in file B. If given the value 4 and 5, all the 4th column in file A with the value 4 and 5 will be extracted. May I know how can I do this using python? can anyone help me? The code below only extract based on index that have value 4.
with open("B.txt", "rt") as f:
classes = [int(line) for line in f.readlines()]
with open("A.txt", "rt") as f:
lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
lines_all= "".join(lines)
with open("C.txt", "w") as f:
f.write(lines_all)
A.txt
hg17_ct_ER_ER_1003 36 42 1
hg17_ct_ER_ER_1003 109 129 2
hg17_ct_ER_ER_1003 110 130 2
hg17_ct_ER_ER_1003 129 149 2
hg17_ct_ER_ER_1003 130 150 2
hg17_ct_ER_ER_1003 157 163 3
hg17_ct_ER_ER_1003 157 165 3
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
hg17_ct_ER_ER_1003 220 226 6
B.txt
4
5
Desired output
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5

create a set of the lines/numbers from the b file the compare the last element from each row in f1 to the elements in the set:
import csv
with open("a.txt") as f, open("b.txt") as f2:
st = set(line.rstrip() for line in f2)
r = csv.reader(f,delimiter=" ")
data = [row for row in r if row[-1] in st]
print(data)
[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]
set delimiter= to whatever it is or don't set it at all if your file is comma separated.
Or:
with open("a.txt") as f, open("b.txt") as f2:
st = set(line.rstrip() for line in f2)
data = [line.rstrip() for line in f if line.rsplit(None, 1)[1] in st ]
print(data)
['hg17_ct_ER_ER_1003 179 185 4', 'hg17_ct_ER_ER_1003 197 217 5']

with open("B.txt", "r") as target_file:
target = [i.strip() for i in target_file]
with open("A.txt", "r") as data_file:
r = filter(lambda x: x.strip().rsplit(None, 1)[1] in target, data_file)
print "".join(r)
the output:
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
As mentioned by #Padraic, I change the split()[-1] to rsplit(None, 1)[1].

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split raw data in python - python

Related

Compare a file line by line and for those lines that meet the given requirement, print them

how do i print 6 values per line using the while loop statement in python

How to extract specific rows from an input text file and print them in python?

extracting/manipulating tab-delimited data with string and integers in python

Extract rows based on values from text file using Python

Categories

Resources