How to add line numbers to multiline string - python

I have a multiline string like the following:
txt = """
some text
on several
lines
"""
How can I print this text such that each line starts with a line number?

I usually use a regex substitution with a function attribute:
def repl(m):
repl.cnt+=1
return f'{repl.cnt:03d}: '
repl.cnt=0
print(re.sub(r'(?m)^', repl, txt))
Prints:
001:
002: some text
003:
004: on several
005:
006: lines
007:
Which allows you to easily number only lines that have text:
def repl(m):
if m.group(0).strip():
repl.cnt+=1
return f'{repl.cnt:03d}: {m.group(0)}'
else:
return '(blank)'
repl.cnt=0
print(re.sub(r'(?m)^.*$', repl, txt))
Prints:
(blank)
001: some text
(blank)
002: on several
(blank)
003: lines
(blank)

This can be done with a combination of split("\n"), join(\n), enumerate and a list comprehension:
def insert_line_numbers(txt):
return "\n".join([f"{n+1:03d} {line}" for n, line in enumerate(txt.split("\n"))])
print(insert_line_numbers(txt))
It produces the output:
001
002 some text
003
004 on several
005
006 lines
007

I did it like this. Simply break the text into lines. Add a line number. Use format to print int line number and the string. 2 place holders for . and a space after the .
count = 1
txt = '''Text
on
several
lines'''
txt = txt.splitlines()
for t in txt:
print("{}{}{}{}".format(count,"."," ",t))
count += 1
Output
1. Text
2. on
3. several
4. lines

for n, i in enumerate(txt.rstrip().split('\n')):
print(n, i)
0
1 some text
2
3 on several
4
5 lines

Related

How to take first five values on a note document and put into a list (Python)

I have a note document (.txt) that has numeric values. There is one number per line and there is about 20 lines. How can I grab the first 5 numbers from the file and put it into a list in Python? Simple question but I can't seem to find out how to do it. Also, is there a way to check how many lines are present in the file and receive a value for the length of the file (no of rows).
something along the lines of this should work
pd.read_csv("sample.txt", nrows=5).values.tolist()
import re
with open("data.txt", "r") as f:
content = f.read()
numbers = re.findall(r"\-?\d+\.?\d*", content)[:5]
numbers = [float(x) for x in numbers]
print(numbers)
number_of_lines = len(content.split("\n"))
print(number_of_lines)
Input:
abc 1) def, abc
2, ghi
jkl +3
mno 5.555 pq.r
tuv -15.0
0.0 wxyz
the end
Output:
[1.0, 2.0, 3.0, 5.555, -15.0]
7

Python print .psl format without quotes and commas

I am working on a linux system using python3 with a file in .psl format common to genetics. This is a tab separated file that contains some cells with comma separated values. An small example file with some of the features of a .psl is below.
input.psl
1 2 3 x read1 8,9, 2001,2002,
1 2 3 mt read2 8,9,10 3001,3002,3003
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
I need to filter this file to extract only regions of interest. Here, I extract only rows with a value of 9 in the fourth column.
import csv
def read_psl_transcripts():
psl_transcripts = []
with open("input.psl") as input_psl:
csv_reader = csv.reader(input_psl, delimiter='\t')
for line in input_psl:
#Extract only rows matching chromosome of interest
if '9' == line[3]:
psl_transcripts.append(line)
return psl_transcripts
I then need to be able to print or write these selected lines in a tab delimited format matching the format of the input file with no additional quotes or commas added. I cant seem to get this part right and additional brackets, quotes and commas are always added. Below is an attempt using print().
outF = open("output.psl", "w")
for line in read_psl_transcripts():
print(str(line).strip('"\''), sep='\t')
Any help is much appreciated. Below is the desired output.
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
You might be able to solve you problem with a simple awk statement.
awk '$4 == 9' input.pls > output.pls
But with python you could solve it like this:
write_pls = open("output.pls", "w")
with open("input.pls") as file:
for line in file:
splitted_line = line.split()
if splitted_line[3] == '9':
out_line = '\t'.join(splitted_line)
write_pls.write(out_line + "\n")
write_pls.close()

Separate lines in Python

I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?
The example of .txt:
1234 0 my name is
6789 2 I am coming
2346 1 are you new?
1234 2 Who are you?
1234 1 how's going on?
And I have keep them like this:
----1----
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
----2----
2346 1 are you new?
----3-----
6789 2 I am coming
What I've tried so far:
inputfile=open('input.txt','r').read()
m_id=[]
p_id=[]
packet_mes=[]
input_file=inputfile.split(" ")
print(input_file)
input_file=line.split()
m_id=[int(x) for x in input_file if x.isdigit()]
p_id=[x for x in input_file if not x.isdigit()]
With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.
You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).
from collections import OrderedDict
o = OrderedDict()
with open('input.txt', 'r') as f:
for line in f:
i, j, k = line.strip().split(None, 2)
o.setdefault(i, []).append([int(i), int(j), k])
print(dict(o))
{'1234': [[1234, 0, 'my name is'],
[1234, 2, 'Who are you?'],
[1234, 1, "how's going on?"]],
'6789': [[6789, 2, 'I am coming']],
'2346': [[2346, 1, 'are you new?']]}
Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.
Maybe you want something like that:
import re
# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
for line in f:
res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
if res:
if not res.group(1) in h:
h[res.group(1)] = []
h[res.group(1)].append((res.group(2), res.group(3)))
# Output result
for i, x in enumerate(sorted(h.keys())):
print("-------- %s -----------" % (i+1))
for y in sorted(h[x]):
print("%s %s %s" % (x, y[0], y[1]))
The result is as follow (add more newlines if you like):
-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming
It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.
Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.
The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.
The program waits output for sorting items.
It's a quite ugly code but it's quite easy to understand.
raw = []
with open("input.txt", "r") as file:
for x in file:
raw.append(x.strip().split(None, 2))
raw = sorted(raw)
title = raw[0][0]
refined = []
cluster = []
for x in raw:
if x[0] == title:
cluster.append(x)
else:
refined.append(cluster)
cluster = []
title = x[0]
cluster.append(x)
refined.append(cluster)
for number, group in enumerate(refined):
print("-"*10+str(number)+"-"*10)
for line in group:
print(*line)

Replacing a string in a file in python

What my text is
$TITLE = XXXX YYYY
1 $SUBTITLE= XXXX YYYY ANSA
2 $LABEL = first label
3 $DISPLACEMENTS
4 $MAGNITUDE-PHASE OUTPUT
5 $SUBCASE ID = 30411
What i want
$TITLE = XXXX YYYY
1 $SUBTITLE= XXXX YYYY ANSA
2 $LABEL = new label
3 $DISPLACEMENTS
4 $MAGNITUDE-PHASE OUTPUT
5 $SUBCASE ID = 30411
The code i am using
import re
fo=open("test5.txt", "r+")
num_lines = sum(1 for line in open('test5.txt'))
count=1
while (count <= num_lines):
line1=fo.readline()
j= line1[17 : 72]
j1=re.findall('\d+', j)
k=map(int,j1)
if (k==[30411]):
count1=count-4
line2=fo.readlines()[count1]
r1=line2[10:72]
r11=str(r1)
r2="new label"
r22=str(r2)
newdata = line2.replace(r11,r22)
f1 = open("output7.txt",'a')
lines=f1.writelines(newdata)
else:
f1 = open("output7.txt",'a')
lines=f1.writelines(line1)
count=count+1
The problem is in the writing of line. Once 30411 is searched and then it has to go 3 lines back and change the label to new one. The new output text should have all the lines same as before except label line. But it is not writing properly. Can anyone help?
Apart from many blood-curdling but noncritical problems, you are calling readlines() in the middle of an iteration using readline(), causing you to read lines not from the beginning of the file but from the current position of the fo handle, i.e. after the line containing 30411.
You need to open the input file again with a separate handle or (better) store the last 4 lines in memory instead of rereading the one you need to change.

Python regular expression for r.findall

I am using findall to separate text.
I started with this expression re.findall(r'(.?)(\$.?\$)' but it doesn't give me the data after the last piece of text found. I missed the '6\n\n'
How do I get the last piece of text?
Here is my python code:
#!/usr/bin/env python
import re
allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6
'''
for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData,flags=re.DOTALL) :
print repr(record)
The output I get for this is:
('\n1\n2\n3 here Some text in here \n', '$file1.txt$', '')
('\n4 Some text in here and more ', '$file2.txt$', '')
('\n5 Some text ', '$file3.txt$', '')
(' here \n', '$file3.txt$', '')
('', '', '\n6\n')
('', '', '')
('', '', '')
I really would like this output:
('\n1\n2\n3 here Some text in here \n', '$file1.txt$')
('\n4 Some text in here and more ', '$file2.txt$')
('\n5 Some text ', '$file3.txt$')
(' here \n', '$file3.txt$')
('\n6\n', '', )
Background info in case you need to see the larger picture.
I case your are interested, I'm re-writing this in python. I have the rest of the code under control. I am just getting too much stuff out of findall.
https://discussions.apple.com/message/21202021#21202021
If I understand correctly from that Apple link you want to do something like:
import re
allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6
'''
def read_file(m):
return open(m.group(1)).read()
# Sloppy matching :D
# print re.sub("\$(.*?)\$", read_file, allData)
# More precise.
print re.sub("\$(file\d+?\.txt)\$", read_file, allData)
EDIT As Oscar suggests make match more precise.
ie. take the filename between $s and read the file for the data and that's what the above would do.
Example output:
1
2
3 here Some text in here
I'am file1.txt
4 Some text in here and more
I'am file2.txt
5 Some text
I'am file3.txt
here
I'am file3.txt
6
Files:
==> file1.txt <==
I'am file1.txt
==> file2.txt <==
I'am file2.txt
==> file3.txt <==
I'am file3.txt
To achieve the output you want you need to restrict your pattern to 2 capture groups. (If you use 3 capture groups, you will have 3 elements in every "record").
You could make the second group optional, that should do the job:
r'([^$]*)(\$.*?\$)?'
Here's one way to solve your substitution problem with findall.
def readfile(name):
with open(name) as f:
return f.read()
r = re.compile(r"\$(.+?)\$|(\$|[^$]+)")
print "".join(readfile(filename) if filename else text
for filename, text in r.findall(allData))
This one is partly solving your problem
import re
allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6
'''
for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
print [ x for x in record if x]
producing output
['1\n2\n3 here Some text in here \n', '$file1.txt$']
['\n4 Some text in here and more ', '$file2.txt$']
['\n5 Some text ', '$file3.txt$']
[' here \n', '$file3.txt$']
['\n6']
[]
Avoid last empty list with
for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
if ([ x for x in record if x] != []):
print [ x for x in record if x]

Categories

Resources