This question already has answers here:
Python: Searching for text between lines with keywords
(2 answers)
Closed last month.
I want to read particular lines from the text file. E.g. all the contents between "This contents information"
I have created a script to perform the task, but it's not a good method. Are there any better way to do it?
readText=open("test.txt","r")
wanted_lines = [4,5,6,7]
count = 1
with open('test.txt', 'r') as infile:
for line in infile:
line = line.strip()
if count in wanted_lines:
print(line)
else:
pass
count += 1
You can compare each line to the sentinel, start outputting once it matches, and stop outputting once it matches again:
with open('test.txt') as infile:
for output in False, True:
for line in map(str.rstrip, infile):
if line == 'This contents information':
break
if output:
print(line)
Demo: https://replit.com/#blhsing/TroubledMysteriousMonitors
You could consider reading the entire text file into a string, and then using a regular expression to extract the contents you want:
with open('test.txt', 'r') as file:
data = file.read()
contents = re.search(r'^This contents information\n(.*?)\nThis contents information\b', inp, flags=re.M|re.S).group(1)
print(contents)
This prints:
City:LK
Country:LL
Postcode:123
You can use split, with "This contents information" as the delimiter.
In the example above, the file will be split into 3 sections, of which we only need to grab the second one (index=1). You can then use .strip() to remove unwanted space.
Code:
with open('test.txt', 'r') as infile:
text = infile.read()
required_info = text.split("This contents information")[1].strip()
print(required_info)
Output:
City:LK
Country: LL
Postcode:123
Instead of prewriting the line numbers, just have a conditional statement that checks for the data you want.
readText=open("test.txt","r")
with open('test.txt', 'r') as infile:
for line in infile:
line = line.strip()
if line == "text to look for":
printline = True
elif line == "text to end content":
printline = False
elif printline == True:
print(line)
I think the best method would be to use regex.
import re
text=""
with open('test.txt', 'r') as infile:
text = infile.read()
# Don't forget to replace here with the word you want to search among what you want to find.
# This contents information(.*?)\nThis contents information
# this regex finds everything between these two words
# example: 'test 123asda test' -> test(.*?)test => ' 123asda '
regex = re.compile(r'This contents information(.*?)\nThis contents information', re.DOTALL)
matches = [m.groups()[0] for m in regex.finditer(text)]
for m in matches:
print(f'{m.strip()}')
import re
with open("file.txt","r") as f:
data =f.readlines()
string="".join(data) #join each line into one string
ls=re.split(r"(\n*?)This contents information\n",string) #split the string where the regex we specified.
for i in range(len(ls)): #print the list. Ohoo you got the answer
print(ls[i])
I have a data file and I want to delete first 3 character of each word in each line
Here is the example of my file:
input
"13X5106,18C2295,17C1462,17X4893,14X4215,16C3729,14C1026,END"
"17C2308,14C1030,15C904,20C1602,17C1017,18C1030,END"
"13C2369,20C1505,18X4245,15C1224,14C1031,12C885,17C936,END"
"11C3080,13C4123,16C1180,14C1141,15C932,18C1467,END"
output
"5106,2295,1462,4893,4215,3729,1026,END"
"2308,1030,904,1602,1017,1030,END"
"2369,1505,4245,1224,1031,885,936,END"
"3080,4123,1180,1141,932,1467,END"
I tried to code but the output is not shown the way I want.
file1 = open('D:\pythonProject\block1.txt','r')
data = file1.read()
remove_char = [sub[3:] for sub in data]
print(remove_char)
If you use file1.readlines(), then you will need to split by comma. The only problem is that it may introduce an end-of-line character at the end. This is because of your END string at the end of each line. But this is easy to get rid of as shown below:
Code:
file1 = open('D:\pythonProject\block1.txt','r')
remove_char = [[s[3:] for s in sub.split(',')] for sub in file1.readlines()]
for the_list in remove_char:
print(the_list[0:-1])
Output:
['5106', '2295', '1462', '4893', '4215', '3729', '1026']
['2308', '1030', '904', '1602', '1017', '1030']
['2369', '1505', '4245', '1224', '1031', '885', '936']
['3080', '4123', '1180', '1141', '932', '1467']
I read the file with f.readlines and got rid of the " on each line.
Then each word is split by , and processed as word[3:].
with open("...", "r") as f:
lines = f.readlines()
lines = map(lambda x: x.replace('"',"").strip("\n").split(","), lines)
res = []
for line in lines:
new_line = []
for word in line:
if word != "END":
word = word[3:]
new_line.append(word)
res.append(",".join(new_line))
res = "\n".join(res)
print(res)
# Output
"""
5106,2295,1462,4893,4215,3729,1026,END
2308,1030,904,1602,1017,1030,END
2369,1505,4245,1224,1031,885,936,END
3080,4123,1180,1141,932,1467,END
"""
You can try this to print each line in for loop:
file1 = open('D:\pythonProject\block1.txt')
data = file1.readlines()
for sub in data:
line = [j[3:] for i in [eval(sub)] for j in i.split(',')[:-1]]+[eval(sub)[-3:]]
remove_char = f'"{chr(44).join(line)}"'
print(remove_char)
Or generator expression:
remove_char = '\n'.join('"'+chr(44).join(j[3:] for i in [eval(s)]
for j in i.split(chr(44))[:-1])+','+chr(44).join([eval(s)[-3:]])+'"'
for s in open('D:\pythonProject\block1.txt').readlines())
print(remove_char)
Output:
"5106,2295,1462,4893,4215,3729,1026,END"
"2308,1030,904,1602,1017,1030,END"
"2369,1505,4245,1224,1031,885,936,END"
"3080,4123,1180,1141,932,1467,END"
Here is a quick solution using a list comprehension :
data = ["13X5106", "18C2295"] # this is a sample list of strings
print([code[3:] for code in data if code != "END"])
This will print the same list with all strings with the first three chars discarded skipping the "END" string:
['5106', '2295']
Hello I have line like below in a file
I want to convert Text :0 to 8978 as a single string. And same for other part i.e Text:1 to 8978.
Text:0
6786993cc89 70hgsksgoop 869368
7897909086h fhsi799hjdkdh 099h
Gsjdh768hhsj dg9978hhjh98 8978
Text:1
8786993cc89 70hgsksgoop 869368
7897909086h fhsi799hjdkdh 099h
Gsjdh768hhsj dg9978hhjh98 8978
I am getting output as
6
7
G
8
7
G
But i want output as from string one and from string two as
6
8
Code is :
file = open ('tem.txt','r')
lines = file.readlines()
print(lines)
for line in lines:
line=line.strip()
linex=line.replace(' ','')
print(linex)
print (linex[0])
I'm not sure about what exact do you need, so:
#1. If need only print the first number (6), I think your code is right.
#2. If you need to print the first part of string(before "space"), it can help you:
line="6786993cc8970hgsksgoop869368 7897909086hfhsi799hjdkdh099h Gsjdh768hhsjdg9978hhjh988978"
print(line[0])
print(line.split(' ')[0])
EDIT
To read a file....
file = open('file.txt', 'r')
Lines = file.readlines()
file.close()
for line in Lines:
print(line.split(' ')[0])
New EDIT
First you need to format your file to after that get the first element. Try this please:
file = open ('tem.txt','r')
lines = file.readlines()
file.close()
linesArray = []
lineTemp = ""
for line in lines:
if 'Text' in line:
if lineTemp:
linesArray.append(lineTemp)
lineTemp = ""
else:
lineTemp += line.strip()
linesArray.append(lineTemp)
for newline in linesArray:
print(newline.split(' ')[0][0])
This should work only if you want to view the first character. Essentially, this code will read your text file, convert multiple lines in the text file to one single string and print out the required first character.
with open(r'tem.txt', 'r') as f:
data = f.readlines()
line = ''.join(data)
print(line[0])
EDITED RESPONSE
Try using regex. Hope this helps.
import re
pattern = re.compile(r'(Text:[0-9]+\s)+')
with open(r'tem.txt', 'r') as f:
data = f.readlines()
data = [i for i in data if len(i.strip())>0]
line = ' '.join([i.strip() for i in data if len(i)>0]).strip()
occurences = re.findall(pattern, line)
for i in occurences:
match_i = re.search(i, line)
start = match_i.end()
print(line[start])
I've learned that we can easily remove blank lined in a file or remove blanks for each string line, but how about remove all blanks at the end of each line in a file ?
One way should be processing each line for a file, like:
with open(file) as f:
for line in f:
store line.strip()
Is this the only way to complete the task ?
Possibly the ugliest implementation possible but heres what I just scratched up :0
def strip_str(string):
last_ind = 0
split_string = string.split(' ')
for ind, word in enumerate(split_string):
if word == '\n':
return ''.join([split_string[0]] + [ ' {} '.format(x) for x in split_string[1:last_ind]])
last_ind += 1
Don't know if these count as different ways of accomplishing the task. The first is really just a variation on what you have. The second does the whole file at once, rather than line-by-line.
Map that calls the 'rstrip' method on each line of the file.
import operator
with open(filename) as f:
#basically the same as (line.rstrip() for line in f)
for line in map(operator.methodcaller('rstrip'), f)):
# do something with the line
read the whole file and use re.sub():
import re
with open(filename) as f:
text = f.read()
text = re.sub(r"\s+(?=\n)", "", text)
You just want to remove spaces, another solution would be...
line.replace(" ", "")
Good to remove white spaces.
I would like to print the total empty lines using python. I have been trying to print using:
f = open('file.txt','r')
for line in f:
if (line.split()) == 0:
but not able to get proper output
I have been trying to print it.. it does print the value as 0.. not sure what wrong with code..
print "\nblank lines are",(sum(line.isspace() for line in fname))
it printing as:
blank lines are 0
There are 7 lines in the file.
There are 46 characters in the file.
There are 8 words in the file.
Since the empty string is a falsy value, you may use .strip():
for line in f:
if not line.strip():
....
The above ignores lines with only whitespaces.
If you want completely empty lines you may want to use this instead:
if line in ['\r\n', '\n']:
...
Please use a context manager (with statement) to open files:
with open('file.txt') as f:
print(sum(line.isspace() for line in f))
line.isspace() returns True (== 1) if line doesn't have any non-whitespace characters, and False (== 0) otherwise. Therefore, sum(line.isspace() for line in f) returns the number of lines that are considered empty.
line.split() always returns a list. Both
if line.split() == []:
and
if not line.split():
would work.
FILE_NAME = 'file.txt'
empty_line_count = 0
with open(FILE_NAME,'r') as fh:
for line in fh:
# The split method will split the word into list. if the line is
# empty the split will return an empty list. ' == [] ' this will
# check the list is empty or not.
if line.split() == []:
empty_line_count += 1
print('Empty Line Count : ' , empty_line_count)