I've created an empty text file, and saved some stuff to it. This is what I saved:
Saish ddd TestUser ForTestUse
There is a space before these words. Anyways, I wanted to know how to read only 1 WORD in the text file using python. This is the code I used:
#Uncommenting the line below the line does literally nothing.
import time
#import mmap, re
print("Loading Data...")
time.sleep(2)
with open("User_Data.txt") as f:
lines = f.read() ##Assume the sample file has 3 lines
first = lines.split(None, 1)[0]
print(first)
print("Type user number 1 - 4 for using different user.")
ans = input('Is the name above correct?(y/1 - 4) ')
if ans == 'y':
print("Ok! You will be called", first)
elif ans == '1':
print("You are already registered to", first)
elif ans == '2':
print('Switching to accounts...')
time.sleep(0.5)
with open("User_Data.txt") as f:
lines = f.read() ##Assume the sample file has 3 lines
second = lines.split(None, 2)[2]
print(second)
#Fix the passord issue! Very important as this is SECURITY!!!
when I run the code, my output is:
Loading Data...
Saish
Type user number 1 - 4 for using different user.
Is the name above correct?(y/1 - 4) 2
Switching to accounts...
TestUser ForTestUse
as you can see, it diplays both "TestUser" and "ForTestUse" while I only want it to display "TestUser".
When you give a limit to split(), all the items from that limit to the end are combined. So if you do
lines = 'Saish ddd TestUser ForTestUse'
split = lines.split(None, 2)
the result is
['Saish', 'ddd', 'TestUser ForTestUse']
If you just want the third word, don't give a limit to split().
second = lines.split()[2]
You can use it directly without passing any None
lines.split()[2]
I understand your passing (None, 2) because you want to get None if there is no value at index 2,
A simple way to check if the index is available in the list
Python 2
2 in zip(*enumerate(lines.split()))[0]
Python 3
2 in list(zip(*enumerate(lines.split())))[0]
Related
Program workflow:
Open "asigra_backup.txt" file and read each line
Search for the exact string: "Errors: " + {any value ranging from 1 - 100}. e.g "Errors: 12"
When a match is found, open a separate .txt file in write&append mode
Write the match found. Example: "Errors: 4"
In addition to above write, append the next 4 lines below the match found in step 3; as that is additional log information
What I've done:
Tested a regular expressions that matches with my sample data on regex101.com
Used list comprehension to find all matches in my test file
Where I need help (please):
Figuring out how to append additional 4 lines of log information below each match string found
CURRENT CODE:
result = [line.split("\n")[0] for line in open('asigra_backup.txt') if re.match('^Errors:\s([1-9]|[1-9][0-9]|100)',line)]
print(result)
CURRENT OUTPUT:
['Errors: 1', 'Errors: 128']
DESIRED OUTPUT:
Errors: 1
Pasta
Fish
Dog
Doctonr
Errors: 128
Lemon
Seasoned
Rhinon
Goat
SAMPLE .TXT FILE
Errors: 1
Pasta
Fish
Dog
Doctonr
Errors: 128
Lemon
Seasoned
Rhinon
Goat
Errors: 0
Rhinon
Cat
Dog
Fish
For those wanting additional clarification, as it may help the next person, this was my final solution:
def errors_to_file(self):
"""
Opens file containing Asigra backup logs, "asigra_backup.txt", and returns a list of all errors within the log.
Uses a regular expression match conditional on each line within the asigra backup log file. Error number range is 1 - 100.
Formats errors log by appending a space every 10th element in the errors log list.txt
Writes formatted error log to a file in current directory: "asigra_errors.txt"
"""
# "asigra_backup.txt" contains log information from the performed backup.
with open('asigra_backup.txt', "r") as f:
lines0 = [line.rstrip() for line in f]
# empty list that is appended with errors found in the log
lines = []
for i, line in enumerate(lines0):
if re.match('^Errors:\s([1-9]|[1-9][0-9]|100)',line):
lines.extend(lines0[i:i+9])
if len(lines) == 0:
print("No errors found")
print("Gracefully exiting")
sys.exit(1)
k = ''
N = 9
formatted_errors = list(chain(*[lines[i : i+N] + [k]
if len(lines[i : i+N]) == N
else lines[i : i+N]
for i in range(0, len(lines), N)]))
with open("asigra_errors.txt", "w") as e:
for i, line in enumerate(formatted_errors):
e.write(f"{line}\n")
Huge thank you to those that answered my question.
Using better regex and re.findall can make it easier. In the following regex, all Errors: and 4 following lines are detected.
import re
regex_matches = re.findall('(?:[\r\n]+|^)((Errors:\s*([1-9][0-9]?|100))(?:[\r\n\s\t]+.*){4})', open('asigra_backup.txt', 'r').read())
open('separate.txt', 'a').write('\n' + '\n'.join([i[0] for i in regex_matches]))
To access error numbers or error lines following lines can use:
error_rows = [i[1] for i in regex_matches]
error_numbers = [i[2] for i in regex_matches]
print(error_rows)
print(error_numbers)
I wrote a code which prints the output as requested. The code will work when Errors: 1 line is added as last line. See the text I have parsed:
data_to_parse = """
Errors: 56
Pasta
Fish
Dog
Doctonr
Errors: 0
Lemon
Seasoned
Rhinon
Goat
Errors: 45
Rhinon
Cat
Dog
Fish
Errors: 34
Rhinon
Cat
Dog
Fish1
Errors: 1
"""
See the code which gives the desired output without using regex. Indices have been used to get desired data.
lines = data_to_parse.splitlines()
errors_indices = []
i = 0
k = 0
for line in lines: # where Errors: are located are found in saved in list errors_indices.
if 'Errors:' in line:
errors_indices.append(i)
i = i+1
#counter = False
while k < len(errors_indices):
counter = False # It is needed to find the indices when Errors: 0 is hit.
for j in range(errors_indices[k-1], errors_indices[k]):
if 'Errors:' in lines[j]:
lines2 = lines[j].split(':')
lines2_val = lines2[1].strip()
if int(lines2_val) != 0:
print(lines[j])
if int(lines2_val) == 0:
counter = True
elif 'Errors:' not in lines[j] and counter == False:
print(lines[j])
k=k+1
I have tried a few times to see if the code is working properly. It looks it gives the requested output properly. See the output when the code is run as:
I'm new to Python and I'm trying to output the length of a list as a single integer, eg:
l1 = ['a', 'b', 'c']
len(l1) = 3
However, it is printing on cmdline with 1s down the page, eg:
1
1
1
1
1
1
etc
How can I get it to just output the number rather than a list of 1s?
(Here's the code:)
def Q3():
from datetime import datetime, timedelta
inputauth = open("auth.log", "r")
authStrings = inputauth.readlines()
failedPass = 'Failed password for'
for line in authStrings:
time = line[7:15]
dateHour = line[0:9]
countAttack1 = []
if time in line and failedPass in line:
if dateHour == 'Feb 3 08':
countAttack1.append(time)
length1 = len(countAttack1)
print(length1)
Ideally, I'd like it to output the number in a print so that I could format it, aka:
print("Attack 1: " + length1)
I think you are looping and ifs are inside a loop. If so, just print the length outside loop scope.
Please share the complete code for a better answer
Well as Syed Abdul Wahab said, the problem is that the "list" is getting recreated each loop. This makes so that the print reports "1", as it is the actual length of the list.
The other problem, repetition of the printng is similar - you are actually printing "each time in the loop".
The solution is then simple: you initialize the list outside the loop; and also report outside the loop.
def Q3():
from datetime import datetime, timedelta
inputauth = open("auth.log", "r")
authStrings = inputauth.readlines()
failedPass = 'Failed password for'
countAttack1 = [] # after this line the countAttack will be empty
for line in authStrings:
time = line[7:15]
dateHour = line[0:9]
if time in line and failedPass in line:
if dateHour == 'Feb 3 08':
countAttack1.append(time)
length1 = len(countAttack1)
print("Attack 1: " + str(length1))
I'd also like to take a bit of time to link you to string formatting While the documentation is complex it will make printing much easier, above print is trnasformed into:
print("Attack 1: {0}".format(length1))
Further analysing the code gives some peculiarities, you check if time is in the line string. - However just a few codelines above you create time from a slice of line - so it will always be inside line. (Except for the edge case where line is not of correct length, but that'll error anyways). So that if statement should be simplified to:
if failedPass in line:
Here is the function that prints the the length:
def print_length():
if time in line and failedPass in line:
if dateHour == 'Feb 3 08':
countAttack1.append(time)
length1 = len(countAttack1)
print(length1)
print_length()
>>>Print length of the List.
So the question basically gives me 19 DNA sequences and wants me to makea basic text table. The first column has to be the sequence ID, the second column the length of the sequence, the third is the number of "A"'s, 4th is "G"'s, 5th is "C", 6th is "T", 7th is %GC, 8th is whether or not it has "TGA" in the sequence. Then I get all these values and write a table to "dna_stats.txt"
Here is my code:
fh = open("dna.fasta","r")
Acount = 0
Ccount = 0
Gcount = 0
Tcount = 0
seq=0
alllines = fh.readlines()
for line in alllines:
if line.startswith(">"):
seq+=1
continue
Acount+=line.count("A")
Ccount+=line.count("C")
Gcount+=line.count("G")
Tcount+=line.count("T")
genomeSize=Acount+Gcount+Ccount+Tcount
percentGC=(Gcount+Ccount)*100.00/genomeSize
print "sequence", seq
print "Length of Sequence",len(line)
print Acount,Ccount,Gcount,Tcount
print "Percent of GC","%.2f"%(percentGC)
if "TGA" in line:
print "Yes"
else:
print "No"
fh2 = open("dna_stats.txt","w")
for line in alllines:
splitlines = line.split()
lenstr=str(len(line))
seqstr = str(seq)
fh2.write(seqstr+"\t"+lenstr+"\n")
I found that you have to convert the variables into strings. I have all of the values calculated correctly when I print them out in the terminal. However, I keep getting only 19 for the first column, when it should go 1,2,3,4,5,etc. to represent all of the sequences. I tried it with the other variables and it just got the total amounts of the whole file. I started trying to make the table but have not finished it.
So my biggest issue is that I don't know how to get the values for the variables for each specific line.
I am new to python and programming in general so any tips or tricks or anything at all will really help.
I am using python version 2.7
Well, your biggest issue:
for line in alllines: #1
...
fh2 = open("dna_stats.txt","w")
for line in alllines: #2
....
Indentation matters. This says "for every line (#1), open a file and then loop over every line again(#2)..."
De-indent those things.
This puts the info in a dictionary as you go and allows for DNA sequences to go over multiple lines
from __future__ import division # ensure things like 1/2 is 0.5 rather than 0
from collections import defaultdict
fh = open("dna.fasta","r")
alllines = fh.readlines()
fh2 = open("dna_stats.txt","w")
seq=0
data = dict()
for line in alllines:
if line.startswith(">"):
seq+=1
data[seq]=defaultdict(int) #default value will be zero if key is not present hence we can do +=1 without originally initializing to zero
data[seq]['seq']=seq
previous_line_end = "" #TGA might be split accross line
continue
data[seq]['Acount']+=line.count("A")
data[seq]['Ccount']+=line.count("C")
data[seq]['Gcount']+=line.count("G")
data[seq]['Tcount']+=line.count("T")
data[seq]['genomeSize']+=data[seq]['Acount']+data[seq]['Gcount']+data[seq]['Ccount']+data[seq]['Tcount']
line_over = previous_line_end + line[:3]
data[seq]['hasTGA']= data[seq]['hasTGA'] or ("TGA" in line) or (TGA in line_over)
previous_line_end = str.strip(line[-4:]) #save previous_line_end for next line removing new line character.
for seq in data.keys():
data[seq]['percentGC']=(data[seq]['Gcount']+data[seq]['Ccount'])*100.00/data[seq]['genomeSize']
s = '%(seq)d, %(genomeSize)d, %(Acount)d, %(Ccount)d, %(Tcount)d, %(Tcount)d, %(percentGC).2f, %(hasTGA)s'
fh2.write(s % data[seq])
fh.close()
fh2.close()
I have 15 lines in a log file and i want to read the 4th and 10 th line for example through python and display them on output saying this string is found :
abc
def
aaa
aaa
aasd
dsfsfs
dssfsd
sdfsds
sfdsf
ssddfs
sdsf
f
dsf
s
d
please suggest through code how to achieve this in python .
just to elaborate more on this example the first (string or line is unique) and can be found easily in logfile the next String B comes within 40 lines of the first one but this one occurs at lots of places in the log file so i need to read this string withing the first 40 lines after reading string A and print the same that these strings were found.
Also I cant use with command of python as this gives me errors like 'with' will become a reserved keyword in Python 2.6. I am using Python 2.5
You can use this:
fp = open("file")
for i, line in enumerate(fp):
if i == 3:
print line
elif i == 9:
print line
break
fp.close()
def bar(start,end,search_term):
with open("foo.txt") as fil:
if search_term in fil.readlines()[start,end]:
print search_term + " has found"
>>>bar(4, 10, "dsfsfs")
"dsfsfs has found"
#list of random characters
from random import randint
a = list(chr(randint(0,100)) for x in xrange(100))
#look for this
lookfor = 'b'
for element in xrange(100):
if lookfor==a[element]:
print a[element],'on',element
#b on 33
#b on 34
is one easy to read and simple way to do it. Can you give part of your log file as an example? There are other ways that may work better :).
after edits by author:
The easiest thing you can do then is:
looking_for = 'findthis' i = 1 for line in open('filename.txt','r'):
if looking_for == line:
print i, line
i+=1
it's efficient and easy :)
I am trying to write a Python program that reads each line from an infile. This infile is a list of dates. I want to test each line with a function isValid(), which returns true if the date is valid, and false if it is not. If the date is valid, it is written into an output file. If it is not, invalid is written into the output file. I have the function, and all I want to know is the best way to test each line with the function. I know this should be done with a loop, I'm just uncertain how to set up the loop to test each line in the file one-by-one.
Edit: I now have a program that basically works. However, I am getting incorrect output to the output file. Perhaps someone will be able to explain why.
Ok, I now have a program that basically works, but I'm getting strange results in the output file. Hopefully those with Python 3 experience can help.
def main():
datefile = input("Enter filename: ")
t = open(datefile, "r")
c = t.readlines()
ofile = input("Enter filename: ")
o = open(ofile, "w")
for line in c:
b = line.split("/")
e = b[0]
f = b[1]
g = b[2]
text = str(e) + " " + str(f) + ", " + str(g)
text2 = "The date " + text + " is invalid"
if isValid(e,f,g) == True:
o.write(text)
else:
o.write(text2)
def isValid(m, d, y):
if m == 1 or m == 3 or m == 5 or m == 7 or m == 8 or m == 10 or m == 12:
if d is range(1, 31):
return True
elif m == 2:
if d is range(1,28):
return True
elif m == 4 or m == 6 or m == 9 or m == 11:
if d is range(1,30):
return True
else:
return False
This is the output I'm getting.
The date 5 19, 1998
is invalidThe date 7 21, 1984
is invalidThe date 12 7, 1862
is invalidThe date 13 4, 2000
is invalidThe date 11 40, 1460
is invalidThe date 5 7, 1970
is invalidThe date 8 31, 2001
is invalidThe date 6 26, 1800
is invalidThe date 3 32, 400
is invalidThe date 1 1, 1111
is invalid
In the most recent versions of Python you can use the context management features that are implicit for files:
results = list()
with open(some_file) as f:
for line in f:
if isValid(line, date):
results.append(line)
... or even more tersely with a list comprehension:
with open(some_file) as f:
results = [line for line in f if isValid(line, date)]
For progressively older versions of Python you might need to explicitly open and close the file (with simple implicit iteration over the file for line in file:) or add more explicit iteration over the file (f.readline() or f.readlines() (plural) depending on whether you want to "slurp" in the entire file (with the memory overhead implications of that) or iterate line-by-line).
Also note that you may wish to strip the trailing newlines off these file contents (perhaps by calling line.rstrip('\n') --- or possibly just line.strip() if you want to eliminate all leading and trailing whitespace from each line).
(Edit based on additional comment to previous answer):
The function signature isValid(m,d,y) suggests that you're passing a data to this function (month, day, year) but that doesn't make sense given that you must also, somehow, pass in the data to be validated (a line of text, a string, etc).
To help you further you'll have to provide more information (preferable the source or a relevant portion of the source to this "isValid()" function.
In my initial answer I was assuming that your "isValid()" function was merely scanning for any valid date in its single argument. I've modified my code examples to show how one might pass a specific date, as a single argument, to a function which used this calling signature: "isValid(somedata, some_date)."
with open(fname) as f:
for line in f.readlines():
test(line)