Open file and read to find first instance of number. Python - python

Im stuck on a problem for an assignment, I need to write a program that opens a file on my computer, and scans that file for the first instance of a number. Once it is found it will return
The first number in , filenm is x
otherwise it will say there is no number in filenm.
My code so far is below:
When i run it no matter what it always says theres no number :(
filenm = raw_input("Enter a file name: ")
datain=open(filenm,"r")
try:
c=datain.read(1)
result = []
for line in datain:
c=datain.read(1)
while int(c) >= 0:
c = datain.read(1)
result.append(c)
except:
pass
if len(result) > 0:
print "The first number is",(" ".join(result))+" . "
else:
print "There is no number in" , filenm + "."

That's all you need:
import re
with open("filename") as f:
for line in f:
s=re.search(r'\d+',line)
if s:
print(s.group())
break

open the file;
read it in a loop char-by-char;
check if the char is digit, print whatever you want;
it means there are no numbers in the file, if end-of-file is reached, print "no numbers"
Use <string>.isdigit() method to check if the given string (a single character in your case) is a digit.

I don't recommend mixing iterating through a file
for line in datain:
with using the read method (or any similar one)
c=datain.read(1)
Just stick with one or the other. Personally, I would go with iterating here.

readlines() method returns a list of all the lines in the file. You can then iterate trough the list of characters in each line:
filenm = raw_input("Enter a file name: ")
datain=open(filenm,"r")
try:
result = []
for line in datain.readlines():
print 'line: ' + line
for each in line:
try:
# attempt casting a number to int
int(each)
# if it worked it add it to the result list
result.append(each)
except:
pass
except:
pass
print result
if len(result) > 0:
print "The first number is",(" ".join(result[0]))+". "
else:
print "There is no number in" , filenm + "."
This will only work with the first number character it finds, not sure if you actually need to extract multi digit numbers.

My thoughts:
1) As others noted, don't mask the exception. It would be better to let it be thrown - at least that way you find out what went wrong, if something went wrong.
2) You do want to read the file a line at a time, using for line in file:. The reason for this is that the numbers you want to read are basically "words" (things separated by spaces), and there isn't a built-in way to read the file a word at a time. file.read(1) reads a single byte (i.e. character, for an ASCII file), and then you have to put them together into words yourself, which is tedious. It's much easier to tell Python to split a line into words. At least, I'm assuming here that you're not supposed to pick out the "20" from "spam ham20eggs 10 lobster thermidor".
.readlines() is somewhat redundant; it's a convenience for making a list of the lines in the file - but we don't need that list; we just need the lines one at a time. There is a function defined called .xreadlines() which does that, but it's deprecated - because we can just use for line in file:. Seriously - just keep it simple.
3) int in Python will not return a negative value if the input is non-numeric. It will throw an exception. Your code does not handle that exception properly, because it would break out of the loop. There is no way to tell Python "keep going from where you threw the exception" - and there shouldn't be, because how is the rest of the code supposed to account for what happened?

Actually your code isn't too far off. There are a number of problems. One big one is that the try/except hides errors from you which might have help you figure things out yourself. Another was that you're reading the file with a mixture of a line at a time (and ignoring its contents entirely) as well as a character at a time.
There also seems to be a misunderstand on your part about what the int() function does when given a non-numeric character string, what it does is raise an exception rather than returning something less than 0. While you could enclose a call to it it in a try/except with the except being specifically for ValueError, in this case however it would be easier to just check ahead of time to see if the character is a digit since all you want to do is continue doing that until one that isn't is seen.
So here's one way your code could be revised that would address the above issues:
import string
filenm = raw_input("Enter a file name: ")
datain = open(filenm,"r")
# find first sequence of one or more digits in file
result = []
while True:
c = datain.read(1)
while c in string.digits: # digit?
result.append(c)
c = datain.read(1)
if c == "" or len(result) > 0: # end-of-file or a number has been found
break # get out of loop
if len(result) > 0:
print "The first number is'", "".join(result) + "'."
else:
print "There is no number in'", filenm + "'."
close(datain)

Related

IndexError: list index out of range. (trying to find and substitute elements of one txt file with another txt file

beginning python programmer here. I am currently stuck with writing a small python script that would open a txt source file, find a specific number in that source file with a regular expression (107.5 in this case) and ultimately replace that 107.5 with a new number. the new number comes from a second txt file which contains 30 numbers. Each time a number has been replaced, the script uses the next number for its replacement. Although the command prompt does seem to print a successfull find and replace, "an IndexError: list index out of range" occurs after the 30th loop...
My hunge is that I somehow have to limit my loop with something like "for i in range x". However I am not sure which list this should be and how I can incorporate that loop limitation in my current code. Any help is much appreciated!
nTemplate = [" "]
output = open(r'C:\Users\Sammy\Downloads\output.txt','rw+')
count = 0
for line in templateImport:
priceValue = re.compile(r'107.5')
if priceValue.sub(pllines[count], line) != None:
priceValue.sub(pllines[count], line)
nTemplate.append(line)
count = count + 1
print('found a match. replaced ' + '107.5 ' + 'with ' + pllines[count] )
print(nTemplate)
else:
nTemplate.append(line)
The IndexError is raised because you are incrementing count in each iteration of the loop, but haven't added an upper limit based on how many values the pllines list actually contains. You should break out of the loop when it reaches len(pllines) in order to avoid the error.
Another issue which you may not have noticed is with your usage of the re.sub() method. It returns a new string with the appropriate replacements, and does not modify the original.
If the pattern doesn't exist in the string, it'll return the original itself. So your nTemplate list probably never had any of the replaced strings appended to it. Unless you need to do some other actions if the pattern was found in the line, you can do away with the if condition (as I have in the example below).
Since the priceValue object is the same for all lines, it can be moved outside the loop.
The following code should work:
nTemplate = [" "]
output = open(r'C:\Users\Sammy\Downloads\output.txt','rw+')
count = 0
priceValue = re.compile(r'107.5')
for line in templateImport:
if count == len(pllines):
break
nTemplate.append(priceValue.sub(pllines[count], line))
count = count + 1
print(nTemplate)

Breaking single line data to multi-line data

Working on project where i am given raw log data and need to parse it out to a readable state, know enought with python to break off all the undeed part and just left with raw data that needs to be split and formated, but cant figure out a way to break it apart if they put multiple records on the same line, which does not always happen.
This is the string value i am getting so far.
*190205*12,6000,0000000,12,6000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000*190206*01,2050,0100550,01,4999,0000000,,
I need to break it out apart so that each line starts with the number value, but since i can assume there will only be 1 or two of them i cant think of a way to do it, and the number of other comma seperate values after it vary so i cant go by length. this is what i am looking to get to use will further operations with data from the above example.
*190205*12,6000,0000000,12,6000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000
*190206*01,2050,0100550,01,4999,0000000,,
txt = "*190205*12,6000,0000000,12,6000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000*190206*01,2050,0100550,01,4999,0000000,,"
output = list()
i = 0
x = txt.split("*")
while i < len(x):
if len(x[i]) == 0:
i += 1
continue
print ("*{0}*{1}".format(x[i],x[i+1]))
output.append("*{0}*{1}".format(x[i],x[i+1]))
i += 2
Use split to tokezine the words between *
Print two constitutive tokens
You can use regex:
([*][0-9]*[*])
You can catch the header part with this and then split according to it.
Same answer as #mujiga but I though a dict might better for further operations
txt = "*190205*12,6000,0000000,12,6000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000*190206*01,2050,0100550,01,4999,0000000,,"
datadict=dict()
i=0
x=txt.split("*")
while i < len(x):
if len(x[i]) == 0:
i += 1
continue
datadict[x[i]]=x[i+1]
i += 2
Adding on to #Ali Nuri Seker's suggestion to use regex, here' a simple one lacking lookarounds (which might actually hurt it in this case)
>>> import re
>>> string = '''*190205*12,6000,0000000,12,6000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000*190206*01,2050,0100550,01,4999,0000000,,'''
>>> print(re.sub(r'([\*][0-9,]+[\*]+[0-9,]+)', r'\n\1', string))
#Output
*190205*12,6000,0000000,12,6000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000,13,2590,0000000,13,7000,0000000,13,7000,0000000
*190206*01,2050,0100550,01,4999,0000000,,

Get a value from a string in python

Program Details:
I am writing a program for python that will need to look through a text file for the line:
Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.
Problem:
Then after the program has found that line, it will then store the line into an array and get the value 19.612545, from f = 19.612545.
Question:
I so far have been able to store the line into an array after I have found it. However I am having trouble as to what to use after I have stored the string to search through the string, and then extract the information from variable f. Does anyone have any suggestions or tips on how to possibly accomplish this?
Depending upon how you want to go at it, CosmicComputer is right to refer you to Regular Expressions. If your syntax is this simple, you could always do something like:
line = 'Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.'
splitByComma=line.split(',')
fValue = splitByComma[1].replace('f= ', '').strip()
print(fValue)
Results in 19.612545 being printed (still a string though).
Split your line by commas, grab the 2nd chunk, and break out the f value. Error checking and conversions left up to you!
Using regular expressions here is maddness. Just use string.find as follows: (where string is the name of the variable the holds your string)
index = string.find('f=')
index = index + 2 //skip over = and space
string = string[index:] //cuts things that you don't need
string = string.split(',') //splits the remaining string delimited by comma
your_value = string[0] //extracts the first field
I know its ugly, but its nothing compared with RE.

Brain going to explode. Need to search external file and return muliple results

I'm not going to lie. I'm trying to do an assignment and I'm being beaten by it.
I need to have python prompt the user to enter a room number, then lookup that room number in a supplied .txt file which has csv [comma-separated values], and then show multiple results if there are any.
I was able to get python to return the first result ok, but then it stops. I got around the csv thing by using a hash command and .split (I would rather read it as a csv although I couldn't get it to work.) I had to edit the external file so instad of the data being seperated by commas it was seperated by semicolons, which is not ideal as I am not supposed to be messing with the supplied file.
Anyhow...
My external file looks like this:
roombookings.txt
6-3-07;L1;MSW001;1
6-3-07;L2;MSP201;1
6-3-07;L3;WEB201;1
6-3-07;L4;WEB101;1
6-3-07;L5;WEB101;1
7-3-07;L1;MSW001;2
7-3-07;L2;MSP201;2
7-3-07;L3;WEB201;2
7-3-07;L4;WEB101;2
7-3-07;L5;WEB101;2
8-3-07;L1;WEB101;1
8-3-07;L2;MSP201;3
Here's what my code looks like:
roomNumber = (input("Enter the room number: "))
def find_details(id2find):
rb_text = open('roombookings.txt', 'r')
each_line = rb_text.readline()
while each_line != '':
s = {}
(s['Date'], s['Room'], s['Course'], s['Stage']) = each_line.split(";")
if id2find == (s['Room']):
rb_text.close()
return(s)
each_line = rb_text.readline()
rb_text.close()
room = find_details(roomNumber)
if room:
print("Date: " + room['Date'])
print("Room: " + room['Room'])
print("Course: " + room['Course'])
print("Stage: " + room['Stage'])
If i run the program, I get prompted for a room number. If I enter, say, "L1"
I get:
Date: 6-3-07
Room: L1
Course: MSW001
Stage: 1
I should get 3 positive matches. I guess my loop is broken? Please help me save my sanity!
Edit. I've tried the solutions here but keeps either crashing the program (I guess I'm not closing the file properly?) or giving me errors. I've seriously been trying to sort this for 2 days and keep in mind I'm at a VERY basic level. I've read multiple textbooks and done many Google searches but it's all still beyond me, I'm afraid. I appreciate the assistance though.
Your code does "return(s)" the first time the "id2find" argument is exactly equal to the room.
If you want multiple matches, you could create an empty list before entering the loop, append every match to the list WITHOUT returning, return the list, and then use a for-loop to print out each match.
First. For iterating over lines in the file use next:
for line in rb_text:
# do something
Second. Your function returns after first match. How can it match more then one record? Maybe you need something like:
def find_details(id2find):
rb_text = open('roombookings.txt', 'r')
for line in rb_text:
s = {}
(s['Date'], s['Room'], s['Course'], s['Stage']) = line.split(";")
if id2find == (s['Room']):
yield s
rb_text.close()
And then:
for room in find_details(roomNumber):
print("Date: " + room['Date'])
print("Room: " + room['Room'])
print("Course: " + room['Course'])
print("Stage: " + room['Stage'])
And yes, you better use some CSV parser.
Your problem is the return(s) in find_details(). As soon as you have found an entry, you are leaving the function. You do not even close the file then. One solution is to use an empty list at the beginning, e.g results = [], and then append all entries which matches your requirements (results.append(s)).

Workarounds when a string is too long for a .join. OverflowError occurs

I'm working through some python problems on pythonchallenge.com to teach myself python and I've hit a roadblock, since the string I am to be using is too large for python to handle. I receive this error:
my-macbook:python owner1$ python singleoccurrence.py
Traceback (most recent call last):
File "singleoccurrence.py", line 32, in <module>
myString = myString.join(line)
OverflowError: join() result is too long for a Python string
What alternatives do I have for this issue? My code looks like such...
#open file testdata.txt
#for each character, check if already exists in array of checked characters
#if so, skip.
#if not, character.count
#if count > 1, repeat recursively with first character stripped off of page.
# if count = 1, add to valid character array.
#when string = 0, print valid character array.
valid = []
checked = []
myString = ""
def recursiveCount(bigString):
if len(bigString) == 0:
print "YAY!"
return valid
myChar = bigString[0]
if myChar in checked:
return recursiveCount(bigString[1:])
if bigString.count(myChar) > 1:
checked.append(myChar)
return recursiveCount(bigString[1:])
checked.append(myChar)
valid.append(myChar)
return recursiveCount(bigString[1:])
fileIN = open("testdata.txt", "r")
line = fileIN.readline()
while line:
line = line.strip()
myString = myString.join(line)
line = fileIN.readline()
myString = recursiveCount(myString)
print "\n"
print myString
string.join doesn't do what you think. join is used to combine a list of words into a single string with the given seperator. Ie:
>>> ",".join(('foo', 'bar', 'baz'))
'foo,bar,baz'
The code snippet you posted will attempt to insert myString between every character in the variable line. You can see how that will get big quickly :-). Are you trying to read the entire file into a single string, myString? If so, the way you want to concatenate the strings is like this:
myString = myString + line
While I'm here... since you're learning Python here are some other suggestions.
There are easier ways to read an entire file into a variable. For instance:
fileIN = open("testdata.txt", "r")
myString = fileIN.read()
(This won't have the exact behaviour of your existing strip() code, but may in fact do what you want.)
Also, I would never recommend practical Python code use recursion to iterate over a string. Your code will make a function call (and a stack entry) for every character in the string. Also I'm not sure Python will be very smart about all the uses of bigString[1:]: it may well create a second string in memory that's a copy of the original without the first character. The simplest way to process every character in a string is:
for mychar in bigString:
... do your stuff ...
Finally, you are using the list named "checked" to see if you've ever seen a particular character before. But the membership test on lists ("if myChar in checked") is slow. In Python you're better off using a dictionary:
checked = {}
...
if not checked.has_key(myChar):
checked[myChar] = True
...
This exercise you're doing is a great way to learn several Python idioms.

Categories

Resources