What I am trying to do:
I am trying to use 'Open' in python and this is the script I am trying to execute. I am trying to give "restaurant name" as input and a file gets saved (reviews.txt).
Script: (in short, the script goes to a page and scrapes the reviews)
from bs4 import BeautifulSoup
from urllib import urlopen
queries = 0
while queries <201:
stringQ = str(queries)
page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ)
soup = BeautifulSoup(page)
reviews = soup.findAll('p', attrs={'itemprop':'description'})
authors = soup.findAll('span', attrs={'itemprop':'author'})
flag = True
indexOf = 1
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f=open("reviews.txt", "a")
f.write(cleanEntry)
f.write("\n")
f.close
queries = queries + 40
Problem:
It's using append mode 'a' and according to documentation, 'w' is the write mode where it overwrites. When i change it to 'w' nothing happens.
f=open("reviews.txt", "w") #does not work!
Actual Question:
EDIT: Let me clear the confusion.
I just want ONE review.txt file with all the reviews. Everytime I run the script, I want the script to overwrite the existing review.txt with new reviews according to my input.
Thank you,
If I understand properly what behavior you want, then this should be the right code:
with open("reviews.txt", "w") as f:
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f.write(cleanEntry)
f.write("\n")
This will open the file for writing only once and will write all the entries to it. Otherwise, if it's nested in for loop, the file is opened for each review and thus overwritten by the next review.
with statement ensures that when the program quits the block, the file will be closed. It also makes code easier to read.
I'd also suggest to avoid using brackets in if statement, so instead of
if(endOf+1 == len(dirtyEntry)):
it's better to use just
if endOf + 1 == len(dirtyEntry):
If you want to write every record to a different new file, you must name it differently, because this way you are always overwritting your old data with new data, and you are left only with the latest record.
You could increment your filename like this:
# at the beginning, above the loop:
i=1
f=open("reviews_{0}.txt".format(i), "a")
f.write(cleanEntry)
f.write("\n")
f.close
i+=1
UPDATE
According to your recent update, I see that this is not what you want. To achieve what you want, you just need to move f=open("reviews.txt", "w") and f.close() outside of the for loop. That way, you won't be opening it multiple times inside a loop, every time overwriting your previous entries:
f=open("reviews.txt", "w")
for review in reviews:
# ... other code here ... #
f.write(cleanEntry)
f.write("\n")
f.close()
But, I encourage you to use with open("reviews.txt", "w") as described in Alexey's answer.
Related
How to - let a function run only on the first startup?
I have tried creating a value-adding mechanism (adding 1 to a variable after startup) but I failed.
result = _winreg.QueryValueEx(key, "MachineGuid")
ID = str(result)
licence_path = 'C:\\Program Files\\Common Files\\System\\read.txt'
oon = 0
def first_time_open_only():
file = open(licence_path, 'w')
file.write(ID[2:38])
file.close()
onn = 1 + onn
first_time_open_only()
with open(licence_path) as f:
contents = f.read()
if contents == str:
pass
else:
root.destroy()
There is a way that can solve this problem. On each run of the code, in order to understand that a function is run before or not, is to save the flag to a file such as pickle or a database. The code below shows a simple example such that the function only runs one time. This kind of problems can be solved by saving the file in order to let the code know the previous state.
In this code, if it is the first run of program, the Flag.pkl would not exists, so the flag will be equal to zero and the function will run, but in second execution the flag will have 1 as its value and the function would not execute.
import pickle
import os.path
def runOnce():
print("first time of execution")
flag = 1
with open('./Flag.pkl', 'wb') as f:
pickle.dump(flag, f)
if os.path.isfile('./Flag.pkl'):
with open('./Flag.pkl','rb') as f:
flag = pickle.load(f)
else:
flag = 0
if flag ==0:
runOnce()
else:
print("This function has been executed before!")
New to python and found one box still running very old code that doesn't support the with statement and admins not interested in upgrading as it will be replaced, but no ETA on replacement and need to get my script working
On another box with later version script works fine, but need to get it working on this older box
Extract from working script on later version
with open("ping.log","r") as reader:
while True:
line = reader.readline()
if len(line)==0:
break
status = line[34]
name = line[16:30]
if (status=="d"):
html+= '<tr>\n<td>\n<font color="red">'+ name+'</font><br>\n</td>\n</tr>\n'
else:
html+= '<tr>\n<td>\n<font color="green">'+ name+'</font><br>\n</td></tr>\n'
It basically opens file reads the contents and looks at specific position to see if device is up or down and make it red or green
Now I know you can use something like
fileh = open(file, 'w')
try:
# Do things with fileh here
finally:
fileh.close()
Need help with the part between try and finally
The as fileh part needs to be changed to an assignment to the file handle variable you want to use, reader, and that basically fixes it.
To wit,
reader = open("ping.log", 'r') # not 'w' if you are reading
try:
# Do things with reader here
# Basically copy-paste the stuff which was inside the with statement
finally:
reader.close()
You were actually on the right path:
fileh = open("ping.log", 'r') # you don't need the as reader part
try:
for line in fileh: # you can just iterate over the file, no need for while True
if len(line)==0:
break
status = line[34]
name = line[16:30]
if (status == "d"):
html+= '<tr>\n<td>\n<font color="red">'+ name+'</font><br>\n</td>\n</tr>\n'
else:
html+= '<tr>\n<td>\n<font color="green">'+ name+'</font><br>\n</td></tr>\n'
except:
print("ERROR")
finally:
fileh.close()
You could even make a variable color and set it depending on your status, so the lines are easier to change later like:
...
color = "red" if (status == "d") else "green"
html+= '<tr>\n<td>\n<font color="' + color + '">'+ name + '</font><br>\n</td>\n</tr>\n'
...
EDIT: As suggested, I updated for try, except, finally
EDIT 2:
For your second problem, as I understand it, you want to write or append to the file dash.aspx?
Then you wouldn't have to iterate over it
...
writer = open("dash.apsx", "w") #You need to check whether you want to use 'a' or 'w' for appending or writing
#no need to iterate over writerlines, you already have the file and can directly write to it
#INFO: this will put everything on one line, for new lines you could add "\n" e.g. writer.write(htmlBeg + "\n") etc.
writer.write(htmlBeg)
writer.write("<tr>")
writer.write(htmlTBeg)
writer.write('<th style="text-align:left"><h3>APAC Region Firewalls</h3></th>')
writer.write(html)
writer.write(htmlTEnd)
#close file at the end
write.close()
...
I have a list of filenames: files = ["untitled.txt", "example.txt", "alphabet.txt"]
I also have a function to create a new file:
def create_file(file):
"""Creates a new file."""
with open(file, 'w') as nf:
is_first_line = True
while True:
line = input("Line? (Type 'q' to quit.) ")
if line == "q":
# Detects if the user wants to quuit.
time.sleep(5)
sys.exit()
else:
line = line + "\n"
if is_first_line == False:
nf.write(line)
else:
nf.write(line)
is_first_line = False
I want the list to update itself after the file is created. However, if I just filenames.append() it,
I realized that it would only update itself for the duration of the program. Does anybody know how to do this? Is this possible in Python?
"Is this possible in Python?" -> This has nothing to do with limitations of the language you chose to solve your problem. What you want here is persistence. You could just store the list of files in a text file. Instead of hardcoding the list in your code your program would then read the content every time it is run.
This code could get you started:
with open("files.txt") as infile:
files = [f.strip() for f in infile.readlines()]
print(f"files: {files}")
# here do some stuff and create file 'new_file'
new_file = 'a_new_file.txt'
files.append(new_file)
###
with open("files.txt", "w") as outfile:
outfile.write("\n".join(files))
I've been working on a python script that will scrape certain webpages.
The beginning of the script looks like this:
# -*- coding: UTF-8 -*-
import urllib2
import re
database = ''
contents = open('contents.html', 'r')
for line in contents:
entry = ''
f = re.search('(?<=a href=")(.+?)(?=\.htm)', line)
if f:
entry = f.group(0)
page = urllib2.urlopen('https://indo-european.info/pokorny-etymological-dictionary/' + entry + '.htm').read()
m = re.search('English meaning( )+\s+(.+?)</font>', page)
if m:
title = m.group(2)
else:
title = 'N/A'
This accesses each page and grabs a title from it. Then I have a number of blocks of code that test whether certain text is present in each page, here is an example of one:
abg = re.findall('\babg\b', page);
if len(abg) == 0:
abg = 'N'
else:
abg = 'Y'
Then, finally, still in the for loop, I add this information to the variable database:
database += '\n' + str('<F>') + str(entry) + '<TITLE="' + str(title) + '"><FQ="N"><SQ="N"><ABG="' + str(abg) + '"></F>'
Note that I have used str() for each variable because I was getting a "can't concatenate strings and lists" error for some reason.
Once the for loop is completed, I write the database variable to a file:
f = open('database.txt', 'wb')
f.write(database)
f.close()
When I run this in the command line, it times out or never completes running. Any ideas as to what might be causing the issue?
EDIT: I fixed it. It seems the program was getting slowed down by the fact that I was having the database variable store the result of each line's iteration through the loop. All I had to do to fix the issue was change the write function to happen during the for loop.
I am fairly new to Python (just started learning in the last two weeks) and am trying to write a script to parse a csv file to extract some of the fields into a List:
from string import Template
import csv
import string
site1 = 'D1'
site2 = 'D2'
site3 = 'D5'
site4 = 'K0'
site5 = 'K1'
site6 = 'K2'
site7 = '0'
site8 = '0'
site9 = '0'
lbl = 1
portField = 'y'
sw = 5
swpt = 6
cd = 0
pt = 0
natList = []
with open(name=r'C:\Users\dtruman\Documents\PROJECTS\SCRIPTING - NATAERO DEPLOYER\NATAERO DEPLOYER V1\nataero_deploy.csv') as rcvr:
for line in rcvr:
fields = line.split(',')
Site = fields[0]
siteList = [site1,site2,site3,site4,site5,site6,site7,site8,site9]
while Site in siteList == True:
Label = fields[lbl]
Switch = fields[sw]
if portField == 'y':
Switchport = fields[swpt]
natList.append([Switch,Switchport,Label])
else:
Card = fields[cd]
Port = fields[pt]
natList.append([Switch,Card,Port,Label])
print natList
Even if I strip the ELSE statement away and break into my code right after the IF clause-- i can verify that "Switchport" (first statement in IF clause) is successfully being populated with a Str from my csv file, as well as "Switch" and "Label". However, "natList" is not being appended with the fields parsed from each line of my csv for some reason. Python returns no errors-- just does not append "natList" at all.
This is actually going to be a function (once I get the code itself to work), but for now, I am simply setting the function parameters as global variables for the sake of being able to run it in an iPython console without having to call the function.
The "lbl", "sw", "swpt", "cd", and "pt" refer to column#'s in my csv (the finished function will allow user to enter values for these variables).
I assume I am running into some issue with "natList" scope-- but I have tried moving the "natList = []" statement to various places in my code to no avail.
I can run the above in a console, and then run "append.natList([Switch,Switchport,Label])" separately and it works for some reason....?
Thanks for any assistance!
It seems to be that the while condition needs an additional parenthesis. Just add some in this way while (Site in siteList) == True: or a much cleaner way suggested by Padraic while Site in siteList:.
It was comparing boolean object against string object.
Change
while Site in siteList == True:
to
if Site in siteList:
You might want to look into the csv module as this module attempts to make reading and writing csv files simpler, e.g.:
import csv
with open('<file>') as fp:
...
reader = csv.reader(fp)
if portfield == 'y':
natlist = [[row[i] for i in [sw, swpt, lbl]] for row in fp if row[0] in sitelist]
else:
natlist = [[row[i] for i in [sw, cd, pt, lbl]] for row in fp if row[0] in sitelist]
print natlist
Or alternatively using a csv.DictReader which takes the first row as the fieldnames and then returns dictionaries:
import csv
with open('<file>') as fp:
...
reader = csv.DictReader(fp)
if portfield == 'y':
fields = ['Switch', 'card/port', 'Label']
else:
fields = ['Switch', '??', '??', 'Label']
natlist = [[row[f] for f in fields] for row in fp if row['Building/Site'] in sitelist]
print natlist