Check textfile for arguments Python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have two Python scripts:
Script 1: Checks elements on a webpage and writes them to a file.
Script 2: Reads from this file and uses the contents as argument for an if statement. This is the part that I'm unsure about.
The text file has at least 500 items all on new lines, and I want to check if these items are still there when I revisit the site.
def read_input_file(self):
inFile = open("page_items.txt","r")
if inFile == current_content:
do.stuff
What would be the best way to go about this?

Use the first script to scrape the site again and save it in a set. Then use .issubset to check if everything in 'inFile' is contained within the current_site?
current_site = set(scraped_items)
if set(inFile).issubset(current_site):
do.stuff

It turned out that the sets where not really what I was looking for after all. Mainly because the scraped contents needed to survive a reboot. So the text file was the only option I could think of.
I did find a solution however, instead of scraping the current_site and matching that with the the infile I now start with the infile and search for that line on the current_site, using Selenium.
Here is what I came up with, it's not very clean but maybe it's useful to somebody from the future
import linecache
for i in range(0, 200):
scraped_content = linecache.getline('scraped.txt', count)
scraped_content = str(scraped_content).rstrip()
search_path = "//*[contains(text(),'",scraped_content,"')]"
joined_string = "".join(str(x) for x in search_path)
scroll_down = driver.find_element_by_tag_name('a')
scroll_down.send_keys(Keys.PAGE_DOWN)
scroll_to_element = None
while not scroll_to_element:
try:
scroll_to_element = driver.find_element_by_xpath(joined_string)
time.sleep(1)
except NoSuchElementException:
print "Searching for Content:", scraped_content
break
if scroll_to_element != None:
try:
print scraped_content,"Found!"

Related

Problem importing libraries when using inside a definition [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 days ago.
Improve this question
I have this code, that uses BeautifullSoup (bs4):
for file_name, news_table in news_tables.items():
# Iterate through all tr tags in 'news_table'
#if sp != None:
if news_table is not None:
for x in news_table:
if x is not None:
for x in news_table.find_all('tr'):
# occasionally x (below) may be None when the html table is poorly formatted, skip it in try except instead of throwing an error and exiting
# may also use an if loop here to check if x is None first
#if x is not None:
try:
# read the text from each tr tag into text
# get text from a only
text = x.a.get_text()
# splite text in the td tag into a list
date_scrape = x.td.text.split()
# if the length of 'date_scrape' is 1, load 'time' as the only element
if len(date_scrape) == 1 :
time = date_scrape[0]
# else load 'date' as the 1st element and 'time' as the second
else:
date = date_scrape[0]
time = date_scrape[1]
# Extract the ticker from the file name, get the string up to the 1st '_'
ticker = file_name.split('_')[0]
# Append ticker, date, time and headline as a list to the 'parsed_news' list
parsed_news.append([ticker, date, time, text])
except Exception as e:
print(e)
else:
pass
else:
pass
that works, but, when inserting this code inside the definition that I use to see the cmd in a Tkinter ui, the libraries are not imported, and the script stops (in the ui)
This is the definition I use for the ui:
def test():
print("Thread: start")
p = subprocess.Popen("ping -c 4 stackoverflow.com".split(), stdout=subprocess.PIPE, bufsize=1, text=True)
while p.poll() is None:
msg = p.stdout.readline().strip() # read a line from the process output
if msg:
print(msg)
I've tried lots of things, but cant find the way to invoke the beautifullsoap get_all() and get_text(), the libraries don't recognize them.
How can I change this so the libreries recognice the get_all and get_text? (at this moment they are not recognized and I think that is why the program stops when reached that point) ... I don't know from where are those funictions (find_all and get_text), I think they are from beautifullsoup, but might not be.
The first code is inside a definition (second code) thats used for the users interface visualization of the cmd with tkinter.

How to check if a Specifc txt exist in a website and save link - python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have this website:https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=1
I need a code to check different zoneid, [1 to 3000]. and check if the word "H10" exists in that link (like this one https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=0160)
and if the word "H10" exits I want all the links that contain that word to be saved.
Thank you.
You can use this example to iterate over different zones and check if there are links with H10:
import requests
from bs4 import BeautifulSoup
url = "https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid={}"
for zoneid in range(159, 165): # <--- adjust pages here, for eg. (1, 3001)
u = url.format(zoneid)
print("Checking {}".format(u))
soup = BeautifulSoup(requests.get(u).content, "html.parser")
h10_links = soup.select('a:-soup-contains("H10")')
for link in h10_links:
print(link["href"])
Prints:
Checking https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=159
Checking https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=160
https://www.ville.levis.qc.ca/fileadmin/documents/pdf/permis/classes_usages_zonage_vdl.pdf
Checking https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=161
Checking https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=162
Checking https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=163
Checking https://geo1.ville.levis.qc.ca/grilleusage/default.aspx?zoneid=164

Looping a function with its input being a url [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
So I am trying to get into python, and am using other examples that I find online to understand certain functions better.
I found a post online that shared a way to check prices on an item through CamelCamelCamel.
They had it set to request from a specific url, so I decided to change it to userinput instead.
How can I just simply loop this function?
It runs fine afaik once, but after the inital process i get 'Process finished with exit code 0', which isn't necessarily a problem.
For the script to perform how I would like it to. It would be nice if there was a break from maybe, 'quit' or something, but after it processes the URL that was given, I would like it to request for a new URL.
Im sure theres a way to check for a specific url, IE this should only work for Camelcamelcamel, so to limit to only that domain.
Im more familiar with Batch, and have kinda gotten away with using batch to run my python files to circumvent what I dont understand.
Personally if I could . . .
I would just mark the funct as 'top:'
and put goto top at the bottom of the script.
from bs4 import BeautifulSoup
import requests
print("Enter CamelCamelCamel Link: ")
plink = input("")
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(plink,headers=headers)
data = r.text
soup = BeautifulSoup(data,'html.parser')
table_data = soup.select('table.product_pane tbody tr td')
hprice = table_data[1].string
hdate = table_data[2].string
lprice = table_data[7].string
ldate = table_data[8].string
print ('High price-',hprice)
print ("[H-Date]", hdate)
print ('---------------')
print ('Low price-',lprice)
print ("[L-Date]", ldate)
Also how could I find the difference from the date I obtain from either hdate or ldate, from today/now. How the dates I parsed they're strings and I got. TypeError: unsupported operand type(s) for +=: 'int' and 'str'.
This is really just for learning, any example works, It doesnt have to be that site in specific.
In Python, you have access to several different types of looping control structures, including:
while statements
while (condition) # Will execute until condition is no longer True (or until break is called)
<statements to execute while looping>
for statements
for i in range(10) # Will execute 10 times (or until break is called)
<statements to execute while looping>
Each one has its strengths and weaknesses, and the documentation at Python.org is very thorough but easy to assimilate.
https://docs.python.org/3/tutorial/controlflow.html

Python: Writing lists to .csv file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I’m teaching myself programming, using Python as my initial weapon of choice.
I have learnt a few basics and decided to set myself the challenge of asking the user for a list of names, adding the names to a list and then finally writing the names to a .csv file.
Below is my code. It works.
My question is what would you do differently, i.e. how could this code be improved for readability and efficiency. Would you approach the situation differently, structure it differently, call different functions? I am interested in, and would appreciate a great deal, the feedback from more experienced programmers.
In particular, I find certain parts clunky; such as having to specify to the user the required format for data entry. If I were to simply request the data (name age location) without the commas however, then each record, when written to .csv, would simply end up as one record per cell (Excel) – this is not the desired result.
#Requesting user input.
guestNames = input("Please enter the names of your guests, one at a time.\n"\
"Once you have finished entering the information, please type the word \"Done\".\n"\
"Please enter your names in the following format (Name, Age, Location). ").capitalize()
guestList.append(guestNames)
while guestNames.lower() != "done".lower() :
guestNames = input("Please enter the name of your " + guestNumber[number] + " guest: ").capitalize()
guestList.append(guestNames)
number += 1
#Sorting the list.
guestList.sort()
guestList.remove("Done")
#Creating .csv file.
guestFile = open("guestList.csv","w")
guestFile.close()
#Writing to file.
for entries in guestList :
guestFile = open("guestList.csv","a")
guestFile.write(entries)
guestFile.write("\n")
guestFile.close()
I try to write down your demands:
Parse the input string according to its structure (whatever) and save results into a list
Format the result into CSV-format string
write the string to a CSV file
First of all, I would highly recommend you to read the a Python string operation and formatting tutorial like Google Developer Tutorial. When you understand the basic operation, have a look at official documentation to see available string processing methods in Python.
Your logic to write the code is right, but there are two meaningless lines:
while guestNames.lower() != "done".lower()
It's not necessary to lower "done" since it is already lower-case.
for entries in guestList :
guestFile = open("guestList.csv","a")
Here you open and close the questList.csv every loop, which is useless and costly. You could open the file at the beginning, then save all lines with a for loop, and close it at the end.
This is a sample using the same logic and different input format:
print('some notification at the beginning')
while true:
guestNames = input("Please enter the name of your " + guestNumber[number] + " guest: ").capitalize()
if guestNames == 'Done':
# Jump out of the loop if user says done
break
else:
# Assume user input 'name age location', replace all space with commas
guestList.append(guestNames.replace(' ', ','))
number += 1
guestList.sort()
# the with keyword will close the guestFile at the end
with open("guestList.csv","w") as guestFile:
guestFile.write('your headers\n')
for entries in guestList:
guestFile.write('%s\n' % entries)
Be aware that there are many ways to fulfil your demands, with different logics and methodologies.

Is there a better way to format this Python/Django code as valid PEP8? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have code written both ways and I see flaws in both of them. Is there another way to write this or is one approach more "correct" than the other?
def functionOne(subscriber):
try:
results = MyModelObject.objects.filter(
project__id=1,
status=MyModelObject.STATUS.accepted,
subscriber=subscriber).values_list(
'project_id',
flat=True).order_by('-created_on')
except:
pass
def functionOne(subscriber):
try:
results = MyModelObject.objects.filter(
project__id=1,
status=MyModelObject.STATUS.accepted,
subscriber=subscriber)
results = results.values_list('project_id', flat=True)
results = results.order_by('-created_on')
except:
pass
This is valid code, this isn't correct code, I ripped a similar chunk of code to give an example of the objects.filter section. Please don't waste time commenting on the other parts of the code. I put the try/except in there to force an indent to push certain elements on new lines(80 columns)
I would do this:
def functionOne(subscriber):
try:
results = MyModelObject.objects.filter(
project__id=1,
status=MyModelObject.STATUS.accepted,
subscriber=subscriber
).values_list(
'project_id',
flat=True
).order_by(
'-created_on'
)
except:
pass
Use indentation to make the hierarchy more readable. However, this code isn't particularly nice. Using code like this directly in views should be considered an anti-pattern. Model Managers might be a better option for such recurring code.
You might want to read http://dabapps.com/blog/higher-level-query-api-django-orm/

Categories

Resources