Here is the goal: a parser that reunites some information from some domains and organize them into one place.
I am a newbie with Python, having chosen to do this job with this language because of learning curve and things.
For the matter, I am doing the parsing with BeautifulSoup lib and that works like a charm. The routine is triggered via crontab in a CentOS 6, Python 2.7.
However, one of my parsing scripts sent me a log with memory error, what was causing the py file to quit without complete its job. Google here and there and found out that some very long html Python parsing would be doing my server ran out of memory. It would be better close, decompose and even garbage collect everything script would not be using there anymore.
Implemented the three things, no more memory errors in the crontab task. However, every time the script runs, I receive an email from crontab with the log of the parsing, what means that something went wrong there. Checking the databank, all the information was recorded alright, script also completed the entire task, still some error occurred, or crontab would not email me with a log.
In fact, when I run the script directly at the terminal on server, same occurs: the script won’t conclude, unless I ctrl+c it, it will be frozen in the screen. However, again, looking at the bank, all the tasks where completed without a error.
I tried work only with gc, tried only close() and only release(). Any of these three resources would freeze the screen/generate a log error (however without a error explicitaly in it).
Here is a simple version of what I am doing to better understanding:\
class GrabCategories():
def __init__(self):
target = 'http://provider-site.com/info.html'
try:
page = urllib2.urlopen(target)
if page.getcode() == 404:
print 'Page not found', target
return False
soup = BeautifulSoup(page.read())
page.close() #not using this anymore, may I close it?
except:
print 'Could not open', target
return
content = soup.find('div', {'id': 'box-content'})
soup.decompose() #not using this anymore, may I decompose it?
c=0
for link in content.findAll('a'):
#define some vars
try:
catPage = urllib2.urlopen(link['a'])
if catPage.getcode() == 404:
print 'Page not found', catPage
return False
catSoup = BeautifulSoup(catPage.read())
catPage.close() #not using this anymore, may I close it?
except:
print 'Could no open', target
continue
#do some things with the page content etc
catSoup.decompose() #not using this anymore, may I decompose it?
if(c%10):
gc.collect()
c=c+1
Related
I'm trying to do everything I would have previously used Bash for in Python, in order to finally learn the language. However, this problem has me stumped, and I haven't been able to find any solutions fitting my use-case.
There is an element of trying to run before I can walk with though, so I'm looking for some direction.
Here's the issue:
I have a Python script that starts a separate program that creates and writes to a log file.
I want to watch that log file, and print out "Successful Run" if the script detects the "Success" string in the log, and "Failed Run" if the "Failed" string is found instead. The underlying process generally takes about 10 seconds to get to the stage where it'll write "Success" or "Failure" to the log file. Neither string will appear in the log at the same. It's either a success, or failure. It can't be both.
I've been attempting to do this with a while loop. So I can continue to watch the log file, until the string appears, and then exit when it does. I have got it working for just one string, but I'm unsure how to accomodate the other string.
Here's the code I'm running.
log_path = "test.log"
success = "Success"
failure = "Failed"
with open(log_path) as log:
while success != True:
if success in log.read():
print("Process Successfully Completed")
sys.exit()
Thanks to the pointers above from alaniwi and David, I've actually managed to get it to work, using the following code. So I must have been quite close originally.
I've wrapped it all in a while True, put the log.read() into a variable, and added an elif. Definitely interested in any pointers on whether this is the most Pythonic way to do it though? So please critique if need be.
while True:
with open(log_path) as log:
read_log = log.read()
if success in read_log:
print("Process Successfully Completed")
sys.exit()
elif fail in read_log:
print("Failed")
sys.exit()
I write a Python script to manage my account on a webpage automatically.
Code Description:
The script has a while loop and at the end of the loop, it waits 12 hours before starting again.
Before the while loop starts, it's logging in to my account, and when entering the while loop, it checks if I'm still logged in. If not, it's logging in to my account again.
Problem:
After re-entering the while loop (first time everything goes fine), the script does only work, when print("Name is:") and print(name) is at the very beginning. I tested it several times and maybe it is just a bug/glitch, which was just unlucky to be caused only when the print statements aren't there, but this is very confusing me right now, how those print statements fixed my issue. I would like to know, what is or could causing the issue and how do I have to solve it properly?
Some side info:
The webpage is saving the login credentials through session cookies with a lifetime of ~6 hours. So after re-entering the script loop again, I'm not logged in for sure. If I'm reducing the wait time to 30 minutes instead of 12 hours, the script works also without the print statements.
General notes:
The script is running through nohup on my Raspberry Pi 3
Python version is 3.7.3
Code related notes:
I'm using the post method from requests to log in to my account
For checking, if I'm still logged in, I'm using beautifulSoup4
The following code is abbreviated and in a very basic shape.
"account" is an instance of a self-made class. When instantiating, it is log in itself with arguments, if given
This is the core code:
import time
import requests
from account import Account # costum made class
from bs4 import BeautifulSoup
# login credentials
name = "lol" # I replaced them with placeholders
pw = "lol"
account = Account(name, pw) # instantiating an account class and log in itself with given arguments
while True: # script loop
print("name is:") # Without those both print statements,
print(name) # the code won't work
if not account.stillAlive(): # if not signed in anymore ...
account.login(name, pw) # ... sign in again
account.doStuff() # Do the automating stuff
time.sleep(43200) # Wait 12 hours, before entering the while loop again
This is the doStuff() method from the Account class:
def doStuff(self):
html = requests.get("example.com").text # Note: example.com is only for demonstration purpose only
crawler = BeautifulSoup(html, "lxml")
lol = crawler.find("input", attrs={"name": "Submit2"}).get("value")
# ...
Error message:
So, if I'm executing the program without the print statements, I'm getting this error:
lol = crawler.find("input", attrs={"name": "Submit2"}).get("value")
AttributeError: 'NoneType' object has no attribute 'get'
This does not occur when executing with the print statements. With the print statements, the code runs fine.
My guess
My guess is, what the memory management of Python is deleting the name variable. When entering the script loop in the first time, I'm already logged in and therefore it is skipping the account.login(name, pw) part. Since this is the only part, where name is continued to be, maybe Python is interpreting this as dead code after too many time has passed without the line to be executed, and don't see the reason to keep the name/pw variable and deletes them. Still, I'm just an amateur and I don't have any expertise in this segment.
Side notes:
This is my first question I'm submitting, if I forgot or did something wrong, pls tell me.
I already searched for this problem, but I didn't find anything similar. Maybe I just searched badly, but I searched for a few hours now. If so, I apologize. (I had to wait for every test 12 hours and since I tried it several times, you can tell, I had some time available to search)
I am relatively new to Python, and programming as a whole. I am progressively getting the hang of it, however I have been stumped as of late in regards to one of my latest projects. I have a set of Atlas Scientific EZO circuits w/ their corresponding sensors hooked up to my Raspberry Pi 3. I can run the i2c script fine, and the majority of the code makes sense to me. However, I would like to pull data from the sensors and log it with a time stamp in a CSV file, taking data points in timed intervals. I am not quite sure how to pull the data from the sensor, and put it into a CSV. Making CSVs in Python is fairly simple, as is filling them with data, but I cannot seem to understand how I would make the data that goes into the CSV the same as what is displayed in the terminal when one runs the Poll function. Attached is the i2c sample code from Atlas' website. I have annotated it a bit more so as to help me understand it better.
I have already attempted to make sense of the poll function, but am confused in regards to the self.file_write and self.file_read methods used throughout the code. I do believe they would be of use in this instance but I am generally stumped in terms of implementation. Below you will find a link to the Python script (i2c.py) written by Atlas Scientific
https://github.com/AtlasScientific/Raspberry-Pi-sample-code/blob/master/i2c.py
I'm guessing by "the polling function" you are referring to this section of the code:
# continuous polling command automatically polls the board
elif user_cmd.upper().startswith("POLL"):
delaytime = float(string.split(user_cmd, ',')[1])
# check for polling time being too short, change it to the minimum timeout if too short
if delaytime < AtlasI2C.long_timeout:
print("Polling time is shorter than timeout, setting polling time to %0.2f" % AtlasI2C.long_timeout)
delaytime = AtlasI2C.long_timeout
# get the information of the board you're polling
info = string.split(device.query("I"), ",")[1]
print("Polling %s sensor every %0.2f seconds, press ctrl-c to stop polling" % (info, delaytime))
try:
while True:
print(device.query("R"))
time.sleep(delaytime - AtlasI2C.long_timeout)
except KeyboardInterrupt: # catches the ctrl-c command, which breaks the loop above
print("Continuous polling stopped")
If this is the case then if looks like you can recycle most of this code for your use. You can grab the string you are seeing in your console with device.query("R"), instead of printing it, grab the return value and write it to your CSV.
I think You should add method to AtlasI2C class which will write data to file
Just type under AtlasI2C init() this method:
def update_file(self, new_data):
with open(self.csv_file, 'a') as data_file:
try:
data = "{}\n".format(str(new_data))
data_file.write(data)
except Exception as e:
print(e)
add to AtlasI2C init statement about csv file name:
self.csv_file = <my_filename>.csv # replace my_filename with ur name
and then under line 51 (char_list = list(map(lambda x: chr(ord(x) & ~0x80), list(response[1:]))) add this line:
self.update_file(''.join(char_list))
Hope its gonna help You.
Cheers,
Fenrir
I have a python project called the "Remote Dongle Reader". There are about 200 machines that have a "Dongle" attached, and a corresponding .exe called "Dongle Manager". Running the Dongle Manager spits out a "Scan" .txt file with information from the dongle.
I am trying to write a script, which runs from a central location, which has administrative domain access to the entire network. It will read a list of hostnames, go through each one, and bring back all the files. Once it brings back all the files, it will compile to a csv.
I have it working on my Lab/Test servers, but in production systems, it does nto work. I am wondering if this is some sort of login issue since people may be actively using the system. THe process needs to launch silently, and do everything int he background. However since I am connecting to the administrator user, I wonder if there is a clash.
I am not sure what's going on other than tge application works up until the point I expect the file to be there. The "Dongle Manager" process starts, but it doesnt appear to be spitting the scan out on any machine not logged in as administrator (the account I am running off of).
Below is the snippet of the WMI section of the code. This was a very quick script so I apoliogize for any non pythonic statements.
c = wmi.WMI(ip, user=username, password=password)
process_startup = c.Win32_ProcessStartup.new()
process_startup.ShowWindow = SW_SHOWNORMAL
cmd = r'C:\Program Files\Avid\Utilities\DongleManager\DongleManager.exe'
process_id, result = c.Win32_Process.Create(CommandLine=cmd,
ProcessStartupInformation=process_startup)
if result == 0:
print("Process started successfully: %d" % process_id)
else:
print("Problem creating process: %d" % result)
while not os.path.exists(("A:/"+scan_folder)):
time.sleep(1)
counter += 1
if counter > 20:
failed.append(hostname)
print("A:/"+scan_folder+"does not exist")
return
time.sleep(4)
scan_list = os.listdir("A:/"+scan_folder)
scan_list.sort(key=lambda x: os.stat(os.path.join("A:/"+scan_folder, x)).st_mtime, reverse=True)
if scan_list is []:
failed.append(hostname)
return
recursive_overwrite("A:/"+scan_folder+"/"+scan_list[0],
"C:\\AvidTemp\\Dongles\\"+hostname+".txt")
Assuming I get a connection (computer on), it usually fails at the point where it either waits for teh folder to be created, or expects something in the list of scan_folder... either way, something is stopping the scan from being created, even though the process is starting
Edit, I am mounting as A:/ elsewhere in the code
The problem is that you've requested to show the application window but there is no logged on desktop to display it. WMI examples frequently use SW_SHOWWINDOW but that's usually the wrong choice because with WMI you are typically trying to run something in the background. In that case, SW_HIDE (or nothing) is the better choice.
I have a script runReports.py that is executed every night. Suppose for some reason the script takes too long to execute, I want to be able to stop it from terminal by issuing a command like ./runReports.py stop.
I tried to implement this by having the script to create a temporary file when the stop command is issued.
The script checks for existence of this file before running each report.
If the file is there the script stops executing, else it continues.
But I am not able to find a way to make the issuer of the stop command aware that the script has stopped successfully. Something along the following lines:
$ ./runReports.py stop
Stopping runReports...
runReports.py stopped successfully.
How to achieve this?
For example if your script runs in loop, you can catch signal http://en.wikipedia.org/wiki/Unix_signal and terminate process:
import signal
class SimpleReport(BaseReport):
def __init__(self):
...
is_running = True
def _signal_handler(self, signum, frame):
is_running = False
def run(self):
signal.signal(signal.SIGUSR1, self._signal_handler) # set signal handler
...
while is_running:
print("Preparing report")
print("Exiting ...")
To terminate process just call kill -SIGUSR1 procId
You want to achieve inter process communication. You should first explore the different ways to do that : system V IPC (memory, very versatile, possibly baffling API), sockets (including unix domain sockets)(memory, more limited, clean API), file system (persistent on disk, almost architecture independent), and choose yours.
As you are asking about files, there are still two ways to communicate using files : either using file content (feature rich, harder to implement), or simply file presence. But the problem using files, is that is a program terminates because of an error, it may not be able to write its ended status on the disk.
IMHO, you should clearly define what are your requirements before choosing file system based communication (testing the end of a program is not really what it is best at) unless you also need architecture independence.
To directly answer your question, the only reliable way to know if a program has ended if you use file system communication is to browse the list of currently active processes, and the simplest way is IMHO to use ps -e in a subprocess.
Instead of having a temporary file, you could have a permanent file(config.txt) that has some tags in it and check if the tag 'running = True'.
To achieve this is quiet simple, if your code has a loop in it (I imagine it does), just make a function/method that branches a check condition on this file.
def continue_running():
with open("config.txt") as f:
for line in f:
tag, condition = line.split(" = ")
if tag == "running" and condition == "True":
return True
return False
In your script you will do this:
while True: # or your terminal condition
if continue_running():
# your regular code goes here
else:
break
So all you have to do to stop the loop in the script is change the 'running' to anything but "True".