Run python script like a service with Twisted - python

I would like to run this script like a automatic service who will run every minute, everyday with Twisted (I first tried to 'DAEMON' but it seems to difficult and i didn't find good tutos to do it, I already tried crontab but that's not what I'm looking for).
Do anyone ever do that with Twisted because I'm not finding the tutorial made for my kind of script(getting datas from a db table and putting them in another table of same db) ? I have to keep the logs in a file but it will not be the most difficult part.
from twisted.enterprise import adbapi
from twisted.internet import task
import logging
from datetime import datetime
from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks
"""
Test DB : This File do database connection and basic operation.
"""
log = logging.getLogger("Test DB")
dbpool = adbapi.ConnectionPool("MySQLdb",db="xxxx",user="guza",passwd="vQsx7gbblal8aiICbTKP",host="192.168.15.01")
class MetersCount():
def getTime(self):
log.info("Get Current Time from System.")
time = str(datetime.now()).split('.')[0]
return time
def getTotalMeters(self):
log.info("Select operation in Database.")
getMetersQuery = """ SELECT count(met_id) as totalMeters FROM meters WHERE DATE(met_last_heard) = DATE(NOW()) """
return dbpool.runQuery(getMetersQuery).addCallback(self.getResult)
def getResult(self, result):
print ("Receive Result : ")
print (result)
# general purpose method to receive result from defer.
return result
def insertMetersCount(self, meters_count):
log.info("Insert operation in Database.")
insertMetersQuery = """ INSERT INTO meter_count (mec_datetime, mec_count) VALUES (NOW(), %s)"""
return dbpool.runQuery(insertMetersQuery, [meters_count])
def checkDB(self):
d = self.getTotalMeters()
d.addCallback(self.insertMetersCount)
return d
a= MetersCount()
a.checkDB()
reactor.run()

If you want to run a function once a minute, have a look at LoopingCall. It takes a function, and runs it at intervals unless told to stop.
You would use it something like this (which I haven't tested):
from twisted.internet.task import LoopingCall
looper = LoopingCall(a.checkDB)
looper.start(60)
The documentation is at the link.

Related

Python schedule with commandline

I have this problem that I want to automate a script.
And in passed projects I've used python scheduler for this. But for this project I'm unsure how to handle this.
The problem is that the code works with login details that are outside the code and entered in the commandline when launching the script.
ex. python scriptname.py email#youremail.com password
How can I automate this with python scheduler?
The code that is in 'scriptname.py' is:
//LinkedBot.py
import argparse, os, time
import urlparse, random
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
def getPeopleLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
if 'profile/view?id=' in url:
links.append(url)
return links
def getJobLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
if '/jobs' in url:
links.append(url)
return links
def getID(url):
pUrl = urlparse.urlparse(url)
return urlparse.parse_qs(pUrl.query)['id'][0]
def ViewBot(browser):
visited = {}
pList = []
count = 0
while True:
#sleep to make sure everything loads, add random to make us look human.
time.sleep(random.uniform(3.5,6.9))
page = BeautifulSoup(browser.page_source)
people = getPeopleLinks(page)
if people:
for person in people:
ID = getID(person)
if ID not in visited:
pList.append(person)
visited[ID] = 1
if pList: #if there is people to look at look at them
person = pList.pop()
browser.get(person)
count += 1
else: #otherwise find people via the job pages
jobs = getJobLinks(page)
if jobs:
job = random.choice(jobs)
root = 'http://www.linkedin.com'
roots = 'https://www.linkedin.com'
if root not in job or roots not in job:
job = 'https://www.linkedin.com'+job
browser.get(job)
else:
print "I'm Lost Exiting"
break
#Output (Make option for this)
print "[+] "+browser.title+" Visited! \n("\
+str(count)+"/"+str(len(pList))+") Visited/Queue)"
def Main():
parser = argparse.ArgumentParser()
parser.add_argument("email", help="linkedin email")
parser.add_argument("password", help="linkedin password")
args = parser.parse_args()
browser = webdriver.Firefox()
browser.get("https://linkedin.com/uas/login")
emailElement = browser.find_element_by_id("session_key-login")
emailElement.send_keys(args.email)
passElement = browser.find_element_by_id("session_password-login")
passElement.send_keys(args.password)
passElement.submit()
Running this on OSX.
I can see at least two different way of automating the trigger of your script. Since you are mentioning that your script is started this way:
python scriptname.py email#youremail.com password
It means that you start it from a shell. As you want to have it scheduled, it sounds like a Crontab is a perfect answer. (see https://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/ for example)
If you really want to use python scheduler, you can use the subprocess.
In your file using python scheduler:
import subprocess
subprocess.call("python scriptname.py email#youremail.com password", shell=True)
What is the best way to call a Python script from another Python script?
About the code itself
LinkedIn REST Api
Have you tried using LinkedIn's REST Api instead of retrieving heavy pages, filling in some form and sending it back?
Your code is prone to be broken whenever LinkedIn changes some elements in their page. Whereas the Api is a contract between LinkedIn and the users.
Check here https://developer.linkedin.com/docs/rest-api and there https://developer.linkedin.com/docs/guide/v2/concepts/methods
Credentials
So that you don't have to pass your credentials through command line (especially your password, which will be readable in clear through history), you should either
use a config file (with your Api Key) and read it with ConfigParser (or anything else, depending on the format of your config file (json, python, etc...)
or set them into your environment variables.
For the scheduling
Using Cron
Moreover, for the scheduling part, you can use cron.
Using Celery
If you're looking for a 100% Python solution, you can use the excellent Celery project. Check its periodic tasks.
You can pass the args to the python scheduler.
scheduler.enter(delay, priority, action, argument=(), kwargs={})
Schedule an event for delay more time units. Other than the relative time, the other arguments, the effect and the return value are the same as those for enterabs().
Changed in version 3.3: argument parameter is optional.
New in version 3.3: kwargs parameter was added.
>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(a='default'):
... print("From print_time", time.time(), a)
...
>>> def print_some_times():
... print(time.time())
... s.enter(10, 1, print_time)
... s.enter(5, 2, print_time, argument=('positional',))
... s.enter(5, 1, print_time, kwargs={'a': 'keyword'})
... s.run()
... print(time.time())
...
>>> print_some_times()
930343690.257
From print_time 930343695.274 positional
From print_time 930343695.275 keyword
From print_time 930343700.273 default
930343700.276

How can I leverage luigi for Openstack tasks

I want to use Luigi to manage workflows in Openstack. I am new to Luigi. For the starter, I just want to authenticate myself to Openstack and then fetch image list, flavor list etc using Luigi. Any help will be appreciable.
I am not good with python but I tried below code. I am also not able to list images. Error: glanceclient.exc.HTTPNotFound: The resource could not be found. (HTTP 404)
import luigi
import os_client_config
import glanceclient.v2.client as glclient
from luigi.mock import MockFile
import sys
import os
def get_credentials():
d = {}
d['username'] = 'X'
d['password'] = 'X'
d['auth_url'] = 'X'
d['tenant_name'] = 'X'
d['endpoint'] = 'X'
return d
class LookupOpenstack(luigi.Task):
d =[]
def requires(self):
pass
def output(self):
gc = glclient.Client(**get_credentials())
images = gc.images.list()
print("images", images)
for i in images:
print(i)
return MockFile("images", mirror_on_stderr=True)
def run(self):
pass
if __name__ == '__main__':
luigi.run(["--local-scheduler"], LookupOpenstack())
The general approach to this is just write python code to perform the tasks you want using the OpenStack API. https://docs.openstack.org/user-guide/sdk.html It looks like the error you are getting is addressed on the OpenStack site. https://ask.openstack.org/en/question/90071/glanceclientexchttpnotfound-the-resource-could-not-be-found-http-404/
You would then just wrap this code in luigi Tasks as appropriate- there's nothing special about doing with this OpenStack, except that you must define the output() of your luigi tasks to match up with an output that indicates the task is done. Right now it looks like the work is being done in the output() method, which should be in the run() method, the output method should just be what to look for to indicate that the run() method is complete so it doesn't run() when required by another task if it is already done.
It's really impossible to say more without understanding more details of your workflow.

How to emit GObject signal once?

First, I have less than one year experience with Python. I normally have problems with data usage with my ISP such that I discover late that I have overused my data allocation. So, in order to evade this problem, I thought of creating a script that would run immediately I tether my phone to my PC. The script scrapes data from my ISP's website and returns my data bundle balance.
I found out that pyudev is useful in such an instance where I would like to constantly monitor my usb ports activities. Part of the script is as shown below. The problem is identify_phone is called 3 times (from my experience) which tends to call the scraping method 3 times whereas I would like it to be called once. I tried using the global value but it does not work as expected. I have searched online but there are scarce resources on pyudev.
Any help would be appreciated (Including any revision on my code :P)
import glib
import re
import subprocess
import requests
import bs4
import datetime
import sys
from selenium import webdriver
from pyudev import Context, Monitor
def identify_phone(observer, device):
global last_updated
current_time = datetime.datetime.now()
time_diff = current_time - last_updated
if time_diff.seconds < 10:#300:
pass
else:
#print('\Checking USB ports...')
try:
tout = subprocess.check_output("lsusb | grep 1234:1234", shell=True) #check if specific usb device (phone) is connected
except subprocess.CalledProcessError:
tout = None
if tout is not None:
device_found = True
else:
device_found = False
last_updated = datetime.datetime.now()
get_network_data()# scrapes ISP website
try:
device_found = False
last_updated = datetime.datetime.now()
try:
from pyudev.glib import MonitorObserver
except ImportError:
from pyudev.glib import GUDevMonitorObserver as MonitorObserver
context = Context()
monitor = Monitor.from_netlink(context)
monitor.filter_by(subsystem='usb')
observer = MonitorObserver(monitor)
observer.connect('device-added', identify_phone)
monitor.start()
glib.MainLoop().run()
except KeyboardInterrupt:
print('\nShutdown requested.\nExiting gracefully...')
sys.exit(0)

Why Python runs the code outside of __main__ every time?

I'm just wondering the behaviour of Python and how it really works. I have a script to run and collect all followers and friends of an account.
This is the code.
#!/usr/bin/env python
import pymongo
import tweepy
from pymongo import MongoClient
from sweepy.get_config import get_config
config = get_config()
consumer_key = config.get('PROCESS_TWITTER_CONSUMER_KEY')
consumer_secret = config.get('PROCESS_TWITTER_CONSUMER_SECRET')
access_token = config.get('PROCESS_TWITTER_ACCESS_TOKEN')
access_token_secret = config.get('PROCESS_TWITTER_ACCESS_TOKEN_SECRET')
MONGO_URL = config.get('MONGO_URL')
MONGO_PORT = config.get('MONGO_PORT')
MONGO_USERNAME = config.get('MONGO_USERNAME')
MONGO_PASSWORD = config.get('MONGO_PASSWORD')
client = MongoClient(MONGO_URL, int(MONGO_PORT))
print 'Establishing Tweepy connection'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, retry_count=3)
db = client.tweets
db.authenticate(MONGO_USERNAME, MONGO_PASSWORD)
raw_tweets = db.raw_tweets
users = db.users
def is_user_in_db(screen_name):
return get_user_from_db(screen_name) is None
def get_user_from_db(screen_name):
return users.find_one({'screen_name' : screen_name})
def get_user_from_twitter(user_id):
return api.get_user(user_id)
def get_followers(screen_name):
users = []
for i, page in enumerate(tweepy.Cursor(api.followers, id=screen_name, count=200).pages()):
print 'Getting page {} for followers'.format(i)
users += page
return users
def get_friends(screen_name):
users = []
for i, page in enumerate(tweepy.Cursor(api.friends, id=screen_name, count=200).pages()):
print 'Getting page {} for friends'.format(i)
users += page
return users
def get_followers_ids(screen_name):
ids = []
for i, page in enumerate(tweepy.Cursor(api.followers_ids, id=screen_name, count=5000).pages()):
print 'Getting page {} for followers ids'.format(i)
ids += page
return ids
def get_friends_ids(screen_name):
ids = []
for i, page in enumerate(tweepy.Cursor(api.friends_ids, id=screen_name, count=5000).pages()):
print 'Getting page {} for friends ids'.format(i)
ids += page
return ids
def process_user(user):
screen_name = user['screen_name']
print 'Processing user : {}'.format(screen_name)
if is_user_in_db(screen_name):
user['followers_ids'] = get_followers_ids(screen_name)
user['friends_ids'] = get_friends_ids(screen_name)
users.insert_one(user)
else:
print '{} exists!'.format(screen_name)
print 'End processing user : {}'.format(screen_name)
if __name__ == "__main__":
for doc in raw_tweets.find({'processed' : {'$exists': False}}):
print 'Start processing'
try:
process_user(doc['user'])
except KeyError:
pass
try:
process_user(doc['retweeted_status']['user'])
except KeyError:
pass
raw_tweets.update_one({'_id': doc['_id']}, {'$set':{'processed':True}})
What I keep getting from the log is
Rate limit reached. Sleeping for: 889
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Rate limit reached. Sleeping for: 891
I'm wondering because Establishing Tweepy connection is outside of __main__ and it shouldn't be running over and over again. I'm just wondering why Python behaves like that or there's a bug in my code?
When you run/import a python script every statement in it is executed (however when imported this will only happen first time the module is imported or when you do reload(module)). There are a few normally present statements that could be noted:
The execution of function definition means that the function is being defined (not executing the body of the function).
The execution of an import statement will import the module.
The execution of a class definition implies that the body of it is executed, mostly it will contain function definitions so it's mostly defining functions.
The execution of if statements means that the controlling expression is first evaluated and depending on that the body may be executed.
The execution of assignments means that the rhs-expression will be evaluated with possible side effects.
This is why one normally don't put code directly in the top level of a python script - it will be executed. If it should work as both a script and a module - the code that should be run when running as a script should be enclosed in a if __name__ == '__main__'-statement.
Unless you need global variabes your script would be a bunch of function definitions and class definitions followed by:
if __name__ == "__main__":
code_to_be_executed_iff_run_as_a_script()
else:
code_to_be_executed_iff_imported()
if you need global variables you will have to take special care sometimes to avoid side effects when running/importing the module.
If you want code that runs only when imported, it would go in the else clause of the normal __main__ guard:
if __name__ == '__main__':
print("Run as a script")
else:
print("Imported as a module")
That's exactly th reason why there's
if __name__ == "__main__":
Before this condition you should have functions and classes definitions and after it, code you would like to run.
Reason for this is that the __name__ variable is different when your file is imported (as every python file is also importable module) and run e.g. python myfile.py.
Create file e.g. myfile.py:
# content of myfile.py
print(__name__)
When you run it it will print __main__.
$ python myfile.py
__main__
But during import it carries the name of the imported module (myfile).
$ python
>>> import myfile
myfile

Gearman + SQLAlchemy - keep losing MySQL thread

I have a python script that sets up several gearman workers. They call into some methods on SQLAlchemy models I have that are also used by a Pylons app.
Everything works fine for an hour or two, then the MySQL thread gets lost and all queries fail. I cannot figure out why the thread is getting lost (I get the same results on 3 different servers) when I am defining such a low value for pool_recycle. Also, why wouldn't a new connection be created?
Any ideas of things to investigate?
import gearman
import json
import ConfigParser
import sys
from sqlalchemy import create_engine
class JSONDataEncoder(gearman.DataEncoder):
#classmethod
def encode(cls, encodable_object):
return json.dumps(encodable_object)
#classmethod
def decode(cls, decodable_string):
return json.loads(decodable_string)
# get the ini path and load the gearman server ips:ports
try:
ini_file = sys.argv[1]
lib_path = sys.argv[2]
except Exception:
raise Exception("ini file path or anypy lib path not set")
# get the config
config = ConfigParser.ConfigParser()
config.read(ini_file)
sqlachemy_url = config.get('app:main', 'sqlalchemy.url')
gearman_servers = config.get('app:main', 'gearman.mysql_servers').split(",")
# add anypy include path
sys.path.append(lib_path)
from mypylonsapp.model.user import User, init_model
from mypylonsapp.model.gearman import task_rates
# sqlalchemy setup, recycle connection every hour
engine = create_engine(sqlachemy_url, pool_recycle=3600)
init_model(engine)
# Gearman Worker Setup
gm_worker = gearman.GearmanWorker(gearman_servers)
gm_worker.data_encoder = JSONDataEncoder()
# register the workers
gm_worker.register_task('login', User.login_gearman_worker)
gm_worker.register_task('rates', task_rates)
# work
gm_worker.work()
I've seen this across the board for Ruby, PHP, and Python regardless of DB library used. I couldn't find how to fix this the "right" way which is to use mysql_ping, but there is a SQLAlchemy solution as explained better here http://groups.google.com/group/sqlalchemy/browse_thread/thread/9412808e695168ea/c31f5c967c135be0
As someone in that thread points out, setting the recycle option to equal True is equivalent to setting it to 1. A better solution might be to find your MySQL connection timeout value and set the recycle threshold to 80% of it.
You can get that value from a live set by looking up this variable http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_connect_timeout
Edit:
Took me a bit to find the authoritivie documentation on useing pool_recycle
http://www.sqlalchemy.org/docs/05/reference/sqlalchemy/connections.html?highlight=pool_recycle

Categories

Resources