Why Python runs the code outside of __main__ every time? - python

I'm just wondering the behaviour of Python and how it really works. I have a script to run and collect all followers and friends of an account.
This is the code.
#!/usr/bin/env python
import pymongo
import tweepy
from pymongo import MongoClient
from sweepy.get_config import get_config
config = get_config()
consumer_key = config.get('PROCESS_TWITTER_CONSUMER_KEY')
consumer_secret = config.get('PROCESS_TWITTER_CONSUMER_SECRET')
access_token = config.get('PROCESS_TWITTER_ACCESS_TOKEN')
access_token_secret = config.get('PROCESS_TWITTER_ACCESS_TOKEN_SECRET')
MONGO_URL = config.get('MONGO_URL')
MONGO_PORT = config.get('MONGO_PORT')
MONGO_USERNAME = config.get('MONGO_USERNAME')
MONGO_PASSWORD = config.get('MONGO_PASSWORD')
client = MongoClient(MONGO_URL, int(MONGO_PORT))
print 'Establishing Tweepy connection'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, retry_count=3)
db = client.tweets
db.authenticate(MONGO_USERNAME, MONGO_PASSWORD)
raw_tweets = db.raw_tweets
users = db.users
def is_user_in_db(screen_name):
return get_user_from_db(screen_name) is None
def get_user_from_db(screen_name):
return users.find_one({'screen_name' : screen_name})
def get_user_from_twitter(user_id):
return api.get_user(user_id)
def get_followers(screen_name):
users = []
for i, page in enumerate(tweepy.Cursor(api.followers, id=screen_name, count=200).pages()):
print 'Getting page {} for followers'.format(i)
users += page
return users
def get_friends(screen_name):
users = []
for i, page in enumerate(tweepy.Cursor(api.friends, id=screen_name, count=200).pages()):
print 'Getting page {} for friends'.format(i)
users += page
return users
def get_followers_ids(screen_name):
ids = []
for i, page in enumerate(tweepy.Cursor(api.followers_ids, id=screen_name, count=5000).pages()):
print 'Getting page {} for followers ids'.format(i)
ids += page
return ids
def get_friends_ids(screen_name):
ids = []
for i, page in enumerate(tweepy.Cursor(api.friends_ids, id=screen_name, count=5000).pages()):
print 'Getting page {} for friends ids'.format(i)
ids += page
return ids
def process_user(user):
screen_name = user['screen_name']
print 'Processing user : {}'.format(screen_name)
if is_user_in_db(screen_name):
user['followers_ids'] = get_followers_ids(screen_name)
user['friends_ids'] = get_friends_ids(screen_name)
users.insert_one(user)
else:
print '{} exists!'.format(screen_name)
print 'End processing user : {}'.format(screen_name)
if __name__ == "__main__":
for doc in raw_tweets.find({'processed' : {'$exists': False}}):
print 'Start processing'
try:
process_user(doc['user'])
except KeyError:
pass
try:
process_user(doc['retweeted_status']['user'])
except KeyError:
pass
raw_tweets.update_one({'_id': doc['_id']}, {'$set':{'processed':True}})
What I keep getting from the log is
Rate limit reached. Sleeping for: 889
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Rate limit reached. Sleeping for: 891
I'm wondering because Establishing Tweepy connection is outside of __main__ and it shouldn't be running over and over again. I'm just wondering why Python behaves like that or there's a bug in my code?

When you run/import a python script every statement in it is executed (however when imported this will only happen first time the module is imported or when you do reload(module)). There are a few normally present statements that could be noted:
The execution of function definition means that the function is being defined (not executing the body of the function).
The execution of an import statement will import the module.
The execution of a class definition implies that the body of it is executed, mostly it will contain function definitions so it's mostly defining functions.
The execution of if statements means that the controlling expression is first evaluated and depending on that the body may be executed.
The execution of assignments means that the rhs-expression will be evaluated with possible side effects.
This is why one normally don't put code directly in the top level of a python script - it will be executed. If it should work as both a script and a module - the code that should be run when running as a script should be enclosed in a if __name__ == '__main__'-statement.
Unless you need global variabes your script would be a bunch of function definitions and class definitions followed by:
if __name__ == "__main__":
code_to_be_executed_iff_run_as_a_script()
else:
code_to_be_executed_iff_imported()
if you need global variables you will have to take special care sometimes to avoid side effects when running/importing the module.

If you want code that runs only when imported, it would go in the else clause of the normal __main__ guard:
if __name__ == '__main__':
print("Run as a script")
else:
print("Imported as a module")

That's exactly th reason why there's
if __name__ == "__main__":
Before this condition you should have functions and classes definitions and after it, code you would like to run.
Reason for this is that the __name__ variable is different when your file is imported (as every python file is also importable module) and run e.g. python myfile.py.
Create file e.g. myfile.py:
# content of myfile.py
print(__name__)
When you run it it will print __main__.
$ python myfile.py
__main__
But during import it carries the name of the imported module (myfile).
$ python
>>> import myfile
myfile

Related

Python mock launches whole program instead of substituting input to specific method

I have a program, like:
module "Main":
import SymbolCalculator as sc
# Defining constants:
TEXT_INTRO = sc.TEXT_INTRO
TEXT_INVITE = "Please print any sentence below:\n"
sentence = ""
# Printing introduction to the program:
print TEXT_INTRO
def getting_result():
# Getting input string from console
sentence = sc.get_input_from_prompt(TEXT_INVITE)
# Forming result list via methods defined in SymbolCalculator module
return sc.characters_calculator(sentence)
result_list = getting_result()
# Computing summary via method defined in SymbolCalculator module
sc.printing_summary(sentence, result_list)
# Printing tuples with characters and their occurrences raw-by-raw
sc.printing_list(result_list)
raw_input("Please press any button to quit the program.")
print 'Bye!!!'
And I'm trying to create a simple unit test with mocked raw_input (updated):
from unittest import TestCase, main
from mock import patch
from Ex_41_42_SymbolCalculatorMain import getting_result
class Ex_4a1_SymbolCalculatorUnitTestWMock(TestCase):
##patch ('Ex_41_42_SymbolCalculator.get_input_from_prompt', return_value = 'aabc')
def test_valid_input(self):
with patch('__builtin__.raw_input', return_value = 'aaabbc') as _raw_input:
self.assertEqual(getting_result(), [('a', 3), ('b', 2), ('c', 1)])
_raw_input.assert_called_once_with('Please print any sentence below:\n')
#patch ('Ex_41_42_SymbolCalculator.get_input_from_prompt', return_value = '')
def test_empty_input(self, mock):
self.assertEqual(getting_result(), [])
if __name__ == "__main__":
main()
As well I tried to go via decoration of the tested method by itself, like:
...
#patch ('Ex_41_42_SymbolCalculator.get_input_from_prompt', return_value = 'aabc')
...
My problem is that when I launch the test, all the "Main" module runs at the moment of getting_result method calling. So it starts from the very beginning, asks me to make an input via command prompt, etc. Thus not only test, but the whole regular program is running.
While I'm expecting that only getting_result method is called being provided with return_value.
Please advise.
When you import a module, all the code in the module is run. It doesn't matter that you used from Ex_41_42_SymbolCalculatorMain import getting_result instead of import Ex_41_42_SymbolCalculatorMain; you're still importing the module. There's no way to just "get" one function without executing the rest of the code in the module.
Instead, you should put that code in a function, and then call it from within an if __name__ == "__main__" block, like this:
def getting_result():
# Getting input string from console
sentence = sc.get_input_from_prompt(TEXT_INVITE)
# Forming result list via methods defined in SymbolCalculator module
return sc.characters_calculator(sentence)
def do_stuff():
print TEXT_INTRO
result_list = getting_result()
# Computing summary via method defined in SymbolCalculator module
sc.printing_summary(sentence, result_list)
# Printing tuples with characters and their occurrences raw-by-raw
sc.printing_list(result_list)
raw_input("Please press any button to quit the program.")
print 'Bye!!!'
if __name__ == "__main__":
do_stuff()
Then do_stuff() will only be run if you execute that file directly, not if you import it. This will allow you to import the module without running the stuff in do_stuff. You can learn more about the __main__ business by searching this site for zillions of questions about it (such as this one).

executescript in python not passing flow file to next processor

I am trying to run executescript process in Apache Nifi using python but having problem with passing flow file to next processor in my data flow.
If I run the standalone flow file creation and writing snippet it works and I can read flow file in the next processor but when I try to enrich it, it simply does not pass the flow file. In fact no error is generated and somehow I have no clue how to proceed. I am bit new with python and nifi and appreciate your help with this particular issue.
Below is the code I am using and you can see its very simple. I just want to create and write some string to flow file using some logic. But no luck so far
import urllib2
import json
import datetime
import csv
import time
import sys
import traceback
from org.apache.nifi.processor.io import OutputStreamCallback
from org.python.core.util import StringUtil
class WriteContentCallback(OutputStreamCallback):
def __init__(self, content):
self.content_text = content
def process(self, outputStream):
try:
outputStream.write(StringUtil.toBytes(self.content_text))
except:
traceback.print_exc(file=sys.stdout)
raise
page_id = "dsssssss"
access_token = "sdfsdfsf%sdfsdf"
def scrapeFacebookPageFeedStatus(page_id, access_token):
flowFile = session.create()
flowFile = session.write(flowFile, WriteContentCallback("Hello there this is my data"))
flowFile = session.write()
session.transfer(flowFile, REL_SUCCESS)
print "\nDone!\n%s Statuses Processed in %s" % \
(num_processed, datetime.datetime.now() - scrape_starttime)
if __name__ == '__main__':
scrapeFacebookPageFeedStatus(page_id, access_token)
I believe the problem is the check for __main__:
if __name__ == '__main__':
scrapeFacebookPageFeedStatus(page_id, access_token)
__builtin__ was the actual module name in my experiment. You could either remove that check, or add a different one if you want to preserve your separate testing path.

python gettext: specify locale in _()

I am looking fo a way to set the language on the fly when requesting a translation for a string in gettext. I'll explain why :
I have a multithreaded bot that respond to users by text on multiple servers, thus needing to reply in different languages.
The documentation of gettext states that, to change locale while running, you should do the following :
import gettext # first, import gettext
lang1 = gettext.translation('myapplication', languages=['en']) # Load every translations
lang2 = gettext.translation('myapplication', languages=['fr'])
lang3 = gettext.translation('myapplication', languages=['de'])
# start by using language1
lang1.install()
# ... time goes by, user selects language 2
lang2.install()
# ... more time goes by, user selects language 3
lang3.install()
But, this does not apply in my case, as the bot is multithreaded :
Imagine the 2 following snippets are running at the same time :
import time
import gettext
lang1 = gettext.translation('myapplication', languages=['fr'])
lang1.install()
message(_("Loading a dummy task")) # This should be in french, and it will
time.sleep(10)
message(_("Finished loading")) # This should be in french too, but it wont :'(
and
import time
import gettext
lang = gettext.translation('myapplication', languages=['en'])
time.sleep(3) # Not requested on the same time
lang.install()
message(_("Loading a dummy task")) # This should be in english, and it will
time.sleep(10)
message(_("Finished loading")) # This should be in english too, and it will
You can see that messages sometimes are translated in the wrong locale.
But, if I could do something like _("string", lang="FR"), the problem would disappear !
Have I missed something, or I'm using the wrong module to do the task...
I'm using python3
While the above solutions seem to work, they don’t play well with the conventional _() function that aliases gettext(). But I wanted to keep that function, because it’s used to extract translation strings from the source (see docs or e.g. this blog).
Because my module runs in a multi-process and multi-threaded environment, using the application’s built-in namespace or a module’s global namespace wouldn’t work because _() would be a shared resource and subject to race conditions if multiple threads install different translations.
So, first I wrote a short helper function that returns a translation closure:
import gettext
def get_translator(lang: str = "en"):
trans = gettext.translation("foo", localedir="/path/to/locale", languages=(lang,))
return trans.gettext
And then, in functions that use translated strings I assigned that translation closure to the _, thus making it the desired function _() in the local scope of my function without polluting a global shared namespace:
def some_function(...):
_ = get_translator() # Pass whatever language is needed.
log.info(_("A translated log message!"))
(Extra brownie points for wrapping the get_translator() function into a memoizing cache to avoid creating the same closures too many times.)
You can just create translation objects for each language directly from .mo files:
from babel.support import Translations
def gettext(msg, lang):
return get_translator(lang).gettext(msg)
def get_translator(lang):
with open(f"path_to_{lang}_mo_file", "rb") as fp:
return Translations(fp=fp, domain="name_of_your_domain")
And a dict cache for them can be easily thrown in there too.
I took a moment to whip up a script that uses all the locales available on the system, and tries to print a well-known message in them. Note that "all locales" includes mere encoding changes, which are negated by Python anyway, and plenty of translations are incomplete so do use the fallback.
Obviously, you will also have to make appropriate changes to your use of xgettext (or equivalent) for you real code to identify the translating function.
#!/usr/bin/env python3
import gettext
import os
def all_languages():
rv = []
for lang in os.listdir(gettext._default_localedir):
base = lang.split('_')[0].split('.')[0].split('#')[0]
if 2 <= len(base) <= 3 and all(c.islower() for c in base):
if base != 'all':
rv.append(lang)
rv.sort()
rv.append('C.UTF-8')
rv.append('C')
return rv
class Domain:
def __init__(self, domain):
self._domain = domain
self._translations = {}
def _get_translation(self, lang):
try:
return self._translations[lang]
except KeyError:
# The fact that `fallback=True` is not the default is a serious design flaw.
rv = self._translations[lang] = gettext.translation(self._domain, languages=[lang], fallback=True)
return rv
def get(self, lang, msg):
return self._get_translation(lang).gettext(msg)
def print_messages(domain, msg):
domain = Domain(domain)
for lang in all_languages():
print(lang, ':', domain.get(lang, msg))
def main():
print_messages('libc', 'No such file or directory')
if __name__ == '__main__':
main()
The following example uses the translation directly, as shown in o11c's answer to allow the use of threads:
import gettext
import threading
import time
def translation_function(quit_flag, language):
lang = gettext.translation('simple', localedir='locale', languages=[language])
while not quit_flag.is_set():
print(lang.gettext("Running translator"), ": %s" % language)
time.sleep(1.0)
if __name__ == '__main__':
thread_list = list()
quit_flag = threading.Event()
try:
for lang in ['en', 'fr', 'de']:
t = threading.Thread(target=translation_function, args=(quit_flag, lang,))
t.daemon = True
t.start()
thread_list.append(t)
while True:
time.sleep(1.0)
except KeyboardInterrupt:
quit_flag.set()
for t in thread_list:
t.join()
Output:
Running translator : en
Traducteur en cours d’exécution : fr
Laufenden Übersetzer : de
Running translator : en
Traducteur en cours d’exécution : fr
Laufenden Übersetzer : de
I would have posted this answer if I had known more about gettext. I am leaving my previous answer for folks who really want to continue using _().
The following simple example shows how to use a separate process for each translator:
import gettext
import multiprocessing
import time
def translation_function(language):
try:
lang = gettext.translation('simple', localedir='locale', languages=[language])
lang.install()
while True:
print(_("Running translator"), ": %s" % language)
time.sleep(1.0)
except KeyboardInterrupt:
pass
if __name__ == '__main__':
thread_list = list()
try:
for lang in ['en', 'fr', 'de']:
t = multiprocessing.Process(target=translation_function, args=(lang,))
t.daemon = True
t.start()
thread_list.append(t)
while True:
time.sleep(1.0)
except KeyboardInterrupt:
for t in thread_list:
t.join()
The output looks like this:
Running translator : en
Traducteur en cours d’exécution : fr
Laufenden Übersetzer : de
Running translator : en
Traducteur en cours d’exécution : fr
Laufenden Übersetzer : de
When I tried this using threads, I only got an English translation. You could create individual threads in each process to handle connections. You probably do not want to create a new process for each connection.

Windows API hooking python to msvbvm60.dll (rtcMsgBox)

I want to intercept the API calls of a process to know when a process call to the API rtcMsgBox of the msvbvm60 dll.
I have tried it with this code but it seems not to work:
from winappdbg import Debug, EventHandler
import sys
import os
class MyEventHandler( EventHandler ):
# Add the APIs you want to hook
apiHooks = {
'msvbvm60.dll' : [( 'rtcMsgBox' , 7 ),],'kernel32.dll' : [( 'CreateFileW' , 7 ),],
}
# The pre_ functions are called upon entering the API
def pre_CreateFileW(self, event, ra, lpFileName, dwDesiredAccess,
dwShareMode, lpSecurityAttributes, dwCreationDisposition,
dwFlagsAndAttributes, hTemplateFile):
fname = event.get_process().peek_string(lpFileName, fUnicode=True)
print "CreateFileW: %s" % (fname)
# The post_ functions are called upon exiting the API
def post_CreateFileW(self, event, retval):
if retval:
print 'Suceeded (handle value: %x)' % (retval)
else:
print 'Failed!'
if __name__ == "__main__":
if len(sys.argv) < 2 or not os.path.isfile(sys.argv[1]):
print sys.argv[1]
print "\nUsage: %s <File to monitor> [arg1, arg2, ...]\n" % sys.argv[0]
sys.exit()
# Instance a Debug object, passing it the MyEventHandler instance
debug = Debug( MyEventHandler() )
try:
# Start a new process for debugging
p = debug.execv(sys.argv[1:], bFollow=True)
# Wait for the debugged process to finish
debug.loop()
# Stop the debugger
finally:
debug.stop()
It works with the CreateFileW API of Kernel32.dll but not with the rtcMsgBox of msvbvm60.dll. Why? What I am doing wrong?
EDIT: By the way I don't know why the code I paste is divided in two pieces of code. The webapp don't parse it correctly but it is just all the same piece of code.
Thanks

Not getting all InfoMessage Events with Python and win32com

I am currently trying to get the percentage complete messages that are returned by the InfoMessage event from ADO (and a SQL server) when running the BACKUP command. (See my previous question for more details).
I have managed to connect to the SQL server and issue it SQL commands, and event get events back. However when I execute the the BACKUP command the cmd.Execute method blocks until the backup is complete.
But during this time I will get a single InfoMessage event call (which will have a message like "1 Percent Complete") and after that I won't receive any more events.
I have tried this using a stored procedure, where the stored procedure prints 3 messages, and even here I will get the first message and nothing else.
I suspect that I need to call pythoncom.PumpWaitingMessages(), but because the cmd.Execute() call blocks I never get anything of any use.
Can anyone work out how to get more that just a single InfoMessage event.
Below is the code that I'm currently using:
import win32com
import pythoncom
import adodbapi
import time
import win32gui
from win32com.client import gencache
gencache.EnsureModule('{2A75196C-D9EB-4129-B803-931327F72D5C}', 0, 2, 8)
defaultNamedOptArg=pythoncom.Empty
defaultNamedNotOptArg=pythoncom.Empty
defaultUnnamedArg=pythoncom.Empty
global connected
connected = False
class events():
def OnInfoMessage(self, pError, adStatus, pConnection):
print 'Info Message'
a = pError.QueryInterface(pythoncom.IID_IDispatch)
a = win32com.client.Dispatch(a)
print a.Description
print a.Number
print a.Source
#print 'B', adStatus
c = pConnection.QueryInterface(pythoncom.IID_IDispatch)
c = win32com.client.Dispatch(c)
print c.Errors.Count
print c.Errors.Item(0).Description
return 1
def OnCommitTransComplete(self, pError=defaultNamedNotOptArg, adStatus=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg): pass
def OnWillExecute(self, Source=defaultNamedNotOptArg, CursorType=defaultNamedNotOptArg, LockType=defaultNamedNotOptArg, Options=defaultNamedNotOptArg
, adStatus=defaultNamedNotOptArg, pCommand=defaultNamedNotOptArg, pRecordset=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg):
print 'Execute Event'
return Source
def OnDisconnect(self, adStatus=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg):
print 'Disconnected'
def OnExecuteComplete(self, RecordsAffected=defaultNamedNotOptArg, pError=defaultNamedNotOptArg, adStatus=defaultNamedNotOptArg, pCommand=defaultNamedNotOptArg
, pRecordset=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg):
print 'Execute complete'
def OnWillConnect(self, ConnectionString=defaultNamedNotOptArg, UserID=defaultNamedNotOptArg, Password=defaultNamedNotOptArg, Options=defaultNamedNotOptArg
, adStatus=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg):
print 'About to connect'
def OnConnectComplete(self, pError=defaultNamedNotOptArg, adStatus=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg):
print 'Connected'
global connected
connected = True
def OnBeginTransComplete(self, TransactionLevel=defaultNamedNotOptArg, pError=defaultNamedNotOptArg, adStatus=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg):pass
def OnRollbackTransComplete(self, pError=defaultNamedNotOptArg, adStatus=defaultNamedNotOptArg, pConnection=defaultNamedNotOptArg): pass
if __name__ == '__main__':
pythoncom.CoInitialize()
conn = win32com.client.DispatchWithEvents("ADODB.Connection", events)
conn.ConnectionString = 'Data Source=HPDX2250RAAZ\\SQLEXPRESS; Provider=SQLOLEDB; Integrated Security=SSPI'
conn.CommandTimeout = 30
conn.CursorLocation = 2
conn.Open(pythoncom.Empty,pythoncom.Empty,pythoncom.Empty,0x10)
while not connected:
#pythoncom.PumpWaitingMessages()
win32gui.PumpWaitingMessages()
time.sleep(0.1)
conn.BeginTrans()
conn.Errors.Clear()
cmd=win32com.client.Dispatch("ADODB.Command")
cmd.ActiveConnection=conn
cmd.CommandTimeout = 30 #v2.1 Simons
cmd.CommandText="EXECUTE [test].[dbo].[Test] "
print 'Execute'
cmd.Execute()
pythoncom.PumpWaitingMessages()
print 'Called'
print ''
print conn.Errors.Count
conn.RollbackTrans()
conn.Close()
I was having the same issue and what the issue is, if you are experiencing the same problem is the messages are basically being held up by the SQL Server engine itself. To get arround this you need to tell SQL not to wait till the end of processing to send the messages but to send them as they occur.
Try this on for size:
SET #message = 'My message...'
RAISERROR (#message, 10, 1) WITH NOWAIT
This should send the message and your front end should pick these up as the system goes along.
Hope this helps
I found a workaround that is compatible with pymssql and other drivers. I use the SQL from Is there a SQL script that I can use to determine the progress of a SQL Server backup or restore process? plus a background thread that each X seconds run that query. Now, for notification I use http://pydispatcher.sourceforge.net/ to get back the progress.
#This is rough extract from my actual code. Probably not work as is, but outline the idea
import dispatch #Decoupled send of messages, identical to django signals
def monitorBackup(self):
return self.selectSql(SQL_MONITOR)
def backup(sql):
con = self.getCon() #Get new connection, we are in another thread!
con.execute_query("HERE THE BACKUP SQL")
result = threading.Thread(target=partial(backup, sql))
result.start()
while result.isAlive():
time.sleep(5) # with the monitor SQL result, is possible to get a estimated time to complete and adjust this...
rows = self.monitorBackup()
if len(rows) > 0:
percentage = rows[0].Percent
self.send(
msg="%d %%" % percentage,
action="progress",
progress=percentage
)

Categories

Resources