Python-ldap search: Size Limit Exceeded - python

I'm using the python-ldap library to connect to our LDAP server and run queries. The issue I'm running into is that despite setting a size limit on the search, I keep getting SIZELIMIT_EXCEEDED errors on any query that would return too many results. I know that the query itself is working because I will get a result if the query returns a small subset of users. Even if I set the size limit to something absurd, like 1, I'll still get a SIZELIMIT_EXCEEDED on those bigger queries. I've pasted a generic version of my query below. Any ideas as to what I'm doing wrong here?
result = self.ldap.search_ext_s(self.base, self.scope, '(personFirstMiddle=<value>*)', sizelimit=5)

When the LDAP client requests a size-limit, that is called a 'client-requested' size limit. A client-requested size limit cannot override the size-limit set by the server. The server may set a size-limit for the server as a whole, for a particular authorization identity, or for other reasons - whichever the case, the client may not override the server size limit. The search request may have to be issued in multiple parts using the simple paged results control or the virtual list view control.

Here's a Python3 implementation that I came up with after heavily editing what I found here and in the official documentation. At the time of writing this it works with the pip3 package python-ldap version 3.2.0.
def get_list_of_ldap_users():
hostname = "google.com"
username = "username_here"
password = "password_here"
base = "dc=google,dc=com"
print(f"Connecting to the LDAP server at '{hostname}'...")
connect = ldap.initialize(f"ldap://{hostname}")
connect.set_option(ldap.OPT_REFERRALS, 0)
connect.simple_bind_s(username, password)
connect=ldap_server
search_flt = "(personFirstMiddle=<value>*)" # get all users with a specific middle name
page_size = 500 # how many users to search for in each page, this depends on the server maximum setting (default is 1000)
searchreq_attrlist=["cn", "sn", "name", "userPrincipalName"] # change these to the attributes you care about
req_ctrl = SimplePagedResultsControl(criticality=True, size=page_size, cookie='')
msgid = connect.search_ext(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
total_results = []
pages = 0
while True: # loop over all of the pages using the same cookie, otherwise the search will fail
pages += 1
rtype, rdata, rmsgid, serverctrls = connect.result3(msgid)
for user in rdata:
total_results.append(user)
pctrls = [c for c in serverctrls if c.controlType == SimplePagedResultsControl.controlType]
if pctrls:
if pctrls[0].cookie: # Copy cookie from response control to request control
req_ctrl.cookie = pctrls[0].cookie
msgid = connect.search_ext(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
else:
break
else:
break
return total_results
This will return a list of all users but you can edit it as required to return what you want without hitting the SIZELIMIT_EXCEEDED issue :)

Related

cx_Oracle SessionPool root of all Flask problems

I created a web service in Flask over uwsgi. I thought I would follow good practice and create a SessionPool with 20 connections to be safe. Each call to a web service endpoint, I acquire a connection from the pool, and at the end I release it.
When using Locust to swarm test the API, I was getting hundreds of failures, nearly 100% on some of the longer responses (30Mb JSON response). Smaller payloads were much better, but with intermittent failures.
The minute I switched back to bad practice and created a brand new connection and cursor within the method itself, all my problems vanished. 100% success on 1000s of stress test calls.
My errors were varied. TNS Bad Packet, incorrect number of connections from pool, request cancelled by user....you name it, it was there.
So I can't use Oracle connection pooling with flask it seems, or have a single connection at the Flask application level (this generated errors, not sure why, which is why I switched to connection pooling).
Any advice on creating scalable apps using cx_Oracle in flask.
My original code was:
pool = cx_Oracle.SessionPool("user", "password", "myserver.company.net:1521/myservice", min=10, max=10, increment=0, getmode=cx_Oracle.SPOOL_ATTRVAL_WAIT, encoding="UTF-8")
def read_products_search(search=None):
"""
This function responds to a request for /api/products
with the complete lists of people
:return: json string of list of people
"""
conn_ariel = pool.acquire()
cursor_ariel = conn_ariel.cursor()
search=search.lower()
print("product search term is: ", search)
# Create the list of products from our data
sql = """
SELECT DRUG_PRODUCT_ID, PREFERRED_TRADE_NAME, PRODUCT_LINE, PRODUCT_TYPE, FLAG_PASSIVE, PRODUCT_NUMBER
FROM DIM_DRUG_PRODUCT
WHERE lower(PREFERRED_TRADE_NAME) LIKE '%' || :search1 || '%' or lower(PRODUCT_LINE) LIKE '%' || :search2 || '%' or lower(PRODUCT_NUMBER) LIKE '%' || :search3 || '%'
ORDER BY PREFERRED_TRADE_NAME ASC
"""
cursor_ariel.execute(sql, {"search1":search,"search2":search, "search3":search })
products = []
for row in cursor_ariel.fetchall():
r = reg(cursor_ariel, row, False)
product = {
"drug_product_id" : r.DRUG_PRODUCT_ID,
"preferred_trade_name" : r.PREFERRED_TRADE_NAME,
"product_line" : r.PRODUCT_LINE,
"product_type" : r.PRODUCT_TYPE,
"flag_passive" : r.FLAG_PASSIVE,
"product_number" : r.PRODUCT_NUMBER
}
# logging.info("Adding Product: %r", product)
products.append(product)
if len(products) == 0:
products = None
pool.release(conn_ariel)
return products
When you create the pool, use threaded=True.
See How to use Python Flask with Oracle Database.

How can I refresh the token with social-auth-app-django?

I use Python Social Auth - Django to log in my users.
My backend is Microsoft, so I can use Microsoft Graph but I don't think that it is relevant.
Python Social Auth deals with authentication but now I want to call the API and for that, I need a valid access token.
Following the use cases I can get to this:
social = request.user.social_auth.get(provider='azuread-oauth2')
response = self.get_json('https://graph.microsoft.com/v1.0/me',
headers={'Authorization': social.extra_data['token_type'] + ' '
+ social.extra_data['access_token']})
But the access token is only valid for 3600 seconds and so I need to refresh, I guess I can do it manually but there must be a better solution.
How can I get an access_token refreshed?
.get_access_token(strategy) refresh the token automatically if it's expired. You can use it like that:
from social_django.utils import load_strategy
#...
social = request.user.social_auth.get(provider='google-oauth2')
access_token = social.get_access_token(load_strategy())
Using load_strategy() at social.apps.django_app.utils:
social = request.user.social_auth.get(provider='azuread-oauth2')
strategy = load_strategy()
social.refresh_token(strategy)
Now the updated access_token can be retrieved from social.extra_data['access_token'].
The best approach is probably to check if it needs to be updated (customized for AzureAD Oauth2):
def get_azuread_oauth2_token(user):
social = user.social_auth.get(provider='azuread-oauth2')
if social.extra_data['expires_on'] <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
This is based on the method get_auth_tokenfrom AzureADOAuth2. I don't think this method is accessible outside the pipeline, please answer this question if there is any way to do it.
Updates
Update 1 - 20/01/2017
Following an Issue to request an extra data parameter with the time of the access token refresh, it is now possible to check if the access_token needs to be updated in every backend.
In future versions (>0.2.1 for the social-auth-core) there will be a new field in extra data:
'auth_time': int(time.time())
And so this works:
def get_token(user, provider):
social = user.social_auth.get(provider=provider)
if (social.extra_data['auth_time'] + social.extra_data['expires']) <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
Note: According to OAuth 2 RFC all responses should (it's a RECOMMENDED param) provide an expires_in but for most backends (including the azuread-oauth2) this value is being saved as expires. Be careful to understand how your backend behaves!
An Issue on this exists and I will be update the answer with the relevant info when it exists.
Update 2 - 17/02/17
Additionally, there is a method in UserMixin called access_token_expired (code) that can be used to assert if the token is valid or not (note: this method doesn't work for race conditions, as pointed out in this anwser by #SCasey).
Update 3 - 31/05/17
In Python Social Auth - Core v1.3.0 get_access_token(self, strategy) was introduced in storage.py.
So now:
from social_django.utils import load_strategy
social = request.user.social_auth.get(provider='azuread-oauth2')
response = self.get_json('https://graph.microsoft.com/v1.0/me',
headers={'Authorization': '%s %s' % (social.extra_data['token_type'],
social.get_access_token(load_strategy())}
Thanks #damio for pointing it out.
#NBajanca's update is almost correct for version 1.0.1.
extra_data['expires_in']
is now
extra_data['expires']
So the code is:
def get_token(user, provider):
social = user.social_auth.get(provider=provider)
if (social.extra_data['auth_time'] + social.extra_data['expires']) <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
I'd also recommend subtracting an arbitrary amount of time from that calc, so that we don't run into a race situation where we've checked the token 0.01s before expiry and then get an error because we sent the request after expiry. I like to add 10 seconds just to be safe, but it's probably overkill:
def get_token(user, provider):
social = user.social_auth.get(provider=provider)
if (social.extra_data['auth_time'] + social.extra_data['expires'] - 10) <= int(time.time()):
strategy = load_strategy()
social.refresh_token(strategy)
return social.extra_data['access_token']
EDIT
#NBajanca points out that expires_in is technically correct per the Oauth2 docs. It seems that for some backends, this may work. The code above using expires is what works with provider="google-oauth2" as of v1.0.1

How do I connect dbus and policykit to my function in python?

I am making a python application that has a method needing root privileges. From https://www.freedesktop.org/software/polkit/docs/0.105/polkit-apps.html, I found Example 2. Accessing the Authority via D-Bus which is the python version of the code below, I executed it and I thought I'd be able to get root privileges after entering my password but I'm still getting "permission denied" on my app. This is the function I'm trying to connect
import dbus
bus = dbus.SystemBus()
proxy = bus.get_object('org.freedesktop.PolicyKit1', '/org/freedesktop/PolicyKit1/Authority')
authority = dbus.Interface(proxy, dbus_interface='org.freedesktop.PolicyKit1.Authority')
system_bus_name = bus.get_unique_name()
subject = ('system-bus-name', {'name' : system_bus_name})
action_id = 'org.freedesktop.policykit.exec'
details = {}
flags = 1 # AllowUserInteraction flag
cancellation_id = '' # No cancellation id
result = authority.CheckAuthorization(subject, action_id, details, flags, cancellation_id)
print result
In the python code you quoted, does result indicate success or failure? If it fails, you need to narrow down the error by first of all finding out what the return values of bus, proxy, authority and system_bus_name are. If it succeeds, you need to check how you are using the result.

Google Contacts API: Temporary internal error, when uploading contact photos in parallel

I need to change the contact photo for a large number of contacts, using the python client for the Google Contacts API 3.0
gdata==2.0.18
The code I'm running is:
client = gdata.contacts.client.ContactsClient(source=MY_APP_NAME)
GDClientAuth(client, MY_AUTH)
def _get_valid_contact(contact_id):
contact = client.GetContact(contact_id)
if contact.GetPhotoLink() is None:
# Generate a proper photo link for this contact
link = gdata.contacts.data.ContactLink()
link.etag = '*'
link.href = generate_photo_url(contact)
link.rel = 'http://schemas.google.com/contacts/2008/rel#photo'
link.type = 'image/*'
contact.link.append(link)
return contact
def upload_photo(contact_id, image_path, image_type, image_size):
contact = _get_valid_contact(contact_id)
try:
client.ChangePhoto(media=image_path,
contact_entry_or_url=contact,
content_type=image_type,
content_length=image_size)
except gdata.client.RequestError as req:
if req.status == 412:
#handle etag mismatches, etc...
pass
Given a list of valid Google contact ids, if I run the upload_photo method sequentially for each of them, everything goes smoothly, and all the contacts get their photo changed:
for contact_id in CONTACT_ID_LIST:
upload_photo(contact_id, '/path/to/image', 'image/png', 1234)
However, if I try to upload the photos in parallel (using at least 4 threads), some of them shall randomly fail with 500, A temporary internal problem has occurred. Try again later as a response to the client.ChangePhoto call. I can retry these photos later, though, and they finally get updated:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(4)
for contact_id in CONTACT_ID_LIST:
pool.apply_async(func=upload_photo,
args=(contact_id,'/path/to/image', 'image/png', 1234))
The more threads I use, the more frequently the error happens.
The only similar issue I could find is http://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=2507, and it was solved some time ago.
The issue I'm facing now might be different, as it happens randomly, and only when running the updates in parallel. So there are chances there might be a race condition at some point at the Google Contacts API end.

Let python sleep 60 secs after it has crawled every 20 pages

I am trying to collect the retweets data from a Chinese microblog Sina Weibo, you can see the following code. However, I am suffering from the problem of IP request out of limit.
To solve this problem, I have to set time.sleep() for the code. You can see I attempted to add a line of ' time.sleep(10) # to opress the ip request limit' in the code. Thus python will sleep 10 secs after crawling a page of retweets (one page contains 200 retweets).
However, it still not sufficient to deal with the IP problem.
Thus, I am planning to more systematically make python sleep 60 secs after it has crawled every 20 pages. Your ideas will be appreciated.
ids=[3388154704688495, 3388154704688494, 3388154704688492]
addressForSavingData= "C:/Python27/weibo/Weibo_repost/repostOwsSave1.csv"
file = open(addressForSavingData,'wb') # save to csv file
for id in ids:
if api.rate_limit_status().remaining_hits >= 205:
for object in api.counts(ids=id):
repost_count=object.__getattribute__('rt')
print id, repost_count
pages= repost_count/200 +2 # why should it be 2? cuz python starts from 0
for page in range(1, pages):
time.sleep(10) # to opress the ip request limit
for object in api.repost_timeline(id=id, count=200, page=page): # get the repost_timeline of a weibo
"""1.1 reposts"""
mid = object.__getattribute__("id")
text = object.__getattribute__("text").encode('gb18030') # add encode here
"""1.2 reposts.user"""
user = object.__getattribute__("user") # for object in user
user_id = user.id
"""2.1 retweeted_status"""
rts = object.__getattribute__("retweeted_status")
rts_mid = rts.id # the id of weibo
"""2.2 retweeted_status.user"""
rtsuser_id = rts.user[u'id']
try:
w = csv.writer(file,delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
w.writerow(( mid,
user_id, rts_mid,
rtsuser_id, text)) # write it out
except: # Exception of UnicodeEncodeError
pass
elif api.rate_limit_status().remaining_hits < 205:
sleep_time=api.rate_limit_status().reset_time_in_seconds # time.time()
print sleep_time, api.rate_limit_status().reset_time
time.sleep(sleep_time+2)
file.close()
pass
Can you not just pace the script instead?
I suggest to make your script sleep in between each request instead of making a requests all at the same time. And say span over a minute.. This way you will also avoid any flooding bans and this is considered good behaviour.
Pacing your requests may also allow you to do things more quickly if the server does not time you out for sending too many requests.
If there is a limit to the IP sometimes their are no great and easy solutions. For example if you run apache http://opensource.adnovum.ch/mod_qos/ limits bandwidth and connections and specifically it limits;
The maximum number of concurrent requests
Limitation of the bandwidth such as the maximum allowed number of requests per second to an URL or the maximum/minimum of downloaded kbytes per second.
Limits the number of request events per second
Generic request line and header filter to deny unauthorized operations.
Request body data limitation and filtering
the maximum number of allowed connections from a single IP source address or dynamic keep-alive control.
You may want to start with these. You could send referrer URL's with your requests and make only single connections, not multiple connections.
You could also refer to this question
I figure out the solution:
first, give an integer, e.g 0
i = 0
second, in the for page loop, add the following code
for page in range(1, 300):
i += 1
if (i % 25 ==0):
print i, "find i which could be exactly divided by 25"

Categories

Resources