trying to parse ldif file using python-ldap ldifparser

trying to parse ldif file using python-ldap ldifparser - python

I have ldif file as below, I want to extract only the dn and changetype here using the ldif parser of python-ldap package.
dn: cn=abc, cn=def, cn="dc=grid,dc=mycompany,dc=com", cn=tree, cn=config
changetype: add
objectClass: top
cn: abc
description: myserver
I have written parser code as below:
from ldif import LDIFParser, LDIFRecordList
parser = LDIFRecordList(open("cluster1.ldif", "r"))
parser.parse()
for dn, entry in parser.all_records:
print(dn)
print(entry)
but this reads everything and skips the changetype key am not sure what is causing that. Is there a better way to parse the ldif file?
Adding output of requested commands in comment:
python -c 'import sys; import ldap; print("\n".join([sys.version, ldap.__version__]))'
2.7.5 (default, May 31 2018, 09:41:32)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
2.4.15

Update
You appear to be using a very old version of the python-ldap module. Version 2.4.15 was released over seven years ago. The current release is 3.3.1, which is what you get if you pip install python-ldap.
Python 2 itself went end-of-life in Janurary 2020, and version 2.7.5 was released back in 2013.
The software you're working with is very old and has bugs that were fixed in more recent versions. You should upgrade.
As I mentioned in comments, I'm not able to reproduce the behavior you've described. If I put your sample LDIF content into cluster1.ldif, I see:
>>> from ldif import LDIFParser, LDIFRecordList
>>> parser = LDIFRecordList(open("cluster1.ldif", "r"))
>>> parser.parse()
>>> for dn, entry in parser.all_records:
... print(dn)
... print(entry)
...
cn=abc, cn=def, cn="dc=grid,dc=mycompany,dc=com", cn=tree, cn=config
{'changetype': [b'add'], 'objectClass': [b'top '], 'cn': [b'abc'], 'description': [b'myserver']}
>>>
You asked about an alternative to use LDIFRecordList. You can of course write your own handler by subclassing LDIFParser, but the underlying LDIF parsing is still going to be the same. That would look something like:
from ldif import LDIFParser, LDIFRecordList
class MyParser(LDIFParser):
def __init__(self, *args, **kwargs):
self.records = []
super().__init__(*args, **kwargs)
def handle(self, dn, entry):
self.records.append((dn, entry))
parser = MyParser(open("cluster1.ldif", "r"))
parser.parse()
for dn, entry in parser.records:
print(dn)
print(entry)
...but that's really just re-implementing LDIFRecordList, so I don't think you obtain any benefit from doing this.
I'm using python 3.9.6 and python-ldap 3.3.1. If you continue to see different behavior, would you update your question to include the output of python -c 'import sys; import ldap; print("\n".join([sys.version, lda p.__version__]))'

If I understand right, you want something like this:
#!/usr/bin/python
from ldif import LDIFParser
from sys import argv
class myParser( LDIFParser ):
def handle( self, dn, entry ):
print( "DN:", dn )
print( "CT:", entry['changetype'], "\n" )
parser = myParser( open( argv[1] ) )
parser.parse()
LDIFParser.handle() method is invoked for every DN of your LDIF and in it you have direct access to DN (dn variable) and AVPs (entry variable), latter being a dictionary of attribute names with associated values as byte arrays.
Does that help?

Related

How to ensure python binaries are on your path?

I am trying to use the Kaggle api. I have downloaded kaggle using pip and moved kaggle.json to ~/.kaggle, but I haven't been able to run kaggle on Command Prompt. It was not recognized. I suspect it is because I have not accomplished the step "ensure python binaries are on your path", but honestly I am not sure what it means. Here is the error message when I try to download a dataset:
>>> sys.version
'3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)]'
>>> import kaggle
>>> kaggle datasets list -s demographics
File "<stdin>", line 1
kaggle datasets list -s demographics
^
SyntaxError: invalid syntax

kaggle is python module but it should also install script with the same name kaggle which you can run in console/terminal/powershell/cmd.exe as
kaggle datasets list -s demographics
but this is NOT code which you can run in Python Shell or in Python script.
If you find this script kaggle and open it in editor then you can see it imports main from kaggle.cli and it runs main()
And this can be used in own script as
import sys
from kaggle.cli import main
sys.argv += ['datasets', 'list', '-s', 'demographics']
main()
But this method sends results directly on screen/console and it would need assign own class to sys.stdout to catch this text in variable.
Something like this:
import sys
import kaggle.cli
class Catcher():
def __init__(self):
self.text = ''
def write(self, text):
self.text += text
def close(self):
pass
catcher = Catcher()
old_stdout = sys.stdout # keep old stdout
sys.stdout = catcher # assing new class
sys.argv += ['datasets', 'list', '-s', 'demographics']
result = kaggle.cli.main()
sys.stdout = old_stdout # assign back old stdout (because it is needed to run correctly `print()`
print(catcher.text)
Digging in source code on script kaggle I see you can do the same using
import kaggle.api
kaggle.api.dataset_list_cli(search='demographics')
but this also send all directly on screen/console.
EDIT:
You can get result as list of special objects which you can later use with for-loop
import kaggle.api
result = kaggle.api.dataset_list(search='demographics')
for item in result:
print('title:', item.title)
print('size:', item.size)
print('last updated:', item.lastUpdated)
print('download count:', item.downloadCount)
print('vote count:', item.voteCount)
print('usability rating:', item.usabilityRating)
print('---')

How to accept an indefinite number of options (and how to query their names/values) using click CLI framework?

I want to pass an unlimited number of options to a click CLI. I don't know Option names either. I'm getting around this issue by using an option named conf. It accepts a string that is assumed to represent a JSON object.
What I've done:
#click.command()
#click.option('--conf', type=str)
def dummy(conf):
click.echo('dummy param {}'.format(conf))
How I use it:
>python main.py dummy --conf='{"foo": "bar", "fizz": "buzz"}'
What I want to do:
#click.command()
#some magic stuff
def dummy(**kwargs):
click.echo('dummy param {}'.format(**kwargs))
How I want to use it:
>python main.py dummy --foo=bar --fizz=buzz

You can hook the parser and make it aware of each option given from the command line like:
Custom Command Class:
import click
class AcceptAllCommand(click.Command):
def make_parser(self, ctx):
"""Hook 'make_parser' and allow the opt dict to find any option"""
parser = super(AcceptAllCommand, self).make_parser(ctx)
command = self
class AcceptAllDict(dict):
def __contains__(self, item):
"""If the parser does no know this option, add it"""
if not super(AcceptAllDict, self).__contains__(item):
# create an option name
name = item.lstrip('-')
# add the option to our command
click.option(item)(command)
# get the option instance from the command
option = command.params[-1]
# add the option instance to the parser
parser.add_option(
[item], name.replace('-', '_'), obj=option)
return True
# set the parser options to our dict
parser._short_opt = AcceptAllDict(parser._short_opt)
parser._long_opt = AcceptAllDict(parser._long_opt)
return parser
Using the Custom Class:
To use the custom class, just pass the class to the click.command() decorator like:
#click.command(cls=AcceptAllCommand)
def my_command(**kwargs):
...
How does this work?
This works because click is a well designed OO framework. The #click.command() decorator usually instantiates a click.Command object but allows this behavior to be over-ridden with the cls parameter. So it is a relatively easy matter to inherit from click.Command in our own class and over ride the desired methods.
In this case, we override make_parser() and replace the option dicts with a dict class of our own. In our dict we override the __contains__() magic method, and in it we populate the parser with an option matching the name that is being looked for, if that name does not already exist.
Test Code:
#click.command(cls=AcceptAllCommand)
def dummy(**kwargs):
click.echo('dummy param: {}'.format(kwargs))
if __name__ == "__main__":
import time
cmd = '--foo=bar --fizz=buzz'
print('Click Version: {}'.format(click.__version__))
print('Python Version: {}'.format(sys.version))
print('-----------')
print('> ' + cmd)
time.sleep(0.1)
dummy(cmd.split())
Results:
Click Version: 6.7
Python Version: 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)]
-----------
> --foo=bar --fizz=buzz
dummy param: {'foo': 'bar', 'fizz': 'buzz'}

How do I get the Python line number and file name of the point this function was called from? [duplicate]

In C++, I can print debug output like this:
printf(
"FILE: %s, FUNC: %s, LINE: %d, LOG: %s\n",
__FILE__,
__FUNCTION__,
__LINE__,
logmessage
);
How can I do something similar in Python?

There is a module named inspect which provides these information.
Example usage:
import inspect
def PrintFrame():
callerframerecord = inspect.stack()[1] # 0 represents this line
# 1 represents line at caller
frame = callerframerecord[0]
info = inspect.getframeinfo(frame)
print(info.filename) # __FILE__ -> Test.py
print(info.function) # __FUNCTION__ -> Main
print(info.lineno) # __LINE__ -> 13
def Main():
PrintFrame() # for this line
Main()
However, please remember that there is an easier way to obtain the name of the currently executing file:
print(__file__)

For example
import inspect
frame = inspect.currentframe()
# __FILE__
fileName = frame.f_code.co_filename
# __LINE__
fileNo = frame.f_lineno
There's more here http://docs.python.org/library/inspect.html

Building on geowar's answer:
class __LINE__(object):
import sys
def __repr__(self):
try:
raise Exception
except:
return str(sys.exc_info()[2].tb_frame.f_back.f_lineno)
__LINE__ = __LINE__()
If you normally want to use __LINE__ in e.g. print (or any other time an implicit str() or repr() is taken), the above will allow you to omit the ()s.
(Obvious extension to add a __call__ left as an exercise to the reader.)

You can refer my answer:
https://stackoverflow.com/a/45973480/1591700
import sys
print sys._getframe().f_lineno
You can also make lambda function

I was also interested in a __LINE__ command in python.
My starting point was https://stackoverflow.com/a/6811020 and I extended it with a metaclass object. With this modification it has the same behavior like in C++.
import inspect
class Meta(type):
def __repr__(self):
# Inspiration: https://stackoverflow.com/a/6811020
callerframerecord = inspect.stack()[1] # 0 represents this line
# 1 represents line at caller
frame = callerframerecord[0]
info = inspect.getframeinfo(frame)
# print(info.filename) # __FILE__ -> Test.py
# print(info.function) # __FUNCTION__ -> Main
# print(info.lineno) # __LINE__ -> 13
return str(info.lineno)
class __LINE__(metaclass=Meta):
pass
print(__LINE__) # print for example 18

wow, 7 year old question :)
Anyway, taking Tugrul's answer, and writing it as a debug type method, it can look something like:
def debug(message):
import sys
import inspect
callerframerecord = inspect.stack()[1]
frame = callerframerecord[0]
info = inspect.getframeinfo(frame)
print(info.filename, 'func=%s' % info.function, 'line=%s:' % info.lineno, message)
def somefunc():
debug('inside some func')
debug('this')
debug('is a')
debug('test message')
somefunc()
Output:
/tmp/test2.py func=<module> line=12: this
/tmp/test2.py func=<module> line=13: is a
/tmp/test2.py func=<module> line=14: test message
/tmp/test2.py func=somefunc line=10: inside some func

import inspect
.
.
.
def __LINE__():
try:
raise Exception
except:
return sys.exc_info()[2].tb_frame.f_back.f_lineno
def __FILE__():
return inspect.currentframe().f_code.co_filename
.
.
.
print "file: '%s', line: %d" % (__FILE__(), __LINE__())

Here is a tool to answer this old yet new question!
I recommend using icecream!
Do you ever use print() or log() to debug your code? Of course, you
do. IceCream, or ic for short, makes print debugging a little sweeter.
ic() is like print(), but better:
It prints both expressions/variable names and their values.
It's 40% faster to type.
Data structures are pretty printed.
Output is syntax highlighted.
It optionally includes program context: filename, line number, and parent function.
For example, I created a module icecream_test.py, and put the following code inside it.
from icecream import ic
ic.configureOutput(includeContext=True)
def foo(i):
return i + 333
ic(foo(123))
Prints
ic| icecream_test.py:6 in <module>- foo(123): 456

To get the line number in Python without importing the whole sys module...
First import the _getframe submodule:
from sys import _getframe
Then call the _getframe function and use its' f_lineno property whenever you want to know the line number:
print(_getframe().f_lineno) # prints the line number
From the interpreter:
>>> from sys import _getframe
... _getframe().f_lineno # 2
Word of caution from the official Python Docs:
CPython implementation detail: This function should be used for internal and specialized purposes only. It is not guaranteed to exist in all implementations of Python.
In other words: Only use this code for personal testing / debugging reasons.
See the Official Python Documentation on sys._getframe for more information on the sys module, and the _getframe() function / submodule.
Based on Mohammad Shahid's answer (above).

Perl or Python SVN Crawler

Is there an SVN crawler, that can walk thru an SVN repo and spitt out all existing branches, or tags?
Preferably in Perl or Python ...

SVN tags and branches are just directories, usually following a particular naming convention. You can easily get them in perl like:
my #branches = `svn ls YourRepoBaseURL/branches`;
chomp #branches; # remove newlines
chop #branches; # remove trailing /
my #tags = `svn ls YourRepoBaseURL/tags`;
chomp #tags;
chop #tags;

Here is a little snippet to print information about files in a SVN repository in python:
# svncrawler.py
import os
import sys
import pysvn
svn_client = pysvn.Client()
for file_status in svn_client.status(sys.argv[1]):
print u'SVN File %s %s' % (file_status, file_status.text_status)
Call it like this:
python svncrawler.py my_repository
It should be easy to modify it to just print the tags and branches.

Thanks for all the help, here is what I came up with in python with your help:
# -*- coding: utf-8 -*-
import os
import sys
import pysvn
svnclient = pysvn.Client()
projects = svnclient.list(sys.argv[1])
for project_path, project_info in projects:
try:
project_branches = svnclient.list(project_path.path + '/branches/')
if ( len(project_branches)>2 ):
for branch, info in project_branches:
print branch.path
except:
pass

How do I find out the Python, Django versions and path that a Django site is running?

I currently have multiple Django sites running from one Apache server through WSGI and each site has their own virtualenv with possibly slight Python and Django version difference. For each site, I want to display the Python and Django version it is using as well as from which path it's pulling the Python binaries from.
For each Django site, I can do:
import sys
sys.version
but I'm not sure if it's showing the Python that the site is using or the system's Python. Any help on that?

To find out which Python is being used, log the value of sys.executable. It should contain the path to the interpreter being used.

but I'm not sure if it's showing the Python that the site is using or the system's Python. Any help on that?
Nope.
However, if you want to know something about your Django app, do this.
Use logging. Write to sys.stderr that's usually routed to errors_log by mod_wsgi. Or look inside the request for the wsgi.errors file.
Update your top-level urls.py to write a startup message, including sys.version to your log.

This is a bit more than what you asked for, but we have a similar situation where different directories are (possibly) using different Python/Django code. We use following to build an easy to look at list (at least for me) of Python & Django versions plus all modules and where they were loaded from. It has sometimes helped save what little there is left of my hair. Modify to taste.
import sys, re, os
import django
def ModuleList():
ret = []
# The double call to dirname() is because this file is in our utils/
# directory, which is one level down from the top of the project.
dir_project = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
project_name = os.path.basename(dir_project)
for k,v in sys.modules.items():
x = str(v)
if 'built-in' in x:
ret.append((k, 'built-in'))
continue
m = re.search(r"^.*?'(?P<module>.*?)' from '(?P<file>.*?)'.*$", x)
if m:
d = m.groupdict()
f = d['file']
# You can skip all of the re.sub()'s if you want raw paths
f = re.sub(r'/usr/.*?/lib/python[.0-9]*/site-packages/django/', 'system django >> ', f)
f = re.sub(r'/usr/.*?/lib/python[.0-9]*/site-packages/', 'site-packages >> ', f)
f = re.sub(r'/usr/.*?/lib/python[.0-9]*/', 'system python >> ', f)
f = re.sub(dir_project+'.*python/', 'local python >> ', f)
f = re.sub(dir_project+'.*django/', 'local django >> ', f)
f = re.sub(dir_project+r'(/\.\./)?', project_name + ' >> ', f)
ret.append((d['module'], f))
ret.sort( lambda a,b: cmp(a[0].lower(), b[0].lower()) )
ret.insert(0, ('Python version', sys.version) )
ret.insert(0, ('Django version', django.get_version()) )
return ret
# ModuleList

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

trying to parse ldif file using python-ldap ldifparser - python

Related

How to ensure python binaries are on your path?

How to accept an indefinite number of options (and how to query their names/values) using click CLI framework?

How do I get the Python line number and file name of the point this function was called from? [duplicate]

Perl or Python SVN Crawler

How do I find out the Python, Django versions and path that a Django site is running?

Categories

Resources