how to use facet in pysolr

how to use facet in pysolr - python

I have successful built a python search app with pysolr. So far I have used two fields: id and title. now i want to push two different versions of titles; the original and the title after removing the stopwords. any ideas? the following code works:
def BuildSolrIndex(solr, trandata):
tmp = []
for i, dat in enumerate(trandata):
if all(d is not None and len(d) > 0 for d in dat):
d = {}
d["id"] = dat[0]
d["title"] = dat[1]
tmp.append(d)
solr.add(tmp)
solr.optimize()
return solr
but this one does not:
def BuildSolrIndex(solr, trandata):
tmp = []
for i, dat in enumerate(trandata):
if all(d is not None and len(d) > 0 for d in dat):
d = {}
d["id"] = dat[0]
d["title_org"] = dat[1]
d["title_new"] = CleanUpTitle(dat[1])
tmp.append(d)
solr.add(tmp)
solr.optimize()
return solr
Any ideas?
EDIT:
bellow is the exception:
Traceback (most recent call last):
...
solr = BuildSolrIndex(solr, trandata)
File "...", line 56, in BuildSolrIndex
solr.add(tmp)
File "build/bdist.linux-x86_64/egg/pysolr.py", line 779, in add
File "build/bdist.linux-x86_64/egg/pysolr.py", line 387, in _update
File "build/bdist.linux-x86_64/egg/pysolr.py", line 321, in _send_request
pysolr.SolrError: [Reason: None]
<response><lst name="responseHeader"><int name="status">400</int><int name="QTime">8</int></lst><lst name="error"><str name="msg">ERROR: [doc=...] unknown field 'title_new'</str><int name="code">400</int></lst></response>

This looks like an issue with your Solr schema.xml, as the exception indicates that "title_new" is not recognized as a valid field. This answer may be of assistance to you: https://stackoverflow.com/a/14400137/1675729
Check to make sure your schema.xml contains a "title_new" field, and that you've restarted the Solr services if necessary. If this doesn't solve your problem, come on back!

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

I am trying to create a telegram-bot that will create notes in notion, for this I use:
notion-py
pyTelegramBotAPI
then I connected my notion by adding token_v2, and then receiving data about the note that I want to save in notion, at the end I save a note on notion like this:
def make_notion_row():
collection_view = client.get_collection_view(list_url[temporary_category]) #take collection
print(temporary_category)
print(temporary_name)
print(temporary_link)
print(temporary_subcategory)
print(temporary_tag)
row = collection_view.collection.add_row() #make row
row.ssylka = temporary_link #this is link
row.nazvanie_zametki = temporary_name #this is name
if temporary_category == 0: #this is category, where do I want to save the note
row.stil = temporary_subcategory #this is subcategory
tags = temporary_tag.split(',') #temporary_tags is text that has many tags separated by commas. I want to get these tags as an array
for tag_one in tags:
**add_new_multi_select_value("Теги", tag_one): #"Теги" is "Tag column" in russian. in this situation, tag_one takes on the following values: ['my_hero_academia','midoria']**
else:
row.kategoria = temporary_subcategory
this script works, but the problem is filling in the Tags column which is of type multi-select.
Since in the readme 'notion-py', nothing was said about filling in the 'multi-select', therefore
I used the bkiac function:https://github.com/jamalex/notion-py/issues/51
here is the slightly modified by me function:
art_tags = ['ryuko_matoi', 'kill_la_kill']
def add_new_multi_select_value(prop, value, style=None):
global temporary_prop_schema
if style is None:
style = choice(art_tags)
collection_schema = collection_view.collection.get(["schema"])
prop_schema = next(
(v for k, v in collection_schema.items() if v["name"] == prop), None
)
if not prop_schema:
raise ValueError(
f'"{prop}" property does not exist on the collection!'
)
if prop_schema["type"] != "multi_select":
raise ValueError(f'"{prop}" is not a multi select property!')
dupe = next(
(o for o in prop_schema["options"] if o["value"] == value), None
)
if dupe:
raise ValueError(f'"{value}" already exists in the schema!')
temporary_prop_schema = prop_schema
prop_schema["options"].append(
{"id": str(uuid1()), "value": value, "style": style}
)
collection.set("schema", collection_schema)`
But it turned out that this function does not work, and gives the following error:
add_new_multi_select_value("Теги","my_hero_academia)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
add_new_multi_select_value("Теги","my_hero_academia)
File "C:\Users\laere\OneDrive\Documents\Programming\Other\notion-bot\program\notionbot\test.py", line 53, in add_new_multi_select_value
collection.set("schema", collection_schema)
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\records.py", line 115, in set
self._client.submit_transaction(
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 290, in submit_transaction
self.post("submitTransaction", data)
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 260, in post
raise HTTPError(
requests.exceptions.HTTPError: Unsaved transactions: Not allowed to edit column: schema
this is my table image: link
this is my telegram chatting to bot: link
Honestly, I don’t know how to solve this problem, the question is how to fill a column of type 'multi-select'?

I solved this problem using this command
row.set_property("Категория", temporary_subcategory)
and do not be afraid if there is an error "options ..." this can be solved by adding settings for the 'multi-select' field.

Unexpected space (not sure what the type of character this space is) while parsing csv file in python

I am iterating through a list of urls from a csv file trying to locate their sitemaps, however, I am getting a weird leading space issue that's causing an error to occur when requests processes each url. I'm trying to figure out what's causing this space to be generated and what type of space it is. I believe something funky is happening with strip() because I can get this to run fine when copying and pasting a url into requests. I am just not sure what type of space this is and what's causing it to occur.
Wondering if anyone else is having or had this issue?
So far I have tried to solve using the following methods:
replace()
"".join(split())
regex
Here is my code:
with open('links.csv') as f:
for line in f:
strdomain = line.strip()
if strdomain:
domain = strdomain
fix_domain = domain.replace('https://', '').replace('www', '').replace('/', '').replace('.', '').replace(' ', '')
ofile = fix_domain + '.txt' # args.ofile
domain_rem = domain
map = find_sitemap.get_sitemap(domain_rem+"sitemap.xml")
url_info = find_sitemap.parse_sitemap(map)
print("Found {0} urls".format(len(url_info)))
new_urls = []
for u in url_info:
new_urls.append(u)
print(u)
links.csv look like the following with just one column:
https://site1.com/
https://site2.com/
https://site3.com/
I printed domain and strdomain and even added the word "this" next to the variable domain so you can see the space being produced clearly:
Here is the error I receive in full when running (you will notice there is no leading space within the url after I've copied and pasted from the terminal into here however I provide an image of my terminal below so you can see it):
Traceback (most recent call last):
File "/Users/natehurwitz/PROJECTS/axis/axis/apps/axisDataFinder/map_website.py", line 358, in <module>
main()
File "/Users/natehurwitz/PROJECTS/axis/axis/apps/axisDataFinder/map_website.py", line 318, in main
map = find_sitemap.get_sitemap(domain_rem+"sitemap.xml")
File "/Users/natehurwitz/PROJECTS/axis/axis/apps/axisDataFinder/find_sitemap.py", line 5, in get_sitemap
get_url = requests.get(url)
File "/Users/natehurwitz/Library/Caches/pypoetry/virtualenvs/axis-eSvach19-py3.9/lib/python3.9/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/Users/natehurwitz/Library/Caches/pypoetry/virtualenvs/axis-eSvach19-py3.9/lib/python3.9/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/natehurwitz/Library/Caches/pypoetry/virtualenvs/axis-eSvach19-py3.9/lib/python3.9/site-packages/requests/sessions.py", line 522, in request
resp = self.send(prep, **send_kwargs)
File "/Users/natehurwitz/Library/Caches/pypoetry/virtualenvs/axis-eSvach19-py3.9/lib/python3.9/site-packages/requests/sessions.py", line 636, in send
adapter = self.get_adapter(url=request.url)
File "/Users/natehurwitz/Library/Caches/pypoetry/virtualenvs/axis-eSvach19-py3.9/lib/python3.9/site-packages/requests/sessions.py", line 727, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'https://blkgrn.com/sitemap.xml'
Here is where you can see the leading space that occurs
Here is the code for "find_sitemap.py":
from bs4 import BeautifulSoup
import requests
def get_sitemap(url):
get_url = requests.get(url)
if get_url.status_code == 200:
return get_url.text
else:
print ('Unable to fetch sitemap: %s.' % url)
def process_sitemap(s):
soup = BeautifulSoup(s, "lxml")
result = []
for loc in soup.findAll('loc'):
item = {}
item['loc'] = loc.text
item['tag'] = loc.parent.name
if loc.parent.lastmod is not None:
item['lastmod'] = loc.parent.lastmod.text
if loc.parent.changeFreq is not None:
item['changeFreq'] = loc.parent.changeFreq.text
if loc.parent.priority is not None:
item['priority'] = loc.parent.priority.text
result.append(item)
return result
def is_sub_sitemap(s):
if s['loc'].endswith('.xml') and s['tag'] == 'sitemap':
return True
else:
return False
def parse_sitemap(s):
sitemap = process_sitemap(s)
result = []
while sitemap:
candidate = sitemap.pop()
if is_sub_sitemap(candidate):
sub_sitemap = get_sitemap(candidate['loc'])
for i in process_sitemap(sub_sitemap):
sitemap.append(i)
else:
result.append(candidate)
return result

How to get email address of group member from LDAP using Python

I am trying to get the email addresses of AD group members of a particular LDAP group using python.
I have following code. The Print m statement writes something like below.
Output:
CN=Admin_abc20,OU=Admin ID's,OU=TEST1,DC=other_example,DC=example,DC=com
CN=leterd,OU=Employees,OU=BACD,DC=na,DC=example,DC=com
CN=mytest37,OU=Employees,OU=SUNPH,DC=na,DC=example,DC=com
CN=Doe Mestre\, John,OU=Partners & Contractors,OU=TEST1,DC=other_example,DC=example,DC=com
CN=Robin\, Mark [ABCD],OU=Partners & Contractors,OU=JJCUS,DC=na,DC=example,DC=com
CN=San Irdondo\, Paul [TEST1 Non-ABC],OU=Partners & Contractors,OU=TEST1,DC=other_example,DC=example,DC=com
My Code:
def get_group_members(group_name, ad_conn, basedn=AD_USER_BASEDN):
members = []
ad_filter = AD_GROUP_FILTER.replace('My_Group_Name', group_name)
result = ad_conn.search_s(basedn, ldap.SCOPE_SUBTREE, ad_filter)
if result:
if len(result[0]) >= 2 and 'member' in result[0][1]:
members_tmp = result[0][1]['member']
for m in members_tmp:
print m
#email = ad_conn.search_s(m, ldap.SCOPE_SUBTREE,'(objectClass=*)',['mail'])
#print email
Now when I remove comment from last 2 lines of my code to get the email address of persons, I get following error, please note that I have changed by company's ldap identifiers to example/test.
Can you please help me with this? I am a newbie to python.
Traceback (most recent call last):
File "/app/abc/python/Test_new.py", line 81, in <module>
group_members = get_group_members(group_name, ad_conn)
File "/app/abc/python/Test_new.py", line 58, in get_group_members
email = ad_conn.search_s(m, ldap.SCOPE_SUBTREE,'(objectClass=*)', ['mail'])
File "/usr/lib64/python2.6/site-packages/ldap/ldapobject.py", line 516, in search_s
return self.search_ext_s(base,scope,filterstr,attrlist,attrsonly,None,None,timeout=self.timeout)
File "/usr/lib64/python2.6/site-packages/ldap/ldapobject.py", line 510, in search_ext_s
return self.result(msgid,all=1,timeout=timeout)[1]
File "/usr/lib64/python2.6/site-packages/ldap/ldapobject.py", line 436, in result
res_type,res_data,res_msgid = self.result2(msgid,all,timeout)
File "/usr/lib64/python2.6/site-packages/ldap/ldapobject.py", line 440, in result2
res_type, res_data, res_msgid, srv_ctrls = self.result3(msgid,all,timeout)
File "/usr/lib64/python2.6/site-packages/ldap/ldapobject.py", line 446, in result3
ldap_result = self._ldap_call(self._l.result3,msgid,all,timeout)
File "/usr/lib64/python2.6/site-packages/ldap/ldapobject.py", line 96, in _ldap_call
result = func(*args,**kwargs)
ldap.REFERRAL: {'info': 'Referral:\nldap://ab.example.com/CN=Radfde3,OU=Partners%20&%20Contractors,OU=JANBE,DC=eu,DC=example,DC=com', 'desc': 'Referral'}

I don't know much about Python but I think your problem is with the LDAP filter. Try this for the last 2 lines of code:
email = ad_conn.search_s(m, ldap.SCOPE_SUBTREE,'(&(objectClass=person)(mail=*))')
print email
I hope this helps!

Exception in making a REST call

I am currently working on a REST call using Flask Framework. and I have run into some errors along the way, which I am unable to figure out as to why they occur (still trying though). The error is as shown below:
[2016-09-20 18:53:26,486] ERROR in app: Exception on /Recommend [GET]
Traceback (most recent call last):
File "/anaconda/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/anaconda/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/anaconda/lib/python2.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/anaconda/lib/python2.7/site-packages/flask_api/app.py", line 97, in handle_user_exception
for typecheck, handler in chain(blueprint_handlers.items(), app_handlers.items()):
AttributeError: 'tuple' object has no attribute 'items'
127.0.0.1 - - [20/Sep/2016 18:53:26] "GET /Recommend HTTP/1.1" 500 -
here is the code that I have built:
from flask import request, jsonify
from flask_api import FlaskAPI
from flask_cors import CORS
from sets import Set
from collections import defaultdict
import itertools
import copy
app = FlaskAPI(__name__)
CORS(app)
content = None
class Apriori:
def __init__(self):
self.no_of_transactions = None
self.min_support = 0.5
self.min_confidence = 0.75
self.transactions = {}
self.set_of_items = set()
self.frequencies = {}
self.frequent_itemsets_of_order_n = {}
self.association_rules = {}
def createPowerSet(self,s):
powerset = set()
for i in xrange(2**len(s)):
subset = tuple([x for j,x in enumerate(s) if (i >> j) & 1])
if len(subset) == 0:
pass
elif len(subset) == 1:
powerset.add(subset[0])
else:
powerset.add(subset)
return powerset
def createFrequentItemSets(self,set_of_items,len):
frequent_itemsets = set(itertools.combinations(set_of_items, len))
for i in list(frequent_itemsets):
tempset = set(i)
self.frequencies[i] = 0
for k, v in self.transactions.iteritems():
if tempset.issubset(set(v)):
self.frequencies[i] += 1
if float(self.frequencies[i])/self.no_of_transactions < self.min_support:
frequent_itemsets.discard(i)
return frequent_itemsets
def mineAssociationRules(self,frequent_itemset):
s = set(frequent_itemset)
subs = list(self.createPowerSet(s))
for each in subs:
if sorted(tuple(set(each))) == sorted(tuple(s)):
continue
if len(set(each))==1:
antecedent = list(set(each))[0]
elif len(set(each))>1:
antecedent = tuple(set(each))
if len(s.difference(set(each)))==1:
consequent = list(s.difference(set(each)))[0]
elif len(s.difference(set(each)))>1:
consequent = tuple(s.difference(set(each)))
AuC = tuple(s)
if float(self.frequencies[AuC])/self.frequencies[antecedent] >= self.min_confidence:
if antecedent in self.association_rules:
pass
else:
if type(antecedent) is tuple:
antecedent = (",").join(antecedent)
if type(consequent) is tuple:
consequent = (",").join(consequent)
self.association_rules[antecedent] = consequent
def implement(self,transactions):
#for i in range(0,self.no_of_transactions):
for i in range(0,len(transactions)):
self.transactions["T"+str(i)] = defaultdict(list)
self.transactions["T"+str(i)] = transactions[i].split(',')
self.set_of_items = self.set_of_items.union(Set(self.transactions["T"+str(i)]))
for i in list(self.set_of_items):
self.frequencies[i] = 0
for k, v in self.transactions.iteritems():
if i in v:
self.frequencies[i] = self.frequencies[i] + 1
if float(self.frequencies[i])/self.no_of_transactions < self.min_support:
self.set_of_items.discard(i)
self.frequent_itemsets_of_order_n[1] = self.set_of_items
l = 1
reps = copy.deepcopy(self.set_of_items)
while True:
l += 1
result = self.createFrequentItemSets(self.set_of_items, l)
if len(result) == 0:
break
self.frequent_itemsets_of_order_n[l] = result
reps = copy.deepcopy(self.frequent_itemsets_of_order_n[l])
l = l-1
while l>2:
for each in self.frequent_itemsets_of_order_n[l]:
self.mineAssociationRules(each)
l = l-1
#app.route('/Recommend')
def FindAssociations():
transactions = ["A,C,D,F,G","A,B,C,D,F","C,D,E","A,D,F","A,C,D,E,F","B,C,D,E,F,G"]
apr = Apriori()
apr.implement(transactions)
return jsonify(rules=apr.association_rules)
if __name__ == "__main__":
app.run(port=5000)
I did run some sample code found on the web, and built the above script based on those scripts. They worked out well. The class I built was based on another python program that I had built earlier which worked well. Should I have imported the class from another script instead of building it here?

There are a number of errors in your Apriori which need to fixed. There are a few attempts to divide by self.no_of_transactions (e.g. line 87) which is initialised to None and never changed. Dividing by None raises an exception:
>>> 1/None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'
This exception is then handled in Flask-API's handle_user_exception() method, which also appears to have a bug as shown in the exception it raises.
The way to fix it is to correct your code so that it does not divide by None.

Key Error on python & App Engine

I am developing an application on App Engine and Python. This app is meant to create routes to several points in town. To create this routes, I invoke a request to an Arcgis service. Once that is done, I need to check the status of the request and get a JSON with the results. I check these results with the following method:
def store_route(job_id, token):
import requests, json
#Process stops result and store it as json in stops_response
stops_url = "https://logistics.arcgis.com/arcgis/rest/services/World/VehicleRoutingProblem/GPServer/SolveVehicleRoutingProblem/jobs/"
stops_url = stops_url+str(job_id)+"/results/out_stops?token="+str(token)+"&f=json"
stops_r = requests.get(stops_url)
stops_response = json.loads(stops_r.text)
#Process routes result and store it as json in routes_response
routes_url = "https://logistics.arcgis.com/arcgis/rest/services/World/VehicleRoutingProblem/GPServer/SolveVehicleRoutingProblem/jobs/"
routes_url = routes_url+str(job_id)+"/results/out_routes?token="+str(token)+"&f=json"
routes_r = requests.get(routes_url)
routes_response = json.loads(routes_r.text)
from routing.models import ArcGisJob, DeliveryRoute
#Process each route from response
processed_routes = []
for route_info in routes_response['value']['features']:
print route_info
route_name = route_info['attributes']['Name']
coordinates = route_info['geometry']['paths']
coordinates_json = {"coordinates": coordinates}
#Process stops from each route
stops = []
for route_stops in stops_response['value']['features']:
if route_name == route_stops['attributes']['RouteName']:
stops.append({"Name": route_stops['attributes']['Name'],
"Sequence": route_stops['attributes']['Sequence']})
stops_json = {"content": stops}
#Create new Delivery Route object
processed_routes.append(DeliveryRoute(name=route_name,route_coordinates=coordinates_json, stops=stops_json))
#insert a new Job table entry with all processed routes
new_job = ArcGisJob(job_id=str(job_id), routes=processed_routes)
new_job.put()
As you can see, what my code does is practically visit the JSON returned by the service and parse it for the content that interest me. The problem is I get the following output:
{u'attributes': {
u'Name': u'ruta_4855443348258816',
...
u'StartTime': 1427356800000},
u'geometry': {u'paths': [[[-100.37766063699996, 25.67669987000005],
...
[-100.37716999999998, 25.67715000000004],
[-100.37766063699996, 25.67669987000005]]]}}
ERROR 2015-03-26 19:02:58,405 handlers.py:73] 'geometry'
Traceback (most recent call last):
File "/Users/Vercetti/Dropbox/Logyt/Quaker Routing/logytrouting/routing/handlers.py", line 68, in get
arc_gis.store_route(job_id, token)
File "/Users/Vercetti/Dropbox/Logyt/Quaker Routing/logytrouting/libs/arc_gis.py", line 150, in store_route
coordinates = route_info['geometry']['paths']
KeyError: 'geometry'
ERROR 2015-03-26 19:02:58,412 BaseRequestHandler.py:51] Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/Users/Vercetti/Dropbox/Logyt/Quaker Routing/logytrouting/routing/handlers.py", line 68, in get
arc_gis.store_route(job_id, token)
File "/Users/Vercetti/Dropbox/Logyt/Quaker Routing/logytrouting/libs/arc_gis.py", line 150, in store_route
coordinates = route_info['geometry']['paths']
KeyError: 'geometry'
The actual JSON returned has a lot more of info, but i just wrote a little portion of it so you can see that there IS a 'geometry' key. Any idea why I get this error??

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to use facet in pysolr - python

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

Unexpected space (not sure what the type of character this space is) while parsing csv file in python

How to get email address of group member from LDAP using Python

Exception in making a REST call

Key Error on python & App Engine

Categories

Resources