Python: Access data from Solr using Pysolr - python

I am using simple Python script to fetch example data from Solr using Pysolr. First I created my core using the following
[user#user solr-7.1.0]$ ./bin/solr create -c json_db
WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is
NOT RECOMMENDED for production use.
To turn it off:
curl http://localhost:8983/solr/json_db/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
Created new core 'json_db'
[user#user solr-7.1.0]$ ./bin/post -c json_db example/exampledocs/*.json
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/json_db/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/json_db/update...
Time spent: 0:00:00.398
After creating the core I ran simple python script to fetch data
from pysolr import Solr
conn = Solr('http://localhost:8983/solr/json_db/')
results = conn.search('*:*')
I am getting this error
Traceback (most recent call last):
File "/home/user/PycharmProjects/APP/application/solr_test.py", line 4, in <module>
results = conn.search({'*:*'})
File "/home/user/PycharmProjects/APP/venv/lib/python3.5/site-packages/pysolr.py", line 723, in search
response = self._select(params, handler=search_handler)
File "/home/user/PycharmProjects/APP/venv/lib/python3.5/site-packages/pysolr.py", line 421, in _select
return self._send_request('get', path)
File "/home/user/PycharmProjects/APP/venv/lib/python3.5/site-packages/pysolr.py", line 396, in _send_request
raise SolrError(error_message % (resp.status_code, solr_message))
pysolr.SolrError: Solr responded with an error (HTTP 404): [Reason: Error 404 Not Found]
But when I try to run the query directly from solr I got results like the following
Can somebody guide me what I am doing wrong here ? Thanks

You can just run the script below to fetch the result without using pysolr library.
#! /usr/bin/python
import urllib
import json as simplejson
import pprint
import sys
url = 'give the url here'
wt = "wt=json"
connection = urllib.urlopen(url)
if wt == "wt=json":
response = simplejson.load(connection)
else:
response = eval(connection.read())
print "Number of hits: " + str(response['response']['numFound'])
pprint.pprint(response['response']['docs'])

Related

Flink Python Datastream API Kafka Consumer

Im new to pyflink. Im tryig to write a python program to read data from kafka topic and prints data to stdout. I followed the link Flink Python Datastream API Kafka Producer Sink Serializaion. But i keep seeing NoSuchMethodError due to version mismatch. I have added the flink-sql-kafka-connector available at https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.13.0/flink-sql-connector-kafka_2.11-1.13.0.jar. Can someone help me in with a proper example to do this? Following is my code
import json
import os
from pyflink.common import SimpleStringSchema
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer
from pyflink.common.typeinfo import Types
def my_map(obj):
json_obj = json.loads(json.loads(obj))
return json.dumps(json_obj["name"])
def kafkaread():
env = StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///automation/flink/flink-sql-connector-kafka_2.11-1.10.1.jar")
deserialization_schema = SimpleStringSchema()
kafkaSource = FlinkKafkaConsumer(
topics='test',
deserialization_schema=deserialization_schema,
properties={'bootstrap.servers': '10.234.175.22:9092', 'group.id': 'test'}
)
ds = env.add_source(kafkaSource).print()
env.execute('kafkaread')
if __name__ == '__main__':
kafkaread()
But python doesnt recognise the jar file and throws the following error.
Traceback (most recent call last):
File "flinkKafka.py", line 31, in <module>
kafkaread()
File "flinkKafka.py", line 20, in kafkaread
kafkaSource = FlinkKafkaConsumer(
File "/automation/flink/venv/lib/python3.8/site-packages/pyflink/datastream/connectors.py", line 186, in __init__
j_flink_kafka_consumer = _get_kafka_consumer(topics, properties, deserialization_schema,
File "/automation/flink/venv/lib/python3.8/site-packages/pyflink/datastream/connectors.py", line 336, in _get_kafka_consumer
j_flink_kafka_consumer = j_consumer_clz(topics,
File "/automation/flink/venv/lib/python3.8/site-packages/pyflink/util/exceptions.py", line 185, in wrapped_call
raise TypeError(
TypeError: Could not found the Java class 'org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer'. The Java dependencies could be specified via command line argument '--jarfile' or the config option 'pipeline.jars'
What is the correct location to add the jar file?
I see that you downloaded flink-sql-connector-kafka_2.11-1.13.0.jar, but the code loades flink-sql-connector-kafka_2.11-1.10.1.jar.
May be you can have a check
just need to check the path to flink-sql-connector jar
You should add jar file of flink-sql-connector-kafka, it depends on your pyflink and scala version. If versions are true, check your path in add_jars function if the jar package is here.

Missing request - Pact python

I'm new to pact and I've understood the concept but having hard time understanding and implementing the code.
Here I'm trying to do a simple pact for get_users from reqres.in.
I believe the first (pact ... code does the mock provider part and I compare that using the pact.json file.
import os
import requests
import pytest
from pact import Consumer, Provider, Format
import unittest
import json
pact = Consumer('Consumer').has_pact_with(Provider('Provider'), port=1234, host_name='localhost')
pact.start_service()
CURR_FILE_PATH = os.path.dirname(os.path.abspath(__file__))
PACT_DIR = os.path.join(CURR_FILE_PATH, '')
PACT_FILE = os.path.join(PACT_DIR, 'pact.json')
#defining class
class GetUsers(unittest.TestCase):
def test_get_board(self):
with open(os.path.join(PACT_DIR, PACT_FILE), 'rb') as path_file:
pact_file_json = json.load(path_file)
print('pact_json')
(pact
.given('Request to send message')
.upon_receiving('a request for response or send message')
.with_request(method = 'GET', path = '/api/users?page=2')
.will_respond_with(status = 200, body = pact_file_json))
with pact:
result = requests.get('http://reqres.in/api/users?page=2')
print('actual response')
self.assertEqual(pact_file_json, result.json())
pact.verify()
ge = GetUsers()
print(ge.test_get_board())
however, when I run the code, I get the following error
the data do not match but I verified it in another code.
Traceback (most recent call last):
File "C:\Users\Desktop\Pact\contract_test.py", line 41, in <module>
print(ge.test_get_board())
File "C:\Users\Desktop\Pact\contract_test.py", line 34, in test_get_board
print(data)
File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pact\pact.py", line 370, in __exit__
self.verify()
File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pact\pact.py", line 269, in verify
assert resp.status_code == 200, resp.text
AssertionError: Actual interactions do not match expected interactions for mock MockService.
says missing requests
Missing requests:
GET https://reqres.in/api/users?page=2
I'm not sure what you're trying to attempt here - why are you opening up a pact.json file in this context? The consumer pact package will automatically serialise any contracts if successful, and you don't seem to be reading the file in anywhere.
The second problem seems to be that some code somewhere, is issuing (or requesting?) the query string page=2 but the code you've shown doesn't make use of any query strings.
Could you have another mock service running somewhere?
Examples
https://docs.pactflow.io/docs/examples/python/consumer/readme
https://github.com/pact-foundation/pact-python/blob/master/examples/e2e/tests/consumer/test_user_consumer.py

Bad Request at Authorization Code Flow Spotify

I am trying to built a script that creates a playlist on a user's spotify profile. To learn spotipy I decided to try the examples they have on the documentation page.
The code I run is:
import sys
import spotipy
import spotipy.util as util
token = util.prompt_for_user_token('idxxx',
'user-library-read',
client_id='axxx',
client_secret='Bxxx',
redirect_uri='http://localhost')
scope = 'user-library-read'
if len(sys.argv) > 1:
username = sys.argv[1]
else:
print("Usage: %s username" % (sys.argv[0],))
sys.exit()
token = util.prompt_for_user_token(username, scope)
if token:
sp = spotipy.Spotify(auth=token)
results = sp.current_user_saved_tracks()
for item in results['items']:
track = item['track']
print(track['name'] + ' - ' + track['artists'][0]['name'])
else:
print("Can't get token for", username)
The problem occurs when i run the code. I get redirected on my redirect uri and after i paste it back on the terminal i get this:
Traceback (most recent call last):
File "spot01.py", line 9, in <module>
redirect_uri='http://localhost')
File "/home/user/.local/lib/python3.6/site-packages/spotipy/util.py", line 92, in prompt_for_user_token
token = sp_oauth.get_access_token(code, as_dict=False)
File "/home/user/.local/lib/python3.6/site-packages/spotipy/oauth2.py", line 434, in get_access_token
raise SpotifyOauthError(response.reason)
spotipy.oauth2.SpotifyOauthError: Bad Request
I tried to access the oauth2.py from the File Manager and the terminal but it says that this repository does not exist. Also i tried to install spotipy through the github page they have where the neccessary files exist but still nothing.
Any ideas?
Thanks a lot.
I solved the problem by downloading the required files from here https://github.com/plamere/spotipy/tree/master/spotipy
Then I changed some things inside of each .py file and i run every code I wanted to inside there. It must be another more fancy solution but this one worked for me.
Oh before running any command I typed this set of commands on the terminal:
$ bash
$ export SPOTIPY_CLIENT_ID='xxx'
$ export SPOTIPY_CLIENT_SECRET='xxx'
$ export SPOTIPY_REDIRECT_URI='http://localhost/'
where xxx you put your credentials and at the refirect uri your localhost or your github profile link.

Running Python script with scrapy import from Node child process

I'm attempting to get a simple scraper up and running to gather data and would like to use Python Scrapy. The rest of the app will be through Nodejs/Express, so I would like to call this script on demand when I need fresh/new data.
The python code runs fine locally through piecharm, but I am seeing issues when it is run as a script.
Through node when I run the server locally and hit /name, it fails with "no module named 'scrapy'
When I run the server through the Anaconda prompt this works fine and scrapy is imported with no error.
I have installed scrapy via conda at the location the express server is being run for both 1 and 2.
From what I've read this may have to do with scrapys need of the Twisted reactor, but as I'm new to Python it's not clear to me what the anaconda terminal is doing differently, and what I would need to do properly from the node side in order to use scrapy.
Nodejs:
app.get('/name', callName);
function callName(req, res) {
console.log("test");
var spawn = require('child_process').spawn;
const pyProg = spawn('python', ['pythonscript.py']);
pyProg.stdout.on('data', function(data) {
console.log(data.toString());
res.write(data);
res.end('end');
});
}
//Print URL for accessing server
console.log('Server running at http://127.0.0.1:8000/')
app.listen(process.env.PORT || 8000, () => console.log("Listening on " + (process.env.PORT || 8000)));
Python script:
try:
import sys
import scrapy
data = "python starting"
print(data)
sys.stdout.flush()
except Exception as exception:
print(exception, False)
print(exception.__class__.__name__ + ": " + exception.message)
Update:
When running import scrapy from the Anaconda interpreter (the other from the comments resulted in "no module found")
Traceback (most recent call last):
File "", line 1, in
File "\Anaconda3\lib\site-packages\scrapy__init__.py", line 34, in
from scrapy.spiders import Spider
File "\Anaconda3\lib\site-packages\scrapy\spiders__init__.py", line 10, in
from scrapy.http import Request
File "\Anaconda3\lib\site-packages\scrapy\http__init__.py", line 11, in
from scrapy.http.request.form import FormRequest
File "\Anaconda3\lib\site-packages\scrapy\http\request\form.py", line 11, in
import lxml.html
File "\Anaconda3\lib\site-packages\lxml\html__init__.py", line 54, in
from .. import etree
ImportError: DLL load failed: The specified module could not be found.
So this looks to be not just interpreter related, but perhaps something additional with Anacondas variables it uses for the terminal?

netCDF4 - Python error

Can anyone tell me what I did wrong? I am using python-conda, and the files I have from http://meop40.troja.mff.cuni.cz:11180/gw.projekt/data.stratopauza/netcdf.profily/
Why it tells me that file doesn't exist?
>>> import netCDF4
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> url = 'http://meop40.troja.mff.cuni.cz:11180/gw.projekt/data.stratopauza/netcdf.profily/atmPrf_C001.2010.227.00.03.G04_2013.3520_nc'
>>> nc = netCDF4.Dataset(url)
**syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <!DOCTYPE^ HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>404 Not Found</title></head><body><h1>Not Found</h1><p>The requested URL /gw.projekt/data.stratopauza/netcdf.profily/atmPrf_C001.2010.227.00.03.G04_2013.3520_nc.dds was not found on this server.</p><hr><address>Apache/2.4.12 (Ubuntu) Server at meop40.troja.mff.cuni.cz Port 11180</address></body></html>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "netCDF4\_netCDF4.pyx", line 1811, in netCDF4._netCDF4.Dataset.__init__ (netCDF4\_netCDF4.c:12626)
IOError: NetCDF: file not found**
NetCDF4.Dataset() can only access remote NetCDF files which are served by an OPeNDAP service, which can return metadata about the file. The error message returned is incorrect and misleading.
There is a brief tutorial, which mentions this and gives basic information at: http://unidata.github.io/netcdf4-python/#section1
I downloaded the file and had no problem opening the file. You should use the method in the answer to your previous question https://stackoverflow.com/a/44622713/1211981
Update:
Go to:
http://meop40.troja.mff.cuni.cz:11180/gw.projekt/data.stratopauza/netcdf.profily/
Click one or more of the links and save to a folder where you will run your script. Change your script or python commands to:
>>> url = 'atmPrf_C001.2010.227.00.03.G04_2013.3520_nc'
>>> nc = netCDF4.Dataset(url)
netCDF4.Dataset() will take either a url or a local file name and work the same way. In this case it will recognize the file as a NetCDF / OPeNDAP compatible.

Categories

Resources