Creating named graph with rdflib in Virtuoso

Creating named graph with rdflib in Virtuoso - python

The following code works for me when using Apache Jena Fuseki 4.3.2 (docker image secoresearch/fuseki:4.3.2) with rdflib 6.1.1:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore
FUSEKI_QUERY = 'http://localhost:3030/ds/sparql'
FUSEKI_UPDATE = 'http://localhost:3030/ds/update'
store = SPARQLUpdateStore(query_endpoint=FUSEKI_QUERY,
update_endpoint=FUSEKI_UPDATE,
method='POST',
autocommit=False)
graph = Graph(store=store, identifier=GRAPH_NAME)
graph.parse('./dump.ttl') # file containing 1000 example triples
store.commit()
But when I change to OpenLink Virtuoso 07.20.3233 (docker image tenforce/virtuoso:latest), I get the following error:
urllib.error.HTTPError: HTTP Error 500: SPARQL Request Failed
With some trial and error, I got the following to work for Virtuoso:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore
VIRTUOSO_QUERY = 'http://localhost:8890/sparql'
VIRTUOSO_UPDATE = 'http://localhost:8890/sparql'
store = SPARQLUpdateStore(query_endpoint=VIRTUOSO_QUERY,
update_endpoint=VIRTUOSO_UPDATE,
method='POST',
autocommit=False)
intermediate_graph = Graph()
intermediate_graph.parse('./dump.ttl')
graph = Graph(store=store, identifier=GRAPH_NAME)
for triple in intermediate_graph:
graph.add(triple)
store.commit() # Have to commit after every add here
If I don't commit after every add, but only once after the loop, I get the same error as above. At the moment, I don't see any helpful HTTP or server log entries, that might point me to the problem.
So my question is, has anyone an idea why this error occurs and what the solution may be? I guess it has something to do with the way my Virtuoso instance is configured?
Update 03.06.2022:
I noticed, that I was using an old version of Virtuoso with docker image tenforce/virtuoso:latest (07.20.3233), so I switched to openlink/virtuoso-opensource-7:latest (07.20.3234). With that, my code for Virtuoso does not work anymore (same error as stated above).
Also, as TallTed identified correctly in his comment, I use /sparql for both query and update. I can do that, because I gave the user SPARQL the SPARQL_UPDATE role in the Virtuoso Conductor interface. It is kind of a workaround for now, since I didn't get basic auth to work over rdflib. Could that have something to do with the problem, since I don't use /sparql-auth?

Related

Can't Schedule Query in BigQuery Via Python SDK

I'll preface this by saying I'm fairly new to BigQuery. I'm running into an issue when trying to schedule a query using the Python SDK. I used the example on the documentation page and modified it a bit but I'm running into errors.
Note that my query does use scripting to set some variables, and it's using a MERGE statement to update one of my tables. I'm not sure if that makes a huge difference.
def create_scheduled_query(dataset_id, project, name, schedule, service_account, query):
parent = transfer_client.common_project_path(project)
transfer_config = bigquery_datatransfer.TransferConfig(
destination_dataset_id=dataset_id,
display_name=name,
data_source_id="scheduled_query",
params={
"query": query
},
schedule=schedule,
)
transfer_config = transfer_client.create_transfer_config(
bigquery_datatransfer.CreateTransferConfigRequest(
parent=parent,
transfer_config=transfer_config,
service_account_name=service_account,
)
)
print("Created scheduled query '{}'".format(transfer_config.name))
I was able to successfully create a query with the function above. However the query errors out with the following message:
Error code 9 : Dataset specified in the query ('') is not consistent with Destination dataset '{my_dataset_name}'.
I've tried changing passing in "" as the dataset_id parameter, but I get the following error from the Python SDK:
google.api_core.exceptions.InvalidArgument: 400 Cannot create a transfer with parent projects/{my_project_name} without location info when destination dataset is not specified.
Interestingly enough I was able to successfully create this scheduled query in the GUI; the same query executed without issue.
I saw that the GUI showed the scheduled query's "Resource name" referenced a transferConfig, so I used the following command to see what that transferConfig looked like, to see if I could apply the same parameters using my Python script:
bq show --format=prettyjson --transfer_config {my_transfer_config}
Which gave me the following output:
{
"dataSourceId": "scheduled_query",
"datasetRegion": "us",
"destinationDatasetId": "",
"displayName": "test_scheduled_query",
"emailPreferences": {},
"name": "{REDACTED_TRANSFER_CONFIG_ID}",
"nextRunTime": "2021-06-18T00:35:00Z",
"params": {
"query": ....
So it looks like the GUI was able to use "" for destinationDataSetId but for whatever reason the Python SDK won't let me use that value.
Any help would be appreciated, since I prefer to avoid the GUI whenever possible.
UPDATE:
This does appear to be related to the scripting I used in my query. I removed the scripts from the query and it's working. I'm going to leave this open because I feel like this should be possible using the SDK since the query with scripting works in the console without issue.

This same thing also threw me through a loop but I managed to figure out what was wrong. The problem is with the
parent = transfer_client.common_project_path(project)
line that is given in the example query. By default, this returns something of the form projects/{project_id}. However, the CreateTransferConfigRequest documentation says of the parent parameter:
The BigQuery project id where the transfer configuration should be created. Must be in the format projects/{project_id}/locations/{location_id} or projects/{project_id}. If specified location and location of the destination bigquery dataset do not match - the request will fail.
Sure enough, if you use the projects/{project_id}/locations/{location_id} format instead, it resolves the error and allows you to pass a null destination_dataset_id.

I had the exact same issue. the fix for the issue is as below.
The below method returns Projects/{projectid}
parent = transfer_client.common_project_path(project_id)
instead use the below method , which returns projects/{project}/locations/{location}
parent = transfer_client.common_location_path(project_id , "EU")
I had tried with the above change , i am able to schedule a script in BQ.

How to use `not` condition in the gitlab api issue query

I am trying to read the list of open issues title which doesn't have label resolved. For that I am referring the API documentation (https://docs.gitlab.com/ee/api/issues.html) which mentions NOT but I couldn't able to get the NOT to work.
The following python script I have tried so far to read the list of issues now I am not able to find how to use NOT to filter the issue which doesn't have resolved label.
import gitlab
# private token or personal token authentication
gl = gitlab.Gitlab('https://example.com', private_token='XXXYYYZZZ')
# make an API request to create the gl.user object. This is mandatory if you
# use the username/password authentication.
gl.auth()
# list all the issues
issues = gl.issues.list(all=True,scope='all',state='opened',assignee_username='username')
for issue in issues:
print(issue.title)

From Gitlab issues api documentation, not is of type Hash. It's a special type documented here
For example to exclude the labels Category:DAST and devops::secure, and to exclude the milestone 13.11, you would use the following parameters:
not[labels]=Category:DAST,devops::secure
not[milestone]=13.11
api example: https://gitlab.com/api/v4/issues?scope=all&state=opened&assignee_username=derekferguson&not[labels]=Category:DAST,devops::secure&not[milestone]=13.11
Using gitlab python module, you would need to pass some extra parameters by adding more keyword arguments:
import gitlab
gl = gitlab.Gitlab('https://gitlab.com')
extra_params = {
'not[labels]': "Category:DAST,devops::secure",
"not[milestone]": "13.11"
}
issues = gl.issues.list(all=True, scope='all', state='opened',
assignee_username='derekferguson', **extra_params)
for issue in issues:
print(issue.title)

Need Script for connecting cassandra with password and execute CQL using python

I need a script for which i need to connect CassDb nodes with password using python script
I tried bellow script
from cassandra.auth import PlainTextAuthProvider
from cassandra.cluster import Cluster
ap = PlainTextAuthProvider(username='##',password='##')
cass_contact_points=['cassdb01.p01.eng.sjc01.com', 'cassdb02.p01.eng.sjc01.com']
cluster = Cluster(cass_contact_points,auth_provider=ap,port=50126)
session = cluster.connect('##')
I'm getting bellow error:
File "C:\python35\lib\site-packages\cassandra\cluster.py", line 2792, in _reconnect_internal
raise NoHostAvailable("Unable to connect to any servers", errors)cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'10.44.67.92': OperationTimedOut('errors=None, last_host=None'), '10.44.67.91': OperationTimedOut('errors=None, last_host=None')})

I see two potential problems.
First, unless you're in a upgrade scenario or you've had past problems with protocol versions, I would not specify that. The driver should negotiate that value. Setting it to 2 will fail with Cassandra 3.x, for example.
Second, I don't think the driver can properly parse-out ports from the endpoints.
node_ips = ['cassdb01.p01.eng.sjc01.com:50126', 'cassdb02.p01.eng.sjc01.com:50126']
When I pass the port along with my endpoints, I get a similar failure. So I would try changing that line to this:
node_ips = ['cassdb01.p01.eng.sjc01.com', 'cassdb02.p01.eng.sjc01.com']
Try starting with that. I have some other thoughts, but let's get those two obvious settings out of the way.

Error Message when trying to delete features in Feature Service Python API 1.7 for ArcGIS

I am trying to delete features using the delete_features method on the FeatureLayer Object and I keep getting the following error: "This SqlTransaction has completed; it is no longer usable."
The code is below. The error message seems to populate in the last line where="OBJECTID >=0", but I'm not a 100 sure if this is the problem. Unfortunately I'm not very good at programming.
gis = arcgis.GIS("http://gfcgis.maps.arcgis.com", "UserName", "Password")
feature_layer_item = gis.content.search(FeatureLayer, item_type = 'Feature Service')[0]
flayers = feature_layer_item.layers
flayer = flayers[0]
flayer.delete_features(where="OBJECTID >= 0", rollback_on_failure=True)
Any help would be greatly appreciated.
Michael

This sounds like a zombie transaction. Check with your DBA if there's a query, most likely a Stored Procedure, which is being called when your code runs. This message usually shows up when the application code tries to do a commit on the DB after the SP has already committed.
That's the SQL Transaction which has already completed.

Come to find out, it was a simple syntax error causing the issue. I didn't put quotations around 'True' for the rollback_on_failure parameter.

pyspeedtest cannot find test server

I'm trying to use pyspeedtest to get the upload/download speed of my connecting but I keep getting the following error which I couldn't resolve:
import pyspeedtest
st = pyspeedtest.SpeedTest()
st.download()
Exception: Cannot find a test server
Any suggestions/insights would be welcome!

It actually does work if you change the url in the pyspeedtest.py file from www.speedtest.net to c.speedtest.net on line 186 in v1.2.7 of the script.
Edit: added an example of how to get it to work
You can edit the pyspeedtest.py script (located at /usr/local/lib/python2.7/dist-packages/pyspeedtest.py on my raspberry pi 3) by using vi, e.g.:
sudo vi /usr/local/lib/python2.7/dist-packages/pyspeedtest.py
Go to line 186 and change the following line:
connection = self.connect('www.speedtest.net')
to:
connection = self.connect('c.speedtest.net')
Then run pyspeedtest using the wrapper in /usr/local/bin:
/usr/local/bin/pyspeedtest
Using server: speedtest.wilkes.net
Ping: 41 ms
Download speed: 46.06 Mbps
Upload speed: 11.58 Mbps
Or use the python interpreter:
>>> import pyspeedtest
>>> st = pyspeedtest.SpeedTest()
>>> st.ping()
41.70024394989014
>>> st.download()
44821310.72337018
>>> st.upload()
14433296.732646577

The project hasn't been updated since mid-2016. And the last update was updated user-agent to prevent SpeedTest block... And if you skim the code, there are [comments like this]:(https://github.com/fopina/pyspeedtest/blob/master/pyspeedtest.py#L188)
# really contribute to speedtest.net OS statistics
# maybe they won't block us again...
And there have been bugs posted to GitHub about the project not working, with no response.
So, my guess is: This project probably violates SpeedTest.net's terms of service, so they blocked it. The author tried to get around the block, they blocked it again, and the author gave up. In the intervening two years, any other servers it used as backups either blocked it, or shut down (e.g., speedtest.serv.pt, mentioned in the docs, no longer exists).
There is a pull request from another user that might fix it, although it appears to be failing the CI test. If you want to try it yourself, you can.
But otherwise, you can't use this library, and there's no way anyone can help you use it; it just doesn't work. You'll have to find another way to do the same thing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating named graph with rdflib in Virtuoso - python

Related

Can't Schedule Query in BigQuery Via Python SDK

How to use `not` condition in the gitlab api issue query

Need Script for connecting cassandra with password and execute CQL using python

Error Message when trying to delete features in Feature Service Python API 1.7 for ArcGIS

pyspeedtest cannot find test server

Categories

Resources