I'm using the neo4j Python driver to run batched data loads on a local Neo4j database. I have the following packages:
neo4j==4.4.1
neo4j-driver==4.4.1
I am using the apoc.periodic.iterate method. The call returns a Result object which contains a small dictionary with some data about the load. It looks like this:
{'batches': 1,
'total': 9,
'timeTaken': 0,
'committedOperations': 9,
...}
When the load is very small, I can extract this object from Result and save it. When it is larger, however, I cannot work with the Result object. I am able to print the address of the object. But if I try to run any method on it, or extract data from it in anyway, or return it from the function, my code hangs forever. However, the return data should always be the same size, because it's just a bit of metadata about the load.
from neo4j import GraphDatabase
driver = GraphDatabase.driver(URI, auth=(user_name, password))
address = "test.csv"
cql = '''
CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM '%s' AS row
WITH
row.field1 AS field1,
toFloat(row.field2) AS field2
RETURN *",
"MERGE (tx: Object {field1: field1})
SET
tx.field1 = field1,
tx.field2 = field2;
",
{batchSize: 10000, parallel: true, retries: 3})
''' % address
with driver.session() as session:
result = session.run(cql)
print(result)
log_data = result.data()[0] # this line hangs forever with large loads
Well, the problem disappears when switching from auto-commit transaction to transaction functions. So instead of the approach above, I did:
def transaction(tx):
return dict(tx.run(cql).single())
with driver.session() as session:
log_data = session.write_transaction(transaction)
https://neo4j.com/docs/driver-manual/1.7/sessions-transactions/
I am trying to find by id a document in the database, but I get None. What am I doing wrong?
python:
card = mongo.db['grl'].find_one({'id': 448510476})
or:
card = mongo.db['grl'].find_one({'id': '448510476'})
document:
{"_id":{"$oid":"5f25b1d787fc4c34a7d9aabe"},
"id":{"$numberInt":"448510476"},"first_name":"Arc","last_name":"Fl"}
I'm not sure how you are initializing your database but try this:
from pymongo import MongoClient
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.database #Selecting database named "database"
#find one in collection named "collection"
card = db.collection.find_one({"id": "448510476"})
print(card)
I want to iterate over documents inside mongodb using callback function in pymongo but I am getting an error in the foreach:
from pymongo import MongoClient
import pandas as pd
client = MongoClient('localhost', 27017)
db = client['testing']
collection_currency = db['testcol']
getdata=[]
cursor=collection_currency.find().forEach((data)=>{getdata=data})
df=pd.DataFrame(cursor)
df.to_csv("data.csv",index=False)
I'm getting this error
cursor=collection_currency.find().forEach((data)=>{getdata=data})
^
SyntaxError: invalid syntax
Make the following changes and it should work. The problem is you are trying to use Mongo shell commands with python.
query = {}
cursor = collection_currency.find(query)
df = pd.DataFrame(list(cursor))
To load a segment of data.
n_documents = 1000
skip_documents = 1000
cursor = collection_currency.find(query).skip(skip_documents).limit(n_documents)
df = pd.DataFrame(list(cursor))
To write data from the collection to a csv, use mongoexport
I'm trying to add a field in to an existing document with pymongo.
here is my code:
from pymongo import MongoClient
client = MongoClient()
db = client['profiles']
collection = db['collection']
def createFields():
collection.update({'_id' : '547f21f450c19fca35de53cd'}, {'$set': {'new_field':1}})
createFields()
when I enter the following in to the mongoDB interpreter
>use profiles
>db.collection.find()
I can see that there have not been any fields added to the specified document.
An _id field is most commonly of type ObjectId() which is a twelve byte value, rather than a 24-byte string as you are providing here.
You must use the correct type in order to match the document.
You should add db before your script
Just try this, it works fine for me
In Mongo CLI
> use profiles
> db.collection.insertOne({'_id': '547f21f450c19fca35de53cd'})
> db.collection.find()
{ "_id" : "547f21f450c19fca35de53cd" }
Python Script -> test.py
from pymongo import MongoClient
client = MongoClient()
db = client['profiles']
collection = db['collection']
def createFields():
db.collection.update({'_id' : '547f21f450c19fca35de53cd'}, {'$set': {'new_field':-881}})
createFields()
Command Line
> python test.py
Mongo CLI
> use profiles
> db.collection.find()
> { "_id" : "547f21f450c19fca35de53cd", "new_field" : -881 }
For pymongo, use update_one or update_many instead of update
I have a celery project connected to a MySQL databases. One of the tables is defined like this:
class MyQueues(Base):
__tablename__ = 'accepted_queues'
id = sa.Column(sa.Integer, primary_key=True)
customer = sa.Column(sa.String(length=50), nullable=False)
accepted = sa.Column(sa.Boolean, default=True, nullable=False)
denied = sa.Column(sa.Boolean, default=True, nullable=False)
Also, in the settings I have
THREADS = 4
And I am stuck in a function in code.py:
def load_accepted_queues(session, mode=None):
#make query
pool = session.query(MyQueues.customer, MyQueues.accepted, MyQueues.denied)
#filter conditions
if (mode == 'XXX'):
pool = pool.filter_by(accepted=1)
elif (mode == 'YYY'):
pool = pool.filter_by(denied=1)
elif (mode is None):
pool = pool.filter(\
sa.or_(MyQueues.accepted == 1, MyQueues.denied == 1)
)
#generate a dictionary with data
for i in pool: #<---------- line 90 in the error
l.update({i.customer: {'customer': i.customer, 'accepted': i.accepted, 'denied': i.denied}})
When running this I get an error:
[20130626 115343] Traceback (most recent call last):
File "/home/me/code/processing/helpers.py", line 129, in wrapper
ret_value = func(session, *args, **kwargs)
File "/home/me/code/processing/test.py", line 90, in load_accepted_queues
for i in pool: #generate a dictionary with data
File "/home/me/envs/me/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2341, in instances
fetch = cursor.fetchall()
File "/home/me/envs/me/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3205, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/home/me/envs/me/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3174, in _fetchall_impl
self._non_result()
File "/home/me/envs/me/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3179, in _non_result
"This result object does not return rows. "
ResourceClosedError: This result object does not return rows. It has been closed automatically
So mainly it is the part
ResourceClosedError: This result object does not return rows. It has been closed automatically
and sometimes also this error:
DBAPIError: (Error) (, AssertionError('Result length not requested
length:\nExpected=1. Actual=0. Position: 21. Data Length: 21',))
'SELECT accepted_queues.customer AS accepted_queues_customer,
accepted_queues.accepted AS accepted_queues_accepted,
accepted_queues.denied AS accepted_queues_denied \nFROM
accepted_queues \nWHERE accepted_queues.accepted = %s OR
accepted_queues.denied = %s' (1, 1)
I cannot reproduce the errror properly as it normally happens when processing a lot of data. I tried to change THREADS = 4 to 1 and errors disappeared. Anyway, it is not a solution as I need the number of threads to be kept on 4.
Also, I am confused about the need to use
for i in pool: #<---------- line 90 in the error
or
for i in pool.all(): #<---------- line 90 in the error
and could not find a proper explanation of it.
All together: any advise to skip these difficulties?
All together: any advise to skip these difficulties?
yes. you absolutely cannot use a Session (or any objects which are associated with that Session), or a Connection, in more than one thread simultaneously, especially with MySQL-Python whose DBAPI connections are very thread-unsafe*. You must organize your application such that each thread deals with it's own, dedicated MySQL-Python connection (and therefore SQLAlchemy Connection/ Session / objects associated with that Session) with no leakage to any other thread.
Edit: alternatively, you can make use of mutexes to limit access to the Session/Connection/DBAPI connection to just one of those threads at a time, though this is less common because the high degree of locking needed tends to defeat the purpose of using multiple threads in the first place.
I got the same error while making a query to SQL-Server procedure using SQLAlchemy.
In my case, adding SET NOCOUNT ON to the stored procedure fixed the problem.
ALTER PROCEDURE your_procedure_name
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for your procedure here
SELECT *
FROM your_table_name;
END;
Check out this article for more details
I was using an INSERT statment. Adding
RETURNING id
at the end of the query worked for me. As per this issue
That being said it's a pretty weird solution, maybe something fixed in later versions of SQLAlchemy, I am using 1.4.39.
This error occurred for me when I used a variable in Python
and parsed it with an UPDATE
statement using pandas pd.read_sql()
Solution:
I simply used mycursor.execute() instead of pd.read_sql()
import mysql.connector and from sqlalchemy import create_engine
Before:
pd.read_sql("UPDATE table SET column = 1 WHERE column = '%s'" % variable, dbConnection)
After:
mycursor.execute("UPDATE table SET column = 1 WHERE column = '%s'" % variable)
Full code:
import mysql.connector
from sqlalchemy import create_engine
import pandas as pd
# Database Connection Setup >
sqlEngine = create_engine('mysql+pymysql://root:root#localhost/db name')
dbConnection = sqlEngine.connect()
db = mysql.connector.connect(
host="localhost",
user="root",
passwd="root",
database="db name")
mycursor = db.cursor()
variable = "Alex"
mycursor.execute("UPDATE table SET column = 1 WHERE column = '%s'" % variable)
For me I got this error when I forgot to write the table calss name for the select function query = select().where(Assessment.created_by == assessment.created_by) so I had only to fix this by adding the class table name I want to get entries from like so:
query = select(Assessment).where(
Assessment.created_by == assessment.created_by)