Problem with updating NULL values in Salesforce using Python - python

I am trying to upsert the user data to salesforce using a python patch request. I have a dataset in the form of dataframe that consists of several null values. While trying to upsert the data to salesforce it throws an error that,
{'message': 'Deserializing the instance of date from VALUE_STRING, value nan, cannot be executed or the request is missing a required field at [line:1, column:27]', 'errorCode': 'JSON_PARSER_ERROR'}
To resolve the error I have tried to replace the values using None and also, Null as mentioned in the below code. Still, I receive the same error.
df_1.fillna(value=None,method = None,inplace=True)
df_1 = df_1.replace(np.NaN,"null")
The error then is :
{'message': 'Deserializing the instance of date from VALUE_STRING, value null, cannot be executed or the request is missing a required field at [line:1, column:27]', 'errorCode': 'JSON_PARSER_ERROR'}
Any possible leads would be immensely helpful

You'll need to find a way to inspect the final JSON generated just before it's sent out.
You can use workbench to experiment (there's "Utilities -> REST Explorer" after you log in) and over normal REST API the update is a straightforward operation
PATCH {your instance url}/services/data/v55.0/sobjects/Account/0017000000Lg8Wh
(put your account id) with body (either form works)
{
"BillingCity" : null,
"BillingCountry": ""
}
should clear the fields. A "null" won't work, it counts as a string.
====
If you're using "Bulk API 2.0" (https://trailhead.salesforce.com/content/learn/modules/api_basics/api_basics_bulk, I think you'd notice it's different, asynchronous dance of intialise job, upload data, start processing, periodically check "is it done yet"...) for JSON format null should work too, for XML you need special tag and if your format is CSV - it's supposed to be #N/A

Try
df.replace({np.nan: None}, inplace=True)
This is the equivalent of submitting a Null or Empty String value to Salesforce

Related

Set a default in schema property in python for sending data to bigquery

I have this piece of code in python,
Schema=[bigquery.SchemaField("data_extracao","TIMESTAMP", mode="NULLABLE")]
in it I define the schema that I will need to send to the bigquery, but I want to define a default value for a column within the schema, is this possible? I did some research and found this
Schema=[bigquery.SchemaField("data_extracao","TIMESTAMP", mode="NULLABLE", default=0)]
but this is also not working, I received the error "init() got an unexpected keyword argument 'default'"
.

How to get data which have a specific child key using Pyrebase

I'm using Pyrebase to access my Firebase database. My database is currently structured like so:
- users
- 12345
name: "Kevin"
company: "Nike"
Where 12345 is the user's id, and the company is the company that the user belongs to. I'm currently trying to get all the users that belong to Nike. According to the Pyrebase docs, doing something like this should work:
db.child("users").order_by_child("company").equal_to("Nike").get().val()
but I'm getting the error "error" : "orderBy must be a valid JSON encoded path". Does anyone know why this might be the case?
There is something wrong with the Pyrebase library. Here's a link to the problem.
The solution is to add these lines of code in your app.
# Temporarily replace quote function
def noquote(s):
return s
pyrebase.pyrebase.quote = noquote
I managed to fix this problem, since I'm also using rest api to connect with my firebase realtime database. I'll demonstrate where the error lies with examples:
When I don't put wrap the orderBy value (child, key, etc) and other queries parameters with commas, retrofit (which I'm using) gives me error/bad request.
Here's the error/bad request url:
https://yourfirebaseprojecturl.com/Users.json?orderBy=username&startAt=lifeofkevin
See, both the orderBy value and startAt value, in this case, username and lifeofkevin, are not wrapped with commas, like this "username" and "lifeofkevin", so it will return orderBy must be a valid JSON encoded path.
In order to work, I need to wrap my orderBy and other query parameters, with commas, so that Firebase returns the data, you want to work with.
Here's the second example, the correct one:
https://yourfirebaseprojecturl.com/Users.json?orderBy="username"&startAt="gang"
Now notice, the difference? Both values of orderBy and startAt are wrapped with commas so now they'll return the data you want to work with.

How do I set continuation tokens for Cosmos DB queries sent by document_client objects in Python?

I have an API that retrieves documents based on keywords that appear in document fields. I would like to paginate results so that I can return documents to a client sending a request, as well as allowing them to request more documents if they want. The query itself only takes a second or so in the browser when I am in the Azure Data Explorer, but it takes about a minute when I query using the Python DocumentDB library.
Looking at the Microsoft Cosmos DB REST API, it appears as if there are two tokens, x-ms_continuation and x-ms-max-item-count that are used.
It doesn't appear that putting these as entries in the options dictionary of document_client.QueryDocuments() does the trick.
In the GitHub repository, the Read() method references the options parameter:
headers = base.GetHeaders(self,
initial_headers,
'get',
path,
id,
type,
options)
# Read will use ReadEndpoint since it uses GET operation
url_connection = self._global_endpoint_manager.ReadEndpoint
result, self.last_response_headers = self.__Get(url_connection,
path,
headers)
Looking in base.py, where the file is located, I saw these two blocks of code
if options.get('continuation'):
headers[http_constants.HttpHeaders.Continuation] = (
options['continuation'])
if options.get('maxItemCount'):
headers[http_constants.HttpHeaders.PageSize] = options['maxItemCount']
This would appear to correspond to the two parameters above. However, when I set them as options in the query ({'continuation':True,'maxItemCount':10}), nothing changes.
The final query looks like
client.QueryDocuments(collection_link, query, {'continuation':True,'maxItemCount':10})
I have also tried using a string instead of an int for maxItemCount.
What am I doing incorrectly here?
Edit: The headers are the same as the two from the documentation above, from http_constants.py:
# Our custom DocDB headers
Continuation = 'x-ms-continuation'
PageSize = 'x-ms-max-item-count'
The way continuation token works is that when you query documents and there are more documents available matching that query, service returns you a marker (or a token) that you need to include in your next query. That will tell the service to fetch the documents from that marker and not the beginning.
So in your code, the very 1st query will have no continuation parameter (or null). When you get the result, you should find if or not a token is returned from the service. If no token is returned that means there's no more data available. However if a token is returned, you should include that in your query options in the 2nd query.
It turns out that the query results needed to be handled from the results object itself, and the method _fetch_function(options) should be called:
q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})
The data is contained in results_[n][0] and header information returned from the call is returned in results_[n][1].
You can also get the results in pages using fetch_next_block().
Note that: the user's code should not expose the continuation token
q = db_source._client.QueryDocuments(collection_link, query, {'maxItemCount': 10, 'continuation': True})
results = q.fetch_next_block()
ref: https://github.com/Azure/azure-documentdb-python/issues/98

How can one make Salesforce Bulk API calls via simple_salesforce?

I'm using the module simple-salesforce, and I'm not seeing anything in the docs about making bulk API calls. Anybody know how to do this?
https://github.com/simple-salesforce/simple-salesforce
The code does have some comments. There's also this readthedocs page but, even that looks like it could use some help.
Good stuff first, explanation below.
Code example (written assuming you're running the whole block of code at once):
from simple_salesforce import Salesforce
sf = Salesforce(<credentials>)
# query
accounts = sf.bulk.Account.query('SELECT Id, Name FROM Account LIMIT 5')
# returns a list of dictionaries similar to: [{'Name': 'Something totally new!!!', 'attributes': {'url': '/services/data/v38.0/sobjects/Account/object_id_1', 'type': 'Account'}, 'Id': 'object_id_1'}]
# assuming you've pulled data, modify it to use in the next statement
accounts[0]['Name'] = accounts[0]['Name'] + ' - Edited'
# update
result = sf.bulk.Account.update(accounts)
# result would look like [{'errors': [], 'success': True, 'created': False, 'id': 'object_id_1'}]
# insert
new_accounts = [{'Name': 'New Bulk Account - 1', 'BillingState': 'GA'}]
new_accounts = sf.bulk.Account.insert(new_accounts)
# new_accounts would look like [{'errors': [], 'success': True, 'created': True, 'id': 'object_id_2'}]
# upsert
accounts[0]['Name'] = accounts[0]['Name'].replace(' - Edited')
accounts.append({'Name': 'Bulk Test Account'})
# 'Id' is the column to "join" on. this uses the object's id column
upserted_accounts = sf.bulk.Account.upsert(accounts, 'Id')
# upserted_accounts would look like [{'errors': [], 'success': True, 'created': False, 'id': 'object_id_1'}, {'errors': [], 'success': True, 'created': True, 'id': 'object_id_3'}]
# how i assume hard_delete would work (i never managed to run hard_delete due to insufficient permissions in my org)
# get last element from the response.
# *NOTE* This ASSUMES the last element in the results of the upsert is the new Account.
# This is a naive assumption
new_accounts.append(upserted_accounts[-1])
sf.bulk.Account.hard_delete(new_accounts)
Using simple_salesforce, you can access the bulk api by doing
<your Salesforce object>.bulk.<Name of the Object>.<operation to perform>(<appropriate parameter, based on your operation>)
<your Salesforce object> is the object you get back from constructing simple_salesforce.Salesforce(<credentials>)
<credentials> is your username, password, security_token, and sandbox(bool, if you're connecting to a sandbox) or session_id. (these are the 2 ways that i know of)
<Name of the Object> is just Account or Opportunity or whatever object you're trying to manipulate
<operation to perform> is one of the below:
query
insert
update
upsert
hard_delete (my account did not have appropriate permissions to test this operation. any mention is pure speculation)
<appropriate parameter> is dependent on which operation you wish to perform
query - a string that contains a SOQL
insert - a list of dictionaries. remember to have a key for all fields required by your org when creating a new record
update - a list of dictionaries. you'll obviously need a valid Object Id per dictionary
upsert - a list of dictionaries and a string representing the "external id" column. The "external id" can be the Salesforce Object 'Id' or any other column; choose wisely. If any dictionary does not have a key that is the same as the "external id", a new record will be created.
What's returned: depends on the operation.
query returns a list of dictionaries with your results. In addition to the columns your query, each dictionary has an 'attributes' key. This contains a 'url' key, which looks like it can be used for api requests for the specific object, key and 'type' key, which is the type of the Object returned
insert/update/upsert returns a list of dictionaries. each dictionary is like {'errors': [], 'success': True, 'created': False, 'id': 'id of object would be here'}
Thanks to #ATMA's question for showing how was using query. With that question and the source code, was able to figure out insert, update, and upsert.
I ran into this same problem a few weeks ago. Sadly, there isn't a way to do it with simple-salesforce. My research through the source didn't seem to have any way to do it or to hack it to make it work.
I looked into a number of other Python based Bulk API Tools. These included Salesforce-bulk 1.0.7 (https://pypi.python.org/pypi/salesforce-bulk/1.0.7), Salesforce-bulkipy 1.0 (https://pypi.python.org/pypi/salesforce-bulkipy), and Salesforce_bulk_api (https://github.com/safarijv/salesforce-bulk-api).
I ran into some issues getting Salesforce-bulk 1.0.7 and Salesforce-bulkipy 1.0 configured on my system, but Salesforce_bulk_api worked pretty well. It uses simple-salesforce as the authentication mechanism but handles the creation of the bulk jobs and uploading the records for you.
A word of caution, simple-salesforce and the bulk APIs work differently. Simple-Salesforce work via REST so that you only create JSON strings - which are readily compatible with Python dicts. The Bulk APIs work with CSV files that are uploaded to Salesforce. Creating those CSVs can be a bit dangerous since the order of the field names in header must correspond to the order of the data elements in the file. It isn't a huge deal, but you need to me more careful when creating your CSV rows that the order matches between the header and data rows.

ObjectID generated by server on pymongo

I am using pymongo (python module for mongodb).
I want the ObjectID to be created automatically by the server, however it seems to be created by pymongo itself when we don't specify it.
The problem it raises is that I use ObjectID to sort by time (by just sorting by the _id field). However it seems that it is using the time set on each computer so we cannot truly rely on it.
Any idea on how to solve this problem?
If you call save and pass it a document without an _id field, you can force the server to add the _id instead of the client by setting the (enigmatically-named) manipulate option to False:
coll.save({'foo': 'bar'}, manipulate=False)
I'm not Python user but I'm afraid there's no way to generate _id by server. For performance reasons _id is always generated by driver thus when you insert a document you don't need to do another query to get the _id back.
Here's a possible way you can do it by generating a int sequence _id, just like the IDENTITY ID of SqlServer. To do this, you need to keep a record in you certain collection for example in my project there's a seed, which has only one record:
{_id: ObjectId("..."), seqNo: 1 }
The trick is, you have to use findAndModify to keep the find and modify in the same "transaction".
var idSeed = db.seed.findAndModify({
query: {},
sort: {seqNo: 1},
update: { $inc: { seqNo: 1 } },
new: false
});
var id = idSeed.seqNo;
This way you'll have all you instances get a unique sequence# and you can use it to sort the records.

Categories

Resources