Missing Data while Reading from Salesforce using Python

Missing Data while Reading from Salesforce using Python - python

I am trying to bulk read data from Salesforce using Python. This is creating an output JSON file. However, the file doesn't seem to contain all the data. It has some but not everything.
I confirmed the record id exists in Salesforce but not in JSON file. If I change the WHERE condition to be close around the missing id's modifieddate, it shows up in JSON file. I think there is some kind of size limit on response here but can't find anything.
Has anyone come across such kind of issue? TIA.
MissingSFData.py
...
sf_object = 'Account'
sf_conn = SalesforceOauthHook(self.sf_conn_id_client, self.sf_conn_id_user).sign_in()
bulk_query = 'select Id,IsDeleted from Account WHERE ModifiedDate >= 2021-06-17T23:10:00+00:00 AND ModifiedDate < 2021-06-21T23:15:00+00:00'
query_results = sf_conn.bulk.__getattr__(sf_object).query(bulk_query) /*bulk.py slightly different from default*/
...
SalesforceOauthHook.py
from simple_salesforce.api import Salesforce /**api.py slightly different from default**/
from airflow.hooks.base_hook import BaseHook
class SalesforceOauthHook(BaseHook):
...
def sign_in(self):
...
url = "https://{}.my.salesforce.com/services/oauth2/token".format(instance)
payload = "&".join([
"client_id={}".format(client_id),
"client_secret={}".format(client_secret),
"grant_type=password&",
"username={}".format(username),
"password={}".format(password)
])
headers = {
'content-type': "application/x-www-form-urlencoded"
}
response = requests.request("POST", url, data=payload, headers=headers)
credentials = response.json()
sf = Salesforce(instance_url=credentials["instance_url"],
session_id=credentials["access_token"],
version="47.0")

I found my issue is with bulk.py in simple_salesforce. It's only reading first batch. Here is the solution to read multiple batches.
https://github.com/simple-salesforce/simple-salesforce/issues/280

Related

insert variable into data request dictionary python

I am trying to build a simple record player with the spotify API and I would like to save the playlist id's in variables so it is easier to change or add in the future
import json
import requests
spotify_user_id = "...."
sgt_peppers_id = "6QaVfG1pHYl1z15ZxkvVDW"
class GetSongs:
def __init__(self):
self.user_id=spotify_user_id
self.spotify_token = ""
self.sgt_peppers_id = sgt_peppers_id
def find_songs(self):
query = "https://api.spotify.com/v1/me/player/play?
device_id=......"
headers={"Content.Type": "application/json", "Authorization": "Bearer
{}".format(self.spotify_token)}
data= '{"context_uri":"spotify:album:6QaVfG1pHYl1z15ZxkvVDW"}'
response = requests.put(query, headers=headers, data=data)
I would like to be able to have it like this:
data= '{"context_uri":f"spotify:album:{sgt_peppers_id}"}'
but sadly it doesnt work and all the other methods for inserting variables into strings dont work either. Hope somebody has the anser to this. thank you in advance!

The Spotify API is expecting the request body to be json, which you're currently building by hand. But, it looks like you're using a misspelled header: Content.Type instead of Content-Type (dot instead of dash).
Luckily, the python requests library can encode python objects into json for you and add the Content-Type headers automatically. It can also add the parameters to the url for you, so you don't have to create the ?query=string manually.
# We can add this to the string as a variable in the `json={...}` arg below
album_uri = "6QaVfG1pHYl1z15ZxkvVDW"
response = requests.put(
"https://api.spotify.com/v1/me/player/play", # url without the `?`
params={"device_id": "..."}, # the params -- ?device_id=...
headers={"Authorization": f"Bearer {self.spotify_token}"},
json={"context_uri": f"spotify:album:{album_uri}"},
)
Let the requests library do the work for you!

foursquare api data pull from databricks

I am using the following command to pull data from foursquare api which is working fine. How can I write the json output as table in databricks? I can't use show/display functions on data output.
import json, requests
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(
client_id='CLIENT_ID',
client_secret='CLIENT_SECRET',
v='20180323',
ll='40.7243,-74.0018',
query='coffee',
limit=1
)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)

You could read and write the data received as follows:
df = spark.read.json(resp.text)
location = 'dbfs:/tmp/test.json'
df.write.json(location)
and then create a table using the file created :
spark.sql(f'''
CREATE TABLE IF NOT EXISTS foursquare
USING JSON
LOCATION "{location}"
''')

Using API to create a new query on Redash

I managed to import queries into another account. I used the endpoint POST function given by Redash, it sort of just applies to just “modifying/replacing”: https://github.com/getredash/redash/blob/5aa620d1ec7af09c8a1b590fc2a2adf4b6b78faa/redash/handlers/queries.py#L178
So actually, if I want to import a new query what should I do? I want to create a new query that doesn’t exist on my account. I’m looking at https://github.com/getredash/redash/blob/5aa620d1ec7af09c8a1b590fc2a2adf4b6b78faa/redash/handlers/queries.py#L84
Following is the function which I made to create new queries if the query_id doesn’t exist.
url = path, api = user api, f = filename, query_id = query_id of file in local desktop
def new_query(url, api, f, query_id):
headers ={'Authorization': 'Key {}'.format(api), 'Content-Type': 'application/json'}
path = "{}/api/queries".format(url)
query_content = get_query_content(f)
query_info = {'query':query_content}
print(json.dumps(query_info))
response = requests.post(path, headers = headers, data = json.dumps(query_info))
print(response.status_code)
I am getting response.status_code 500. Is there anything wrong with my code? How should I fix it?

For future reference :-) here's a python POST that creates a new query:
payload = {
"query":query, ## the select query
"name":"new query name",
"data_source_id":1, ## can be determined from the /api/data_sources end point
"schedule":None,
"options":{"parameters":[]}
}
res = requests.post(redash_url + '/api/queries',
headers = {'Authorization':'Key YOUR KEY'},
json=payload)
(solution found thanks to an offline discussion with #JohnDenver)

TL;DR:
...
query_info = {'query':query_content,'data_source_id':<find this number>}
...
Verbose:
I had a similar problem. Checked redash source code, it looks for data_source_id. I added the data_source_id to my data payload which worked.
You can find the appropriate data_source_id by looking at the response from a 'get query' call:
import json
def find_data_source_id(url,query_number,api)
path = "{}/api/queries/{}".format(url,query_number)
headers ={'Authorization': 'Key {}'.format(api), 'Content-Type': 'application/json'}
response = requests.get(path, headers = headers)
return json.loads(response.text)['data_source_id']

The Redash official API document is so lame, it doesn't give any examples for the documented "Common Endpoints". I was having no idea how I should use the API key.
Instead check this saviour https://github.com/damienzeng73/redash-api-client .

API gives only the headers in Python but not the data

I am trying to access an API from this website. (https://www.eia.gov/opendata/qb.php?category=717234)
I am able to call the API but I am getting only headers. Not sure if I am doing correctly or any additions are needed.
Code:
import urllib
import requests
import urllib.request
locu_api = 'WebAPI'
def locu_search(query):
api_key = locu_api
url = 'https://api.eia.gov/category?api_key=' + api_key
locality = query.replace(' ', '%20')
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')
data = json.loads(json_obj)
When I try to print the results to see whats there in data:
data
I am getting only the headers in JSON output. Can any one help me figure out how to do extract the data instead of headers.

Avi!
Look, the data you posted seems to be an application/json response. I tried to reorganize your snippet a little bit so you could reuse it for other purposes later.
import requests
API_KEY = "insert_it_here"
def get_categories_data(api_key, category_id):
"""
Makes a request to gov API and returns its JSON response
as a python dict.
"""
host = "https://api.eia.gov/"
endpoint = "category"
url = f"{host}/{endpoint}"
qry_string_params = {"api_key": api_key, "category_id": category_id}
response = requests.post(url, params=qry_string_params)
return response.json()
print(get_categories_data(api_key=API_KEY, category_id="717234"))
As far as I can tell, the response contains some categories and their names. If that's not what you were expecting, maybe there's another endpoint that you should look for. I'm sure this snippet can help you if that's the case.
Side note: isn't your API key supposed to be private? Not sure if you should share that.
Update:
Thanks to Brad Solomon, I've changed the snippet to pass query string arguments to the requests.post function by using the params parameter which will take care of the URL encoding, if necessary.

You haven't presented all of the data. But what I see here is first a dict that associates category_id (a number) with a variable name. For example category_id 717252 is associated with variable name 'Import quantity'. Next I see a dict that associates category_id with a description, but you haven't presented the whole of that dict so 717252 does not appear. And after that I would expect to see a third dict, here entirely missing, associating a category_id with a value, something like {'category_id': 717252, 'value': 123.456}.
I think you are just unaccustomed to the way some APIs aggressively decompose their data into key/value pairs. Look more closely at the data. Can't help any further without being able to see the data for myself.

Issues while inserting data in cloudant DB

I am working a project, where in I am suppose to get some user input through web application, and send that data to cloudant DB. I am using python for the use case. Below is the sample code:
import requests
import json
dict_key ={}
key = frozenset(dict_key.items())
doc={
{
"_ID":"1",
"COORD1":"1,1",
"COORD2":"1,2",
"COORD3":"2,1",
"COORD4":"2,2",
"AREA":"1",
"ONAME":"abc",
"STYPE":"black",
"CROPNAME":"paddy",
"CROPPHASE":"initial",
"CROPSTARTDATE":"01-01-2017",
"CROPTYPE":"temp",
"CROPTITLE":"rice",
"HREADYDATE":"06-03-2017",
"CROPPRICE":"1000",
"WATERRQ":"1000",
"WATERSRC":"borewell"
}
}
auth = ('uid', 'pwd')
headers = {'Content-type': 'application/json'}
post_url = "server_IP".format(auth[0])
req = requests.put(post_url, auth=auth,headers=headers, data=json.dumps(doc))
#req = requests.get(post_url, auth=auth)
print json.dumps(req.json(), indent=1)
When I am running the code, I am getting the below error:
"WATERSRC":"borewell"
TypeError: unhashable type: 'dict'
I searched a bit, and found below stackflow link as a prospective resolution
TypeError: unhashable type: 'dict'
It says that "To use a dict as a key you need to turn it into something that may be hashed first. If the dict you wish to use as key consists of only immutable values, you can create a hashable representation of it like this:
key = frozenset(dict_key.items())"
I have below queries:
1) I have tried using it in my code above,but I am not sure if I have used it correctly.
2) To put the data in the cloudant DB, I am using Python module "requests". In the code, I am using the below line to put the data in the DB:
req = requests.put(post_url, auth=auth,headers=headers, data=json.dumps(doc))
But I am getting below error:
"reason": "Only GET,HEAD,POST allowed"
I searched on that as well, and I found IBM BLuemix document about it as follows
https://console.ng.bluemix.net/docs/services/Cloudant/basics/index.html#cloudant-basics
As I referred the document, I can say that I am using the right option. But may be I am wrong.

If you are adding a document to the database and you know the the _id, then you need to do an HTTP POST. Here's some slightly modified code:
import requests
import json
doc={
"_id":"2",
"COORD1":"1,1",
"COORD2":"1,2",
"COORD3":"2,1",
"COORD4":"2,2",
"AREA":"1",
"ONAME":"abc",
"STYPE":"black",
"CROPNAME":"paddy",
"CROPPHASE":"initial",
"CROPSTARTDATE":"01-01-2017",
"CROPTYPE":"temp",
"CROPTITLE":"rice",
"HREADYDATE":"06-03-2017",
"CROPPRICE":"1000",
"WATERRQ":"1000",
"WATERSRC":"borewell"
}
auth = ('admin', 'admin')
headers = {'Content-type': 'application/json'}
post_url = 'http://localhost:5984/mydb'
req = requests.post(post_url, auth=auth,headers=headers, data=json.dumps(doc))
print json.dumps(req.json(), indent=1)
Notice that
the _id field is supplied in the doc and is lower case
the request call is a POST not a PUT
the post_url contains the name of the database being written to - in this case mydb
N.B in the above example I am writing to local CouchDB, but replacing the URL with your Cloudant URL and adding correct credentials should get this working for you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Missing Data while Reading from Salesforce using Python - python

I found my issue is with bulk.py in simple_salesforce. It's only reading first batch. Here is the solution to read multiple batches. https://github.com/simple-salesforce/simple-salesforce/issues/280

Related

insert variable into data request dictionary python

foursquare api data pull from databricks

Using API to create a new query on Redash

API gives only the headers in Python but not the data

Issues while inserting data in cloudant DB

Categories

Resources