Trying to parse Google Analytics API Python json response into python dataframe, and then ETL to MS SQL Server using python.
I get a successful output called feed
import json, gdata
data_query = gdata.analytics.client.DataFeedQuery({
'ids': 'ga:67981229',
'dimensions': 'ga:userType,ga:sessionCount,ga:source', ##ga:source,ga:medium
'metrics': 'ga:pageviews',
##'filters': 'ga:pagePath==/my_url_comes_here/',
##'segment':'',
'start-date': '2015-01-01',
'end-date': '2015-01-03',
'prettyprint': 'true',
'output':'json',
})
feed = my_client.GetDataFeed(data_query)
However, when I try to parse the the data using this code it doesn't work and I get the below error
response = json.parse(feed) ## I also tried json.load(feed) and json.loads(feed)
data = json.parse(feed)
Traceback (most recent call last):
File "", line 1, in
data = json.parse(feed)
AttributeError: 'module' object has no attribute 'parse'
data = json.loads(feed)
Traceback (most recent call last):
File "", line 1, in
data = json.loads(feed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
data = json.load(feed)
Traceback (most recent call last):
File "", line 1, in
data = json.load(feed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 286, in load
return loads(fp.read(),
AttributeError: 'DataFeed' object has no attribute 'read'
And I have already imported all of json as seen at the top, furthermore - my end objective is to ETL this to MS SQL Server - so any help on effective method to do this with a JSON Python object would help a LOT! Thanks!
Instead of parsing the json response manually into a dataframe you could try using the Pandas library which has built in methods to query the Google Analytics API. Once you get your Google Analytics Metrics into a dataframe, you could insert records into SQL Server using the to_sql method.
Related
I'm trying to post data to a machine learning api using elasticsearch. What format does the json docs need to be in?
I've attempted to send data with json docs separated by newline in a txt file. I've also tried converting back and forth to json using dump and load to no avail. The documentation states that the documents can be separated by whitespace, but no matter what I try it won't accept them.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-post-data.html
Here is an example of a json doc saved as file_name.json:
[{"myid": "id1", "client": "client1", "submit_date": 1514764857},
{"my_id": "id2", "client": "client_2", "submit_date": 1514764857}]
Here is the basic code needed to post data:
from elasticsearch import Elasticsearch
from elasticsearch.client.xpack import MlClient
es = elastic_connection()
es_ml = MlClient(es)
def post_training_data(directory='Training Data', file_name='file_name.json'):
with open(os.path.join(directory, file_name), mode='r') as train_file:
train_data = json.load(train_file)
es_ml.post_data(job_id=job_id, body=train_data)
post_training_data()
This is the specific error I am getting with this:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "..\train_model.py", line 218, in post_training_data
self.es_ml.post_data(job_id=self.job_id, body=train_data)
File "..\inc_anamoly\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "..\inc_anamoly\lib\site-packages\elasticsearch\client\xpack\ml.py", line 81, in post_data
body=self._bulk_body(body))
AttributeError: 'MlClient' object has no attribute '_bulk_body'
This turns out to be a bug. Issue reported.
https://github.com/elastic/elasticsearch-py/issues/959
I'm pretty new to Python and I'm trying to connect to smartsheet with API.
I have ran "pip install smartsheet-python-sdk" and it installed smartsheet as I can find it under "lib"
This is code I have found and supposed to work(I replaced the token with the token)
# Import.
import smartsheet
# Instantiate smartsheet and specify access token value.
smartsheet = smartsheet.Smartsheet('Token_here')
# Get all columns.
action = smartsheet.Sheets.get_columns('Template for Bram', include_all=True)
columns = action.data
# For each column, print Id and Title.
for col in columns:
print(col.id)
print(col.title)
print('')
It shows this error:
Traceback (most recent call last):
File "C:\Users\bram\Desktop\smartsheet.py", line 2, in <module>
import smartsheet
File "C:\Users\bram\Desktop\smartsheet.py", line 5, in <module>
smartsheet = smartsheet.Smartsheet('token_here')
AttributeError: 'module' object has no attribute 'Smartsheet'
Now I'm not sure what my next step is. I think I have followed all of the appropriate steps. When I run import smartsheet by itself it won't error out.
What am I doing wrong?
Thank you
Update***
After using the code from the github page and implementing my token and sheet id I get this error:
Traceback (most recent call last):
File "C:\Users\bvanhout\Desktop\test23.py", line 58, in <module>
sheet = ss.Sheets.get_sheet(sheet_id)
File "C:\Python27\lib\site-packages\smartsheet\sheets.py", line 460, in get_sheet
response = self._base.request(prepped_request, expected, _op)
File "C:\Python27\lib\site-packages\smartsheet\smartsheet.py", line 178, in request
res = self.request_with_retry(prepped_request, operation)
File "C:\Python27\lib\site-packages\smartsheet\smartsheet.py", line 242, in request_with_retry
return self._request(prepped_request, operation)
File "C:\Python27\lib\site-packages\smartsheet\smartsheet.py", line 210, in _request
raise UnexpectedRequestError(rex.request, rex.response)
UnexpectedRequestError: (<PreparedRequest [GET]>, None)
# TODO: Update this with the ID of your sheet to update
sheet_id = 48568543424234
I printed ss and ss.Sheets and both do not reflect the actual token or sheet_id
>>> print (ss.Sheets)
<smartsheet.sheets.Sheets object at 0x0000000003874438>
I suspect the problem is that you are using a local variable with the same name as the module ('smartsheet')
Please take a look at the sample here: https://github.com/smartsheet-samples/python-read-write-sheet
I wrote a simple python script to put the JSON file to Elasticsearch.I want to store it based on the id field I am extracting from the JSON file.
But when I try to insert into elastic search.It raises an error TypeError: expected string or buffer
Here is the code I am working on...
#! /usr/bin/python
import requests
from elasticsearch import Elasticsearch
import json
es = Elasticsearch([{'host':'localhost','port':9200}])
r = requests.get('http://127.0.0.1:9200')
i = 1
if r.status_code == 200:
with open('d1.json') as json_data:
d = json.load(json_data)
for i in d['tc'][0]['i]['f']['h']:
if i['name'] == 'm':
m = i['value']
dope=str(m)
print dope
print type(dope)
#print type(md5)
es.index(index='lab', doc_type='report',id=dope,body=json.loads(json_data))
Error Log:
44d88612fea8a8f36de82e1278abb02f
<type 'str'>
Traceback (most recent call last):
File "elastic_insert.py", line 22, in <module>
es.index(index='labs', doc_type='report',id=dope,body=json.loads(json_data))
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
Any suggestions on how to solve this error.I even tried to convert the m to int but it gave another error.
int(m)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '44d88612fea8a8f36de82e1278abb02f'
P.S: ElasticSearch service is up and running.
The problem is not related to the id. the problem is with
"json_data". it is a file stream so you need json.load and not json.loads in your es.index
I'm trying to take the JSON from a twitter get_user query and turn it into a Python object that I can extract data from (twitter handle, location, screen name, etc.)
Here is what I created. I am not sure why it doesn't work.
api = tweepy.API(auth,parser=tweepy.parsers.JSONParser())
user = api.search_users('google.com')
t_dict = json.loads(user)
pprint(t_dict)
Error:
Traceback (most recent call last):
File "Get_User_By_URL.py", line 23, in <module>
t_dict = json.loads(user)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
api.search_users is already returning a python object. It isn't a json string that needs to be parsed. According to tweetpy documentation search_users actually returns a list of users. So the following is possible:
for user in api.search_users('google.com'):
print user.screen_name
I'm new to python but would like to use urllib to download tweets, I'm following a tutorial instructions but get the same error every time, I print:
import urllib
import json
response = urllib.urlopen("https://twitter.com/search?q=Microsoft&src=tyah")
print json.load(response)
But everytime I get the error:
Traceback (most recent call last):
File "C:\Python27\print.py", line 4, in <module>
print json.load(response)
File "C:\Python27\Lib\json\__init__.py", line 278, in load
**kw)
File "C:\Python27\Lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\Lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\Lib\json\decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
As noted in comments, the answer is: nothing is wrong with your code, per se.
The problem is that when json.load looks at response, it does not find JSON in there - it is finding HTML.
You need to pass a file-like object containing JSON into the json.load function, or it will raise the exception you see here.
To get JSON from Twitter, you need to call a URL that gives a JSON response. I can tell you now, that none of the Web interface URLs do this directly. You should use the Twitter API.
However, purely for sake of demonstration, if you deconstruct the page at the URL you are calling now, you will find that to load the tweet data, the page makes the following request:
https://twitter.com/i/search/timeline?q=Microsoft&src=tyah&composed_count=0&include_available_features=1&include_entities=1
And this URL does return JSON in response, which would work just fine with your current code.
Of course, I'm pretty sure doing so violates some sort of Twitter TOS, so if you do this there are all sorts of potential negative repercussions to consider. Plus it's just not good sportsmanship. :)