I'm currently trying to pull GA data using Python; I've gotten as far as retrieving a list of DataPoint objects, and I can see inside them using .list, but I can't access their values directly.
For example, I've got this
>>> print(data.list)
[[[u'Android Browser'], [80]], [[u'Chrome'], [127]], [[u'Firefox'], [78]], [[u'Internet Explorer'], [564]], [[u'Mozilla'], [2]], [[u'Mozilla Compatible Agent'], [7]], [[u'Opera'], [2]], [[u'Safari'], [175]]]
But when I try to do this
data[0]
I get this
<googleanalytics.data.DataPoint object at 0x00D06DB0>
which is just a black box to me; I can't get inside it to split up the content for actual use.
I got one lucky guess: the first of the pair of attributes is called 'title'.
"data[0].title" gives me this
'ga:browser=Android Browser'
which I can use. I just need that second attribute name. Does anybody know it?
Thanks a lot!
There's a page at the documentation explaining each field.
http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDataFeed.html#dataResponse
I figured it out: I was able to crack open the object using the inspect module, and that told me that the attributes were accessible using the same names I used in the query. Convenient language, this Python.
Related
i tried to replace spaces in a variable in python but it returns me this error
AttributeError: 'HTTPHeaders' object has no attribute 'replace'
this is my code
for req in driver.requests:
print(req.headers)
d = req.headers
x = d.replace("""
""", "")
So, if you check out the class HTTPHeaders you'll see it has a __repr__ function and that it's an HTTPMessage object.
Depending on what you exactly want to achieve (which is still not clear to me!, i.e, for which header do you want to replace spaces?) you can go about this two ways. Use the methods on the HTTPMessage object (documented here) or use the string version of it by calling repr on the response. I recommend you use the first approach as it is much cleaner.
I'll give an example in which I remove spaces for all canary values in all of the requests:
for req in driver.requests:
canary = req.headers.get("canary")
canary = canary.replace(" ", "")
P.S., your question is nowhere near clear enough as it stands. Only after asking multiple times and linking your other question it becomes clear that you are using seleniumwire, for example. Ideally, the code you provide can be run by anyone with the installed packages and reproduces the issue you have. BUT, allright, the comments made it more clear.
I have a document reference that I am retreiving from a query on my Firestore database. I want to use the DocumentReference as a query parameter for another query. However, when I do that, it says
TypeError: sequence item 1: expected str instance, DocumentReference found
This makes sense, because I am trying to pass a DocumentReference in my update statement:
db.collection("Teams").document(team).update("Dictionary here") # team is a DocumentReference
Is there a way to get the document name from a DocumentReference? Now before you mark this as duplicate: I tried looking at the docs here, and the question here, although the docs were so confusing and the question had no answer.
Any help is appreciated, Thank You in advance!
Yes,split the .refPath. The document "name" is always the last element after the split; something like lodash _.last() can work, or any other technique that identifies the last element in the array.
Note, btw, the refPath is the full path to the document. This is extremely useful (as in: I use it a lot) when you find documents via collectionGroup() - it allows you to parse to find parent document(s)/collection(s) a particular document came from.
Also note: there is a pseudo-field __name__ available. (really an alias of documentID()). In spite of it's name(s), it returns the FULL PATH (i.e. refPath) to the document NOT the documentID by itself.
I think I figured out - by doing team.path.split("/")[1] I could get the document name. Although this might not work for all firestore databases (like subcollections) so if anyone has a better solution, please go ahead. Thanks!
I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap
I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.
After executing some of an example syntaxnet scripts(like parse.sh) I receive output in text-conll format. My goal is to take some features and proceed them to next network. One possible choice is to parse text output with something like nltk.corpus.reader.ConllCorpusReader to a python object. But for me interesting
is:
It is possible with some code modification to get from SyntaxNet not text, but Python object related to parsed results?
I've found that in parser_eval.py on lines 133-138 syntaxnet fetched already text version of results.
while True:
tf_eval_epochs, tf_eval_metrics, tf_documents = sess.run([
parser.evaluation['epochs'],
parser.evaluation['eval_metrics'],
parser.evaluation['documents'],
])
But I cannot locate the place from what object this text was generated and how.
There are many ways to do it, and from what I know all involve parsing the output of SyntaxNet, and load it into NLTK objects. I wrote a simple post on my blog, exemplifying it:
http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/
I’m trying to write a response from a Solr server to a CSV file. I’m pretty new to python and have been given code to modify. Originally the code looked like this ...
for doc in response.results:
status = json.loads(doc['status'])
The script runs and prints the correct information. But it only every prints one result (last one). I think this is because the loop constantly writes over the varible 'status' until its worked through the response.
After some reading I decided to store the information in a list. That way i could print the information to seprate lines in a list. I created an empty list and changed the code below -
for doc in response.results:
list.append = json.loads(doc['status'])
I got this response back after trying to run the code -
`AttributeError: 'list' object attribute 'append' is read-only`.
Where am I going wrong? Is a list not the best approach?
>>> list.append
<method 'append' of 'list' objects>
You're trying to modify the append method of the built-in list class!
Just do
docstats = []
for doc in response.results:
docstats.append(json.loads(doc['status']))
or equivalently:
docstats = [json.loads(doc['status']) for doc in response.results]
I'm not sure what you are trying to do.
I guess you haven't created a list variable. list is a python's builtin class for lists, so if there's no variable to mask it, you'll access that. And you tried to modify one of it's propterties, which is not allowed (it's not like ruby where you can monkey-patch anything).
Is this what you want? :
l=[]
for doc in response.results:
l.append(json.loads(doc[‘status’]))
Try
list.append(json.loads(doc['status']))