I am trying to use the Google Docs API to auto generate a document based on values from a JSON file.
I have created a template Google Doc which is structured like this:
I went to the shop and bought the following item: {{foodtype}}.
It cost {{price}}.
The JSON looks like this:
{
'food':'apples",
'price':'10'
}
So using the Google Docs replaceAllText method I can use a Python script to update the doc like this:
def update_doc(food_json,docs)
requests=[
{
'replaceAllText': {
'containsText': {
'text': '{{foodtype}}',
'matchCase': 'true'
},
'replaceText': 'food_json['food']',
}},]
doc_id = 'xxxxxxxxxxxxx'
result = docs.documents().batchUpdate(documentId=doc_id, body={'requests': requests}).execute()
This works perfectly. The {{foodtype}} tag is replaced in the document with the string from the JSON.
My problem is that I cannot do this when I want to add multiple updates to a document. Sometimes there might be 5, 10 or even 50 items to add so manually adding tags to a template is not possible.
For example if my JSON looked like this:
{
{basket:
{'food':'apples','price':'10'},
{'food':'bread', 'price':'15'}
{'food':'bananas', 'price': '5'}
}
etc etc etc
}
I want to be able to write a for loop to iterate through the JSON and write a report for every item in the JSON e.g.:
I went to the shop and bought the following item: apple.
It cost 10 cents.
I went to the shop and bought the following item: bread.
It cost 15 cents.
I went to the shop and bought the following item: bananas.
It cost 5 cents.
The code would look something like this:
requests = []
for f in food_json['basket']:
requests.append(f)
requests=[
{
'replaceAllText': {
'containsText': {
'text': '{{foodtype}}',
'matchCase': 'true'
},
'replaceText': 'f['food']',
}},
'replaceAllText': {
'containsText': {
'text': '{{price}}',
'matchCase': 'true'
},
'replaceText': 'f['price']',
}},
]
*Push the update to the Google docs*
However this fails because on the first iteration of the for loop, the text matching the {{foodtype}} tag is replaced and then there is nothing to replace on the next iteration.
I'm stuck as to how to proceed.
I have previously used MS Word and the awesome python-docx-template library to publish documents in this way. It was possible to add Jinja tags in the Word template and then use a for loop directly in the document to achieve what I wanted e.g. just put something like this in a template:
{% for b in basket %}
I went to the shop and bought:
{{ b.food }}
The price was:
{{ b.price }}
{% endfor %}
This worked perfectly, but I cannot see how to reproduce the same result with the Google Docs API and Python. Any suggestions?
I believe your situation and goal are as follows.
In your situation, there are several same texts of {{foodtype}} in a Document. And the number of {{foodtype}} is the same as the value of food.
You want to replace 1st {{foodtype}} with the 1st value of food.
You want to achieve this using googleapis for python.
If my understanding is correct, unfortunately, in the current stage, when replaceAllText of Docs API is used, all values of {{foodtype}} are replaced with the 1st value of food in one batch request. I thought that this might be the reason for your issue.
Flow:
In order to achieve your goal using Docs API with python, as a workaround, I would like to propose the following flow.
Search each {{foodtype}} and {{price}} in the texts of Google Document.
In this case, I used documents.get method.
Add the count to the texts of {{foodtype}} and {{price}}. It's like {{foodtype1}}, {{price1}}, {{foodtype2}} and {{price2}} and so on.
For the texts like {{foodtype1}}, {{price1}}, {{foodtype2}} and {{price2}} and so on, each value is replaced with them using replaceAllText request.
In this flow, the batchUpdate request is used for one call. The sample script is as follows.
Sample script:
# Please set the values you want to replace.
sample = {'basket': [
{'food': 'apples', 'price': '10'},
{'food': 'bread', 'price': '15'},
{'food': 'bananas', 'price': '5'}
]}
documentId = '###' # Please set your document ID.
docs = build('docs', 'v1', credentials=creds)
obj = docs.documents().get(documentId=documentId, fields='body').execute()
content = obj.get('body').get('content')
foodCount = 0
priceCount = 0
requests = []
for c in content:
if 'paragraph' in c:
p = c.get('paragraph')
for e in p.get('elements'):
textRun = e.get('textRun')
if textRun:
text = textRun.get('content')
if '{{foodtype}}' in text:
foodCount += 1
requests.append(
{
"replaceAllText": {
"replaceText": sample['basket'][foodCount - 1]['food'],
"containsText": {
"text": '{{foodtype' + str(foodCount) + '}}',
"matchCase": True
}
}
}
)
requests.append({
"insertText": {
"location": {
"index": e['startIndex'] + text.find('{{foodtype}}') + len('{{foodtype')
},
"text": str(foodCount)
}})
if '{{price}}' in text:
priceCount += 1
requests.append(
{
"replaceAllText": {
"replaceText": sample['basket'][priceCount - 1]['price'],
"containsText": {
"text": '{{price' + str(priceCount) + '}}',
"matchCase": True
}
}
}
)
requests.append({
"insertText": {
"location": {
"index": e['startIndex'] + text.find('{{price}}') + len('{{price')
},
"text": str(priceCount)
}})
if requests != []:
requests.reverse()
docs.documents().batchUpdate(documentId=documentId, body={'requests': requests}).execute()
Note:
In your question, the following values are shown as a sample value.
{
{basket:
{'food':'apples','price':'10'},
{'food':'bread', 'price':'15'}
{'food':'bananas', 'price': '5'}
}
etc etc etc
}
But, I thought that this might not be correct. So in my sample script, I used the following sample value. Please modify each value for your actual situation.
sample = {'basket': [
{'food': 'apples', 'price': '10'},
{'food': 'bread', 'price': '15'},
{'food': 'bananas', 'price': '5'}
]}
In this sample script, as the sample for explaining my workaround, the paragraphs in Google Document are searched. For example, when you want to search the texts in the tables, please modify the above script.
This sample script supposes that you have already been able to get and put values for Google Document using Docs API. Please be careful this.
References:
Method: documents.get
Method: documents.batchUpdate
ReplaceAllTextRequest
Related
I've been using the standard Python ElasticSearch client to make single requests in the following format:
es.search(index='my_index', q=query, size=5, search_type='dfs_query_then_fetch')
I now want to make queries in batch for multiple strings q.
I've seen this question explaining how to use the msearch() functionality to do queries in batch. However, msearch requires the full json-formatted request body for each request. I'm not sure which parameters in the query API correspond to just the q parameter from search(), or size, or search_type, which seem to be API shortcuts specific to the single-example search().
How can I use msearch but specify q, size, and search_type?
I read through the API and figured out how to batch simple search queries:
from typing import List
from elasticsearch import Elasticsearch
import json
def msearch(
es: Elasticsearch,
max_hits: int,
query_strings: List[str],
index: str
):
search_arr = []
for q in query_strings:
search_arr.append({'index': index })
search_arr.append(
{
"query": {
"query_string": {
"query": q
}
},
'size': max_hits
})
request = ''
request = ' \n'.join([json.dumps(x) for x in search_arr])
resp = es.msearch(body = request)
return resp
msearch(es, query_strings=['query 1', 'query 2'], max_hits=1, index='my_index')
EDIT: For my use case, I made one more improvement, which was because I didn't want to return the entire document in the result– for my purpose, I just needed the document ID and its score.
So the final search request object part looked like this, including the '_source': False bit:
search_arr.append(
{
# Queries `q` using Lucene syntax.
"query": {
"query_string": {
"query": q
},
},
# Don't return the full profile string, etc. with the result.
# We just want the ID and the score.
'_source': False,
# Only return `max_hits` documents.
'size': max_hits
}
)
I would first like to say that I am very new to coding and know the very basics at best
I was tasked with scraping data from facebook and running a sentiment analysis on it. I got the data using scraping-bot.io and I have it on a json file with the following format
{
"owner_url": "https://www.facebook.com/########",
"url": "https://www.facebook.com/post",
"name": "Page name",
"date": "date",
"post_text": "Post title",
"media_url": "media url attached",
"likes": ###,
"shares": ###,
"num_comments": ###,
"scrape_time": "date",
"comments": [
{
"author_name": "Name",
"text": "Comment text",
"created": "Date"
},
The posts are in spanish and so I looked up for a library to run the analysis with. I settled on https://pypi.org/project/sentiment-analysis-spanish/ (not sure if it's the best one, so I'm open to suggestions on that front as well)
Ideally I would like to be able to open the json file, run the sentiment analysis on "text" and then save that data into the same or a new file to visualize in another program.
This is what I have so far
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('C:/Users/vnare/Documents/WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
for i in range(len('text')):
print(sentiment.sentiment(i))
Currently it gives me the following error AttributeError: 'int' object has no attribute 'lower'
But I'm sure there's far more that I'm doing wrong there.
I appreciate any help provided
AttributeError: 'int' object has no attribute 'lower' means integer cannot be lower-cased. This means that somewhere in your code, you are trying to call the lower() string method on an integer.
If you take a look at the documentation for the sentiment analysis you provided, you will see that print(sentiment.sentiment("something")) will evaluate the sentiment of "something" and give you a score between 1 and 0.
My guess is that when you call sentiment.sentiment("some text") it will use lower() to convert whatever text is passed through to all lowercase. This would be fine if you were passing a string, but you are currently passing an integer!
By using for i in range(), you are indicating that you would like to take a range of numbers from 0 to the end number. This means that your i will always be an integer!
You need to instead loop through your JSON data to access the key/value pairs. "text" cannot be accessed directly as you've done above, but from within the JSON data, it can be!
https://www.geeksforgeeks.org/json-with-python/
The important thing to look at is the format of the JSON data that you are trying to access. First, you need to access a dictionary key named "comments". However, what is inside of 'comments'?
[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
It's actually another dictionary of key-value pairs inside of a list. Given that list indices start at 0 and there is only one list element (the dictionary) in your example, we need to next use the index 0 to access the dictionary inside. Now, we will look for the key 'text' as you were initially.
When you are learning python, I highly recommend using a lot of print statements when trying to debug! This helps you see what your program sees so you know where the errors are.
import json
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
print(data)
comments = data['comments']
print(comments)
text = comments[0]['text']
print(text)
sentimentScore = sentiment.sentiment(text)
print(sentimentScore)
When you run this, the output will show you what is inside 'data', what is inside 'comments', what is inside 'text', and what the sentiment score is.
{'owner_url': 'https://www.facebook.com/########', 'url': 'https://www.facebook.com/post', 'name': 'Page name', 'date': 'date', 'post_text': 'Post title', 'media_url': 'media url attached', 'likes': 234, 'shares': 500, 'num_comments': 100, 'scrape_time': 'date', 'comments': [{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]}
[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
Comment text
0.49789225920557484
This is what helped me see that inside of 'comments' was a dictionary within a list.
Now that you understand how it works, here is a more efficient way to run the code without all the extra prints! You can see I am now implementing the for loop you used earlier, as there may be multiple comments in a real-life scenario.
import json
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
comments = data['comments']
i = 0
for i in range (len(comments)):
comment = comments[i]['text']
sentimentScore = sentiment.sentiment(comment)
print(f"The sentiment score of this comment is {sentimentScore}.")
print(f"The comment was: '{comment}'.")
This results in the following output.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 1'.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 2'.
This is the file that I used for reference.
{
"owner_url": "https://www.facebook.com/########",
"url": "https://www.facebook.com/post",
"name": "Page name",
"date": "date",
"post_text": "Post title",
"media_url": "media url attached",
"likes": 234,
"shares": 500,
"num_comments": 100,
"scrape_time": "date",
"comments": [
{
"author_name": "Name",
"text": "Comment 1",
"created": "Date"
},
{
"author_name": "Name",
"text": "Comment 2",
"created": "Date"
}
]
}
I have some Python code that retrieves the rows in a Notion database using the notion-client library. The code does manage to retrieve all the rows, but the order is wrong. I looked at the Sort object from the API reference, but was unable to figure out how to use it to return the rows in the exact order in which they're displayed on notion.so. Here's the snippet in question:
from notion_client import Client
notion = Client(auth=NOTION_API_TOKEN)
result = notion.databases.query(database_id='...')
for row in result['results']:
title = row['properties']['NAME_OF_PROPERTY']['title']
if len(title) == 0:
print('')
else:
print(title[0]['plain_text'])
What am I missing?
The Notion API does not support views in the current version, so it is not necessarily going to match the order you have it in unless you have applied a sort or filter that you can also apply via the API.
This is working as well as their documentation
const response = await notion.databases.query({
database_id: databaseId,
filter: {
or: [
{
property: 'In stock',
checkbox: {
equals: true,
},
},
{
property: 'Cost of next trip',
number: {
greater_than_or_equal_to: 2,
},
},
],
},
sorts: [
{
property: 'Last ordered',
direction: 'ascending',
},
],
});
Use the order argument to notion.databases.query(). This argument is a list of sort specifications, which are dictionaries.
result = notion.databases.query(
database_id = 'df4dfb3f-f36f-462d-ad6e-1ef29f1867eb',
sort = [{"property": "NAME_OF_PROPERTY", "direction": "ascending"}]
)
You can put multiple sort specifications in the list, the later ones will be used for rows that are equal in the preceding properties.
Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards
Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]
I am having trouble updating document in MongoDB that involves adding to list and updating some fields, using Pymongo.
To summarize, I would like to:
Add a value to the a list.
Update some fields.
Using a single update statement.
I have tried 2 methods, but both doesn't work:
key = {'username':'user1'}
user_detail = {
'name':{'first':'Marie', 'last':'Bender'},
'items':{'$addtoset':{'cars':'BMW'}}
}
user_detail2 = {
'name':{'first':'Marie', 'last':'Bender'},
'$addtoset':{'items.cars':'BMW'}
}
mongo_collection.update(key, user_detail, upsert=True)
mongo_collection.update(key, user_detail2, upsert=True)
error message: dollar ($) prefixed field '$addToSet' in '$addToSet' is not valid for storage.
My intended outcome:
Before:
{
'username':'user1',
'item': {'cars':['Merc','Ferrari'],'house':1}
}
Intended After:
{
'username':'user1',
'name': {'first':'Marie', 'last':'Bender'},
'item': {'cars':['Merc','Ferrari','BMW'],'house':1}
}
Your second attempt is closer, but you need to use the $set operator to set the value of name:
user_detail2 = {
'$set': {'name': {'first': 'Marie', 'last': 'Bender'}},
'$addtoset': {'items.cars': 'BMW'}
}