Google Docs API programmatically adding a table of content

Google Docs API programmatically adding a table of content - python

I have a python script which does some analysis and output the results as text (paragraphs) on a Google Doc. I know how to insert text, update paragraph and text style through batchUpdate.
doc_service.documents().batchUpdate(documentId=<ID>,body={'requests': <my_request>}).execute()
where, for instance, "my_request" takes the form of something like:
request = [
{
"insertText": {
"location": {
"index": <index_position>,
"segmentId": <id>
},
"text": <text>
}
},
{
"updateParagraphStyle": {
"paragraphStyle": {
"namedStyleType": <paragraph_type>
},
"range": {
"segmentId": <id>,
"startIndex": <index_position>,
"endIndex": <index_position>
},
"fields": "namedStyleType"
}
},
]
However, once the script is done updating the table, it would be fantastic if a table of content could be added at the top of the document.
However, I am very new to Google Docs API and I am not entirely sure how to do that. I know I should use "TableOfContents" as a StructuralElement. I also know this option currently does not update automatically after each modification brought to the document (this is why I would like to create it AFTER the document has finished updating and place it at the top of the document).
How to do this with python? I am unclear where to call "TableOfContents" in my request.
Thank you so very much!

After your comment, I was able to understand better what you are desiring to do, but I came across these two Issue Tracker's posts:
Add the ability to generate and update the TOC of a doc.
Geting a link to heading paragraph.
These are well-known feature requests that unfortunately haven't been implemented yet. You can hit the ☆ next to the issue number in the top left on this page as it lets Google know more people are encountering this and so it is more likely to be seen faster.
Therefore, it's not possible to insert/update a table of contents programmatically.

Related

Set context from custom XBRL file

I'm able to read a custom XBRL file. The problem is that the parsed object has the amounts of the initial period (last december) and not the last accountable period.
from xbrl import XBRLParser, GAAP, GAAPSerializer
# xbrl comes from python-xbrl package
xbrl_parser = XBRLParser()
with open('filename.xbrl') as file:
xbrl = xbrl_parser.parse(file)
custom_obj = xbrl_parser.parseCustom(xbrl)
print(custom_obj.cashandcashequivalents)
This prints the cash of 2021/12 not 2022/06 as expected
Current output: 100545101000
Expected: 81518021000
I think those number are the ones you can see in lines 9970 and 9972 of xbrl file.
These are the lines:
9970: <ifrs-full:CashAndCashEquivalents decimals="-3" contextRef="CierreTrimestreActual" unitRef="CLP">81518021000</ifrs-full:CashAndCashEquivalents>
9972: <ifrs-full:CashAndCashEquivalents decimals="-3" contextRef="SaldoActualInicio" unitRef="CLP">100545101000</ifrs-full:CashAndCashEquivalents>
How can I set the context/contextRef so the custom_obj has the numbers of the latest periods?
XBRL file: https://www.cmfchile.cl/institucional/inc/inf_financiera/ifrs/safec_ifrs_verarchivo.php?auth=&send=&rut=70016160&mm=06&aa=2022&archivo=70016160_202206_C.zip&desc_archivo=Estados%20financieros%20(XBRL)&tipo_archivo=XBRL

I've never used python-xbrl, but from a quick look at the source code it looks very basic and makes lots of unwarranted assumptions about the structure of the document. It doesn't appear to have any support for XBRL Dimensions, which the report you're using makes use of.
The module isn't built on a proper model of the XBRL data which would give you easy access to each fact's properties such as the period, and allow you to easily filter down to just the facts that you want.
I don't think the module will allow you to do what you want. Looking at this code it just iterates over all the facts, and sticks them onto properties on an object, so whichever fact it hits last in the document will be the one that you get, and given that order isn't important in XBRL files, it's just going to be pot luck which one you get.
I'd strongly recommend switching to a better XBRL library. Arelle is probably the most widely used, although you could also use my own pxp.
As an example, either tool can be used to convert the XBRL to JSON format, and will give you facts like this:
"f126928": {
"value": "81518021000",
"decimals": -3,
"dimensions": {
"concept": "ifrs-full:CashAndCashEquivalents",
"entity": "scheme:70016160-9",
"period": "2022-07-01T00:00:00",
"unit": "iso4217:CLP"
}
},
"f126930": {
"value": "100545101000",
"decimals": -3,
"dimensions": {
"concept": "ifrs-full:CashAndCashEquivalents",
"entity": "scheme:70016160-9",
"period": "2022-01-01T00:00:00",
"unit": "iso4217:CLP"
}
},
With this, you can then sort the facts by period, and then select the most recent one. Of course, you can do the same directly via the Python interfaces in these tools, rather than going via JSON.

Read from Google spreadsheet and get html formatting

I am learning Python Google Spreadsheet API. Looking at the Google quickstart guide here: https://developers.google.com/sheets/api/quickstart/python I am able to get the code working
The issue I am running into is it is printing the content from the spreadsheet normally to my command prompt. I tested against a spreadsheet that has different formatting, such as bullets, numbered list, bold, underline, etc and it just basically prints out what it can.
Is it possible to read from the sheet and get a html version of the content? So for like bold it should be
<b>the word</b>
I am also trying to gather the info in HTML of what is bullets, numbered lists, tabs, etc.
I plan to basically read a spreadsheet and then display the exact info on a HTML page, but the formatting would all be lost.

Explanation:
To get the information about the text format of a cell, use the spreadsheets.get API here.
It is in the response's CellData (at sheets.data.rowData.values), in both the userEnteredFormat and effectiveFormat fields.
However, the structure of CellData and consequently, CellFormat is an object:
{
"numberFormat": {
object (NumberFormat)
},
"backgroundColor": {
object (Color)
},
"backgroundColorStyle": {
object (ColorStyle)
},
"borders": {
object (Borders)
},
"padding": {
object (Padding)
},
"horizontalAlignment": enum (HorizontalAlign),
"verticalAlignment": enum (VerticalAlign),
"wrapStrategy": enum (WrapStrategy),
"textDirection": enum (TextDirection),
"textFormat": {
object (TextFormat)
},
"hyperlinkDisplayType": enum (HyperlinkDisplayType),
"textRotation": {
object (TextRotation)
}
}
So you need to manually build your HTML from code.

Listing the Document ID from TinyDB

I am trying to list out the content of db.json from this github (https://github.com/syntaxsmurf/todo/blob/main/main.py) on line 40 in main.py
Specifically this line
for item in db:
table.add_row( item["task"], item["completed_by"], item["status"]) #need to find the right command for pulling data out of tinyDB into these example strings
as you can see I can pull out and list the items just fine that I defined the names on Fx with item["task]
here is an example entry from db.json if you don't wanna take a look at github.
{
"_default": {
"1": {
"completed_by": "Today",
"status": "Pending",
"task": "Shopping"
}
}
Now what I am missing is how do I pull out the default generated ID "1" and list that? I wanna use that to for the user being able to remove it later.
Thank you I hope the question makes sense!

From reddit user azzal07: item.doc_id would be the correct implementation of this.
for item in db:
table.add_row(str(item.doc_id), item["task"], item["completed_by"], item["status"])
Str() is for Rich table function it does not work if it's an int it would seem.

Scrapy to crawl LD JSON data

I have done some research but can't seem to find any information on if it is possible to crawl something like JSON Schema data from a URL. An example i just found as i was looking at the product anyway would be:
https://www.reevoo.com/p/panasonic-nn-e271wmbpq
<script class="microdata-snippet" type="application/ld+json">
{
"#context": "http://schema.org/",
"#type": "Product",
"name": "PANASONIC NN-E271WMBPQ",
"image": "https://images.reevoo.com/products/3530/3530797/550x550.jpg?fingerprint=73ed91807dac7eb8f899757a348c735446d0a1fe&gravity=Center"
,"category": {
"#type": "Thing",
"name": "Microwave",
"url": "https://www.reevoo.com/browse/product_type/microwaves"
}
,"description": "Auto weight programs will automatically calculate the cooking time, once the weight has been entered. Acrylic lining makes cleaning easy, simply wipe after use. Child lock provides extra security to prevent little fingers interfering with the programming of the oven. \nAll our compact microwave ovens are packed with flexible features to make everyday cooking simple. Auto weight programs will automatically calculate the cooking time, once the weight has been entered. Acrylic lining makes cleaning easy, simply wipe after use. Child lock provides extra security to prevent little fingers interfering with the programming of the oven."
,"aggregateRating": {
"#type": "AggregateRating",
"ratingValue": "8.7",
"ratingCount": 636,
"worstRating": "1",
"bestRating": "10"
}
}
</script>
So would it be possible to extract say the rating data?
Thanks in advance,

import json
And next in your code:
microdata_content = response.xpath('//script[#type="application/ld+json"]/text()').extract_first()
microdata = json.loads(microdata_content)
ratingValue = microdata["aggregateRating"]["ratingValue"]

From JSON to JSON-LD without changing the source

There are 'duplicates' to my question but they don't answer my question.
Considering the following JSON-LD example as described in paragraph 6.13 - Named Graphs from http://www.w3.org/TR/json-ld/:
{
"#context": {
"generatedAt": {
"#id": "http://www.w3.org/ns/prov#generatedAtTime",
"#type": "http://www.w3.org/2001/XMLSchema#date"
},
"Person": "http://xmlns.com/foaf/0.1/Person",
"name": "http://xmlns.com/foaf/0.1/name",
"knows": "http://xmlns.com/foaf/0.1/knows"
},
"#id": "http://example.org/graphs/73",
"generatedAt": "2012-04-09",
"#graph":
[
{
"#id": "http://manu.sporny.org/about#manu",
"#type": "Person",
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"#id": "http://greggkellogg.net/foaf#me",
"#type": "Person",
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}
]
}
Question:
What if you start with only the JSON part without the semantic layer:
[{
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}]
and you link the #context from a separate file or location using a http link header or rdflib parsing, then you are still left without the #id and #type in the rest of the document. Injecting those missing keys-values into the json string is not a clean option. The idea is to go from JSON to JSON-LD without changing the original JSON part.
The way I see it to define a triple subject, one has to use an #id to map tot an IRI. It's very unlikely that JSON data has the #id key-values. So does this mean all JSON files cannot be parsed as JSON-LD without add the keys first? I wonder how they do it.
Does someone have an idea to point me in the right direction?
Thank you.

No, unfortunately that's not possible. There exist, however, libraries and tools that have been created exactly for that reason. JSON-LD Macros is such a library. It allows declarative transformations of JSON objects to make them usable as JSON-LD. So, effectively, all you need is a very thin layer on top of an off-the-shelve JSON-LD processor.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google Docs API programmatically adding a table of content - python

Related

Set context from custom XBRL file

Read from Google spreadsheet and get html formatting

Listing the Document ID from TinyDB

Scrapy to crawl LD JSON data

From JSON to JSON-LD without changing the source

Categories

Resources