Read from Google spreadsheet and get html formatting

Read from Google spreadsheet and get html formatting - python

I am learning Python Google Spreadsheet API. Looking at the Google quickstart guide here: https://developers.google.com/sheets/api/quickstart/python I am able to get the code working
The issue I am running into is it is printing the content from the spreadsheet normally to my command prompt. I tested against a spreadsheet that has different formatting, such as bullets, numbered list, bold, underline, etc and it just basically prints out what it can.
Is it possible to read from the sheet and get a html version of the content? So for like bold it should be
<b>the word</b>
I am also trying to gather the info in HTML of what is bullets, numbered lists, tabs, etc.
I plan to basically read a spreadsheet and then display the exact info on a HTML page, but the formatting would all be lost.

Explanation:
To get the information about the text format of a cell, use the spreadsheets.get API here.
It is in the response's CellData (at sheets.data.rowData.values), in both the userEnteredFormat and effectiveFormat fields.
However, the structure of CellData and consequently, CellFormat is an object:
{
"numberFormat": {
object (NumberFormat)
},
"backgroundColor": {
object (Color)
},
"backgroundColorStyle": {
object (ColorStyle)
},
"borders": {
object (Borders)
},
"padding": {
object (Padding)
},
"horizontalAlignment": enum (HorizontalAlign),
"verticalAlignment": enum (VerticalAlign),
"wrapStrategy": enum (WrapStrategy),
"textDirection": enum (TextDirection),
"textFormat": {
object (TextFormat)
},
"hyperlinkDisplayType": enum (HyperlinkDisplayType),
"textRotation": {
object (TextRotation)
}
}
So you need to manually build your HTML from code.

Related

How can I transform or query JSON data in my Github Workflow?

I am using the octokit/request-action action in my Github Workflow to obtain information about a certain Github Release via its tag name:
- name: Get Release Info
uses: octokit/request-action#v2.x
id: release
with:
route: GET /repos/{org_repo}/releases/tags/{tag}
The response JSON structure contains information about all of the assets attached to the release. I need to use this information to build a list of download URLs for all the release artifacts and pass that into a Python script that builds a discord notification.
Using the Github Workflow YAML, how do I "query" the JSON data (similar to JSONata) to obtain a list of all the browser_download_url values inside the assets array? The data returned looks like this (trimmed):
{
"url": "https://api.github.com/repos/octocat/Hello-World/releases/1",
"id": 1,
"assets": [
{
"url": "https://api.github.com/repos/octocat/Hello-World/releases/assets/1",
"browser_download_url": "https://github.com/octocat/Hello-World/releases/download/v1.0.0/example1.zip",
"id": 1
},
{
"url": "https://api.github.com/repos/octocat/Hello-World/releases/assets/2",
"browser_download_url": "https://github.com/octocat/Hello-World/releases/download/v1.0.0/example2.zip",
"id": 2
}
]
}
The end result I want is a way to pass the two download URLs above to my script like so (using a separate step in my workflow):
python discord_notification.py "https://github.com/octocat/Hello-World/releases/download/v1.0.0/example1.zip" "https://github.com/octocat/Hello-World/releases/download/v1.0.0/example2.zip"
(Exact syntax can vary; the above snippet is just an example)
It's possible that what I want to do just can't be achieved in the workflow YAML itself. If that's the case, I'd be OK with a solution that involves passing all or part of the response JSON to the Python script and use Python itself to parse the JSON data. I just don't know if Bash adds a layer of complexity that will make passing a multi-line response string as a parameter difficult.

Google Cloud DLP regular custom dictionary error "Dictionary has no "cloudStoragePath" field"

In DLP I'm creating a regular custom dictionary detector that points to a dictionary text file stored in Cloud Storage. Below is what I've done to define the custominfotype. I believe it follows the instructions at https://cloud.google.com/dlp/docs/creating-custom-infotypes-dictionary#examples.
Yet it errors with "Protocol message Dictionary has no "cloudStoragePath" field." The text file definitely exists in my cloud storage bucket and I have proper credentials.
Can you tell me if I have the syntax wrong? Thank you.
custom_info_types = [
{
"info_type": {"name": "TAXES"},
"likelihood": google.cloud.dlp_v2.Likelihood.POSSIBLE,
"dictionary": {
"cloudStoragePath": {
"path": "gs://mybucket/myfile.txt"
},
},
}
]

Python doesn't use camel casing, but instead uses snake case.
See
https://cloud.google.com/dlp/docs/samples/dlp-deidentify-masking#dlp_deidentify_masking-python
so that should be cloud_storage_path

Apache Spark / PySpark, defining custom JSON Schema for Dynamic Keys

I have a bunch of JSON files, and suppose each have the following structure:
{
"fields": {
"name": "Bob",
"key": "bob"
},
"results": {
"bob": { ... }
}
}
Where by some unfortunate reason, while the structure of the JSON is fairly consistent, there is one dynamic key under "results". Defining the schema for under the fields is fairly straight-forward to me.
So, for several JSON files, the final schema might be:
fieldSchema = StructField(...)
resultSchema = StructField("results", StructType([StructField("bob", ...)]))
finalSchema = StructType([fieldSchema, resultsSchema])
Where the problem is this line: StructField("bob", ...)
Obviously, bob is not the key I'm looking for. This name for the StructField would ideally be some kind of wildcard character, regex pattern, or worst case, some dynamic field based on other fields.
I'm a newbie to Spark and have been scouring the documentation and historical StackOverflow posts, but I've been unable to find anything.
Long story short, I want to be able to pass some kind of wide net for the name parameter in StructField to encompass a variety of different keys, similar to a regex pattern.

Google Docs API programmatically adding a table of content

I have a python script which does some analysis and output the results as text (paragraphs) on a Google Doc. I know how to insert text, update paragraph and text style through batchUpdate.
doc_service.documents().batchUpdate(documentId=<ID>,body={'requests': <my_request>}).execute()
where, for instance, "my_request" takes the form of something like:
request = [
{
"insertText": {
"location": {
"index": <index_position>,
"segmentId": <id>
},
"text": <text>
}
},
{
"updateParagraphStyle": {
"paragraphStyle": {
"namedStyleType": <paragraph_type>
},
"range": {
"segmentId": <id>,
"startIndex": <index_position>,
"endIndex": <index_position>
},
"fields": "namedStyleType"
}
},
]
However, once the script is done updating the table, it would be fantastic if a table of content could be added at the top of the document.
However, I am very new to Google Docs API and I am not entirely sure how to do that. I know I should use "TableOfContents" as a StructuralElement. I also know this option currently does not update automatically after each modification brought to the document (this is why I would like to create it AFTER the document has finished updating and place it at the top of the document).
How to do this with python? I am unclear where to call "TableOfContents" in my request.
Thank you so very much!

After your comment, I was able to understand better what you are desiring to do, but I came across these two Issue Tracker's posts:
Add the ability to generate and update the TOC of a doc.
Geting a link to heading paragraph.
These are well-known feature requests that unfortunately haven't been implemented yet. You can hit the ☆ next to the issue number in the top left on this page as it lets Google know more people are encountering this and so it is more likely to be seen faster.
Therefore, it's not possible to insert/update a table of contents programmatically.

From JSON to JSON-LD without changing the source

There are 'duplicates' to my question but they don't answer my question.
Considering the following JSON-LD example as described in paragraph 6.13 - Named Graphs from http://www.w3.org/TR/json-ld/:
{
"#context": {
"generatedAt": {
"#id": "http://www.w3.org/ns/prov#generatedAtTime",
"#type": "http://www.w3.org/2001/XMLSchema#date"
},
"Person": "http://xmlns.com/foaf/0.1/Person",
"name": "http://xmlns.com/foaf/0.1/name",
"knows": "http://xmlns.com/foaf/0.1/knows"
},
"#id": "http://example.org/graphs/73",
"generatedAt": "2012-04-09",
"#graph":
[
{
"#id": "http://manu.sporny.org/about#manu",
"#type": "Person",
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"#id": "http://greggkellogg.net/foaf#me",
"#type": "Person",
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}
]
}
Question:
What if you start with only the JSON part without the semantic layer:
[{
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}]
and you link the #context from a separate file or location using a http link header or rdflib parsing, then you are still left without the #id and #type in the rest of the document. Injecting those missing keys-values into the json string is not a clean option. The idea is to go from JSON to JSON-LD without changing the original JSON part.
The way I see it to define a triple subject, one has to use an #id to map tot an IRI. It's very unlikely that JSON data has the #id key-values. So does this mean all JSON files cannot be parsed as JSON-LD without add the keys first? I wonder how they do it.
Does someone have an idea to point me in the right direction?
Thank you.

No, unfortunately that's not possible. There exist, however, libraries and tools that have been created exactly for that reason. JSON-LD Macros is such a library. It allows declarative transformations of JSON objects to make them usable as JSON-LD. So, effectively, all you need is a very thin layer on top of an off-the-shelve JSON-LD processor.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read from Google spreadsheet and get html formatting - python

Related

How can I transform or query JSON data in my Github Workflow?

Google Cloud DLP regular custom dictionary error "Dictionary has no "cloudStoragePath" field"

Apache Spark / PySpark, defining custom JSON Schema for Dynamic Keys

Google Docs API programmatically adding a table of content

From JSON to JSON-LD without changing the source

Categories

Resources