Can't upload any data to bigquery using table.insertall()

Can't upload any data to bigquery using table.insertall() - python

This is the portion of my code that I'm having issues with.
table_data_insert_all_request_body = {
"kind": "bigquery#tableDataInsertAllRequest",
"skipInvalidRows": True,
"ignoreUnknownValues": True,
"templateSuffix": 'suffix',
"rows": [
{
"json": {
("one"): ("two"),
("three"): ("four")
}
}
]
}
request = service.tabledata().insertAll(projectId=projectId, datasetId=datasetId, tableId=tableId, body=table_data_insert_all_request_body)
response = request.execute()
If I print response, I get the response:
{u'kind': u'bigquery#tableDataInsertAllResponse'}
I can assess the project, dataset and even the table but I cant update the values in the table. What do I need to do differently? Obviously I don't want to enter two values but I cant get anything to upload. Once I can get something to upload I'll be able to get rows working.

Even though its tough to tell without looking at your schema, I am pretty sure your json data is not correct.
Here is what I use.
Bodyfields = {
"kind": "bigquery#tableDataInsertAllRequest",
"rows": [
{
"json": {
'col_name_1': 'row 1 value 1',
'col_name_2': 'row 1 value 2'
}
},
{
"json": {
'col_name_1': 'row 2 value 1',
'col_name_2': 'row 2 value 2'
}
}
]
}

Related

How to flatten dict in a DataFrame & concatenate all resultant rows

I am using Github's GraphQL API to fetch some issue details.
I used Python Requests to fetch the data locally.
This is how the output.json looks like
{
"data": {
"viewer": {
"login": "some_user"
},
"repository": {
"issues": {
"edges": [
{
"node": {
"id": "I_kwDOHQ63-s5auKbD",
"title": "test issue 1",
"number": 146,
"createdAt": "2023-01-06T06:39:54Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:42:00Z",
"comments": {
"edges": [
{
"node": {
"id": "IC_kwDOHQ63-s5R2XCV",
"body": "comment 01"
}
},
{
"node": {
"id": "IC_kwDOHQ63-s5R2XC9",
"body": "comment 02"
}
}
]
},
"labels": {
"edges": []
}
},
"cursor": "Y3Vyc29yOnYyOpHOWrimww=="
},
{
"node": {
"id": "I_kwDOHQ63-s5auKm8",
"title": "test issue 2",
"number": 147,
"createdAt": "2023-01-06T06:40:34Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:40:34Z",
"comments": {
"edges": []
},
"labels": {
"edges": [
{
"node": {
"name": "food"
}
},
{
"node": {
"name": "healthy"
}
}
]
}
},
"cursor": "Y3Vyc29yOnYyOpHOWripvA=="
}
]
}
}
}
}
The json was put inside a list using
result = response.json()["data"]["repository"]["issues"]["edges"]
And then this list was put inside a DataFrame
import pandas as pd
df = pd.DataFrame (result, columns = ['node', 'cursor'])
df
These are the contents of the data frame
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
{'edges': [{'node': {'id': 'IC_kwDOHQ63-s5R2XCV","body": "comment 01"}},{'node': {'id': 'IC_kwDOHQ63-s5R2XC9","body": "comment 02"}}]}
{'edges': []}
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
{'edges': []}
{'edges': [{'node': {'name': 'food"}},{'node': {'name': 'healthy"}}]}
I would like to split/explode the comments and labels columns.
The values in these columns are nested dictionaries
I would like there to be as many rows for a single issue, as there are comments & labels.
I would like to flatten out the data frame.
So this involves split/explode and concat.
There are several stackoverflow answers that delve on this topic. And I have tried the code from several of them.
I can not paste the links to those questions, because stackoverflow marks my question as spam due to many links.
But these are the steps I have tried
df3 = df2['comments'].apply(pd.Series)
Drill down further
df4 = df3['edges'].apply(pd.Series)
df4
Drill down further
df5 = df4['node'].apply(pd.Series)
df5
The last statement above gives me the KeyError: 'node'
I understand, this is because node is not a key in the DataFrame.
But how else can i split this dictionary and concatenate all columns back to my issues row?
This is how I would like the output to look like
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 01
Null
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 02
Null
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
healthy

If dct is your dictionary from the question you can try:
df = pd.DataFrame(d['node'] for d in dct['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df.to_markdown(index=False))
Prints:
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 01
nan
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 02
nan
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
healthy

#andrej-kesely has answered my question.
I have selected his response as the answer for this question.
I am now posting a consolidated script that includes my poor code and andrej's great code.
In this script i want to fetch details from Github's GraphQL API Server.
And put it inside pandas.
Primary source for this script is this gist.
And a major chunk of remaining code is an answer by #andrej-kesely.
Now onto the consolidated script.
First import the necessary packages and set headers
import requests
import json
import pandas as pd
headers = {"Authorization": "token <your_github_personal_access_token>"}
Now define the query that will fetch data from github.
In my particular case, I am fetching issue details form a particular repo
it can be something else for you.
query = """
{
viewer {
login
}
repository(name: "your_github_repo", owner: "your_github_user_name") {
issues(states: OPEN, last: 2) {
edges {
node {
id
title
number
createdAt
closedAt
state
updatedAt
comments(first: 10) {
edges {
node {
id
body
}
}
}
labels(orderBy: {field: NAME, direction: ASC}, first: 10) {
edges {
node {
name
}
}
}
comments(first: 10) {
edges {
node {
id
body
}
}
}
}
cursor
}
}
}
}
"""
Execute the query and save the response
def run_query(query):
request = requests.post('https://api.github.com/graphql', json={'query': query}, headers=headers)
if request.status_code == 200:
return request.json()
else:
raise Exception("Query failed to run by returning code of {}. {}".format(request.status_code, query))
result = run_query(query)
And now is the trickiest part.
In my query response, there are several nested dictionaries.
I would like to split them - more details in my question above.
This magic code from #andrej-kesely does that for you.
df = pd.DataFrame(d['node'] for d in result['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df)

Cant Access Data in List of Dictionaries

I'm trying to access some data within nested ordered dictionaries. This dictionary was created by using the XMLTODICT module. Obviously I would like to create my own dictionaries but this one is out of my control.
I've tried to access them numerous ways.
Example:
Using a for loop:
I can access the first level using v["name"] which gives me Child_Policy and Parent Policy
When I do v["class"]["name"] I would expect to get "Test1" but that's not the case.
I've also tried v[("class", )] variations as well with no luck.
Any input would be much appreciated
The data below is retrieved from a device via XML and converted to dictionary with XMLTODICT.
[
{
"#xmlns": "http://cisco.com/ns/yang/Cisco-IOS-XE-policy",
"name": "Child_Policy",
"class": [
{
"name": "Test1",
"action-list": {
"action-type": "bandwidth",
"bandwidth": {
"percent": "30"
}
}
},
{
"name": "Test2",
"action-list": {
"action-type": "bandwidth",
"bandwidth": {
"percent": "30"
}
}
}
]
},
{
"#xmlns": "http://cisco.com/ns/yang/Cisco-IOS-XE-policy",
"name": "Parent_Policy",
"class": {
"name": "class-default",
"action-list": [
{
"action-type": "shape",
"shape": {
"average": {
"bit-rate": "10000000"
}
}
},
{
"action-type": "service-policy",
"service-policy": "Child_Policy"
}
]
}
}
]
My expectations result is to retrieve values from the nested dictionary and produce and output similar to this:
Queue_1: Test1
Action_1: bandwidth
Allocation_1: 40
Queue_2: Test2
Action_2: bandwidth
Allocation_2: 10
I have now issue formatting the output, just getting the values is the issue.
#
I had some time tonight so I changed the code be be dynamic:
int = 0
int_2 = 0
for v in policy_dict.values():
print("\n")
print("{:15} {:<35}".format("Policy: ", v[0]["name"]))
print("_______")
for i in v:
int_2 = int_2 + 1
try:
print("\n")
print("{:15} {:<35}".format("Queue_%s: " % int_2, v[0]["class"][int]["name"]))
print("{:15} {:<35}".format("Action_%s: " % int_2, v[0]["class"][int]["action-list"]["action-type"]))
print("{:15} {:<35}".format("Allocation_%s: " % int_2, v[0]["class"][int]["action-list"]["bandwidth"]["percent"]))
int = int + 1
except KeyError:
break
pass

According to the sample you posted you can try to retrieve values like:
v[0]["class"][0]["name"]
This outputs:
Test1

Add a Protected Range to an existing NamedRange

I have an existing worksheet with an existing NamedRange for it and I would like to call the batch_update method of the API to protect that range from being edited by anyone other than the user that makes the batch_update call.
I have seen an example on how to add protected ranges via a new range definition, but not from an existing NamedRange.
I know I need to send the addProtectedRangeResponse request. Can I define the request body with a Sheetname!NamedRange notation?
this_range = worksheet_name + "!" + nrange
batch_update_spreadsheet_request_body = {
'requests': [
{
"addProtectedRange": {
"protectedRange": {
"range": {
"name": this_range,
},
"description": "Protecting xyz",
"warningOnly": False
}
}
}
],
}
EDIT: Given #Tanaike feedback, I adapted the call to something like:
body = {
"requests": [
{
"addProtectedRange": {
"protectedRange": {
"namedRangeId": namedRangeId,
"description": "Protecting via gsheets_manager",
"warningOnly": False,
"requestingUserCanEdit": False
}
}
}
]
}
res2 = service.spreadsheets().batchUpdate(spreadsheetId=ssId, body=body).execute()
print(res2)
But although it lists the new protections, it still lists 5 different users (all of them) as editors. If I try to manually edit the protection added by my gsheets_manager script, it complains that the range is invalid:
Interestingly, it seems to ignore the requestUserCanEdit flag according to the returning message:
{u'spreadsheetId': u'NNNNNNNNNNNNNNNNNNNNNNNNNNNN', u'replies': [{u'addProtectedRange': {u'protectedRange': {u'requestingUserCanEdit': True, u'description': u'Protecting via gsheets_manager', u'namedRangeId': u'1793914032', u'editors': {}, u'protectedRangeId': 2012740267, u'range': {u'endColumnIndex': 1, u'sheetId': 1196959832, u'startColumnIndex': 0}}}}]}
Any ideas?

How about using namedRangeId for your situation? The flow of the sample script is as follows.
Retrieve namedRangeId using spreadsheets().get of Sheets API.
Set a protected range using namedRangeId using spreadsheets().batchUpdate of Sheets API.
Sample script:
nrange = "### name ###"
ssId = "### spreadsheetId ###"
res1 = service.spreadsheets().get(spreadsheetId=ssId, fields="namedRanges").execute()
namedRangeId = ""
for e in res1['namedRanges']:
if e['name'] == nrange:
namedRangeId = e['namedRangeId']
break
body = {
"requests": [
{
"addProtectedRange": {
"protectedRange": {
"namedRangeId": namedRangeId,
"description": "Protecting xyz",
"warningOnly": False
}
}
}
]
}
res2 = service.spreadsheets().batchUpdate(spreadsheetId=ssId, body=body).execute()
print(res2)
Note:
This script supposes that Sheets API can be used for your environment.
This is a simple sample script. So please modify it to your situation.
References:
ProtectedRange
Named and Protected Ranges
If this was not what you want, I'm sorry.
Edit:
In my above answer, I modified your script using your settings. If you want to protect the named range, please modify body as follows.
Modified body
body = {
"requests": [
{
"addProtectedRange": {
"protectedRange": {
"namedRangeId": namedRangeId,
"description": "Protecting xyz",
"warningOnly": False,
"editors": {"users": ["### your email address ###"]}, # Added
}
}
}
]
}
By this, the named range can be modified by only you. I'm using such settings and I confirm that it works in my environment. But if in your situation, this didn't work, I'm sorry.

Why my index fields are still shown as "analyzed" even after i index em as "not_analyzed"

I have a lot of data (json format) in Amazon SQS. I basically have a simple python script which pulls data from the SQS queue & then indexes it in ES. My problem is even though i have specified in my script to index as "not_analyzed", i still see my index filed as "analyzed" in index setting of kibana4 dashboard
Here is my python code :
doc = {
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type_name": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
es = Elasticsearch()
h = { "Content-type":"application/json" }
res = requests.request("POST","http://localhost:9200/"+index_name+"/",headers=h,data=json.dumps(doc))
post = es.index(index=index_name , doc_type='server' , id =1 , body=json.dumps(new_list))
print "------------------------------"
print "Data Pushed Successfully to ES"
I am not sure what's wrong here?

The doc_type you're using when indexing (= server) doesn't match the one you have in your index mappings (= type_name).
So if you index your documents like this instead, it will work
post = es.index(index=index_name , doc_type='type_name' , id =1 , body=json.dumps(new_list))
^
|
change this

Iterating json response from google api using python

I took a nearby location from google api and i get json, when i try to split json i receive error
{
u'status':u'OK',
u'next_page_token':u'CoQC9AAAAHepdLFIvAUuqz6y6WWasKBmq5aAOYr0Bbu97upCMy4EI1Ea5t-6S6iObZdZ5_RIB7ywocdG-lF9ian5JRuTQVGL7MwbBa_uN3EfS7XzjmlVx-IKsauiEiO-Wu3r25zk9SL3yc5d_vDGvN3VQJkA7bBiDWhkloJ4RFngjBsGVWVQOnj5glrbwVVrw9Nu6DNi70C2Wdqqy_65b_jFjJiJYTAwrlfoyl7GGpxk5Gng7QgSFdtTJII9zdfkxcj3osUzklRetjraDtgfaQgxr0KA_H5btbuXz3UT6r-dyqdj2qd1tr_0oAvFkGB9t0qFbUYSe7bDETEAwdDv7MSmmXeYHQUSEMCBruHU5pb8X4EoPbPw9ncaFLgqTTICkQyGYY-boaJ1_3X3SaeT',
u'html_attributions':[
u'Listings by Indiacom Yellow Pages'],
u'results':[
{
u'name':u'Institute for Financial Management and Research',
u'reference':u'CpQBgwAAAL5Gg4T18LzUpNTEzvKWeAH0jLBuTyC_rmxOycL3KndgQ05WVKovVhiIYhnnqeOxcX1tcWesIi0vSVwugaskyy2UnJ_BrTD5ZblXzD7nLxP9L-FOQLetRgbpA6DlNzHM6Nmcu3jtJiBAOyMQJOmgL9cot7c4y18o_3E1cJrzPJfg5hK6trq2u2lvJnD2ZxJ6IxIQC2IuHwQILkrbtUd3ke5GDBoU1sZLoPY-_kARc7lEoq2naKHtwSk',
u'geometry':{
u'location':{
u'lat':13.062882,
u'lng':80.238669
}
},
u'place_id':u'ChIJKzE7o2ZmUjoRLaCtNPjba3U',
u'vicinity':u'24,
Kothari Road,
Nungambakkam,
Chennai',
u'photos':[
{
u'photo_reference':u'CnRoAAAApH-YJpJFjPYltZYhYTs_tIVFA7vve-LMii8XbUydZJLMXbzDNkxuCuGCk9W-nFjgUrj-JoRqJLRuurGvt1oz94osENNc8bZGLBI4Joj1w-dQSyiwqqzqDdna-u0TRkJ_8S91fF3uerww341951YB2hIQX7gFjIn5tWkkEcGwErJ9oBoU0CdKRd6b2pL3Bcp09hCYvleEfaQ',
u'width':816,
u'html_attributions':[
u'From a Google User'
],
u'height':459
}
],
u'scope':u'GOOGLE',
u'id':u'2e9a63cf7368e0f90e2a20711ac56853b7c34462',
u'types':[
u'school',
u'establishment'
],
u'icon': u'http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png'
},
{
u'rating':4.2,
u'name':u'Sri Sankara Senior Secondary School',
u'reference':u'CoQBdgAAAJ-Uc78EbPnLX6adzheZMWrS9sOJ9vWTQsqZOlQza-r3qozDUrl4XxWPRdHD9K_BVP0t_FhEwQt4w42X0z01uQr7dtq5cZ7ioa9zBVIQpwOkSQhxjbjQjX05YxVqGPB9MCfEikHpFKSKIaz5mPrLDgklbhQ8clD4fm9BiWNmE_mJEhD35R4GgbVNu4J-x0Lfaw3BGhRPQEXErZf3jJJkLbHs2HWVRvP2Xg',
u'geometry':{
u'location':{
u'lat':13.009931,
u'lng':80.260746
}
},
u'place_id':u'ChIJh_fXcelnUjoRd4vKDQfY_DM',
u'vicinity':u'9/21 Vasantha Press Road,
Vasanta Press Road,
Adyar,
Landmarks are Malar Hospital/Theosophical society,
Chennai',
u'photos':[
{
u'photo_reference':u'CnRwAAAAIrFQSUJn7JB5_GgDfEPBldHptKmARqhV-6HR5fUT-MjB6ScO7ZYz1jamqoGvTqXlbEZZjxC67BvOllBHTiRIQwKyBXoI9DhleBmrCgMTrorjeDkvIDY_8ZC0pOFZOZGGH2XdfLrH1irsWZUEa0IjFRIQaATxA2BymP1KED4vxNZfnxoUTwD5Y-4-8ZPnPrhuKofUVSztcoQ',
u'width':297,
u'html_attributions':[
],
u'height':297
}
],
u'scope':u'GOOGLE',
u'id':u'a8dc412bac3ea790260d2c7d6fe08271ae883a4e',
u'types':[
u'school',
u'establishment'
],
u'icon': u'http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png'
},
{
u'rating':4.5,
u'name':u'Chettinad Vidyashram',
u'reference':u'CnRoAAAAbIG1-6ecTuOcqw5hCenhtbHlAmP-nfdw_W1vEv94fXvIyCzhHSQMn95VEtKCgbLeME6qd30uGhxmLxFwXItcls-SlC7fgXwGl2JINCLTjB1RYpYC--Gr6hS-9cT7Xq2f46-dAqnpF5n2sRa1cNJJSBIQvVLDztqmh2BmqkJER9MLZxoU3gbS1TpgVj8h5Uo71QKTTyj1CdQ',
u'geometry':{
u'location':{
u'lat':13.017083,
u'lng':80.269503
}
},
u'place_id':u'ChIJR3w9SdxnUjoRs2vfnH-ERNA',
u'vicinity':u'Rajah Annamalaipuram,
Chennai',
u'photos':[
{
u'photo_reference':u'CnRoAAAAeUHwPDKO87eeGP7Fzm7aKE3VcQd6gFebbjo2FhYRHdulLZW-XdepstzETly74Id6NMOF5lqm4BHZ56C1CRnsxmdqaxJ-rcJR2Cpq2VfJaixZmBG3C-0TTNmMuPuGsjKAldr6rWCWdDVMg8FAnWhgyRIQXYPX89XdA5fl7e5RUecRWhoU-SExDqUr-GRaYVLkb8Iq_1mf-R8',
u'width':968,
u'html_attributions':[
u'From a Google User'
],
u'height':968
}
],
u'scope':u'GOOGLE',
u'id':u'f3b774d4c11a4bd20585669d9c4ae57fc12e5652',
u'types':[
u'school',
u'establishment'
],
u'icon': u'http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png'
},
Here is my python code
res = json.dumps(response)
for result in response[status][results]:
print result['status']
as json was big i put half json data only.
Error i get is ror at 1431:global name 'status' is not defined
How to split this json
When i print type(response)
type tuple

If you get a json string you want to load it into the native structure with loads and then iterate over that native structure. Looking at the json string you have it also seems as if the individual results don't have a status field. You could do something like this:
res = json.loads(response)
print res['status']
for result in res['results']:
print result

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't upload any data to bigquery using table.insertall() - python

Related

How to flatten dict in a DataFrame & concatenate all resultant rows

Cant Access Data in List of Dictionaries

Add a Protected Range to an existing NamedRange

Why my index fields are still shown as "analyzed" even after i index em as "not_analyzed"

Iterating json response from google api using python

Categories

Resources