From SQL Database to Google sheet via Python gspread - python

I am trying to populate an already created google sheet from my sql table using python and gspread.
I can update the sheet one row at a time using a for loop, but i have a lot of data to add to the sheet and want to do a column at a time or more if possible.
Any suggestions here's what i've been using and i get an error: Object of type 'Row' is not JSON serializable
#!/usr/bin/python3
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import dbconnect
#credentials for google
gc = gspread.authorize(credentials)
worksheet = gc.open('NAMEOFWS').sheet1
cell_list = worksheet.range('A2:A86')
#connect to database using dbconnect and grab cursor
query = "select loc from table"
cursor.execute(query)
results = cursor.fetchall()
cell_values = (results)
for i, val in enumerate(cell_values):
cell_list[i].value = val
worksheet.update_cells(cell_list)

I am not sure how to do this with gspread, but you can modify you code very easily and use pygsheets and it allows to update a column all at once. Also, I am not sure what your data looks like so the below may need to be altered or you may need to alter your data set a little. Hope this helps.
import pygsheets
gc = pygsheets.authorize(service_file = 'client_secret2.json')
# Open spreadsheet and select worksheet
sh = gc.open('Api_Test')
wks = sh.sheet1
#update Column
notes = [1,2,3,4] #this is the dataset to getupdated in the column
wks.update_col(4, notes, 1)# 4 is the number of column, notes is the dataset to update, 1 skips the first row (I used a header)

Related

Syncing Data from Google Sheet to Postgres RDS

I have the data from google sheet in data frame and using Pandas df._tosql to import the data in Postgres RDS.
def gsheet2df(spreadsheet_name, sheet_num):
scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
credentials_path = 'billing-342104-8b351a7a2813.json'
credentials = sac.from_json_keyfile_name(credentials_path, scope)
client = gspread.authorize(credentials)
sheet = client.open(spreadsheet_name).get_worksheet(sheet_num).get_all_records()
df = pd.DataFrame.from_dict(sheet)
print(df)
return df
def write2db(ed):
connection_string = "postgresql+psycopg2://%s:%s#%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
connection = engine.connect()
ed.to_sql('user_data', con = engine, if_exists = 'append', index='user_id')
But I have two use cases that are not been handled and explored a lot also.
When I am importing the data to DB the column names are the sheet column names. I want 2 extra columns to be added for every row one is the time at which it is getting updated and another is if deleted. With values, this should be there for every row.
I have imported the data once but now I want to sync it again and update the DB based on the changes in the sheet. I don't want to wipe out the complete DB.
Any suggestions to achieve the same.

Skip forbidden rows from a BigQuery query, using Python

I need to download a relatively small table from BigQuery and store it (after some parsing) in a Panda dataframe .
Here is the relevant sample of my code:
from google.cloud import bigquery
client = bigquery.Client(project="project_id")
job_config = bigquery.QueryJobConfig(allow_large_results=True)
query_job = client.query("my sql string", job_config=job_config)
result = query_job.result()
rows = [dict(row) for row in result]
pdf = pd.DataFrame.from_dict(rows)
My problem:
After a few thousands rows parsed, one of them is too big and I get an exception: google.api_core.exceptions.Forbidden.
So, after a few iterations, I tried to transform my loop to something that looks like:
rows = list()
for _ in range(result.total_rows):
try:
rows.append(dict(next(result)))
except google.api_core.exceptions.Forbidden:
pass
BUT it doesn't work since result is a bigquery.table.RowIterator and despite its name, it's not an iterator... it's an iterable
So... what do I do now? Is there a way to either:
ask for the next row in a try/except scope?
tell bigquery to skip bad rows?
Did you try paging through query results?
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY name
ORDER BY total_people DESC
"""
query_job = client.query(query) # Make an API request.
query_job.result() # Wait for the query to complete.
# Get the destination table for the query results.
#
# All queries write to a destination table. If a destination table is not
# specified, the BigQuery populates it with a reference to a temporary
# anonymous table after the query completes.
destination = query_job.destination
# Get the schema (and other properties) for the destination table.
#
# A schema is useful for converting from BigQuery types to Python types.
destination = client.get_table(destination)
# Download rows.
#
# The client library automatically handles pagination.
print("The query data:")
rows = client.list_rows(destination, max_results=20)
for row in rows:
print("name={}, count={}".format(row["name"], row["total_people"]))
Also you can try to filter out big rows in your query:
WHERE LENGTH(some_field) < 123
or
WHERE LENGTH(CAST(some_field AS BYTES)) < 123

get the last modified date of tables using bigquery tables GET api

I am trying to get the list of tables and their last_modified_date using bigquery REST API.
In the bigquery API explorer I am getting all the fields correctly but when I use the api from Python code its returning 'None' for modified date.
This is the code written for the same in python
from google.cloud import bigquery
client = bigquery.Client(project='temp')
datasets = list(client.list_datasets())
for dataset in datasets:
print dataset.dataset_id
for dataset in datasets:
for table in dataset.list_tables():
print table.table_id
print table.created
print table.modified
In this code I am getting created date correctly but modified date is 'None' for all the tables.
Not quite sure which version of the API you are using but I suspect the latest versions do not have the method dataset.list_tables().
Still, this is one way of getting last modified field, see if this works for you (or gives you some idea on how to get this data):
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json('/key.json')
dataset_list = list(client.list_datasets())
for dataset_item in dataset_list:
dataset = client.get_dataset(dataset_item.reference)
tables_list = list(client.list_tables(dataset))
for table_item in tables_list:
table = client.get_table(table_item.reference)
print "Table {} last modified: {}".format(
table.table_id, table.modified)
If you want to get the last modified time from only one table:
from google.cloud import bigquery
def get_last_bq_update(project, dataset, table_name):
client = bigquery.Client.from_service_account_json('/key.json')
table_id = f"{project}.{dataset}.{table_name}"
table = client.get_table(table_id)
print(table.modified)

How do I insert my Python dictionary into my SQL Server database table?

I have a dictionary with 3 keys which correspond to field names in a SQL Server table. The values of these keys come from an excel file and I store this dictionary in a dataframe which I now need to insert into a SQL table. This can all be seen in the code below:
import pandas as pd
import pymssql
df=[]
fp = "file path"
data = pd.read_excel(fp,sheetname ="CRM View" )
row_date = data.loc[3, ]
row_sita = "ABZPD"
row_event = data.iloc[12, :]
df = pd.DataFrame({'date': row_date,
'sita': row_sita,
'event': row_event
}, index=None)
df = df[4:]
df = df.fillna("")
print(df)
My question is how do I insert this dictionary into a SQL table now?
Also, as a side note, this code is part of a loop which needs to go through several excel files one by one, insert the data into dictionary then into SQL then delete the data in the dictionary and start again with the next excel file.
You could try something like this:
import MySQLdb
# connect
conn = MySQLdb.connect("127.0.0.1","username","passwore","table")
x = conn.cursor()
# write
x.execute('INSERT into table (row_date, sita, event) values ("%d", "%d", "%d")' % (row_date, sita, event))
# close
conn.commit()
conn.close()
You might have to change it a little based on your SQL restrictions, but should give you a good start anyway.
For the pandas dataframe, you can use the pandas built-in method to_sql to store in db. Following is the way to use it.
import sqlalchemy as sa
params = urllib.quote_plus("DRIVER={};SERVER={};DATABASE={};Trusted_Connection=True;".format("{SQL Server}",
"<db_server_url>",
"<db_name>"))
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = sa.create_engine(conn_str)
df.to_sql(<table_name>, engine,schema=<schema_name>, if_exists="append", index=False)
For this method you you will need to install sqlalchemy package.
pip install sqlalchemy
You will also need to setup the MSSql DSN on the machine.

BigQuery insert dates into 'DATE' type field using Python Google Cloud library

I'm using Python 2.7 and the Google Cloud Client Library for Python (v0.27.0) to insert data into a BigQuery table (using table.insert_data()).
One of the fields in my table has type 'DATE'.
In my Python script I've formatted the date-data as 'YYYY-MM-DD', but unfortunately the Google Cloud library returns an 'Invalid date:' error for that field.
I've tried formatting the date-field in many ways (i.e. 'YYYYMMDD', timestamp etc.), but no luck so far...
Unfortunately the API docs (https://googlecloudplatform.github.io/google-cloud-python/latest/) don't mention anything about the required date format/type/object in Python.
This is my code:
from google.cloud import bigquery
import pandas as pd
import json
from pprint import pprint
from collections import OrderedDict
# Using a pandas dataframe 'df' as input
# Converting date field to YYYY-MM-DD format
df['DATE_VALUE_LOCAL'] = df['DATE_VALUE_LOCAL'].apply(lambda x: x.strftime('%Y-%m-%d'))
# Converting pandas dataframe to json
json_data = df.to_json(orient='records',date_format='iso')
# Instantiates a client
bigquery_client = bigquery.Client(project="xxx")
# The name for the new dataset
dataset_name = 'dataset_name'
table_name = 'table_name'
def stream_data(dataset_name, table_name, json_data):
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(table_name)
data = json.loads(json_data, object_pairs_hook=OrderedDict)
# Reload the table to get the schema.
table.reload()
errors = table.insert_data(data)
if not errors:
print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
print('Errors:')
pprint(errors)
stream_data(dataset_name, table_name, json_data)
What is the required Python date format/type/object to insert my dates into a BigQuery DATE field?
I just simulated your code here and everything worked fine. Here's what I've simulated:
import pandas as pd
import json
import os
from collections import OrderedDict
from google.cloud.bigquery import Client
d = {'ed': ['3', '5'],
'date': ['2017-10-11', '2017-11-12']}
json_data = df.to_json(orient='records', date_formate='iso')
json_data = json.loads(json_data, object_pairs_hook=OrderedDict)
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/key.json'
bc = Client()
ds = bc.dataset('dataset name')
table = ds.table('table I just created')
table = bc.get_table(table)
bc.create_rows(table, json_data)
It's using version 0.28.0 but still it's the same methods from previous versions.
You probably have some mistake going on in some step that maybe is converting date to some other unidentifiable format for BQ. Try using this script as reference to see where the mistake might be happening in your code.

Categories

Resources