Want to create python API and integrated with swagger/postman - python

Requirement: 1. I want to create python API which will help to insert data in big query table and this API will host in swagger/postman, from there user can provide input data so that it will get reflected in big query table.
Can anyone help me to find out suitable solution with code
import sqlite3 as sql
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/to/file.json')
project_id = 'project_id'
client = bigquery.Client(credentials= credentials,project=project_id)
def add_data(group_name, user_name):
try:
# Connecting to database
con = sql.connect('shot_database.db')
# Getting cursor
c = con.cursor()
# Adding data
job_config.use_legacy_sql = True
query_job = client.query("""
INSERT INTO `table_name` (group, user)
VALUES (%s, %s)""",job_config = job_config)
results = query_job.result() # Wait for the job to complete.
# Applying changes
con.commit()
except:
print("An error has occured")

The code you provided is a mix of SQLite and BigQuery, but it likes that you're trying to use BigQuery to insert data into a table. To insert data into a BigQuery table using Python, you can use the insert_data() method of the Client class. Here's I am adding an example of how you can use this method to insert data into a table called "mytable" in a dataset called "mydataset":
# Define the data you want to insert
data = [
{
"group": group_name,
"user": user_name
}
]
# Insert the data
table_id = "mydataset.mytable"
errors = client.insert_data(table_id, data)
if errors == []:
print("Data inserted successfully")
else:
print("Errors occurred while inserting data:")
print(json.dumps(errors, indent=2))
Then, You can create an API using Flask or Django and call the add_data method which you have defined to insert data into big query table.

Related

fastapi snowflake connection only pulling 1 record

I am trying to read data from snowflake database using FASTAPI. I was able to create the connection which is able to pull data from snowflake.
The issue which I am facing right now is that I am only getting 1 record (instead of 10 records).
I suspect I am not using correct keyword while returning the data. appreciate any help.
Here is my code :-
from fastapi import FastAPI
import snowflake.connector as sf
import configparser
username='username_value'
password='password_value'
account= 'account_value'
warehouse= 'test_wh'
database= 'test_db'
ctx=sf.connect(user=username,password=password,account=account,warehouse=warehouse,database=database)
app = FastAPI()
#app.get('/test API')
async def fetchdata():
cursor = ctx.cursor()
cursor.execute("USE WAREHOUSE test_WH ")
cursor.execute("USE DATABASE test_db")
cursor.execute("USE SCHEMA test_schema")
sql = cursor.execute ("SELECT DISTINCT ID,NAME,AGE,CITY FROM TEST_TABLE WHERE AGE > 60")
for data in sql:
return data
You use return in your inner for-loop. This will return the first row encountered.
If you want to return all rows as a list, you can probably do (I'm not familiar with the snowflake connector):
return list(data)
instead of the for-loop, or sql.fetchall().

Python: tweepy/psycopg2 not inserting data into tables

I'm streaming Twitter data from the API into a Postgres database by modeling this script. Using those exact methods, I'm able to stream the data successfully into the two tables (one containing user_id/user_name, and the other containing data). I've been able to make minor changes to extract a few other bits of information, but using these methods I'm only collecting retweets given a keyword list and I would like to collect all tweets given the list. Based on the way the original script is collecting/storing retweet user_ids and user_names, I changed the code tried to stream into a new table without making any references to retweets. Unfortunately, the result of this were two empty tables. The code ran fine otherwise, and was printing statements to the terminal, there was just no data. Why would this be? Below is my code:
import psycopg2
import tweepy
import json
import numpy as np
# Importing postgres credentials
import postgres_credentials
# Importing twitter credentials
import twitter_credentials
# Accesing twitter from the App created in my account
def autorize_twitter_api():
"""
This function gets the consumer key, consumer secret key, access token
and access token secret given by the app created in your Twitter account
and authenticate them with Tweepy.
"""
# Get access and costumer key and tokens
auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
return auth
def create_tweets_table(term_to_search):
"""
This function open a connection with an already created database and creates a new table to
store tweets related to a subject specified by the user
"""
# Connect to Twitter Database created in Postgres
conn_twitter = psycopg2.connect(dbname=postgres_credentials.dbname, user=postgres_credentials.user, password=postgres_credentials.password, host=postgres_credentials.host,
port=postgres_credentials.port)
# Create a cursor to perform database operations
cursor_twitter = conn_twitter.cursor()
# with the cursor now, create two tables, users twitter and the corresponding table according to the selected topic
cursor_twitter.execute("CREATE TABLE IF NOT EXISTS test_twitter_users (user_id VARCHAR PRIMARY KEY, user_name VARCHAR);")
query_create = "CREATE TABLE IF NOT EXISTS %s (id SERIAL, created_at_utc timestamp, tweet text NOT NULL, user_id VARCHAR, user_name VARCHAR, PRIMARY KEY(id), FOREIGN KEY(user_id) REFERENCES twitter_users(user_id));" % (
"test_tweet_text")
cursor_twitter.execute(query_create)
# Commit changes
conn_twitter.commit()
# Close cursor and the connection
cursor_twitter.close()
conn_twitter.close()
return
def store_tweets_in_table(term_to_search, created_at_utc, tweet, user_id, user_name):
"""
This function open a connection with an already created database and inserts into corresponding table
tweets related to the selected topic
"""
# Connect to Twitter Database created in Postgres
conn_twitter = psycopg2.connect(dbname=postgres_credentials.dbname, user=postgres_credentials.user, password=postgres_credentials.password, host=postgres_credentials.host,
port=postgres_credentials.port)
# Create a cursor to perform database operations
cursor_twitter = conn_twitter.cursor()
# with the cursor now, insert tweet into table
cursor_twitter.execute(
"INSERT INTO test_twitter_users (user_id, user_name) VALUES (%s, %s) ON CONFLICT(user_id) DO NOTHING;",
(user_id, user_name))
cursor_twitter.execute(
"INSERT INTO %s (created_at_utc, tweet, user_id, user_name) VALUES (%%s, %%s, %%s, %%s);" % (
'test_tweet_text'),
(created_at_utc, tweet, user_id, user_name))
# Commit changes
conn_twitter.commit()
# Close cursor and the connection
cursor_twitter.close()
conn_twitter.close()
return
class MyStreamListener(tweepy.StreamListener):
'''
def on_status(self, status):
print(status.text)
'''
def on_data(self, raw_data):
try:
global term_to_search
data = json.loads(raw_data)
# Obtain all the variables to store in each column
user_id = data['user']['id']
user_name = data['user']['name']
created_at_utc = data['created_at']
tweet = data['text']
# Store them in the corresponding table in the database
store_tweets_in_table(term_to_search, created_at_utc, tweet, user_id, user_name)
except Exception as e:
print(e)
def on_error(self, status_code):
if status_code == 420:
# returning False in on_error disconnects the stream
return False
########################################################################
while True:
if __name__ == "__main__":
# Creates the table for storing the tweets
term_to_search = ["donald trump","trump"]
create_tweets_table(term_to_search)
# Connect to the streaming twitter API
api = tweepy.API(wait_on_rate_limit_notify=True)
# Stream the tweets
try:
streamer = tweepy.Stream(auth=autorize_twitter_api(), listener=MyStreamListener(api=api),tweet_mode='extended')
streamer.filter(track=term_to_search)
except:
continue
What happen if you print the values in this function? do you have values there?
def on_data(self, raw_data):
try:
global term_to_search
data = json.loads(raw_data)
# Obtain all the variables to store in each column
user_id = data['user']['id']
user_name = data['user']['name']
created_at_utc = data['created_at']
tweet = data['text']
# Store them in the corresponding table in the database
store_tweets_in_table(term_to_search, created_at_utc, tweet, user_id, user_name)
except Exception as e:
print(e)
When you print the sql statements, can you see the inserts without data?
I discovered the issue - I was creating two new tables, but inserting data into two different tables.

BigQuery Python Client: Creating a Table from Query with a Table Description

I'm using the python client to create tables via SQL as explained in the docs (https://cloud.google.com/bigquery/docs/tables) like so:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table('your_table_id')
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location='US',
job_config=job_config) # API request - starts the query
query_job.result() # Waits for the query to finish
print('Query results loaded to table {}'.format(table_ref.path))
This works well except that the client function for creating a table via SQL query uses a job_config object, and job_config receives a table_ref, not a table object.
I found this doc for creating tables with description here: https://google-cloud-python.readthedocs.io/en/stable/bigquery/usage.html, But this is for tables NOT created from queries.
Any ideas on how to create a table from query while specifying a description for that table?
Since you want to do more than only save the SELECT result to a new table the best way for you is not use a destination table in your job_config variable rather use a CREATE command
So you need to do 2 things:
Remove the following 2 lines from your code
table_ref = client.dataset(dataset_id).table('your_table_id')
job_config.destination = table_ref
Replace your SQL with this
#standardSQL
CREATE TABLE dataset_id.your_table_id
PARTITION BY DATE(_PARTITIONTIME)
OPTIONS(
description = 'this table was created via agent #123'
) AS
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;

How to run a BigQuery query in Python

This is the query that I have been running in BigQuery that I want to run in my python script. How would I change this/ what do I have to add for it to run in Python.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
From what I have been researching it is saying that I cant save this query as a permanent table using Python. Is that true? and if it is true is it possible to still export a temporary table?
You need to use the BigQuery Python client lib, then something like this should get you up and running:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html
See the current BigQuery Python client tutorial.
Here is another way using a JSON file for the service account:
>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
... print(row)
This is a good usage guide:
https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html
To simply run and write a query:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location="US",
job_config=job_config,
) # API request - starts the query
query_job.result() # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))
I personally prefer querying using pandas:
# BQ authentication
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
query = "SELECT * FROM my_table"
data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
The pythonbq package is very simple to use and a great place to start. It uses python-gbq.
To get started you would need to generate a BQ json key for external app access. You can generate your key here.
Your code would look something like:
from pythonbq import pythonbq
myProject=pythonbq(
bq_key_path='path/to/bq/key.json',
project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)

Python - How to parse and save JSON to MYSQL database

As the title indicates, how does one use python to elegantly access an API and parse and save the JSON contents onto a relational database (MYSQL) for later access?
Here, I saved the data onto a pandas object. But how do I create a mysql database, save the json contents onto it, and access the contents for later use?
# Libraries
import json, requests
import pandas as pd
from pandas.io.json import json_normalize
# Set URL
url = 'https://api-v2.themuse.com/jobs'
# For loop to
for i in range(100):
data = json.loads(requests.get(
url=url,
params={'page': i}
).text)['results']
data_norm = pd.read_json(json.dumps(data))
You create your Mysql table on your server using something like Mysql Workbench CE. then in python you do this. I wasnt sure if you want to use data in for loop or data_norm so for ease of use, here some functions. insertDb() can be put in your for loop, since data will be overwriten by itself in every iteration.
import MySQLdb
def dbconnect():
try:
db = MySQLdb.connect(
host='localhost',
user='root',
passwd='password',
db='nameofdb'
)
except Exception as e:
sys.exit("Can't connect to database")
return db
def insertDb():
try:
db = dbconnect()
cursor = db.cursor()
cursor.execute("""
INSERT INTO nameoftable(nameofcolumn) \
VALUES (%s) """, (data))
cursor.close()
except Exception as e:
print e
If this is merely for storage for processing later, kind of like a cache, a varchar field is enough. If however you need to retrieve some structured jdata, JSON field is what you need.

Categories

Resources