BigQuery on Python

BigQuery on Python - python

I need to run BigQuery on Python but the Google BigQuery module doesn't exist
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
Do you guys know how to do the connection?

Looks like you do not have bigquery module installed, you could install it with -
pip install --upgrade google-cloud-bigquery
Ref - Installing the client library

As per Document, need to install client library for bigquery.

One thing that you need to correct is to setup credentials to connect your big query with PYTHON. You will also need to setup environment variable GOOGLE_APPLICATION_CREDENTIALS pointing towards the location of your credential file .

If your problem is on the connection to BigQuery:
client = bigquery.Client() creates the connection using your default credentials. Default credentials can be set on terminal using gcloud auth login. More on that you can see here: https://cloud.google.com/sdk/gcloud/reference/auth/login
If your problem it to install the library, consider running on terminal pip install --upgrade google-cloud-bigquery -- library docs can be found here: https://googleapis.dev/python/bigquery/latest/index.html

Related

Python client for elasticsearch 8.5

I used to connect elasticsearch 7 self managed cluster using following code.
from elasticsearch import Elasticsearch,RequestsHttpConnection
es = Elasticsearch(['hostname'], timeout=1000,http_auth=('user_name', 'password'),use_ssl=True,verify_certs=True,connection_class=RequestsHttpConnection,scheme="https",port=9200)
After updating the Elasticsearch to 8.5 most of the parameters were invalid. Need a help to figure out the correct way to connect elastic cluster in elastic search 8.5.

In Elasticsearch 8.X, there have been significant changes in the Elasticsearch API.
Now, in the Elasticsearch 8.X, the scheme and port need to be included explicitly as part of the hostname, scheme://hostname:port e.g.(https://localhost:9200)
The http_auth should be updated to basic_auth instead. You can have a look at all the additional available options here. So the new snippet to connect would be something like -
es = Elasticsearch(['https://hostname:port'], timeout=1000 ,basic_auth=('user_name', 'password'),verify_certs=True)
There are significant changes in the requests/responses while doing querying as well, so I would suggest giving this a read.
If you are doing migration to elasticsearch 8.X from 7.X, for short-term, there is another workaround as well which would not require any code changes and just setting the ELASTIC_CLIENT_APIVERSIONING=1 env variable in your python application.
Enable compatibility mode and upgrade Elasticsearch
Upgrade your Elasticsearch client to 7.16:
$ python -m pip install --upgrade 'elasticsearch>=7.16,<8
If you have an existing application enable the compatibility mode by setting ELASTIC_CLIENT_APIVERSIONING=1 environment variable. This will instruct the Elasticsearch server to accept and respond with 7.x-compatibile requests and responses.
https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/migration.html#migration-compat-mode

Installing Python libraries and modules in a Google Cloud Function

Situation
I have an existing Python app in Google Colab that calls the Twitter API and sends the response to Cloud Storage.
I'm trying to automate the Twitter API call in GCP, and am wondering how I install the requests library for the API call, and install os for authentication.
I tried doing the following library installs in a Cloud Function:
import requests
import os
Result
That produced a resulting error message:
Deployment failure: Function failed on loading user code.
Do I need to install those libraries in a Cloud Function? I'm trying to understand this within the context of my Colab python app, but am not clear if the library installs are necessary.
Thank you for any input.

when you create your cloud function source code , there are two files.
main.py
requirements.txt
Add packages in requirements.txt as below
#Function dependencies, for example:
requests==2.20.0

creating a new python environment for your project might help and would be a good start for any project
it is easy to create.
## for unix-based systems
## create a python environment
python3 -m venv venv
## activate your environment
## in linux-based systems
. ./venv/bin/activate
if you are using google colab, add "!" before these commands, they should work fine.

S3 file to Mysql AWS via Airflow

I been learning how to use Apache-Airflow the last couple of months and wanted to see if anybody has any experience with transferring CSV files from S3 to a Mysql database in AWS(RDS). Or from my Local drive to MySQL.
I managed to send everything to an S3 bucket to store them in the cloud using airflow.hooks.S3_hook and it works great. I used boto3 to do this.
Now I want to push this file to a MySQL database I created in RDS, but I have no idea how to do it. Do I need to use the MySQL hook and add my credentials there and then write a python function?
Also, It doesn't have to be S3 to Mysql, I can also try from my local drive to Mysql if it's easier.
Any help would be amazing!

Airflow has S3ToMySqlOperator which can be imported via:
from airflow.providers.mysql.transfers.s3_to_mysql import S3ToMySqlOperator
Note that you will need to install MySQL provider.
For Airflow 1.10 series (backport version):
pip install apache-airflow-backport-providers-mysql
For Airflow >=2.0 (regular version currently in Beta):
pip install apache-airflow-providers-mysql
Example usage:
S3ToMySqlOperator(
s3_source_key='myfile.csv',
mysql_table='myfile_table',
mysql_duplicate_key_handling='IGNORE',
mysql_extra_options="""
FIELDS TERMINATED BY ','
IGNORE 1 LINES
""",
task_id= 'transfer_task',
aws_conn_id='aws_conn',
mysql_conn_id='mysql_conn',
dag=dag
)

were you able to resolve the 'MySQLdb._exceptions.OperationalError: (2068, 'LOAD DATA LOCAL INFILE file request rejected due to restrictions on access' issue

bigquery pandas df on datalab

I want to use pandas to pull from bigquery in datalab. However, something doesn't work. I'm using Python 2.7
query = "select * from myTable"
df_train = pd.read_gbq(project_id='my_project', query=query, dialect='standard'
I get this error:
ImportError: pandas requires google-cloud-python for Google BigQuery support: cannot import name make_exception
How can I use pandas?

The docs say you need pandas-gbq to get this done, and that you need to have some auth happening somewhere. Install pandas-gbq with pip install pandas-gbq or conda install pandas-gbq to address the error you pasted, and also ensure auth is happening either by passing a private_key arg to read_gbq or by setting the defaults as described in the docs.

How do I install this PIP package from Github?

I am basically trying to access crunchbase data through their REST API using python. There is a package available on github that gives me the following documentation. How do I get this "package" ?
The CrunchBase API provides a RESTful interface to the data found on CrunchBase. The response is in JSON format.
Register
Follow the steps below to start using the CrunchBase API:
Sign Up
Login & get API key
Browse the documentation.
Setup
pip install git+git://github.com/anglinb/python-crunchbase**
Up & Running
Import Crunchbase then intialize the Crunchbase object with your api key.
git+git://github.com/anglinb/python-crunchbase

pip install git+https://github.com/anglinb/python-crunchbase.git
You are missing the https
Update: make sure you have git installed on your system.

Add this in your requirements.txt file.
git+https://github.com/user_name/project_name.git
=========
Ideally requirements.txt or reqs.txt will exist in your project's root folder. This file is where all the python libraries' names will be stored along with precise version number.
Here is great deal of information with easy examples related to this topic
https://pip.readthedocs.io/en/1.1/requirements.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

BigQuery on Python - python

Looks like you do not have bigquery module installed, you could install it with - pip install --upgrade google-cloud-bigquery Ref - Installing the client library

As per Document, need to install client library for bigquery.

One thing that you need to correct is to setup credentials to connect your big query with PYTHON. You will also need to setup environment variable GOOGLE_APPLICATION_CREDENTIALS pointing towards the location of your credential file .

Related

Python client for elasticsearch 8.5

Installing Python libraries and modules in a Google Cloud Function

S3 file to Mysql AWS via Airflow

bigquery pandas df on datalab

How do I install this PIP package from Github?

Categories

Resources