bigquery pandas df on datalab - python

I want to use pandas to pull from bigquery in datalab. However, something doesn't work. I'm using Python 2.7
query = "select * from myTable"
df_train = pd.read_gbq(project_id='my_project', query=query, dialect='standard'
I get this error:
ImportError: pandas requires google-cloud-python for Google BigQuery support: cannot import name make_exception
How can I use pandas?

The docs say you need pandas-gbq to get this done, and that you need to have some auth happening somewhere. Install pandas-gbq with pip install pandas-gbq or conda install pandas-gbq to address the error you pasted, and also ensure auth is happening either by passing a private_key arg to read_gbq or by setting the defaults as described in the docs.

Related

Python client for elasticsearch 8.5

I used to connect elasticsearch 7 self managed cluster using following code.
from elasticsearch import Elasticsearch,RequestsHttpConnection
es = Elasticsearch(['hostname'], timeout=1000,http_auth=('user_name', 'password'),use_ssl=True,verify_certs=True,connection_class=RequestsHttpConnection,scheme="https",port=9200)
After updating the Elasticsearch to 8.5 most of the parameters were invalid. Need a help to figure out the correct way to connect elastic cluster in elastic search 8.5.
In Elasticsearch 8.X, there have been significant changes in the Elasticsearch API.
Now, in the Elasticsearch 8.X, the scheme and port need to be included explicitly as part of the hostname, scheme://hostname:port e.g.(https://localhost:9200)
The http_auth should be updated to basic_auth instead. You can have a look at all the additional available options here. So the new snippet to connect would be something like -
es = Elasticsearch(['https://hostname:port'], timeout=1000 ,basic_auth=('user_name', 'password'),verify_certs=True)
There are significant changes in the requests/responses while doing querying as well, so I would suggest giving this a read.
If you are doing migration to elasticsearch 8.X from 7.X, for short-term, there is another workaround as well which would not require any code changes and just setting the ELASTIC_CLIENT_APIVERSIONING=1 env variable in your python application.
Enable compatibility mode and upgrade Elasticsearch
Upgrade your Elasticsearch client to 7.16:
$ python -m pip install --upgrade 'elasticsearch>=7.16,<8
If you have an existing application enable the compatibility mode by setting ELASTIC_CLIENT_APIVERSIONING=1 environment variable. This will instruct the Elasticsearch server to accept and respond with 7.x-compatibile requests and responses.
https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/migration.html#migration-compat-mode

Data Pipeline using SQl and Python

I need to create a data pipeline using Python. I want to connect with MySql in Python and read the tables in dataframes, perform pre-processing and then load the data back to Mysql Db. I was able to connect to the MySql Db using mysql connector and then pre-process the dataframes. However, I'm not able to load these dataframes from Python back to Mysql. Error: ValueError: unknown type str96 python.
Please help me with methods to complete this task.
I'm new to programming. Any help will be greatly appreciated. Thanks!
It is a bug and has been fixed in version 1.1.3.
upgrade pandas package
pip3 install --upgrade pandas

S3 file to Mysql AWS via Airflow

I been learning how to use Apache-Airflow the last couple of months and wanted to see if anybody has any experience with transferring CSV files from S3 to a Mysql database in AWS(RDS). Or from my Local drive to MySQL.
I managed to send everything to an S3 bucket to store them in the cloud using airflow.hooks.S3_hook and it works great. I used boto3 to do this.
Now I want to push this file to a MySQL database I created in RDS, but I have no idea how to do it. Do I need to use the MySQL hook and add my credentials there and then write a python function?
Also, It doesn't have to be S3 to Mysql, I can also try from my local drive to Mysql if it's easier.
Any help would be amazing!
Airflow has S3ToMySqlOperator which can be imported via:
from airflow.providers.mysql.transfers.s3_to_mysql import S3ToMySqlOperator
Note that you will need to install MySQL provider.
For Airflow 1.10 series (backport version):
pip install apache-airflow-backport-providers-mysql
For Airflow >=2.0 (regular version currently in Beta):
pip install apache-airflow-providers-mysql
Example usage:
S3ToMySqlOperator(
s3_source_key='myfile.csv',
mysql_table='myfile_table',
mysql_duplicate_key_handling='IGNORE',
mysql_extra_options="""
FIELDS TERMINATED BY ','
IGNORE 1 LINES
""",
task_id= 'transfer_task',
aws_conn_id='aws_conn',
mysql_conn_id='mysql_conn',
dag=dag
)
were you able to resolve the 'MySQLdb._exceptions.OperationalError: (2068, 'LOAD DATA LOCAL INFILE file request rejected due to restrictions on access' issue

Connect google cloud function to an oracle database

Does anyone know, how to connect google cloud function(Python) to an Oracle database ? I tried importing cx_Oracle library in cloud function. But it shows an error
Function load error: DPI-1047: Oracle Client library cannot be loaded: libclntsh.so: cannot open shared object file
Following is main.py code:
import cx_Oracle
def import_data(request):
request_json = request.get_json()
if request_json and 'message' in request_json:
con = cx_Oracle.connect("username", "password", "host:port/SID")
print(con.version)
con.close
Following is requirement.txt
# Function dependencies, for example:
# package>=version
cx_Oracle==6.0b1
It seems Google Cloud Functions does not support shared libraries (in other words, it only supports "pure python" libraries) and that cx_oracle depends on this. Sadly I haven't been able to find a pure-python Oracle library, so for now this is not supported.
Your best bet is to use App Engine Flexible as it the closest equivalent service that allows non-pure python libraries. cl_oracle should work with it.

BigQuery on Python

I need to run BigQuery on Python but the Google BigQuery module doesn't exist
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
Do you guys know how to do the connection?
Looks like you do not have bigquery module installed, you could install it with -
pip install --upgrade google-cloud-bigquery
Ref - Installing the client library
As per Document, need to install client library for bigquery.
One thing that you need to correct is to setup credentials to connect your big query with PYTHON. You will also need to setup environment variable GOOGLE_APPLICATION_CREDENTIALS pointing towards the location of your credential file .
If your problem is on the connection to BigQuery:
client = bigquery.Client() creates the connection using your default credentials. Default credentials can be set on terminal using gcloud auth login. More on that you can see here: https://cloud.google.com/sdk/gcloud/reference/auth/login
If your problem it to install the library, consider running on terminal pip install --upgrade google-cloud-bigquery -- library docs can be found here: https://googleapis.dev/python/bigquery/latest/index.html

Categories

Resources