Accessing Big Query from Cloud DataLab using Pandas - python

I have a Jypyter Notebook accessing Big Query using Pandas as the vehicle:
df = pd.io.gbq.read_gbq( query, project_id = 'xxxxxxx-xxxx' )
This works fine from my local machine! (great, in fact!)
But when I load the same notebook to Cloud DataLab I get:
DistributionNotFound: google-api-python-client
Which seems rather disappointing! I believe that the module should be installed with Pandas.. but somehow Google is not including it?
It would be most preferable for a bunch of reasons to not have to change the code from what we develop on our local machines to what is needed in Cloud DataLab, in this case we heavily parameterize the data access...
Ok I ran:
!pip install --upgrade google-api-python-client
Now when I run the notebook I get an auth prompt that I cannot resolve since DataLab is on a remote machine:
Your browser has been opened to visit:
>>> Browser string>>>>
If your browser is on a different machine then exit and re-run this
application with the command-line parameter
--noauth_local_webserver
Don't see an obvious answer to this?
I use the code suggested below by #Anthonios Partheniou from within the same notebook (executing it in a cell block) after updating the google-api-python-client in the notebook
and I got the following traceback:
TypeError Traceback (most recent call last)
<ipython-input-3-038366843e56> in <module>()
5 scope='https://www.googleapis.com/auth/bigquery',
6 redirect_uri='urn:ietf:wg:oauth:2.0:oob')
----> 7 storage = Storage('bigquery_credentials.dat')
8 authorize_url = flow.step1_get_authorize_url()
9 print 'Go to the following link in your browser: ' + authorize_url
/usr/local/lib/python2.7/dist-packages/oauth2client/file.pyc in __init__(self, filename)
37
38 def __init__(self, filename):
---> 39 super(Storage, self).__init__(lock=threading.Lock())
40 self._filename = filename
41
TypeError: object.__init__() takes no parameters
He mentions the need to be executing the notebook from the same folder yet the only way that I know of for executing a datalab notebook is via the repo?
While the new module of using the new Jupyter Datalab module is a possible alternative The ability to use the full Pandas BQ interface unchanged on local and DataLab instances would be hugely helpful! So xing my fingers for a solution!
pip installed:
GCPDataLab 0.1.0
GCPData 0.1.0
wheel 0.29.0
tensorflow 0.6.0
protobuf 3.0.0a3
oauth2client 1.4.12
futures 3.0.3
pexpect 4.0.1
terminado 0.6
pyasn1 0.1.9
jsonschema 2.5.1
mistune 0.7.2
statsmodels 0.6.1
path.py 8.1.2
ipython 4.1.2
nose 1.3.7
MarkupSafe 0.23
py-dateutil 2.2
pyparsing 2.1.1
pickleshare 0.6
pandas 0.18.0
singledispatch 3.4.0.3
PyYAML 3.11
nbformat 4.0.1
certifi 2016.2.28
notebook 4.0.2
cycler 0.10.0
scipy 0.17.0
ipython-genutils 0.1.0
pyasn1-modules 0.0.8
functools32 3.2.3-2
ipykernel 4.3.1
pandocfilters 1.2.4
decorator 4.0.9
jupyter-core 4.1.0
rsa 3.4.2
mock 1.3.0
httplib2 0.9.2
pytz 2016.3
sympy 0.7.6
numpy 1.11.0
seaborn 0.6.0
pbr 1.8.1
backports.ssl-match-hostname 3.5.0.1
ggplot 0.6.5
simplegeneric 0.8.1
ptyprocess 0.5.1
funcsigs 0.4
scikit-learn 0.16.1
traitlets 4.2.1
jupyter-client 4.2.2
nbconvert 4.1.0
matplotlib 1.5.1
patsy 0.4.1
tornado 4.3
python-dateutil 2.5.2
Jinja2 2.8
backports-abc 0.4
brewer2mpl 1.4.1
Pygments 2.1.3
end

Google BigQuery authentication in pandas is normally straight forward, except when pandas code is executed on a remote server. For example, running pandas on Datalab in the cloud. In that case, use the following code to create the credentials file that pandas needs to access Google BigQuery in Google Datalab.
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.file import Storage
flow = OAuth2WebServerFlow(client_id='<Client ID from Google API Console>',
client_secret='<Client secret from Google API Console>',
scope='https://www.googleapis.com/auth/bigquery',
redirect_uri='urn:ietf:wg:oauth:2.0:oob')
storage = Storage('bigquery_credentials.dat')
authorize_url = flow.step1_get_authorize_url()
print 'Go to the following link in your browser: ' + authorize_url
code = raw_input('Enter verification code: ')
credentials = flow.step2_exchange(code)
storage.put(credentials)
Once you complete the process I don't expect you will see the error (as long as the notebook is in the same folder as the newly created 'bigquery_credentials.dat' file).
You also need to install the google-api-python-client python package as it is required by pandas for Google BigQuery support. You can run either of the following in a notebook to install it.
Either
!pip install google-api-python-client --no-deps
!pip install uritemplate --no-deps
!pip install simplejson --no-deps
or
%%bash
pip install google-api-python-client --no-deps
pip install uritemplate --no-deps
pip install simplejson --no-deps
The --no-deps option is needed so that you don't accidentally update a python package which is installed in datalab by default (to ensure other parts of datalab don't break).
Note: With pandas 0.19.0 (not released yet), it will be much easier to use pandas in Google Cloud Datalab. See Pull Request #13608
Note: You also have the option to use the (new) google datalab module inside of jupyter (and that way the code will also work in Google Datalab on the cloud). See the following related stack overflow answer:
How do I use gcp package from outside of google datalabs?

Related

python executeable created with pyinstaller isnt working

I tried to compile a python project with pyinstaller into a single executable. Does anyone has a idea why it works when i run the python project, but fails to run when i execute the executable?
My dependecies are down below.
Pillow 9.3.0 9.3.0
altgraph 0.17.3 0.17.3
certifi 2022.9.24 2022.9.24
charset-normalizer 2.1.1 3.0.1
contourpy 1.0.6 1.0.6
cycler 0.11.0 0.11.0
fonttools 4.38.0 4.38.0
idna 3.4 3.4
kiwisolver 1.4.4 1.4.4
matplotlib 3.6.2 3.6.2
numpy 1.23.5 1.23.5
nvidia-cublas-cu11 11.10.3.66 11.11.3.6
nvidia-cuda-nvrtc-cu11 11.7.99 11.8.89
nvidia-cuda-runtime-cu11 11.7.99 11.8.89
nvidia-cudnn-cu11 8.5.0.96 8.6.0.163
opencv-python 4.6.0.66 4.6.0.66
packaging 21.3 21.3
pip 21.3.1 22.3.1
pyinstaller 5.6.2 5.6.2
pyinstaller-hooks-contrib 2022.13 2022.13
pyparsing 3.0.9 3.0.9
python-dateutil 2.8.2 2.8.2
requests 2.28.1 2.28.1
setuptools 60.2.0 65.6.3
six 1.16.0 1.16.0
torch 1.13.0 1.13.0
torchvision 0.14.0 0.14.0
typing-extensions 4.4.0 4.4.0
urllib3 1.26.13 1.26.13
wheel 0.38.4 0.38.4
I executed it and got this errors:
[19507] WARNING: file already exists but should not: /tmp/_MEIF8RVEq/torch/_C.cpython-39-x86_64-linux-gnu.so
[19507] WARNING: file already exists but should not: /tmp/_MEIF8RVEq/torch/_C_flatbuffer.cpython-39-x86_64-linux-gnu.so
torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
torch/_jit_internal.py:839: UserWarning: Unable to retrieve source for #torch.jit._overload function: <function _DenseLayer.forward at 0x7fc0c005b8b0>.
warnings.warn(
torch/_jit_internal.py:839: UserWarning: Unable to retrieve source for #torch.jit._overload function: <function _DenseLayer.forward at 0x7fc0c006fb80>.
warnings.warn(
torch/serialization.py:834: UserWarning: Couldn't retrieve source code for container of type RRDBNet. It won't be checked for correctness upon loading.
warnings.warn("Couldn't retrieve source code for container of "
torch/serialization.py:868: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
torch/serialization.py:868: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
torch/serialization.py:834: UserWarning: Couldn't retrieve source code for container of type RRDB. It won't be checked for correctness upon loading.
warnings.warn("Couldn't retrieve source code for container of "
torch/serialization.py:834: UserWarning: Couldn't retrieve source code for container of type ResidualDenseBlock. It won't be checked for correctness upon loading.
warnings.warn("Couldn't retrieve source code for container of "
torch/serialization.py:868: SourceChangeWarning: source code of class 'torch.nn.modules.activation.LeakyReLU' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)

Gdata Python Module - HTTP(s) Proxy via Environment Variable

I am using gcalcli 4.3.0
https://github.com/insanum/gcalcli
using this docker image:
https://hub.docker.com/r/devopstestlab/gcalcli
I would like to use Fiddler (a HTTP(s) Proxy), to see what gcalcli sends/receives to/from Google.
By their documentation it states, that I can use environment variables for setting up a proxy:
http_proxy
https_proxy
proxy-username or proxy_username
proxy-password or proxy_password
But this doesn't work at all.
The proxy settings are not considered and just uses the regular connection route.
(No proxy error or similiar)
Using the command (all inside the docker container)
python -m pip install --upgrade pip
with the environment proxy settings, respects these values and works just as expected.
pip list output:
Package Version
------------------------ ---------
cachetools 5.0.0
certifi 2021.10.8
charset-normalizer 2.0.12
gcalcli 4.3.0
google-api-core 2.5.0
google-api-python-client 2.39.0
google-auth 2.6.0
google-auth-httplib2 0.1.0
googleapis-common-protos 1.55.0
httplib2 0.20.4
idna 3.3
oauth2client 4.1.3
parsedatetime 2.6
pip 19.3.1
protobuf 4.0.0rc2
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyparsing 3.0.7
python-dateutil 2.8.2
requests 2.27.1
rsa 4.8
setuptools 41.6.0
six 1.16.0
uritemplate 4.1.1
urllib3 1.26.8
vobject 0.9.6.1
wheel 0.33.6
pip upgrade (all packages) seems to have no effect.
Since I have no clue about Python and Co., these are my researches which do not offer a solution:
Can't connect to google directory-api using proxy with httplib
https://code.google.com/archive/p/httplib2/issues/38
httplib2.ServerNotFoundError: Unable to find the server at accounts.google.com
https://groups.google.com/g/google-api-python-client/c/tXd-xd5rEqg
https://groups.google.com/g/httplib2-commit/c/t77mjFQOdVE
Google API + proxy + httplib2
Setting HTTP-proxy for Google Analytics Reporting API
Python3 BigQuery or Google Cloud Python through HTTP Proxy
The docker image is using Python 3.8.0
Any way or instruction to make the HTTPS_PROXY/https_proxy environment variables work?
Thank you for the answers.

I keep getting ModuleNotFoundError where is the issue?

So, I'm trying to install and use the google-images-download repo both through: pip install google-images-download and pip install git+https://github.com/Joeclinton1/google-images-download.git
I've tried installing it as SU as well. In PyCharm when I view packages I do see it but when I try this code:
from google_images_download import google_images_download
#instantiate the class
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"aeroplane, school bus, dog in front of house",
"limit":10,"print_urls":False}
paths = response.download(arguments)
#print complete paths to the downloaded images
print(paths)
it gives this error continuously:
Traceback (most recent call last):
File "/Users/*x*/Desktop/SchoolPython/PythonUVA/Webscrape.py", line 1, in <module>
from google_images_download import google_images_download
ModuleNotFoundError: No module named 'google_images_download'
I think it might not be looking in the right filepath or library but any other repo I tried previously did work.
Any help is greatly appreciated.
*edit for versions
(3.9UVA) MacBook-Pro-van-Flavia:Webscrape.py flavia$ which pip
/Users/flavia/PycharmProjects/3.9UVA/bin/pip
(3.9UVA) MacBook-Pro-van-Flavia:Webscrape.py flavia$ which python
/Users/flavia/PycharmProjects/3.9UVA/bin/python
(3.9UVA) MacBook-Pro-van-Flavia:Webscrape.py flavia$ pip list
Package Version
---------------------- -----------
async-generator 1.10
attrs 21.4.0
certifi 2022.5.18.1
cffi 1.15.0
charset-normalizer 2.0.12
cryptography 37.0.2
google-images-download 2.8.0
h11 0.13.0
idna 3.3
outcome 1.1.0
Pillow 9.1.1
pip 21.3.1
pycparser 2.21
pyOpenSSL 22.0.0
PySocks 1.7.1
requests 2.27.1
selenium 4.2.0
setuptools 60.2.0
sniffio 1.2.0
sortedcontainers 2.4.0
trio 0.20.0
trio-websocket 0.9.2
urllib3 1.26.9
wheel 0.37.1
wsproto 1.1.0

Why can't the import be resolved?

I've seen several answers to this question, albeit none of the solutions have worked for my particular situation. I'm trying to get started building an API with Flask. When I try to import Flask-RESTful, I get an error in VS Code. For context, I am using Windows 11. Here are the first two lines of my .py file:
from flask import Flask
from flask_restful import Resource, Api, reqparse
The error I get reads as:
Import "flask_restful" could not be resolved Pylance(reportMissingImports)
Now, to add more context, I've checked to make sure the interpreter path is set using Ctrl+Shift+P to open the Command Palette and selecting the correct (and the only) Python interpreter for the project inside my virtual environment. When I run pip list, I get this output:
(api) C:\Users\<Username>\OneDrive\Documents\PythonProjects\api>pip list
Package Version
----------------------- ---------
aiohttp 3.8.1
aiosignal 1.2.0
alembic 1.8.0
aniso8601 9.0.1
anyio 3.6.1
async-timeout 4.0.2
attrs 21.4.0
bleach 5.0.1
certifi 2022.6.15
charset-normalizer 2.1.0
click 8.1.3
click-log 0.4.0
colorama 0.4.5
deprecation 2.1.0
docutils 0.19
dotty-dict 1.3.0
Flask 2.1.2
Flask-Migrate 3.1.0
Flask-RESTful 0.3.9
Flask-SQLAlchemy 2.5.1
flask-swagger 0.2.14
frozenlist 1.3.0
gitdb 4.0.9
GitPython 3.1.27
gotrue 0.5.0
greenlet 1.1.2
h11 0.12.0
httpcore 0.14.7
httpx 0.21.3
idna 3.3
importlib-metadata 4.12.0
invoke 1.7.1
itsdangerous 2.1.2
Jinja2 3.1.2
keyring 23.6.0
Mako 1.2.1
MarkupSafe 2.1.1
multidict 6.0.2
packaging 21.3
pip 22.0.4
pkginfo 1.8.3
postgrest-py 0.10.2
psycopg2 2.9.3
pydantic 1.9.1
Pygments 2.12.0
pyparsing 3.0.9
python-dateutil 2.8.2
python-gitlab 3.6.0
python-semantic-release 7.28.1
pytz 2022.1
pywin32-ctypes 0.2.0
PyYAML 6.0
readme-renderer 35.0
realtime 0.0.4
requests 2.28.1
requests-toolbelt 0.9.1
rfc3986 1.5.0
semver 2.13.0
setuptools 58.1.0
setuptools-scm 7.0.4
six 1.16.0
smmap 5.0.0
sniffio 1.2.0
SQLAlchemy 1.4.39
storage3 0.3.4
supabase 0.5.8
supabase-client 0.2.4
tomli 2.0.1
tomlkit 0.10.2
tqdm 4.64.0
twine 3.8.0
typing_extensions 4.3.0
urllib3 1.26.10
webencodings 0.5.1
websockets 9.1
Werkzeug 2.1.2
wheel 0.37.1
yarl 1.7.2
zipp 3.8.0
Why would the flask...Flask import work, but not flask_restful? I can see both in the Lib\site-packages folder in my project directory and the output from pip list outside the virtual environment is different, which signals to me that there isn't an issue with the path or directories.
EDIT: I forgot to mention that when I run the code using Ctrl + Alt + N, I get this output:
Traceback (most recent call last):
File "c:\Users\<Username>\OneDrive\Documents\PythonProjects\api\api.py", line 3, in <module>
from flask_restful import Resource, Api, reqparse
ModuleNotFoundError: No module named 'flask_restful'
Again, no errors with importing flask, only with flask_restful.
Any help with this will be greatly appreciated! Thank you in advance for your time. I'm happy to provide more info if needed. Thanks.
EDIT: I have updated pip and attempted to simply run the program inside the command prompt. This is what I got. I'm still getting the import error inside VS Code, though. I am going to see if using a different version of Python makes a difference. Thanks everyone for all of your help so far, I appreciate it!
EDIT: Okay, it seems like the issue is a little closer to being solved. So, I updated pip. I retried setting the interpreter path and, which some of you mentioned, it turns out that I'd been doing it wrong. I had to do Ctrl + Shift + P >> Python: Select Interpreter >> Enter interpreter path and select the correct path that way. I did this by going into the project directory, going to the scripts folder, and selecting python.exe.
That solved the issue with Pylance. I no longer see an error in the editor when working on the project. However, the interpreter will not show in the bottom right hand corner of the window. That may just be a bug and I can either look through the issues on GitHub or open a new one some other time I assume.
When I run the code with Ctrl + Alt + N I get a ModuleNotFoundError relating to flask_restful again. But, when I run set flask_app=api.py >> flask run in the terminal, it has changed from a white background in the browser to a black background and displays the message it is intended to display (a simple "Hello, World" as a test).
Should I just keep going until I run into another issue? I also tried python -m api and that worked as well. Should I just ignore the VS Code output window? Also, sorry about the late replies. I appreciate everyone's help and patience.
Use the Ctrl+Shift+P command, search for and select Python:Select Interpreter(Or click directly on the python version displayed in the lower right corner), and select the correct interpreter.

ipywidgets.embed missing dependencies? Key error when run in venv

I am writing a script that simply asks the google api for the latitudes and longitudes for a list of addresses read in from a csv file and outputs an html with the googlemap widget embedded. Further I hoped to run pyinstaller in order to make this into a .exe.
Running the code on my original conda environment it works fine however the .exe that pyinstaller creates is massive for such a small script (over 300mb). As such, I created a new virtual environment in which to work and have installed what I believe to be the bare minimum packages necessary and have rewritten the code to use as few packages as I am able which for the currently working portion of the code dropped it down considerably to just over 10 mb. (No numpy or pandas for me... ah well).
The code again works fine up until the final step:
from ipywidgets.embed import embed_minimal_html
embed_minimal_html("exporttest.html", None)
The above line should take any widgets, in particular the figure created from
fig = gmaps.figure(layout=figure_layout)
markers = gmaps.marker_layer(coordinates)
fig.add_layer(markers)
fig
Running the currently modified version in my original conda environment with all my of my usual packages installed this runs as expected without errors. Running on the virtual environment however on the mentioned lines I get the following key error:
KeyError Traceback (most recent call last)
c:\programdata\anaconda3\envs\synod_environ\lib\sre_parse.py in
parse_template(source, pattern)
1020 try:
-> 1021 this = chr(ESCAPES[this][1])
1022 except KeyError:
KeyError: '\\u'
During handling of the above exception, another exception occurred:
error Traceback (most recent call last)
<ipython-input-5-3359941239ab> in <module>
1 from ipywidgets.embed import embed_minimal_html
2
----> 3 embed_minimal_html("exporttest.html", None)
...
error: bad escape \u at position 0
(For clarification, key error has two slashes before the u, some frustration in getting this to post correctly)
As the code runs correctly in the one environment but not the other, I can only assume that I'm missing a package somewhere that ipywidgets requires, but running pip check doesn't notify me of anything missing.
pip list returns the following packages:
altgraph 0.16.1
backcall 0.1.0
bleach 3.0.2
certifi 2018.10.15
chardet 3.0.4
colorama 0.4.0
decorator 4.3.0
defusedxml 0.5.0
entrypoints 0.2.3
future 0.17.1
geojson 2.4.1
gmaps 0.8.2
idna 2.7
ipykernel 5.1.0
ipython 7.1.1
ipython-genutils 0.2.0
ipywidgets 7.4.2
jedi 0.13.1
Jinja2 2.10
jsonschema 2.6.0
jupyter 1.0.0
jupyter-client 5.2.3
jupyter-console 6.0.0
jupyter-core 4.4.0
macholib 1.11
MarkupSafe 1.0
mistune 0.8.4
nbconvert 5.4.0
nbformat 4.4.0
notebook 5.7.0
pandocfilters 1.4.2
parso 0.3.1
pefile 2018.8.8
pickleshare 0.7.5
pip 10.0.1
prometheus-client 0.4.2
prompt-toolkit 2.0.7
Pygments 2.2.0
PyInstaller 3.4
python-dateutil 2.7.5
pywin32-ctypes 0.2.0
pywinpty 0.5.4
pyzmq 17.1.2
qtconsole 4.4.2
requests 2.20.0
Send2Trash 1.5.0
setuptools 40.4.3
six 1.11.0
terminado 0.8.1
testpath 0.4.2
tornado 5.1.1
traitlets 4.3.2
urllib3 1.24
wcwidth 0.1.7
webencodings 0.5.1
wheel 0.32.2
widgetsnbextension 3.4.2
wincertstore 0.2
Any thoughts on how to further identify what went wrong, what package might be missing or how to fix the issue, and/or alternate ways to save a googlemaps output?
Fiddling with it and comparing from one environment to the other, I found that my virtual environment had ipywidgets 7.4.2 while the base environment had ipywidgets 7.2.1. Downgrading versions fixed the issue I was having.

Categories

Resources