Not able to create database with PyHive with HDFS on docker - python

I have hive running on docker in a dev setup inspired by this article like this:
version: '2'
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
container_name: namenode
volumes:
- ./hdfs/namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=hive
env_file:
- ./hive/hadoop-hive.env
ports:
- "50070:50070"
networks:
- shared
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
container_name: datanode
volumes:
- ./hdfs/datanode:/hadoop/dfs/data
env_file:
- ./hive/hadoop-hive.env
environment:
SERVICE_PRECONDITION: "namenode:50070"
depends_on:
- namenode
ports:
- "50075:50075"
networks:
- shared
hive-server:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-server
env_file:
- ./hive/hadoop-hive.env
environment:
HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
SERVICE_PRECONDITION: "hive-metastore:9083"
depends_on:
- hive-metastore
ports:
- "10000:10000"
networks:
- shared
hive-metastore:
image: bde2020/hive:2.3.2-postgresql-metastore
container_name: hive-metastore
env_file:
- ./hive/hadoop-hive.env
command: /opt/hive/bin/hive --service metastore
environment:
SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432"
depends_on:
- hive-metastore-postgresql
ports:
- "9083:9083"
networks:
- shared
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.3.0
container_name: hive-metastore-postgresql
volumes:
- ./metastore-postgresql/postgresql/data:/var/lib/postgresql/data
depends_on:
- datanode
networks:
- shared
networks:
shared:
name: hive-playground
driver: bridge
Now I try to create a database but instead of doing it manually as described in the given article on the docker container directly I attempt to do this through a python script.
But when I tried to create a database and show the database nothing happens. My python script looks like this:
from pyhive import hive
conn = hive.Connection(host="localhost", port=10000, username="hive")
cursor = conn.cursor()
print(cursor.execute("SHOW DATABASES"))
print(cursor.execute("CREATE DATABASE IF NOT EXISTS userdb"))
conn.close()
But this always only prints:
NONE
NONE
Even if executed twice. I assume I got something very fundamental wrong but cannot figure out what.

Related

Is it possible to add new packages and run docker non-dev file?

I have been working on Superset (cloned the code from superset github).Created new file under the folder superset/superset/ named db_access.py.
I have imported this file in the existing app.py file .Now have the following imports in the newly created file,
db_access.py
import boto3
import sys
import hvac
import pymysql
import psycopg2
import os
import io
import logging
from datetime import datetime, timedelta
import traceback
from timestream_reader import Timestream_reader
import json
from pathlib import Path
from zenpy import Zenpy
app.py
import logging
import os
import json
from flask import Flask
from flask import request,jsonify
from superset.initialization import SupersetAppInitializer
import sys
from superset.db_access import Db_access
I have included the hvac in requirements.txt and when I am composing docker dev file, it works with out any issue.
But when I try to compose the non-dev docker file with the above code, it shows error like hvac module not found (I tried pip install hvac and included that in requirements file too).Is there any other places that I need to customize based on new imports for non-dev?
docker-compose-non-dev.yml
x-superset-image: &superset-image apache/superset:${TAG:-latest-dev}
x-superset-depends-on: &superset-depends-on
- db
- redis
x-superset-volumes: &superset-volumes
# /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
- ./docker:/app/docker
- superset_home:/app/superset_home
- ./superset:/app/superset
- ./superset-frontend:/app/superset-frontend
version: "3.7"
services:
redis:
image: redis:latest
container_name: superset_cache
restart: unless-stopped
volumes:
- redis:/data
db:
env_file: docker/.env-non-dev
image: postgres:10
container_name: superset_db
restart: unless-stopped
volumes:
- db_home:/var/lib/postgresql/data
superset:
env_file: docker/.env-non-dev
image: *superset-image
container_name: superset_app
command: ["/app/docker/docker-bootstrap.sh", "app-gunicorn"]
user: "root"
restart: unless-stopped
ports:
- 8088:8088
depends_on: *superset-depends-on
volumes: *superset-volumes
superset-init:
image: *superset-image
container_name: superset_init
command: ["/app/docker/docker-init.sh"]
env_file: docker/.env-non-dev
depends_on: *superset-depends-on
user: "root"
volumes: *superset-volumes
healthcheck:
disable: true
superset-worker:
image: *superset-image
container_name: superset_worker
command: ["/app/docker/docker-bootstrap.sh", "worker"]
env_file: docker/.env-non-dev
restart: unless-stopped
depends_on: *superset-depends-on
user: "root"
volumes: *superset-volumes
healthcheck:
test: ["CMD-SHELL", "celery inspect ping -A superset.tasks.celery_app:app -d celery#$$HOSTNAME"]
superset-worker-beat:
image: *superset-image
container_name: superset_worker_beat
command: ["/app/docker/docker-bootstrap.sh", "beat"]
env_file: docker/.env-non-dev
restart: unless-stopped
depends_on: *superset-depends-on
user: "root"
volumes: *superset-volumes
healthcheck:
disable: true
volumes:
superset_home:
external: false
db_home:
external: false
redis:
external: false
docker-compose.yml (dev)
x-superset-image: &superset-image apache/superset:${TAG:-latest-dev}
x-superset-user: &superset-user root
x-superset-depends-on: &superset-depends-on
- db
- redis
x-superset-volumes: &superset-volumes
# /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
- ./docker:/app/docker
- ./superset:/app/superset
- ./superset-frontend:/app/superset-frontend
- superset_home:/app/superset_home
- ./tests:/app/tests
version: "3.7"
services:
redis:
image: redis:latest
container_name: superset_cache
restart: unless-stopped
ports:
- "127.0.0.1:6379:6379"
volumes:
- redis:/data
db:
env_file: docker/.env
image: postgres:14
container_name: superset_db
restart: unless-stopped
ports:
- "127.0.0.1:5432:5432"
volumes:
- db_home:/var/lib/postgresql/data
superset:
env_file: docker/.env
image: *superset-image
container_name: superset_app
command: ["/app/docker/docker-bootstrap.sh", "app"]
restart: unless-stopped
ports:
- 8088:8088
user: *superset-user
depends_on: *superset-depends-on
volumes: *superset-volumes
environment:
CYPRESS_CONFIG: "${CYPRESS_CONFIG}"
superset-websocket:
container_name: superset_websocket
build: ./superset-websocket
image: superset-websocket
ports:
- 8080:8080
depends_on:
- redis
# Mount everything in superset-websocket into container and
# then exclude node_modules and dist with bogus volume mount.
# This is necessary because host and container need to have
# their own, separate versions of these files. .dockerignore
# does not seem to work when starting the service through
# docker-compose.
#
# For example, node_modules may contain libs with native bindings.
# Those bindings need to be compiled for each OS and the container
# OS is not necessarily the same as host OS.
volumes:
- ./superset-websocket:/home/superset-websocket
- /home/superset-websocket/node_modules
- /home/superset-websocket/dist
environment:
- PORT=8080
- REDIS_HOST=redis
- REDIS_PORT=6379
- REDIS_SSL=false
superset-init:
image: *superset-image
container_name: superset_init
command: ["/app/docker/docker-init.sh"]
env_file: docker/.env
depends_on: *superset-depends-on
user: *superset-user
volumes: *superset-volumes
environment:
CYPRESS_CONFIG: "${CYPRESS_CONFIG}"
healthcheck:
disable: true
superset-node:
image: node:16
container_name: superset_node
command: ["/app/docker/docker-frontend.sh"]
env_file: docker/.env
depends_on: *superset-depends-on
volumes: *superset-volumes
superset-worker:
image: *superset-image
container_name: superset_worker
command: ["/app/docker/docker-bootstrap.sh", "worker"]
env_file: docker/.env
restart: unless-stopped
depends_on: *superset-depends-on
user: *superset-user
volumes: *superset-volumes
healthcheck:
test: ["CMD-SHELL", "celery inspect ping -A superset.tasks.celery_app:app -d celery#$$HOSTNAME"]
# Bump memory limit if processing selenium / thumbnails on superset-worker
# mem_limit: 2038m
# mem_reservation: 128M
superset-worker-beat:
image: *superset-image
container_name: superset_worker_beat
command: ["/app/docker/docker-bootstrap.sh", "beat"]
env_file: docker/.env
restart: unless-stopped
depends_on: *superset-depends-on
user: *superset-user
volumes: *superset-volumes
healthcheck:
disable: true
superset-tests-worker:
image: *superset-image
container_name: superset_tests_worker
command: ["/app/docker/docker-bootstrap.sh", "worker"]
env_file: docker/.env
environment:
DATABASE_HOST: localhost
DATABASE_DB: test
REDIS_CELERY_DB: 2
REDIS_RESULTS_DB: 3
REDIS_HOST: localhost
network_mode: host
depends_on: *superset-depends-on
user: *superset-user
volumes: *superset-volumes
healthcheck:
test: ["CMD-SHELL", "celery inspect ping -A superset.tasks.celery_app:app -d celery#$$HOSTNAME"]
volumes:
superset_home:
external: false
db_home:
external: false
redis:
external: false

Reaching elasticsearch from jupyter notebook

I am running elasticsearch from docker-compose and I want to connect python to it and test it. The docker-compose has the following form:
version: '3.2'
services:
elasticsearch:
container_name: elasticsearch
build:
context: elasticsearch/
args:
ELK_VERSION: $ELK_VERSION
volumes:
- type: bind
source: ./elasticsearch/config/elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
- type: volume
source: elasticsearch
target: /usr/share/elasticsearch/data
ports:
- "9200:9200"
- "9300:9300"
environment:
ES_JAVA_OPTS: "-Xmx256m -Xms256m"
ELASTIC_PASSWORD: changeme
links:
- kibana
networks:
- elk
logstash:
container_name: logstash
build:
context: logstash/
args:
ELK_VERSION: $ELK_VERSION
volumes:
- type: bind
source: ./logstash/config/logstash.yml
target: /usr/share/logstash/config/logstash.yml
read_only: true
- type: bind
source: ./logstash/pipeline
target: /usr/share/logstash/pipeline
read_only: true
ports:
- "5000:5000"
- "9600:9600"
expose:
- "5044"
networks:
- elk
depends_on:
- elasticsearch
kibana:
container_name: kibana
build:
context: kibana/
args:
ELK_VERSION: $ELK_VERSION
volumes:
- type: bind
source: ./kibana/config/kibana.yml
target: /usr/share/kibana/config/kibana.yml
read_only: true
ports:
- "5601:5601"
networks:
- elk
app:
container_name: app
build : ./app
volumes:
- ./app/:/usr/src/app
- /usr/src/app/node_modules/ # make node_module empty in container
command: npm start
ports:
- "3000:3000"
networks:
- elk
nginx:
container_name: nginx
build: ./nginx
volumes:
- ./nginx/config:/etc/nginx/conf.d
- ./nginx/log:/var/log/nginx
ports:
- "80:80"
- "443:443"
links:
- app:app
depends_on:
- app
networks:
- elk
filebeat:
container_name: filebeat
build: ./filebeat
entrypoint: "filebeat -e -strict.perms=false"
volumes:
- ./filebeat/config/filebeat.yml:/usr/share/filebeat/filebeat.yml
- ./nginx/log:/var/log/nginx
networks:
- elk
depends_on:
- app
- nginx
- logstash
- elasticsearch
- kibana
links:
- logstash
networks:
elk:
driver: bridge
volumes:
elasticsearch:
I am using in jupyter-notebook this very simple code to connect to elasticsearch and test it:
from elasticsearch import Elasticsearch
es = Elasticsearch(['http://elasticsearch:9200/'])
if not es.ping():
raise ValueError("Connection failed")
However, the result is ValueError: Connection failed. Is there a problem reaching the localhost outside the docker?
I have tried also to replace elasticsearch with localhost in es = Elasticsearch(['http://elasticsearch:9200/']), but it also failed.

Cannot connect to redis container

I am trying to connect redis container to python app container using environment variable. I passed password as an environment variable but it is not connecting, if I don't use an environment variable and hard code the password it works fine otherwise it gives redis.exceptions.ConnectionError
version: "3.7"
services:
nginx_app:
image: nginx:latest
depends_on:
- flask_app
volumes:
- ./default.conf:/etc/nginx/conf.d/default.conf
ports:
- 8090:80
networks:
- my_project_network
flask_app:
build:
context: .
dockerfile: Dockerfile
expose:
- 5000
environment:
- PASSWORD=pass123a
depends_on:
- redis_app
networks:
- my_project_network
redis_app:
image: redis:latest
command: redis-server --requirepass ${PASSWORD} --appendonly yes
environment:
- PASSWORD=pass123a
volumes:
- ./redis-vol:/data
expose:
- 6379
networks:
- my_project_network
networks:
my_project_network:
index.py
from flask import Flask
from redis import Redis
import os
app = Flask(__name__)
redis = Redis(host='redis_app', port=6379, password=os.getenv('PASSWORD'))
#app.route('/')
def hello():
redis.incr('hits')
return 'Hello World! I have been seen %s times.' % redis.get('hits')
if __name__ == "__main__":
app.run(host="0.0.0.0", debug=True)
Update your docker-compose.yaml
the environment is a list of strings:
docker-composer interpolates ${ENV} where the value of ENV is loaded from .env file
Use:
command: redis-server --requirepass $PASSWORD --appendonly yes
Instead of:
command: redis-server --requirepass ${PASSWORD} --appendonly yes
You can verify environment variable inside ur container by:
docker-compose run --rm flask_app printenv | grep PASSWORD
That should return:
PASSWORD=pass123a
docker-compose example for environment variables: Here
Looks like you have missed passing the environment variable to your Redis container.
Try This:
version: "3.7"
services:
nginx_app:
image: nginx:latest
#LOCAL IMAGE
depends_on:
- flask_app
volumes:
- ./default.conf:/etc/nginx/conf.d/default.conf
ports:
- 8082:80
networks:
- my_project_network
flask_app:
build:
context: .
dockerfile: Dockerfile
expose:
- 5000
environment:
- PASSWORD=pass123a
depends_on:
- redis_app
networks:
- my_project_network
redis_app:
image: redis:latest
command: redis-server --requirepass ${PASSWORD} --appendonly yes
environment:
- PASSWORD=pass123a
volumes:
- ./redis-vol:/data
expose:
- 6379
networks:
- my_project_network
networks:
my_project_network:

Connecting MYSQL container with Python container through volume and pymysql

I need to connect a MYSQL container to a Python container so I can get the database from my MYSQL on this python container. I am using pymysql for this right now.
I made the containers with docker-compose I will link the documents down here.
dockerfile.yml mysql the 'my-mysql' is mij custom mysql image
version: "3.3"
services:
app-name:
build:
context: .
image: my-mysql
container_name: my-mysql
ports:
- '3308:3308'
environment:
MYSQL_ROOT_PASSWORD: 'root-password'
MYSQL_DATABASE: 'bags'
MYSQL_USER: 'user'
MYSQL_PASSWORD: 'password'
restart: unless-stopped
expose:
- '3308'
volumes:
data:
external:
name: data
Here is the dockerfile.yml for python the 'dataoverdracht' is my custom python image
version: "3.7"
services:
app-name:
build:
context: .
image: dataoverdracht:1.0.0
container_name: dataoverdracht
ports:
- 92:8080
environment:
- TARGET=$TARGET
restart: unless-stopped
volumes:
- .:/data
volumes:
data:
external:
name: data
This is the python code I made for the container to connect with the mysql.
import dash_html_components as html
import pymysql
import pandas as pd
connection = pymysql.connect("172.17.0.2","user","password","bags" )
cur = connection.cursor()
cur.execute("select * from filledbags LIMIT 10")
rows = cur.fetchall()
df = pd.DataFrame(rows)
print(df)
I have documents like the dockerfiles and the requirements.txt but I can promise you there is nothing wrong with them the only thing that can be handy is the expose from dockerfile python this is EXSPOSE 8080.
When I try to connect them I will get the error
pymysql.err.OperationalError: (1045, "Access denied for user 'user'#'172.18.0.2' (using password: YES)")
Can somebody explain to my what is wrong with my code. Btw: I did check different questions on this website I read some about changing my my.cnf I tried to look for my bind-address but it was not included in this file for me so this was no solution it seemed.
You need to create the network for example my_network
$ docker network create --driver=bridge my_network
Then use it in docker-compose files.
first_file.yml
version: "3.3"
services:
app-name:
build:
context: .
image: my-mysql
container_name: my-mysql
ports:
- "3308:3308"
environment:
MYSQL_ROOT_PASSWORD: "root-password"
MYSQL_DATABASE: "bags"
MYSQL_USER: "user"
MYSQL_PASSWORD: "password"
restart: unless-stopped
expose:
- "3308"
networks:
- my_network
networks:
my_network:
external: true
volumes:
data:
external:
name: data
second_file.yml
version: "3.7"
services:
app-name:
build:
context: .
image: dataoverdracht:1.0.0
container_name: dataoverdracht
ports:
- 92:8080
environment:
- TARGET=$TARGET
restart: unless-stopped
volumes:
- .:/data
networks:
- my_network
networks:
my_network:
external: true
volumes:
data:
external:
name: data
Then update your python code (use my_network instead of 172.17.0.2 ).
import dash_html_components as html
import pymysql
import pandas as pd
connection = pymysql.connect("my_network","user","password","bags" )
cur = connection.cursor()
cur.execute("select * from filledbags LIMIT 10")
rows = cur.fetchall()
df = pd.DataFrame(rows)
print(df)

Setting up docker-compose.yml to run celery worker and celery beat for a django project with redis as broker

I have setup django project using django cookiecutter. The project scaffolding is excellent. I also opted to use docker along with it. Now I am struggling with getting celery v4.0.x working in the whole setup.
This is my docker-compose.yml
version: '2'
volumes:
postgres_data_dev: {}
postgres_backup_dev: {}
services:
postgres:
build: ./compose/postgres
volumes:
- postgres_data_dev:/var/lib/postgresql/data
- postgres_backup_dev:/backups
environment:
- POSTGRES_USER=application
django:
build:
context: .
dockerfile: ./compose/django/development/Dockerfile
depends_on:
- postgres
environment:
- POSTGRES_USER=application
- USE_DOCKER=yes
volumes:
- .:/app
- /tmp/
links:
- postgres
- redis
expose:
- "8000"
env_file:
- ./dev.env
restart:
- "on-failure"
nginx:
build:
context: .
dockerfile: ./compose/nginx/development/Dockerfile
depends_on:
- django
ports:
- "0.0.0.0:80:80"
links:
- django
volumes_from:
- django
redis:
image: redis:latest
hostname: redis
celeryworker:
build:
context: .
dockerfile: ./compose/django/development/Dockerfile
env_file: ./dev.env
depends_on:
- postgres
- redis
command: celery -A application.taskapp worker -l INFO
restart: "on-failure"
celerybeat:
build:
context: .
dockerfile: ./compose/django/development/Dockerfile
env_file: ./dev.env
depends_on:
- postgres
- redis
command: celery -A application.taskapp beat -l INFO
Quite honestly I feel there seems to be some tiny issue with config for celerybeat/celeryworker service. It would be nice if someone can point it out.
Update:
When I execute the command to run the containers, I get an error saying that application could not be found
Update
This is the new compose file which ironed out few errors in my compose. Somewhere along the way of getting it all working I also came across thread where someone had mentioned that ordering of the services mattered as well. So in the new version, django is placed first.
version: '2'
volumes:
postgres_data_dev: {}
postgres_backup_dev: {}
services:
django: &django
build:
context: .
dockerfile: ./compose/django/development/Dockerfile
depends_on:
- postgres
volumes:
- .:/app
- /tmp/
links:
- postgres
- redis
environment:
- POSTGRES_USER=application
- USE_DOCKER=yes
expose:
- "8000"
env_file:
- ./dev.env
postgres:
build: ./compose/postgres
volumes:
- postgres_data_dev:/var/lib/postgresql/data
- postgres_backup_dev:/backups
environment:
- POSTGRES_USER=application
ports:
- "5432:5432"
redis:
image: redis:latest
hostname: redis
ports:
- "0.0.0.0:6379:6379"
env_file:
- ./dev.env
nginx:
build:
context: .
dockerfile: ./compose/nginx/development/Dockerfile
depends_on:
- django
ports:
- "0.0.0.0:80:80"
links:
- django
volumes_from:
- django
celeryworker:
<<: *django
depends_on:
- redis
- postgres
command: "celery -A application.taskapp worker --loglevel INFO --uid taskmaster"
I am using the same tech stack . This works fine for me.
docker-compose.yml
redis:
image: redis
container_name: redis
command: ["redis-server", "--port", "${REDIS_PORT}", "--appendonly", "yes","--maxmemory", "1gb", "--maxmemory-policy", "allkeys-lru"]
ports:
- "${REDIS_PORT}:${REDIS_PORT}"
volumes:
- .:/redis.conf
networks:
- pipeline-net
celery-worker:
build:
context: ./app
container_name: celery-worker
entrypoint: celery
command: -A celery_app.celery worker --loglevel=info
volumes:
- .:/var/www/app/worker
links:
- redis
depends_on:
- redis
networks:
- pipeline-net
celery-beat:
build:
context: ./app
container_name: celery-beat
entrypoint: celery
command: -A celery_app.celery beat --loglevel=info
volumes:
- .:/var/www/app/beat
links:
- celery-worker
- redis
depends_on:
- celery-worker
- redis
networks:
- pipeline-net
flower:
image: mher/flower
container_name: flower
environment:
- CELERY_BROKER_URL=redis://redis:6379
- FLOWER_PORT=8888
ports:
- 8888:8888
links:
- redis
- celery-worker
- celery-beat
depends_on:
- redis
- celery-worker
- celery-beat
networks:
- pipeline-net

Categories

Resources