Im new to pyflink. Im tryig to write a python program to read data from kafka topic and prints data to stdout. I followed the link Flink Python Datastream API Kafka Producer Sink Serializaion. But i keep seeing NoSuchMethodError due to version mismatch. I have added the flink-sql-kafka-connector available at https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.13.0/flink-sql-connector-kafka_2.11-1.13.0.jar. Can someone help me in with a proper example to do this? Following is my code
import json
import os
from pyflink.common import SimpleStringSchema
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer
from pyflink.common.typeinfo import Types
def my_map(obj):
json_obj = json.loads(json.loads(obj))
return json.dumps(json_obj["name"])
def kafkaread():
env = StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///automation/flink/flink-sql-connector-kafka_2.11-1.10.1.jar")
deserialization_schema = SimpleStringSchema()
kafkaSource = FlinkKafkaConsumer(
topics='test',
deserialization_schema=deserialization_schema,
properties={'bootstrap.servers': '10.234.175.22:9092', 'group.id': 'test'}
)
ds = env.add_source(kafkaSource).print()
env.execute('kafkaread')
if __name__ == '__main__':
kafkaread()
But python doesnt recognise the jar file and throws the following error.
Traceback (most recent call last):
File "flinkKafka.py", line 31, in <module>
kafkaread()
File "flinkKafka.py", line 20, in kafkaread
kafkaSource = FlinkKafkaConsumer(
File "/automation/flink/venv/lib/python3.8/site-packages/pyflink/datastream/connectors.py", line 186, in __init__
j_flink_kafka_consumer = _get_kafka_consumer(topics, properties, deserialization_schema,
File "/automation/flink/venv/lib/python3.8/site-packages/pyflink/datastream/connectors.py", line 336, in _get_kafka_consumer
j_flink_kafka_consumer = j_consumer_clz(topics,
File "/automation/flink/venv/lib/python3.8/site-packages/pyflink/util/exceptions.py", line 185, in wrapped_call
raise TypeError(
TypeError: Could not found the Java class 'org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer'. The Java dependencies could be specified via command line argument '--jarfile' or the config option 'pipeline.jars'
What is the correct location to add the jar file?
I see that you downloaded flink-sql-connector-kafka_2.11-1.13.0.jar, but the code loades flink-sql-connector-kafka_2.11-1.10.1.jar.
May be you can have a check
just need to check the path to flink-sql-connector jar
You should add jar file of flink-sql-connector-kafka, it depends on your pyflink and scala version. If versions are true, check your path in add_jars function if the jar package is here.
Related
I'm new to pact and I've understood the concept but having hard time understanding and implementing the code.
Here I'm trying to do a simple pact for get_users from reqres.in.
I believe the first (pact ... code does the mock provider part and I compare that using the pact.json file.
import os
import requests
import pytest
from pact import Consumer, Provider, Format
import unittest
import json
pact = Consumer('Consumer').has_pact_with(Provider('Provider'), port=1234, host_name='localhost')
pact.start_service()
CURR_FILE_PATH = os.path.dirname(os.path.abspath(__file__))
PACT_DIR = os.path.join(CURR_FILE_PATH, '')
PACT_FILE = os.path.join(PACT_DIR, 'pact.json')
#defining class
class GetUsers(unittest.TestCase):
def test_get_board(self):
with open(os.path.join(PACT_DIR, PACT_FILE), 'rb') as path_file:
pact_file_json = json.load(path_file)
print('pact_json')
(pact
.given('Request to send message')
.upon_receiving('a request for response or send message')
.with_request(method = 'GET', path = '/api/users?page=2')
.will_respond_with(status = 200, body = pact_file_json))
with pact:
result = requests.get('http://reqres.in/api/users?page=2')
print('actual response')
self.assertEqual(pact_file_json, result.json())
pact.verify()
ge = GetUsers()
print(ge.test_get_board())
however, when I run the code, I get the following error
the data do not match but I verified it in another code.
Traceback (most recent call last):
File "C:\Users\Desktop\Pact\contract_test.py", line 41, in <module>
print(ge.test_get_board())
File "C:\Users\Desktop\Pact\contract_test.py", line 34, in test_get_board
print(data)
File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pact\pact.py", line 370, in __exit__
self.verify()
File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pact\pact.py", line 269, in verify
assert resp.status_code == 200, resp.text
AssertionError: Actual interactions do not match expected interactions for mock MockService.
says missing requests
Missing requests:
GET https://reqres.in/api/users?page=2
I'm not sure what you're trying to attempt here - why are you opening up a pact.json file in this context? The consumer pact package will automatically serialise any contracts if successful, and you don't seem to be reading the file in anywhere.
The second problem seems to be that some code somewhere, is issuing (or requesting?) the query string page=2 but the code you've shown doesn't make use of any query strings.
Could you have another mock service running somewhere?
Examples
https://docs.pactflow.io/docs/examples/python/consumer/readme
https://github.com/pact-foundation/pact-python/blob/master/examples/e2e/tests/consumer/test_user_consumer.py
I tried to make a Twitter bot from https://github.com/joaquinlpereyra/twitterImgBot
The installations went well and successfully but the problem came when I tried to execute it.
Below is the code for my config in the config.py.
Keys and Tokens removed with the API key replaced with random numbers because that's the problem I'm trying to address.
import os
import configparser
import tweepy
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
# read configs from file
config = configparser.ConfigParser()
config.read(dname + '/settings')
twitter_config = config['Twitter']
api_key = twitter_config['2931928391273123']
secret_key = twitter_config['REMOVED']
token = twitter_config['REMOVED']
secret_token = twitter_config['REMOVED']
The error I'm getting when executing it in cmd is:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Administrator>cd C:\Documents and Settings\Administrat
or\Desktop\twitterImgBot-master
C:\Documents and Settings\Administrator\Desktop\twitterImgBot- master>python twitterbot.py
Traceback (most recent call last):
File "twitterbot.py", line 3, in <module>
from settings import config
File "C:\Documents and Settings\Administrator\Desktop\twitterImgBot-master\settings\config.py", line 13, in <module>
api_key = twitter_config['2931928391273123']
File "C:\Python34\lib\configparser.py", line 1203, in __getitem__
raise KeyError(key)
KeyError: '2931928391273123'
C:\Documents and Settings\Administrator\Desktop\twitterImgBot-master>
I looked up all the questions here that has something to do with KeyError but none of them have something to do with the Twitter's API key
Again, the API key which is 2931928391273123 is just a random number to substitute the original one for obvious reasons. I did double check it to make sure I had the correct keys/tokens before resorting to here.
This is also a first for me, so I'm hoping someone can lend me a helping hand! I would have posted the 2 screenshots of the Python file and the cmd but I'm only limited to one link.
Thanks in advance!
During my pandas/Google Analytics API setup, I basically did everything as described in this link:
http://blog.yhathq.com/posts/pandas-google-analytics.html
The client_secrets.json is in the pandas/io folder. When i now try to execute a statement of the form
>>>from pandas.io import ga
>>>df = ga.read_ga(metrics, dimensions, start_date)
the following error occurs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "\Anaconda\lib\site-packages\pandas\io\ga.py", line 110, in read_ga
reader = GAnalytics(**reader_kwds)
File "\Anaconda\lib\site-packages\pandas\io\ga.py", line 179, in __init__
self._service = self._init_service(secrets)
File "\Anaconda\lib\site-packages\pandas\io\ga.py", line 191, in _init_service
http = self.authenticate(secrets)
File "\Anaconda\lib\site-packages\pandas\io\ga.py", line 151, in authenticate
return auth.authenticate(flow, self.token_store)
File "\Anaconda\lib\site-packages\pandas\io\auth.py", line 108, in authenticate
credentials = tools.run(flow, storage)
AttributeError: 'module' object has no attribute 'run'
According to the yhat link, my browser should open for authentication.
Note: I did not not create the Client ID for "installed application", since I did not have this choice in the menu when creating the ID. Instead, i chose "other". This shouldn't be the cause of the error, though.
Second Note: I recently updated my pandas to 0.17.1. When importing pandas.io.ga, i got the message that the .ga module is deprecated. Furthermore, i manually installed the gflags module, because it was needed when I tried to import .io.ga the first time.
Either file a ticket with the owners of Pandas to change (currently) line 108 of pandas/io/auth.py from run() to run_flow(), or make the fix yourself and file a PR. (Yes, it would've been nice if Google had just made run_flow() and alias of run(), but as you can imagine, this is not how this change evolved, so we have to live with it.)
For other developers running into this error: If you have the latest version (as of Feb 2016) of the Google APIs Client Library for Python, just rename your call from tools.run() to tools.run_flow(), and you should be good-to-go. More about this change in a PSA (public service announcement) blogpost I wrote back in mid-2015 but update periodically to stay current.
The fastest way to upgrade your Client Library is with:
pip install -U google-api-python-client # or pip3 for 3.x
I'm trying to launch AWS EMR cluster using boto library, everything works well.
Because of that I need to install required python libraries, tried to add bootstrap action step using boto.emr.bootstrap_action
But It gives error below;
Traceback (most recent call last):
File "run_on_emr_cluster.py", line 46, in <module>
steps=[step])
File "/usr/local/lib/python2.7/dist-packages/boto/emr/connection.py", line 552, in run_jobflow
bootstrap_action_args = [self._build_bootstrap_action_args(bootstrap_action) for bootstrap_action in bootstrap_actions]
File "/usr/local/lib/python2.7/dist-packages/boto/emr/connection.py", line 623, in _build_bootstrap_action_args
bootstrap_action_params['ScriptBootstrapAction.Path'] = bootstrap_action.path AttributeError: 'str' object has no attribute 'path'
Code below;
from boto.emr.connection import EmrConnection
conn = EmrConnection('...', '...')
from boto.emr.step import StreamingStep
step = StreamingStep(name='mapper1',
mapper='s3://xxx/mapper1.py',
reducer='s3://xxx/reducer1.py',
input='s3://xxx/input/',
output='s3://xxx/output/')
from boto.emr.bootstrap_action import BootstrapAction
bootstrap_action = BootstrapAction(name='install related packages',path="s3://xxx/bootstrap.sh", bootstrap_action_args=None)
job = conn.run_jobflow(name='emr_test',
log_uri='s3://xxx/logs',
master_instance_type='m1.small',
slave_instance_type='m1.small',
num_instances=1,
action_on_failure='TERMINATE_JOB_FLOW',
keep_alive=False,
bootstrap_actions='[bootstrap_action]',
steps=[step])
What's the proper way of passing bootstrap arguments?
You are passing the bootstrap_actions argument as a literal string rather than as a list containing the BootstrapAction object you just created. Try this:
job = conn.run_jobflow(name='emr_test',
log_uri='s3://xxx/logs',
master_instance_type='m1.small',
slave_instance_type='m1.small',
num_instances=1,
action_on_failure='TERMINATE_JOB_FLOW',
keep_alive=False,
bootstrap_actions=[bootstrap_action],
steps=[step])
Notice that the ``bootstrap_action` argument is different here.
I'm trying to automate the addition of new portgroups to ESXi hosts using pysphere. I'm using the following code fragment:
from pysphere import MORTypes
from pysphere import VIServer, VIProperty
from pysphere.resources import VimService_services as VI
s = VIServer()
s.connect(vcenter, user, password)
host_system = s.get_hosts().keys()[17]
prop = VIProperty(s, host_system)
propname = prop.configManager._obj.get_element_networkSystem()
vswitch = prop.configManager.networkSystem.networkInfo.vswitch[0]
network_system = VIMor(propname, MORTypes.HostServiceSystem)
def add_port_group(name, vlan_id, vswitch, network_system):
request = VI.AddPortGroupRequestMsg()
_this = request.new__this(network_system)
_this.set_attribute_type(network_system.get_attribute_type())
request.set_element__this(_this)
portgrp = request.new_portgrp()
portgrp.set_element_name(name)
portgrp.set_element_vlanId(vlan_id)
portgrp.set_element_vswitchName(vswitch)
portgrp.set_element_policy(portgrp.new_policy())
request.set_element_portgrp(portgrp)
s._proxy.AddPortGroup(request)
However, when I attempt to run it, I get the following error:
>>> add_port_group(name, vlan_id, vswitch, network_system)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in add_port_group
File "/usr/lib/python2.6/site-packages/pysphere-0.1.8- py2.6.egg/pysphere/resources/VimService_services.py", line 4344, in AddPortGroup
response = self.binding.Receive(AddPortGroupResponseMsg.typecode)
File "/usr/lib/python2.6/site-packages/pysphere-0.1.8- py2.6.egg/pysphere/ZSI/client.py", line 545, in Receive
return _Binding.Receive(self, replytype, **kw)
File "/usr/lib/python2.6/site-packages/pysphere-0.1.8- py2.6.egg/pysphere/ZSI/client.py", line 464, in Receive
raise FaultException(msg)
pysphere.ZSI.FaultException: The object has already been deleted or has not been completely created
I've attempted to swap in different values for "vswitch" and "network_system", but I haven't had any success. Has anyone attempted to do something similar with pysphere successfully?
I can accomplish what I need through Powershell, which demonstrates that it isn't a vmware issue, but I don't want to use Powershell in this particular case.
I tried your code on one of our vSpheres.
It seems you are passing the object to set_element_vswitchName rather than the name. Maybe this will help:
vswitch = prop.configManager.networkSystem.networkInfo.vswitch[0].name