Boto3/Jenkins client throwing an error while running the code - python

I am running a daily glue script in one of our AWS machines, which I scheduled it using jenkins.
I am getting the following from the last 15 days. (this daily job is running for almost 6 months and all of a sudden since the 15 days this is happening)
The jenkins console output looks like this
Started by timer
Building in workspace /var/lib/jenkins/workspace/build_name_xyz
[build_name_xyz] $ /bin/sh -xe /tmp/jenkins8188702635955396537.sh
+ /usr/bin/python3 /var/lib/jenkins/path_to_script/glue_crawler.py
Traceback (most recent call last):
File "/var/lib/jenkins/path_to_script/glue_crawler.py", line 10, in <module>
response = glue_client.update_crawler(Name = crawler_name,Targets = {'S3Targets': [{'Path':update_path}]})
File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidInputException: An error occurred (InvalidInputException) when calling the UpdateCrawler operation: Cannot update Crawler while running. Please stop crawl or wait until it completes to update.
Build step 'Execute shell' marked build as failure
Finished: FAILURE
So, I went ahead and have seen the line 10 in this file
/var/lib/jenkins/path_to_script/glue_crawler.py
That looked something like this.
import boto3
import datetime
glue_client = boto3.client('glue', region_name='region_name')
crawler_name = 'xyz_abc'
today = (datetime.datetime.now()).strftime("%Y_%m_%d")
update_path = 's3://path-to-respective-aws-s3-bucket/%s' % (today)
response = glue_client.update_crawler(Name = crawler_name,Targets = {'S3Targets': [{'Path':update_path}]})
response_crawler = glue_client.start_crawler(
Name=crawler_name
)
print(response_crawler)
The above throws an error at line 10. I am not understanding what exactly is going wrong on line 10 and hence the jenkins throws an error with the red ball, requesting for some help here. I tried googling on this, but I couldn't find anything.
Just, FYI......if I run the same build (by clicking 'Build Now') using the jenkins UI after sometime, the job runs absolutely fine.
Not sure what exactly is wrong here, any help is highly appreciated.
Thanks in advance!!

The error is self explanatory:
Cannot update Crawler while running. Please stop crawl or wait until it completes to update.
So somehow the crawler was started approximately at the same time and in Glue it's not allowed to update crawler properties when it's running. Please check if there is any other task that starts crawler with name xyz_abc too. Besides that in AWS Console make sure the crawler is configured to run on demand rather than on schedule.

Related

Instabot API for Python raises error after running code for the 2nd time

I am currently working with the Instabot API for python and I ran across the following issue:
I wrote a small program:
from instabot import Bot
bot = Bot()
bot.login(username = "[my username]", password = "[my passowrd]")
bot.follow("lego")
which worked fine after running it for the very first time. However, after running the program for a second time, this time following another account, it raised an error ("KeyError: ds_user").
This error can be fixed by deleting the config folder inside the project folder. Unfortunately, this isn't a very sustainable solution, as it makes working on the code really arduous. I therefore would like to know if there is any solution for getting the program to run multiple times without having to delete the config folder over and over again.
I am receiving the following traceback (code is running in an anaconda environment called "Instagram Automation"):
Traceback (most recent call last):
File "e:/Programme/OneDrive/Dokumente/Projekte/Instagram Automation/main.py", line 4, in <module>
bot.login(username = "[my username]", password = "[my password]")
File "E:\Programme\Anaconda\envs\Instagram Automation\lib\site-packages\instabot\bot\bot.py", line 443, in login
if self.api.login(**args) is False:
File "E:\Programme\Anaconda\envs\Instagram Automation\lib\site-packages\instabot\api\api.py", line 240, in login
self.load_uuid_and_cookie(load_cookie=use_cookie, load_uuid=use_uuid)
File "E:\Programme\Anaconda\envs\Instagram Automation\lib\site-packages\instabot\api\api.py", line 199, in load_uuid_and_cookie
return load_uuid_and_cookie(self, load_uuid=load_uuid, load_cookie=load_cookie)
File "E:\Programme\Anaconda\envs\Instagram Automation\lib\site-packages\instabot\api\api_login.py", line 352, in load_uuid_and_cookie
cookie_username = self.cookie_dict["ds_user"]
KeyError: 'ds_user'
As far as I can see, the only way on your side to fight the symptoms is to always delete the JSON file in the config folder, e.g:
import os
if os.path.isfile("path/to/config/file.json"):
os.remove("path/to/config/file.json")
import instabot
# rest of your code goes here
The developers of instabot should fix the source of the problem, for example by using self.cookie_dict.get("ds_user", "some default value") instead of self.cookie_dict["ds_user"]

Can't initialize ANT+ Node with Python OpenANT library

I've totally new in Python and also in the ANT+ technology. I wonder if that's not some basic problem, but I've been strugling with it for couple of days already browsing through forums with no luck..
So I'm trying to use the Python OpenANT library (https://github.com/Tigge/openant) to access my ANT doungle which is plugged into the USB port (WINDOWS 10 PRO). My goal is to access my Garmin through it and get some data from it. However, I'm stuck at the very beginning trying to inizialize the ANT Node. My code is this:
from ant.easy.node import Node
node=Node()
To this I get the exception:
File "C:/Users/Edgars/Desktop/untitled-5.py", line 2, in <module>
pass
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\ant\easy\node.py", line 56, in __init__
self.ant = Ant()
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\ant\base\ant.py", line 68, in __init__
self._driver.open()
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\Lib\site-packages\ant\base\driver.py", line 193, in open
cfg = dev.get_active_configuration()
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyusb-1.1.0-py3.8.egg\usb\core.py", line 909, in get_active_configuration
return self._ctx.get_active_configuration(self)
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyusb-1.1.0-py3.8.egg\usb\core.py", line 113, in wrapper
return f(self, *args, **kwargs)
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyusb-1.1.0-py3.8.egg\usb\core.py", line 250, in get_active_configuration
bConfigurationValue=self.backend.get_configuration(self.handle)
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyusb-1.1.0-py3.8.egg\usb\backend\libusb0.py", line 519, in get_configuration
ret = self.ctrl_transfer(
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyusb-1.1.0-py3.8.egg\usb\backend\libusb0.py", line 601, in ctrl_transfer
return _check(_lib.usb_control_msg(
File "C:\Users\Edgars\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyusb-1.1.0-py3.8.egg\usb\backend\libusb0.py", line 447, in _check
raise USBError(errmsg, ret)
usb.core.USBError: [Errno None] b'libusb0-dll:err [control_msg] sending control message failed, win error: A device which does not exist was specified.\r\n\n'
I have closed the Garmin Agent, so no other programs are using my ANT dongle at the same time. When I run my code, the specific sound occurs every time - the one that we hear when we detach a USB device by selecting "Eject" from the drop-down menu (the sound happens simultaneously with the exception message), so I guess the USB gets accessed at some moment.
Before the exception I get such a printout:
Driver available: [<class 'ant.base.driver.SerialDriver'>, <class 'ant.base.driver.USB2Driver'>, <class 'ant.base.driver.USB3Driver'>]
- Using: <class 'ant.base.driver.USB3Driver'>
Could not check if kernel driver was active, not implemented in usb backend
I have seen other users' threads where the printout says Using ... USB1Driver or Using ... USB2Driver, and they don't get this message. I've installed various python libraries trying to get even this far, and now I've worried that maybe they get in each other's way.. Can anybody help me with this? It's really frustrating that a program of only two code lines can get so complicated.. :D
!!!EDIT!!!
OK, I found the problem - in the "driver.py" file there's a line dev.reset() which disconnects my USB dongle before trying to access it. I have no idea why such a line should exist there. I tried to comment this line out, and now I'm not getting the abovementioned error anymore. However, what happens now is there are continuos timeouts..
So my code has evolved to this (although actually the same timeouts happen also with my initial 2-lines-long program):
from ant.easy.node import Node
from ant.easy.channel import Channel
from ant.base.message import Message
import threading
NETWORK_KEY=[0xb9,0xa5,0x21,0xfb,0xbd,0x72,0xc3,0x45]
def on_data(data):
print("Data received")
print(data)
def back_thread(node):
node.set_network_key(0x00,NETWORK_KEY)
channel=node.new_channel(Channel.Type.BIDIRECTIONAL_RECEIVE)
channel.on_broadcast_data=on_data
channel.on_burst_data=on_data
channel.set_period(16070)
channel.set_search_timeout(20)
channel.set_rf_freq(57)
channel.set_id(0,120,0)
try:
channel.open()
node.start()
finally:
node.stop()
print("ANT Node Shutdown Complete")
node=Node()
x=threading.Thread(target=back_thread,args=(node,))
x.start()
Now I get this error line printed out for ever:
<class 'usb.core.USBError'>, (None, b'libusb0-dll:err [_usb_reap_async] timeout error\n')
When my Garmin Agent is active, I get the error "ANT resource already in use" instead of the timeout, so I'm certain that my code is accessing the ANT dongle.. However, now (having closed the Garmin Agent) I have no idea about how to get rid of the timeout and how to establish a simple handshake with my Garmin device..
OK, now I've figured out that my Garmin Forerunner 310XT can't act as a data source and thus cannot be accessed using the ANT+ protokol. Instead, I should use the ANT-FS protocol of File Sharing. Keeping my head down and trying it out...
I posted a PR with some changes that I made to get Tigge’s openant library to work. Basically, I put a pause after the reset line that you mentioned above and bypassed the use of udev_rules as it doesn’t apply in Windows. You can use libusb but installation is a bit different. I’ve added Windows installation instructions to the readme in the PR with details on what worked for me.

Python Requests_html: giving me Timeout Error

I'm trying to scrape headlines from medium.com by using this library called requests_html
The code I'm using works well on other's PC but not mine.
Here's what the original code looks like this:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://medium.com/#daranept27')
r.html.render()
x = r.html.find('a.eg.bv')
[print(elem.text) for elem in x]
It gives me pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.
Here's the full error:
Traceback (most recent call last):
File "C:\Users\intel\Desktop\hackerrank.py", line 5, in <module>
r.html.render()
File "C:\Users\intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests_html.py", line 598, in render
content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
File "C:\Users\intel\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "C:\Users\intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests_html.py", line 512, in _async_render
await page.goto(url, options={'timeout': int(timeout * 1000)})
File "C:\Users\intel\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\page.py", line 885, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.
[Finished in 13.0s with exit code 1]
[shell_cmd: python -u "C:\Users\intel\Desktop\hackerrank.py"]
[dir: C:\Users\intel\Desktop]
[path: C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\nodejs\;C:\Python38;C:\Users\intel\AppData\Local\Programs\Python\Python38\Scripts\;C:\Users\intel\AppData\Local\Programs\Python\Python38\;C:\MinGW\bin;C:\Users\intel\AppData\Local\Programs\Microsoft VS Code\bin]
I saw a comment on one of my posts and saw others' answers too to re-run it, then it will work. I don't understand why...
The error you are getting suggests that you are not getting a response from the server in a timely manner.
I ran your code on my machine (Ubuntu 18.04) successfully and got the following results:
Seven Days -Between Life And Death
Have you ever encountered a fake friend? If So, Try These Simple Tips To Overcome it.
Does Anybody Ever Wonder Why He’s My Everything?
Ladies, Why Should You Treat Your Face Like The Coloring Books?
Listen, Girl, Aren’t You Curious How The Last Line Could Be This Hurtful?
The girl name “Rich”
She Lost Her Beloved Mother, But Why She Asserted that Loss Was Not Just A Loss sometimes?
You Used To Try This Lonely. Have You Ever Imagine The flavor You Tried To Eat it with Your Lover?
If You Have Siblings, You Won’t Comprehend this. Have You Ever Wonder How A Child Feels Like? This Is How It Perceives.
Is It Okay To Help A Stranger?
The Nightmare Was Always Considered A Bad Omen, But It Turned Incredible Differently.
If You’re A Woman Or Girl Who Loves To Wear Lipstick, Read This Poetry.
She Wants To Spread This Poetry For Every Girl Or Woman That Was Born Just Like The Way She Was.
You must check your internet connection.
Alternatively, I'd suggest you run your idle in administrator mode and re-run your code through idle.

Cloud Firestore update method crashes after a few updates

I am using Firebase's CloudFirestore feature for updating the entries in my database. When I run the code it works fine for around 10 seconds, then it crashes with the following error:
File "C:\Users\mypc\AppData\Local\Programs\Python\Python35-32\lib\site-packages\google\api_core\grpc_helpers.py", line 78, in next
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
Here is my implementation that causes the issue:
existing = db.collection(u'pages').where(u'season', u'==', u'summer').get()
for x in existing:
obj = x.to_dict()
doc_ref = db.collection(u'pages').document(u'%s' % (obj['uid'],))
doc_ref.update({u'year': "2019"})
As you can see it's a quite simple function and I have no idea why it crashes if it's working for the first 10 seconds. I am on a paid plan so exceeding the limit can't be the problem. My idea is that I am doing something very wrong and my code causes the error, or it's simply a bug.

Odd TypeError from the airflow scheduler -- has usage of #once for scheduler interval changed in v1.9?

I have a super simple test DAG that looks like this:
from datetime import datetime
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
DAG = DAG(
dag_id='scheduler_test_dag',
start_date=datetime(2017, 9, 9, 4, 0, 0, 0), #..EC2 time. Equal to 11pm hora México
max_active_runs=1,
schedule_interval='#once' #externally triggered
)
def ticker_function():
with open('/tmp/ticker', 'a') as outfile:
outfile.write('{}\n'.format(datetime.now()))
time_ticker = PythonOperator(
task_id='time_ticker',
python_callable=ticker_function,
dag=DAG
)
Since upgrading to apache-airflow v1.9 this DAG is hung and won't run. Digging into the scheduler logs I found the error trace:
[2018-02-12 17:03:06,259] {jobs.py:1754} INFO - DAG(s) dict_keys(['scheduler_test_dag']) retrieved from /home/ubuntu/airflow/dags/scheduler_test_dag.py
[2018-02-12 17:03:06,315] {jobs.py:1386} INFO - Processing scheduler_test_dag
[2018-02-12 17:03:06,320] {jobs.py:379} ERROR - Got an exception! Propagating...
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 371, in helper
pickle_dags)
File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 1792, in process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 1388, in _process_dags
dag_run = self.create_dag_run(dag)
File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 807, in create_dag_run
if next_start <= now:
TypeError: unorderable types: NoneType() <= datetime.datetime()
Where is this error coming from? The only thing that I can think of is that the usage of scheduler_interval='#once' has changed, which is the one thing that this DAG has in common with one other broken DAG on my server since the v1.9 upgrade. Otherwise it's the most basic DAG ever--doesn't seem like there should be a problem. Previously I was using the basic pip install before switching to the apache-airflow repo.
Here's a screenshot of the Web UI. Everything seems to be working alright, except the top and bottom DAGS which have scheduling interval set to #once and are indefinitely hung:
Any thoughts?
Have you defined catch up as True in your airflow.cfg? Then this is fixed in master. Disable catchup for this dag and it should start working.
We had the same issue and fixed it by setting ‘catchup=false’ on the dag object. It should be fixed in master as well by now
This seems to be related with this issue: https://issues.apache.org/jira/browse/AIRFLOW-1977
It is important to mention that this is only an issue while the task does not run at least once. For instance if you configure the task
schedule_interval=None
And manually run it once, it will work as expected. After that you can put #onceas the schedule_interval and the scheduler won't complain again (of course that if you manually trigger a task it defeats the purpose of running automatically once...). I did not test if the same happens to the other tags or not but this could fix this same issue with for example the #dailytag.

Categories

Resources