Github Actions not accessing download from Newspaper3k - python

I've been trying to use Github Actions to run a python script. Everything seems to run fine, except a specific function that uses the Newspaper3k package. The article appears to download fine (article.html works ok), but Article.parse() does not work. This works fine on my local server, but not in Github. Is this related to being able to access file locations, that are different on Github? It's a private repository, in case that makes a difference.
My yaml script is as follows:
build:
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout#v3 # checkout the repository content to github runner.
- name: setup python
uses: actions/setup-python#v4
with:
python-version: '3.10' #install the python needed
cache: 'pip'
- name: install python packages
run: |
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: execute py script # run file
env:
WORDPRESS_USER: ${{ secrets.WORDPRESS_USER }}
WORDPRESS_PASSWORD: ${{ secrets.WORDPRESS_PASSWORD }}
run: |
python main.py
The function in question is provided below:
def generate_article_summary(supplied_links):
summary_list = ""
for news_article in supplied_links[:5]:
try:
url = news_article
article = Article(url, config=config)
article.download()
article.parse()
article.nlp()
except:
summary_list = summary_list + "\n"
pass
summary_list = summary_list + "\n" + article.summary
return summary_list
Any help would be much appreciated.

Related

Github Actions python build error: "AttributeError: __enter__"

I have a webscraper that runs locally, and puts the data in a mysql database hosted on filess.io. Wanted to set up a schedule on github actions to run it consistently, but the build fails here:
try:
with connect(
host=DB_HOST,
user=DB_USER,
password=DB_PASSWORD,
database=DB_DATABASE,
port=DB_PORT
) as connection:
print(connection)
With this error:
0s
Run python main.py
Traceback (most recent call last):
File "myscript.py", line 66, in <module>
with connect(
AttributeError: __enter__
Error: Process completed with exit code 1.
I have secrets set up in github, and the values are pulled into the code in this earlier section, with no errors:
try:
DB_HOST=os.environ["DB_HOST"]
DB_USER=os.environ["DB_USER"]
DB_PASSWORD=os.environ["DB_PASSWORD"]
DB_DATABASE=os.environ["DB_DATABASE"]
DB_PORT=os.environ["DB_PORT"]
This code works perfectly on my local machine, with secrets saved in .env file. I have double- and triple-checked that my secrets are set in github. Am I missing something?
I tried running locally (worked fine), logging the github secrets to verify they were stored correctly (was obscured, so that didn't work). Looked up the enter error, and it means some attribute has an error, but I can't figure out which.
Main point of confusion: it works locally. This leads me to believe it's an error with my github setup. Any ideas what's going on?
EDIT: adding github actions workflow code below:
name: Manual workflow
on: [workflow_dispatch]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout#v2 # checkout the repository content to github runner
- name: setup python
uses: actions/setup-python#v4
with:
python-version: '3.9' # install the python version needed
- name: install python packages
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: execute py script # run main.py
env:
DB_HOST: ${{ secrets.DB_HOST }}
DB_USER: ${{ secrets.DB_USER }}
DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
DB_DATABASE: ${{ secrets.DB_DATABASE }}
DB_PORT: ${{ secrets.DB_PORT }}
run: python main.py
After much digging, my requirements.txt had mysql-connector listed, which is deprecated. My local system had mysql-connector-python installed and was using that. Not sure how the requirements.txt file added the wrong one. Adding mysql-connector-python to the requirements.txt fixed this particular bug.
Thanks to #Azeem for your debugging help!

Github action to execute a Python script that create a file, then commit and push this file

My repo contains a main.py that generates a html map and save results in a csv. I want the action to:
execute the python script (-> this seems to be ok)
that the file generated would then be in the repo, hence having the file generated to be added, commited and pushed to the main branch to be available in the page associated with the repo.
name: refresh map
on:
schedule:
- cron: "30 11 * * *" #runs at 11:30 UTC everyday
jobs:
getdataandrefreshmap:
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout#v3 # checkout the repository content to github runner.
- name: setup python
uses: actions/setup-python#v4
with:
python-version: 3.8 #install the python needed
- name: Install dependencies
run: |
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: execute py script
uses: actions/checkout#v3
run: |
python main.py
git config user.name github-actions
git config user.email github-actions#github.com
git add .
git commit -m "crongenerated"
git push
The github-action does not pass when I include the 2nd uses: actions/checkout#v3 and the git commands.
Thanks in advance for your help
If you want to run a script, then you don't need an additional checkout step for that. There is a difference between steps that use workflows and those that execute shell scripts directly. You can read more about it here.
In your configuration file, you kind of mix the two in the last step. You don't need an additional checkout step because the repo from the first step is still checked out. So you can just use the following workflow:
name: refresh map
on:
schedule:
- cron: "30 11 * * *" #runs at 11:30 UTC everyday
jobs:
getdataandrefreshmap:
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout#v3 # checkout the repository content to github runner.
- name: setup python
uses: actions/setup-python#v4
with:
python-version: 3.8 #install the python needed
- name: Install dependencies
run: |
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: execute py script
run: |
python main.py
git config user.name github-actions
git config user.email github-actions#github.com
git add .
git commit -m "crongenerated"
git push
I tested it with a dummy repo and everything worked.

Node JS runtime but requires python

I am utilizing https://www.npmjs.com/package/youtube-dl-exec through a simple JS lambda function on an AWS lambda (node 14).
The code is pretty simple and gathers info as per the URL given (and supported by YTDL). I have done testing with jest and it works well on my local where python 2.7 is installed.
My package.json dependencies look like
"dependencies": {
"youtube-dl": "^3.5.0",
"youtube-dl-exec": "^1.2.0"
},
"devDependencies": {
"jest": "^26.6.3"
}
I am using github action to deploy the code on push to master using main.yml file:
name: Deploy to AWS lambda
on: [push]
jobs:
deploy_source:
name: build and deploy lambda
strategy:
matrix:
node-version: [14.x]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v1
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node#v1
with:
node-version: ${{ matrix.node-version }}
- name: npm install and build
run: |
npm ci
npm run build --if-present
env:
CI: true
- uses: actions/setup-python#v2
with:
python-version: '3.x' # Version range or exact version of a Python version to use, using SemVer's version range syntax
architecture: 'x64' # optional x64 or x86. Defaults to x64 if not specified
- name: zip
uses: montudor/action-zip#v0.1.0
with:
args: zip -qq -r ./bundle.zip ./
- name: default deploy
uses: appleboy/lambda-action#master
with:
aws_access_key_id: ${{ secrets.AWS_EEEEE_ID }}
aws_secret_access_key: ${{ secrets.AWS_EEEEE_KEY }}
aws_region: us-EEEEE
function_name: DownloadEEEEE
zip_file: bundle.zip
I am getting a
INFO Error: Command failed with exit code 127: /var/task/node_modules/youtube-dl-exec/bin/youtube-dl https://www.EXQEEEE.com/p/XCCRXqXInEEZ4W4 --dump-json --no-warnings --no-call-home --no-check-certificate --prefer-free-formats --youtube-skip-dash-manifest
/usr/bin/env: python: No such file or directory
at makeError (/var/task/node_modules/execa/lib/error.js:59:11)
at handlePromise (/var/task/node_modules/execa/index.js:114:26)
at processTicksAndRejections (internal/process/task_queues.js:93:5) {
shortMessage: 'Command failed with exit code 127: /var/task/node_modules/youtube-dl-exec/bin/youtube-dl https://www.instagram.com/p/CCRq_InFZ44 --dump-json --no-warnings --no-call-home --no-check-certificate --prefer-free-formats --youtube-skip-dash-manifest',
command: '/var/task/node_modules/youtube-dl-exec/bin/youtube-dl https://www.EXQEEEE.com/p/XCCRXqXInEEZ4W4 --dump-json --no-warnings --no-call-home --no-check-certificate --prefer-free-formats --youtube-skip-dash-manifest',
exitCode: 127,
signal: undefined,
signalDescription: undefined,
stdout: '',
stderr: '/usr/bin/env: python: No such file or directory',
failed: true,
timedOut: false,
isCanceled: false,
killed: false
}
error.
I have tried adding a lambda layer, adding python in the main.yml file, and also installing through dependency but perhaps I am doing something wrong so that the library is not able to find python at /usr/bin/env.
How do I make python be available in that path?
Should I not use ubuntu-latest on lambda config (main.yml) since it doesn't come packed with python by default?
Any help would be appreciated.
Note: I have obfuscated the URLs for privacy purposes.
The new nodejs10.x Lambda runtime does not contain python anymore, and therefore, youtube-dl does not work

How to access environment secrets from a Github workflow?

I am trying to publish a Python package to PyPI, from a Github workflow, but the authentication fails for "Test PyPI". I successfully published to Test PyPI from the command line, so my API token must be correct. I also checked for leading and trailing spaces in the secret value (i.e., on GitHub).
As the last commits show, I tried a few things without success.
I first tried to inline simple bash commands into the workflow as follows, but I have not been able to get my secrets into environment variables. Nothing showed up in the logs when I printed these variables.
- name: Publish on Test PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TEST_TOKEN }}
TWINE_REPOSITORY_URL: "https://test.pypi.org/legacy/"
run: |
echo "$TWINE_PASSWORD"
pip install twine
twine check dist/*
twine upload dist/*
I also tried to use a dedicated GitHub Action as follows, but it does not work either. I guess the problem comes from the secrets not being available in my workflow. What puzzled me is that my workflow uses another token/secret just fine! Though, if I put it in an environment variable, nothing is printed out. I also recreated my secrets under different names (PYPI_TEST_TOKEN and TEST_PYPI_API_TOKEN) but to no avail.
- name: Publish to Test PyPI
uses: pypa/gh-action-pypi-publish#release/v1
with:
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
I guess I miss something obvious (as usual). Any help is highly appreciated.
I eventually figured it out. My mistake was that I defined my secrets within an environment and, by default, workflows do not run in any specific environment. For this to happen, I have to explicitly name the environment in the job description as follows:
jobs:
publish:
environment: CI # <--- /!\ Here is the link to the environment
needs: build
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/checkout#v2
# Some more steps here ...
- name: Publish to Test PyPI
env:
TWINE_USERNAME: "__token__"
TWINE_PASSWORD: ${{ secrets.TEST_PYPI_API_TOKEN }}
TWINE_REPOSITORY_URL: "https://test.pypi.org/legacy/"
run: |
echo KEY: '${TWINE_PASSWORD}'
twine check dist/*
twine upload --verbose --skip-existing dist/*
The documentation mentions it actually.
Thanks to those who commented for pointing me in the right direction.
This is the problem I struggled with, since I am working with multiple environments and they all share same named secrets with different values the following solution worked for me. Isolated pieces are described here and there, but it wasn't obvious how to piece it together.
At first I define that environment is selected during workflow_dispatch event:
on:
workflow_dispatch:
inputs:
environment:
type: choice
description: Select the environment
required: true
options:
- TEST
- UAT
I then reference it in jobs context:
jobs:
run-portal-tests:
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment }}
Finally to be used in the step I need them in:
- name: Run tests
env:
ENDPOINT: ${{ secrets.ENDPOINT }}
TEST_USER: ${{ secrets.TEST_USER }}
TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}
CLIENT_ID: ${{ secrets.CLIENT_ID }}
CLIENT_SECRET: ${{ secrets.CLIENT_SECRET }}
run: python3 main.py

FileNotFoundError: Github Actions Workflow fails when creating directory or file during test

I am using Github Python application workflow for CI. My application creates a folder to store temporary files. It works perfectly when testing on localhost but it will not let me create a new directory in Github actions. I get the below error:
#classmethod
def save_files(cls, files: list) -> str:
"""
saves a list of files in the "files"
folder in app
:param files: list of FileStorage objects
:return: directory name where files saved
"""
folder = time.strftime("%Y%m%d-%H%M%S")
folder_path = Path(__file__).parent / "files" / folder
os.mkdir(folder_path)
E FileNotFoundError: [Errno 2] No such file or directory: /home/runner/work/DocumentAnalysisTool/DocumentAnalysisTool/app/files/20200430-235749
Here is my workflow pythonapp.yml file:
name: Python application
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- name: Set up Python 3.8
uses: actions/setup-python#v1
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pip install pytest
pytest
Thank you in advance

Categories

Resources