Problem passing markdown template via gitlab projects api in a python script - python

I am trying to use the gitlab projects api to edit multiple project MR templates. The problem is, that it only sends the first line of the markdown template.
While messing around with the script, I was toying with converting it to html when I found that it sent the whole template when converted to html.
I am probably missing something super simple but for the life of me, I cant figure out why it would be able to send the entire template in html but only send the first line of it natively in markdown.
I have been searching for a solution for a bit now so I apologize if my googlefu missed an obvious answer here.
Here is the script...
#! /usr/bin/env python3
import argparse
import requests
gitlab_addr = "https://gitlab.com/api/v4"
# Insert your project IDs into the array below.
project_IDs = [xxxx, yyyy, zzzz]
# Insert your MR template info below.
with open('/.gitlab/merge_request_templates/DefaultMRTemplate.md', 'r') as file:
MR_template = file.read()
#print(MR_template)
def getArgs():
parser = argparse.ArgumentParser(
description='This tool updates the default template for a single '
'or multiple program\'s MRs. \n\nYou will need to edit '
'the script to input your MR template and projects IDs.'
'\nYou will also need to pass in your API Token via '
' command line.\n\nYou want to see "200 OK" on the '
' command line as confirmation.',
formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument("token", type=str,
help="API Token. Create one at User Settings / Access Tokens")
return parser.parse_args()
def ChangeTemplate():
token = getArgs().token
headers = {"PRIVATE-TOKEN": token, }
for x in project_IDs:
addr = f"{gitlab_addr}/projects/{x}/?merge_requests_template={MR_template}"
response = requests.put(addr, headers=headers)
# You want to see "200 OK" on the command line.
print(response.status_code, response.reason)
def main():
ChangeTemplate()
if __name__ == '__main__':
main()
Here is a sample template...
See guidance here: https://example.com/Gitlab+MR+Guide
## Description
%% Put a description here %%
%% Add an issue link here %%
## Tests
%% Include test listing here %%
## Checklists
**Author Checklist**
- [ ] A: Did you fill out the description, add an issue link (in title or desc) and fill out the test section?
- [ ] A: Add a peer to the MR
**Assignee 1 Checklist:**
- [ ] P: Verify the description field is filled out, issue link is included (in title or desc) and the test section is filled out
- [ ] P: Add a code owner to the MR
**Assignee 2 (Code Owner) Checklist:**
- [ ] O: Verify the description field is filled out, issue link is included (in title or desc) and the test section is filled out
- [ ] O: Verify unit test coverage is at least 40% line coverage with a goal of 90%
problem output...
See guidance here: https://example.com/Gitlab MR Guide

Your data needs to be properly encoded in the request. Trying to format the literal contents of the file into the query string won't work here.
Use the data keyword argument to requests.put, which will pass the data in the request body (or use params to set query params). requests will handle the proper encoding of the data.
addr = f"{gitlab_addr}/projects/{x}/"
payload = {'merge_requests_template': MR_template}
response = requests.put(addr, headers=headers, data=payload)
# or params=payload to use query string

Related

I am looking to create an API endpoint route that returns txt in a json format -Python

I'm new to developing and my question(s) involves creating an API endpoint in our route. The api will be used for a POST from a Vuetify UI. Data will come from our MongoDB. We will be getting a .txt file for our shell script but it will have to POST as a JSON. I think these are the steps for converting the text file:
1)create a list for the lines of the .txt
2)add each line to the list
3) join the list elements into a string
4)create a dictionary with the file/file content and convert it to JSON
This is my current code for the steps:
import json
something.txt: an example of the shell script ###
f = open("something.txt")
create a list to put the lines of the file in
file_output = []
add each line of the file to the list
for line in f:
file_output.append(line)
mashes all of the list elements together into one string
fileoutput2 = ''.join(file_output)
print(fileoutput2)
create a dict with file and file content and then convert to JSON
json_object = {"file": fileoutput2}
json_response = json.dumps(json_object)
print(json_response)
{"file": "Hello\n\nSomething\n\nGoodbye"}
I have the following code for my baseline below that I execute on my button press in the UI
#bp_customer.route('/install-setup/<string:customer_id>', methods=['POST'])
def install_setup(customer_id):
cust = Customer()
customer = cust.get_customer(customer_id)
### example of a series of lines with newline character between them.
script_string = "Beginning\nof\nscript\n"
json_object = {"file": script_string}
json_response = json.dumps(json_object)
get the install shell script content
replace the values (somebody has already done this)
attempt to return the below example json_response
return make_response(jsonify(json_response), 200)
my current Vuetify button press code is here: so I just have to ammend it to a POST and the new route once this is established
onClickScript() {
console.log("clicked");
axios
.get("https://sword-gc-eadsusl5rq-uc.a.run.app/install-setup/")
.then((resp) => {
console.log("resp: ", resp.data);
this.scriptData = resp.data;
});
},
I'm having a hard time combining these 2 concepts in the correct way. Any input as to whether I'm on the right path? Insight from anyone who's much more experienced than me?
You're on the right path, but needlessly complicating things a bit. For example, the first bit could be just:
import json
with open("something.txt") as f:
json_response = json.dumps({'file': f.read()})
print(json_response)
And since you're looking to pass everything through jsonify anyway, even this would suffice:
with open("something.txt") as f:
data = {'file': f.read()}
Where you can pass data directly through jsonify. The rest of it isn't sufficiently complete to offer any concrete comments, but the basic idea is OK.
If you have a working whole, you could go to https://codereview.stackexchange.com/ to ask for some reviews, you should limit questions on StackOverflow to actual questions about getting something to work.

Using python and suds, data not read by server side because element is not defined as an array

I am a very inexperienced programmer with no formal education. Details will be extremely helpful in any responses.
I have made several basic python scripts to call SOAP APIs, but I am running into an issue with a specific API function that has an embedded array.
Here is a sample excerpt from a working XML format to show nested data:
<bomData xsi:type="urn:inputBOM" SOAP-ENC:arrayType="urn:bomItem[]">
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
</bomData>
I have tried 3 different things to get this to work to no avail.
I can generate the near exact XML from my script, but a key attribute missing is the 'SOAP-ENC:arrayType="urn:bomItem[]"' in the above XML example.
Option 1 was using MessagePlugin, but I get an error because my section is like the 3 element and it always injects into the first element. I have tried body[2], but this throws an error.
Option 2 I am trying to create the object(?). I read a lot of stack overflow, but I might be missing something for this.
Option 3 looked simple enough, but also failed. I tried setting the values in the JSON directly. I got these examples by an XML sample to JSON.
I have also done a several other minor things to try to get it working, but not worth mentioning. Although, if there is a way to somehow do the following, then I'm all ears:
bomItem[]: bomData = {"bomItem"[{...,...,...}]}
Here is a sample of my script:
# for python 3
# using pip install suds-py3
from suds.client import Client
from suds.plugin import MessagePlugin
# Config
#option 1: trying to set it as an array using plugin
class MyPlugin(MessagePlugin):
def marshalled(self, context):
body = context.envelope.getChild('Body')
bomItem = body[0]
bomItem.set('SOAP-ENC:arrayType', 'urn:bomItem[]')
URL = "http://localhost/application/soap?wsdl"
client = Client(URL, plugins=[MyPlugin()])
transact_info = {
"username":"",
"transaction":"",
"workorder":"",
"serial":"",
"trans_qty":"",
"seqnum":"",
"opcode":"",
"warehouseloc":"",
"warehousebin":"",
"machine_id":"",
"comment":"",
"defect_code":""
}
#WIP - trying to get bomData below working first
inputData = {
"dataItem":[
{
"fieldname": "",
"fielddata": ""
}
]
}
#option 2: trying to create the element here and define as an array
#inputbom = client.factory.create('ns3:inputBOM')
#inputbom._type = "SOAP-ENC:arrayType"
#inputbom.value = "urn:bomItem[]"
bomData = {
#Option 3: trying to set the time and array type in JSON
#"#xsi:type":"urn:inputBOM",
#"#SOAP-ENC:arrayType":"urn:bomItem[]",
"bomItem":[
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
},
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
}
]
}
try:
response = client.service.transactUnit(transact_info,inputData,bomData)
print("RESPONSE: ")
print(response)
#print(client)
#print(envelope)
except Exception as e:
#handle error here
print(e)
I appreciate any help and hope it is easy to solve.
I have found the answer I was looking for. At least a working solution.
In any case, option 1 worked out. I read up on it at the following link:
https://suds-py3.readthedocs.io/en/latest/
You can review at the '!MessagePlugin' section.
I found a solution to get message plugin working from the following post:
unmarshalling Error: For input string: ""
A user posted an example how to crawl through the XML structure and modify it.
Here is my modified example to get my script working:
#Using MessagePlugin to modify elements before sending to server
class MyPlugin(MessagePlugin):
# created method that could be reused to modify sections with similar
# structure/requirements
def addArrayType(self, dataType, arrayType, transactUnit):
# this is the code that is key to crawling through the XML - I get
# the child of each parent element until I am at the right level for
# modification
data = transactUnit.getChild(dataType)
if data:
data.set('SOAP-ENC:arrayType', arrayType)
def marshalled(self, context):
# Alter the envelope so that the xsd namespace is allowed
context.envelope.nsprefixes['xsd'] = 'http://www.w3.org/2001/XMLSchema'
body = context.envelope.getChild('Body')
transactUnit = body.getChild("transactUnit")
if transactUnit:
self.addArrayType('inputData', 'urn:dataItem[]', transactUnit)
self.addArrayType('bomData', 'urn:bomItem[]', transactUnit)

Google Document Ai giving different outputs for the same file

I was using Document OCR API to extract text from a pdf file, but part of it is not accurate. I found that the reason may be due to the existence of some Chinese characters.
The following is a made-up example in which I cropped part of the region that the extracted text is wrong and add some Chinese characters to reproduce the problem.
When I use the website version, I cannot get the Chinese characters but the remaining characters are correct.
When I use Python to extract the text, I can get the Chinese characters correctly but part of the remaining characters are wrong.
The actual string that I got.
Are the versions of Document AI in the website and API different? How can I get all the characters correctly?
Update:
When I print the detected_languages (don't know why for lines = page.lines, the detected_languages for both lines are empty list, need to change to page.blocks or page.paragraphs first) after printing the text, I get the following output.
Code:
from google.cloud import documentai_v1beta3 as documentai
project_id= 'secret-medium-xxxxxx'
location = 'us' # Format is 'us' or 'eu'
processor_id = 'abcdefg123456' # Create processor in Cloud Console
opts = {}
if location == "eu":
opts = {"api_endpoint": "eu-documentai.googleapis.com"}
client = documentai.DocumentProcessorServiceClient(client_options=opts)
def get_text(doc_element: dict, document: dict):
"""
Document AI identifies form fields by their offsets
in document text. This function converts offsets
to text snippets.
"""
response = ""
# If a text segment spans several lines, it will
# be stored in different text segments.
for segment in doc_element.text_anchor.text_segments:
start_index = (
int(segment.start_index)
if segment in doc_element.text_anchor.text_segments
else 0
)
end_index = int(segment.end_index)
response += document.text[start_index:end_index]
return response
def get_lines_of_text(file_path: str, location: str = location, processor_id: str = processor_id, project_id: str = project_id):
# You must set the api_endpoint if you use a location other than 'us', e.g.:
# opts = {}
# if location == "eu":
# opts = {"api_endpoint": "eu-documentai.googleapis.com"}
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
document = {"content": image_content, "mime_type": "application/pdf"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
document = result.document
document_pages = document.pages
response_text = []
# For a full list of Document object attributes, please reference this page: https://googleapis.dev/python/documentai/latest/_modules/google/cloud/documentai_v1beta3/types/document.html#Document
# Read the text recognition output from the processor
print("The document contains the following paragraphs:")
for page in document_pages:
lines = page.blocks
for line in lines:
block_text = get_text(line.layout, document)
confidence = line.layout.confidence
response_text.append((block_text[:-1] if block_text[-1:] == '\n' else block_text, confidence))
print(f"Text: {block_text}")
print("Detected Language", line.detected_languages)
return response_text
if __name__ == '__main__':
print(get_lines_of_text('/pdf path'))
It seems the language code is wrong, will this affect the result?
Posting this Community Wiki for better visibility.
One of features of DocumentAI is OCR - Optical Character Recognition which allows recognizing text from various files.
OP in this scenario received difference outputs using Try it function and Client Libraries - Python.
Why are there discrepancies between Try it and Python library?
It's hard to say as both methods use the same API documentai_v1beta3. It might be related to some files modifications when pdf is uploading to Try it Demo, different endpoints, language alphabet recognition or some random stuff.
When you are using Python Client you also get accuracy % of text identification. Below examples from my testes:
However, OP's identification is about 0,73 so it might get wrong results and in this situation is a visible issue. I guess it cannot be anyhow improved using code. Maybe if there would be different quality of PDF (in shown OPs example there are some dots which might affect identification).

Run pythod code with Django and produce output on web

I know this could be a repeated question for many of you but I have not been able to find a proper answer for this yet. I am a beginner to Django and Python. I have a python code which runs and produce output on cli at present but I want the same program to run its output on web.
I read that for web django is best suitable framework and for this purpose I started to study django. I see in every tutorial people have discussed apps, views urls etc but not seen an example which integrate a python code with django.
All I am looking for to understand how can I integrate my python script with Django and where do I place my code in Django project or app. Should I import it within views? if yes, then how to present my output to web.
Here is the sample code I am running, it basically opens two files and run some regex to extract the desired information.
import re
def vipPoolFileOpen(): # function opens vip and pool config file and store them to vip_config and pool_config variables
with open("pool_config.txt",'rb') as pool_config:
pool_config = pool_config.read()
pool_config = pool_config.split('ltm')
with open("vip_config.txt",'rb') as vip_config:
vip_config = vip_config.read()
vip_config = vip_config.split('ltm')
return vip_config,pool_config
def findWidth(vip_config): # function to find the maximum length of vip in entire file, this will be used to adjust column space
colWidth=0
for item in vip_config:
i=0
if colWidth<len(item):
while i<len(vip_config)-1:
if len(item)>=len(vip_config[i+1]):
colWidth=len(item)
i=i+1
else:
i+=1
continue
return colWidth
def regexFunction():
vip_config, pool_config = vipPoolFileOpen()
findWidth(vip_config)
for vip in vip_config:
regVip = re.compile(r'pool (.+)\r')
poolByVip = regVip.findall(vip) # poolByVip holds pool name from the vip_config file
for poolblock in pool_config:
regPool = re.compile(r'pool (.+) {')
poolByConfig = regPool.findall(poolblock)
if poolByVip == poolByConfig:
print vip + poolblock
break
elif poolByVip == ['none']:
print vip
break
else:
continue
Yes, you should present your output to the web via view. You need to write a view function (or a class view) in views.py and provide an url where you want to have it in urls.py
If you rewrite your function to return desired result instead of printing it, tou can do the following:
write this in views.py
from django.http import HttpResponse
from wherever_you_have_it import regexFunction
def bar(request):
result = regexFunction() # result should be a string
return HttpResponse(result)
and in urls.py:
from .views import bar
urlpatterns = [
url(r'^foo$', bar),
]
Providing of course you have created your Django app at the first place.
Your result should be displayed as plain text on address localhost:8000/foo - but you need to:
python menage.py runserver
In your terminal first
And of course feel free to look at:
https://github.com/Ergaro/CheckMyChords
to see how a simple django app looks like

Extracting only the body of email using Python

I am using Python to read the Enron email dataset. I have the emails in text files. I would like to read the text files and extract only the "Body" section of each email. I am not concerned about any other FROM, TO, BCC, attachments, DATE, etc. I only want the BODY section and would like to store it in a list. I tried to use the get_payload() function, but it still prints everything. How do I skip the other content and use only the Body section?
import email.parser
from email.parser import Parser
# Code to extract a particular section from raw emails.
parser = Parser()
text1 = open("path of the file", "r").read()
msg = email.message_from_string(text1)
email = parser.parsestr(text1)
if msg.is_multipart():
for payload in msg.get_payload():
print payload.get_payload()
else:
print msg.get_payload()
One file may contain multiple emails. Sample emails.
docID: 1
segmentNumber: 0
Body: I just checked with Carolyn on your invoicing for the conference. She
verified the 85K was processed.
##########################################################
docID: 2
segmentNumber: 0
Body: null
##########################################################
docID: 3
segmentNumber: 0
Body: In regard to the costs for the GAM conference, Karen told me the $ 6,695.97
figure was inclusive of all the items for the conference. However, after
speaking with Shweta, I found out this is not the case. The CDs are not
included in this figure.
The CD cost will be $2,011.50 + the cost of postage/handling (which is
currently being tabulated).
##########################################################
docID: 3
segmentNumber: 1
Body:
This is the original quote for this project and it did not include the
postage. As soon as I have the details from the vendor, I'll forward those to
you.
Please call me if you have any questions.
Assuming all your files has the format specified in your example, this might work:
email_body_list = [ email.split('Body:')[-1] for email in file_content.split('##########################################################')]

Categories

Resources