Im accessing the content of a file using GitLab's File Repository API, and all seems to be going fine until I hit one specific file (any text file).
{:file_name=>"requirements.txt", :file_path=>"requirements.txt", :size=>30, :encoding=>"base64", :content=>"ImZsYXNrIgoidXdzZ2kiCiJnb2dvc2VjcmV0cyIK", :ref=>"master", :blob_id=>"7f551dddd4fd8931419450fb357bad83d2efbe6a", :commit_id=>"40e161fcb323be28259712a6cf5da8fddfda80e1", :last_commit_id=>"40e161fcb323be28259712a6cf5da8fddfda80e1"}
I have never seen colons before a key, and have never seen a '=>' in a JSON object. All other JSON objects returned have been fine. The request being made to the API isn't wrong because a 200 response is being returned.
What is this??
This appears to be an open bug in Gitlab. See https://gitlab.com/gitlab-org/gitlab-ee/issues/2298.
What is the current bug behavior?
Response:
{:file_path=>"Readme.txt", :branch=>"master"}
What is the expected correct behavior?
Response:
{"file_name": "Readme.txt", "branch": "master"}
The issue is still opened on gitlab (issue 2298)
I encountered the same problem.
I confirm that only .txt files are affected. Files with unknown extensions for Git are not affected (e.g. : .myFile, .unknowExtension ...)
This can be detected with the Content-Type header, set as text/plain when Gitlab sent response in ruby hash.
Here is a simple function to parse ruby hash in PHP.
function ruby_hash_decode($ruby)
{
if (substr( $ruby, 0, 2 ) === "{:") {
$ruby = str_replace('{:', '{"', $ruby);
$ruby = str_replace('", :', '", "', $ruby);
$ruby = str_replace(', :', ', "', $ruby);
$ruby = str_replace('=>"', '":"', $ruby);
$ruby = str_replace('=>', '":', $ruby);
$ruby = \json_decode($ruby);
}
return $ruby;
}
Make a Ruby hash to Python dict or json converter. Note, the below solution is a quick n' dirty one I made in 2 minutes. You can do much better if you use regex.
s = '{:file_path=>"Readme.txt", :branch=>"master"}'
s = s[1:-1]
s = s.split(',')
s = [elem.strip() for elem in s]
s = [subs[1:] for subs in s]
s = [subs.replace('"', '') for subs in s]
my_dict = {elem.split('=>')[0]: elem.split('=>')[1] for elem in s}
my_dict looks like this now: {'file_path': 'Readme.txt', 'branch': 'master'}
Related
EDIT 3:
So the problem may likely be in the set-up and configuration of my Lambda Layer Dependencies. I have a /bin directory containing 3 files:
lambdazip.sh
pdftk
libgcj.so.10
pdftk is a pdf library, and libgcj is a dependency for PDFtk.
lambdazip.sh seems to set & modify PATH Variables.
I have tried uploading all 3 as 1 lambda layer.
I have tried uploading all 3 as 3 separate lambda layers.
I have not tried customizing the .zip file names, I know sometimes the Lambda Layer wants you to name the .zip file a specific name dependent on the language.
I have not tried customizing the "compatible architectures" & "compatible runtime" lambda layer settings.
EDIT 2:
I tried renaming the Lambda Layer as Python.zip because I heard that sometimes you need a specific naming convention for the Lambda Layer to work correctly. This also failed & produced the same error.
EDIT:
I have tried pulling the .py files out of the /surveys directory, so when they are zipped, they are in the root folder, but I still receive the same error: Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'surveys
Which files do I need to zip? Do I need to move certain files to the root?
I learned that I had accidentally zipped the directory which commonly caused this error.
I needed to zip the contents of the directory, which is a common solution.
Unfortunately this did not work for me.
I have a Lambda Function, and the code I have uploaded is a zipped folder of my /Archive directory.
From what I understand, many of the people who run into this "[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function':" have issues because of their Lambda Handler.
My Lambda handler is: lambda_function.lambda_handler so this doesn't appear to be my issue.
Another common problem I've noticed on Stackoverflow, appears to be with how people are compressing & zipping the files they upload to the Lambda Function.
Do I need to move my lambda_function.py? Sometimes this CloudWatch error occurs because the lambda_function.py is not in the ROOT directory.
Does my survey directory need to move?
I think the folders & directories I have here may be causing my issue.
Do I need to zip the directories individually?
Can I resolve this error by Zipping the entire project?
For more information, I also have a Lambda Layer for PDF Toolkit, called pyPDFtk in the codebase. In that Lambda layer is a zipped /bin with binaries inside.
If there is anything I can alter/change within my code or AWS configuration, please let me know, and I can return new CloudWatch error logs for you.
lambda_function.py
"""
cl_boost-pdfgen manages form to
pdf merge and mail
"""
import json, base64
import os, sys
from string import Template
from boost import PageCalc, AwsWrapper
from boost.tools import helper
from boost.surveys import ALLOWED_SURVEYS
os.environ['LAMBDA_TASK_ROOT'] = os.environ['LAMBDA_TASK_ROOT'] if 'LAMBDA_TASK_ROOT' in os.environ else '/usr/local'
os.environ['PDFTK_PATH'] = os.environ['LAMBDA_TASK_ROOT'] + '/bin/pdftk'
os.environ['LD_LIBRARY_PATH'] = os.environ['LAMBDA_TASK_ROOT'] + '/bin'
# must import after setting env vars for pdftk
import pypdftk
# Constants
BUCKET_NAME = os.environ['BUCKET_NAME'] if 'BUCKET_NAME' in os.environ else 'cl-boost-us-east-1-np'
RAW_MESSAGE = Template(b'''From: ${from}
To: ${to}
Subject: MySteadyMind Survey results for ${subjname}
MIME-Version: 1.0
Content-type: Multipart/Mixed; boundary = "NextPart"
--NextPart
Content-Type: multipart/alternative; boundary="AlternativeBoundaryString"
--AlternativeBoundaryString
Content-Type: text/plain;charset="utf-8"
Content-Transfer-Encoding: quoted-printable
See attachment for MySteadyMind report on ${subjname}
--AlternativeBoundaryString
Content-Type: text/html;charset="utf-8"
Content-Transfer-Encoding: quoted-printable
<html>
<body>=0D
<p>See attachment for MySteadyMind Report on </b> ${subjname} </b>.</p>=0D
</body>=0D
</html>=0D
--AlternativeBoundaryString--
--NextPart
Content-type: application / pdf
Content-Type: application/pdf;name="${filename}"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;filename="${filename}"
${pdfdata}
--NextPart--''')
EMAIL_TAKER = True
#DEFAULT_EMAIL = os.environ['DEFAULT_EMAIL'] if 'DEFAULT_EMAIL' in os.environ else 'support#mysteadymind.com'
DEFAULT_EMAIL = os.environ['DEFAULT_EMAIL'] if 'DEFAULT_EMAIL' in os.environ else 'marshall#volociti.com'
SUBJECT = 'Evaluation for %s'
NAME_PATH = ['Entry', 'Name']
#EXTRA_EMAILS = os.environ['EXTRA_EMAILS'].split(",") if 'EXTRA_EMAILS' in os.environ else ['seth#mysteadymind.com']
EXTRA_EMAILS = os.environ['EXTRA_EMAILS'].split(",") if 'EXTRA_EMAILS' in os.environ else ['marshall#volociti.com']
# Lambda response
def respond(err, res=None):
"""
parameters are expected to either be
None or valid JSON ready to go.
:param err:
:param res:
:return:
"""
return {
'statusCode': '400' if err else '200',
'body': err if err else res,
'headers': {
'Content-Type': 'application/json',
},
}
def check_basic_auth(headers):
"""
pull out the auth header and validate.
:param headers:
:return:
# Retrieve values from env
vid = os.environ['uid']
vpw = os.environ['pwd']
encoded = "Basic " + base64.b64encode("%s:%s" % (vid,vpw))
# compare
return headers['Authorization'] == encoded
"""
return True
def lambda_handler(event, context):
"""
receive JSON, produce PDF, save to S3,
email via SES.... bring it!
"""
err = None
rsp = None
#Have to none out addresses for future lambda runs to not cause issues with appending.
ADDRESSES = None
ADDRESSES = {'from': "marshall#volociti.com",
'to': [DEFAULT_EMAIL] + EXTRA_EMAILS}
"""
ADDRESSES = {'from': "support#mysteadymind.com",
'to': [DEFAULT_EMAIL] + EXTRA_EMAILS}
"""
# check auth
if not check_basic_auth(event['headers']):
print ("Failed to authenticate")
return False
# get dataq
data = json.loads(event['body'])
# make sure its legit
if (data['Form']['InternalName'] not in ALLOWED_SURVEYS):
return False
# read in template and prep survey type and scoreit
pcalc = PageCalc(data, data['Form']['InternalName'])
pcalc.score()
pcalc.flat['Name'] = data['Section']['FirstName'] + \
" " + data['Section']['LastName']
# output pdf to temp space
# baseName = str(data['Entry']['Number']) + "-" + pcalc.survey + "-" + \
# data['Section']['LastName'].replace(' ','') + ".pdf"
baseName = str(data['Entry']['Number']) + "-MySteadyMind-" + \
data['Section']['LastName'].replace(' ','') + ".pdf"
filename = "/tmp/" + baseName
pypdftk.fill_form(pcalc.pdf_path, pcalc.flat, out_file=filename)
# -- Post Processing after PDF Generation -- #
# fetch the client wrapper(s)
aws = AwsWrapper()
# get PDF data and prep for email
try:
# save the pdf to S3
print("save %s to S3" % filename)
aws.save_file(BUCKET_NAME, pcalc.survey,filename)
# read in the pdf file data and
# base64 encode it
buff = open(filename, "rb")
pdfdata = base64.b64encode(buff.read())
buff.close()
ADDRESSES['to'].append(data['Section']['Email']) if EMAIL_TAKER else None
# gather data needed for email body
data = {"from": ADDRESSES['from'],
"to": ', '.join(ADDRESSES['to']),
"subjname": pcalc.flat["Name"],
"filename": baseName,
"pdfdata": pdfdata
}
print("sending email via SES to: %s" % ', '.join(ADDRESSES['to']))
# build MMM email and send via SES
response = aws.send_raw_mail(ADDRESSES['from'],
ADDRESSES['to'],
RAW_MESSAGE.substitute(data))
# send JSON response
rsp = '{"Code": 200, "Message": "%s"}' % response
except Exception as ex:
# error trap for all occassions
errmsg = "Exception Caught: %s" % ex
# notify local log
print(errmsg)
# and lambda response
err = '{"Code":500, "Message":"%s"}' % errmsg
# done
return respond(err, rsp)
if __name__ == '__main__':
# use this option to manually generate from raw csv of cognitoforms
if len(sys.argv) > 1:
import csv
with open(sys.argv[1], 'rU') as csvfile:
csvreader = csv.DictReader(csvfile, delimiter=',',)
for row in csvreader:
jsondata = helper.create_json(row)
fakeevent = {'body': json.dumps(jsondata), "headers": []}
lambda_handler(fakeevent, None)
# use this option to manually generate from raw json webhook response from cognito in a dir called generated/
else:
rng = range(3,4)
print (rng)
print ("Attempting to parse files: " + str(rng))
for i in rng:
try:
print ('./generated/queue/' + str(i) + '.json')
f = open('./generated/queue/' + str(i) + '.json', 'r')
jsondata = f.read().replace('\n', '')
f.close()
#jdata = json.loads(jsondata)
fakeevent = {'body': jsondata, "headers": []}
lambda_handler(fakeevent, None)
except:
print ("error. file not found: " + str(i))
lambdazip.sh
#!/bin/bash
PYTHON_PATH=$VIRTUAL_ENV/..
BASE_PATH=$PWD
cd $VIRTUAL_ENV/lib/python3.9/site-packages/
zip -x "*pyc" -r9 $BASE_PATH/dist/cl-boost.zip pypdftk*
cd $BASE_PATH
zip -r $BASE_PATH/dist/cl-boost.zip bin
cd $BASE_PATH
zip -x "*pyc" -r9 $BASE_PATH/dist/cl-boost.zip boost* pdf_surveys
I tried to replicate the issue, but it all works as expected. My setup was (Lambda with Python 3.9):
It seems to me that either your directory struct is not what you posted in the question. Similarly your real code that you present in SO could be different.
Considering Marcin could not reproduce the error: This error might be connected to multiple installations and their libraries. If you have the wrong one in your path variables, the correct one cannot be found.
My code was written in Python 2.7, a deprecated language for AWS Lambda.
While updating the code to Python 3.9, I learned that I also need to update my dependencies, binaries & AWS Lambda Layers.
The Binary that I am using for Python PDF Toolkit, or PDFtk, is also out-dated and incompatible.
So I needed to launch a new EC2 instance of CentOS 6 to produce the Binary.
This has come with its own problems, but those details are best for another thread.
Mainly, CentOS 6 is EOL & instructions for producing the binary no longer work.
I need to go change the base url inside the /etc/yum.repos.d/ to access the vault.centos.org
Someone I know wanted a programm that listed all google drive ids of the files in a certain folder. So I looked on the google website , learned a bit on how to do it in python and came up with a programm. The problem is that there seems to be a limit , but one I can't quite understand :
If I use this code :
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v2', http=http)
par = {'maxResults':1000}
results = service.files().list(**par).execute()
items = results.get('items', [])
folders = {}
if not items:
print('No files found.')
else:
print("There is "+str(len(items))+" items")
Where I took the get_credentials() on a sample of the google developper website , the thing is that I know for sure that the drive I am acessing has more than a 1000 files , still the programm tells me There is 460 items , is there any way this is logic that it limits to 460 ?
I don't do Python, but I think you're missing the 'nextPage' logic. See, (at least in Java) the list returns one 'page' of results at a time, so there has to be another loop that retrieves the next page (as long as one is available), like:
void list(String prnId) {
String qryClause = "'me' in owners and '" + prnId + "' in parents";
Drive.Files.List qry = mGOOSvc.files().list()
.setQ("'me' in owners and '" + prnId + "' in parents")
.setFields("items, nextPageToken");
String npTok = null;
do {
FileList gLst = qry.execute();
if (gLst != null) {
// BINGO - file list, but only one page worth
npTok = gLst.getNextPageToken(); //GET POINTER TO THE NEXT PAGE
qry.setPageToken(npTok);
}
} while (npTok != null && npTok.length() > 0);
}
See the 'do while()' loop and the 'BINGO' comment? Try to implement it to your PY code (and don't forget to ask for 'nextPageToken' to get it).
Also, there is a great tool, TryIt! on the bottom of this page, where you can test all kinds of stuff.
Good Luck
I have the same issue, I keep receiving 460 results even if I set maxResults to 1000...
According to the documentation: "maxResults: Maximum number of files to return. Acceptable values are 0 to 1000, inclusive. (Default: 100)"
https://developers.google.com/drive/v2/reference/files/list
I'm trying to set the mac address of the eth0 interface of a system on my cobbler server using the xmlrpcapi.
I can set simple fields like "comment", but I can't seem to set mac address, probably because I don't know the path to refer to. So this works:
handle = server.get_system_handle(system, token)
server.modify_system(handle, 'comment', 'my comment', token)
server.save_system(handle, token)
But if I want to set interfaces['eth0'][mac_address'] what property name do I use?
Found an example in the documentation that happens to show the creation of a new system:
server.modify_system(handle, 'modify_interface', {
'macaddress-eth0': args.mac
}, token)
I'm still not sure of a generic way though to determine what the path is to various properties, just got lucky with this example
I actually had to work around this same issue when developing the prov utility we use internally at Wolfram. I'm not sure why Cobbler's data representation isn't bidirectional. I effectively do the following:
system_name = '(something)' # The name of the system.
system_data = {} # The desired final state of the system data here.
# Pull out the interfaces dictionary.
if 'interfaces' in system_data:
interfaces = system_data.pop('interfaces')
else:
interfaces = {}
# Apply the non-interfaces data.
cobbler_server.xapi_object_edit('systems', system_name, 'edit', system_data, self.token)
# Apply interface-specific data.
handle = cobbler_server.get_system_handle(system_name, self.token)
ninterfaces = {}
for iname, ival in interfaces.items():
for k, v in ival.items():
if k in ['dns_name', 'ip_address', 'mac_address']:
if v:
ninterfaces[k.replace('_', '') + '-' + iname] = v
else:
ninterfaces[k + '-' + iname] = v
cobbler_server.modify_system(
handle,
'modify_interface',
ninterfaces,
self.token
)
cobbler_server.save_system(handle, self.token)
This isn't my code, it is a module I found on the internet which performs (or is supposed to perform) the task I want.
print '{'
for page in range (1,4):
rand = random.random()
id = str(long( rand*1000000000000000000 ))
query_params = { 'q':'a',
'include_entities':'true', 'lang':'en',
'show_user':'true',
'rpp': '100', 'page': page,
'result_type': 'mixed',
'max_id':id}
r = requests.get('http://search.twitter.com/search.json',
params=query_params)
tweets = json.loads(r.text)['results']
for tweet in tweets:
if tweet.get('text') :
print tweet
print '}'
print
The Python shell seems to indicate that the error is one Line 1. I know very little Python so have no idea why it isn't working.
This snippet is written for Python 2.x, but in Python 3.x (where print is now a proper function). Replace print SomeExp with print(SomeExpr) to solve this.
Here's a detailed description of this difference (along with other changes in 3.x).
I've been trying to unpickle some dictionaries from the database. I've reverted to using the marshal module, but was still wondering why pickle is having such a difficult time unserializing some data. Here is a command line python session showing essentially what I am trying to do:
>>> a = {'service': 'amazon', 'protocol': 'stream', 'key': 'lajdfoau09424jojf.flv'}
>>> import pickle; import base64
>>> pickled = base64.b64encode(pickle.dumps(a))
>>> pickled
'KGRwMApTJ3Byb3RvY29sJwpwMQpTJ3N0cmVhbScKcDIKc1Mna2V5JwpwMwpTJ2xhamRmb2F1MDk0MjRqb2pmLmZsdicKcDQKc1Mnc2VydmljZScKcDUKUydhbWF6b24nCnA2CnMu'
>>> unpickled = pickle.loads(base64.b64decode(pickled))
>>> unpickled
{'protocol': 'stream', 'service': 'amazon', 'key': 'lajdfoau09424jojf.flv'}
>>> unpickled['service']
'amazon'
This works all fine, but when I try this inside of a factory method for a class, it seems like the pickle.loads part errors out. The strings I am trying to load are pickled the same way as above. I've even tried copying the exact string that is pickled in the command line session above and just trying to unpickle that, but with no success. Here is the code for this latter attempt:
class Resource:
_service = 'unknown'
_protocol = 'unknown'
_key = 'unknown'
'''
Factory method that creates an appropriate instance of one of Resource’s subclasses based on
the type of data provided (the data being a serialized dictionary with at least the keys 'service',
'protocol', and 'key').
#param resource_data (string) -- the data used to create the new Resource instance.
'''
#staticmethod
def resource_factory(resource_data):
# Unpack the raw resource data and then create the appropriate Resource instance and return.
resource_data = "KGRwMApTJ3Byb3RvY29sJwpwMQpTJ3N0cmVhbScKcDIKc1Mna2V5JwpwMwpTJ2xhamRmb2F1MDk0MjRqb2pmLmZsdicKcDQKc1Mnc2VydmljZScKcDUKUydhbWF6b24nCnA2CnMu" #hack to just see if we can unpickle this string
logging.debug("Creating resource: " + resource_data)
unencoded = base64.b64decode(resource_data)
logging.debug("Unencoded is: " + unencoded)
unpacked = pickle.loads(unencoded)
logging.debug("Unpacked: " + unpacked)
service = unpacked['service']
protocol = unpacked['protocol']
key = unpacked['key']
if (service == 'amazon'):
return AmazonResource(service=service, protocol=protocol, key=key)
elif (service == 'fs'):
return FSResource(service=service, protocol=protocol, key=key)
Your code works. How are you testing it?
import logging
import base64
import pickle
class Resource:
#staticmethod
def resource_factory(resource_data):
resource_data = "KGRwMApTJ3Byb3RvY29sJwpwMQpTJ3N0cmVhbScKcDIKc1Mna2V5JwpwMwpTJ2xhamRmb2F1MDk0MjRqb2pmLmZsdicKcDQKc1Mnc2VydmljZScKcDUKUydhbWF6b24nCnA2CnMu" #hack to just see if we can unpickle this string
# logging.debug("Creating resource: " + resource_data)
unencoded = base64.b64decode(resource_data)
# logging.debug("Unencoded is: " + unencoded)
unpacked = pickle.loads(unencoded)
logging.debug("Unpacked: " + repr(unpacked))
service = unpacked['service']
protocol = unpacked['protocol']
key = unpacked['key']
logging.basicConfig(level=logging.DEBUG)
Resource.resource_factory('')
yields
# DEBUG:root:Unpacked: {'protocol': 'stream', 'service': 'amazon', 'key': 'lajdfoau09424jojf.flv'}
I was able to solve this after making some simplifications and then debugging in django. The main issue was that there were some errors in the Resource class itself that were preventing the correct completion of the resource_factory method. First, I was trying to concatenate a string and dictionary, which was throwing an error. I also had some errors elsewhere in the class where I was referring to the instance variables _service, _protocol, and key withouth the '' (typos).
In any case, the interesting thing was that when I used this code within Django's custom field infrastructure, the errors would get caught and I did not see any actual message indicating the problem. The debug statements suggested it was a problem with loads, but actually it was a problem with the debug statement itself and some code that came later. When I tried to implement this behavior using model properties instead of custom model fields for the data I was saving, the errors actually got printed out correctly and I was able to quickly debug.