Unknown error has occurred in Cloud Functions - python

First, it looks like this thread but it is not: An unknown error has occurred in Cloud Function: GCP Python
I deployed a couple of times Cloud Functions and they are still working fine. Nevertheless, since last week, following the same procedure I can deploy correctly, but testing them I get the error "An unknown error has occurred in Cloud Functions. The attempted action failed. Please try again, send feedback".
In remote the script works perfectly and writes in Cloud Storage.
My Cloud Function is a zip with a python script, loading a csv in Cloud Storage.
The csv weights 160kB, the python script 5kB. So I used 128MiB of memory allocated.
The execution time is 38 secs, almost half of the default timeout.
It is configured to allow just traffic within the project.
Env variables are not the problem
It's triggered by pub/sub and what I want is to schedule it when I can make it work.
I'm quite puzzled. I have such a lack of ideas right now that I started to think everything works fine but the Google testing method is what is fails... Nevertheless when I run the pub/sub topic in Cloud Scheduler it launches the error log without much info 1. By any chance anyone had the same problem?
Thanks

Answer of myself from the past:
Finally "solved". I'm a processing a csv in the CF of 160kB, in my computer the execution time lasts 38 seconds. For some reason in the CF I need 512MB of Allocated Memory and a timeout larger than 60 secs.
Answer of myself from a closest past:
Don't test a CF using the test button, because sometimes it takes more than the max available timeout to finish, hence you'll get errors.
If you want to test it easily
Write prints after milestones in your code to check how the script is evolving.
Use the logs interface. The prints will be displayed there ;)
Also, logs show valuable info (sometimes even readable).
Also, if you're sending for example, to buckets, check them after the CF is finished, maybe you get a surprise.
To sum up, don't believe blindly in the testing button.
Answer of myself from the present (already regretting the prints thing):
There are nice python libraries to check logs, don't print stuff for that (if you have time).

Related

Azure resource for long running process in python

I am trying to figure out the best way to run a python process typically taking 10-30 (max an hour ish) minutes on my local machine. The process will be manually triggered, and may not be triggered for ours or days.
I am a bit confused, because I read official ms-docs stating that one should avoid long running processes in function apps (https://learn.microsoft.com/en-us/azure/azure-functions/performance-reliability#avoid-long-running-functions) but at the same time, the functionTimeout for the Premium and Dedicated plans can be unlimited.
I am hesitant to use a standard web app with an API since it seems overkill to have it running 24/7.
Are there any ideal resources for this?
you can use consumption based Azure durable functions, they can run for hours or even days.

Google Calendar API:Calendar usage limits exceeded

My work is migration calendar data, we're using google calendar API.
The number of Target Data is 280,000.
The method to execute is as follows.
・Calender API V3 Event-insert
https://developers.google.com/google-apps/calendar/v3/reference/events/insert
・batch request
https://developers.google.com/google-apps/calendar/batch
・exponential backoff
It has been running many times for this test, and it was able to run without problems before.
However, it currently has an error indication "Calendar usage limits exceeded", a state where it can not be executed even once and lasted for about a week.
I understand that the cause is as follows.
https://support.google.com/a/answer/2905486?hl=en
QPD(quota per date) had been changed to 2,000,000 by the support .
Therefore, I think that the problem by quota can be solved.
However, Currently, I am currently in a state where I can not execute the API even once I run the program.
I want to eliminate this situation.
I think that it is probably necessary to cancel restrictions on google side.
Can I borrow your wisdom?

Azure python SDK - AzureHttpError: The condition specified using HTTP conditional header(s) is not met

I'm trying to download a large VHD file (30GB) from Azure Blob Storage using the following code:
blob_service.get_blob_to_path('vhds', '20161206092429.vhd', '20161206092429.vhd')
where the first parameter is the container name, the second the blob name, and the third the local file/path where it will be saved. This 30GB download was working normally, but all of a sudden I started receiving this error:
AzureHttpError: The condition specified using HTTP conditional
header(s) is not met. ConditionNotMetThe
condition specified using HTTP conditional header(s) is not met.
RequestId:88b6ac24-0001-0001-5ec0-4f490d000000
Time:2016-12-06T12:57:13.5389237Z
Download now runs OK for some random time: sometimes really short time, and sometimes long time. Even up to 9 or 10GB of the full 30GB download.
According to this questions:
Azure Blob: "The condition specified using HTTP conditional header(s) is not met"
304: The condition specified using HTTP conditional header(s) is not met
It seems to be a race condition, but that doesn't help much to solve the issue without diving in and deal with the SDK code. Any suggestions on what can be causing this, as the download was working previously? Maybe an outage on Azure cloud?
As a VHD changes, its related ETag will change. Once this happens, a file-copy operation will no longer be valid. I believe this is what you're seeing via your call to blob_service.get_blob_to_path(), since your vhd is being used with a running VM. And... even if the vm is idle - a running OS is never really idle - there are always some background operations, which likely write to disk.
Not that it will ensure a successful file-copy operation, but you'd need to shut down the VM first before initiating a copy.
Alternatively, you can make a snapshot of the VHD and then do a copy via the snapshot instead of the original vhd (which would then let you continue to use your vhd during the copy operation).
If you're creating your blob service with an sas_token it may only have been set to last for an hour. If that's the case you can change the expiry time of the token on creation to a later point in time.

R10 Boot Timeout Error - Conceptual

So I'm getting the very common
"Web process failed to bind to $PORT within 60 seconds of launch"
But none of the solutions I've tried have worked, so my question is much more conceptual.
What is suppose to be binding? It is my understanding that I do not need to write code specifically to bind the worker dyno to the $PORT, but rather that this failure is caused primarily by computationally intensive processes.
I don't have any really great code snippets to show here, but I've included the link to the github repo for the project I'm working on.
https://github.com/therightnee/RainbowReader_MKII
There is a long start up time when the RSS feeds are first parsed, but I've never seen it go past 30 seconds. Even so, currently when you go to the page it should just render a template. Initially, in this setup, there is no data processing being done. Testing locally, everything runs great, and even with the data parsing it doesn't take more than a minute in any test case.
This leads me to believe that somewhere I need to be setting or using the $PORT variable in some way but I don't know.
Thanks!

Profiling a long-running Python Server

I have a long-running twisted server.
In a large system test, at one particular point several minutes into the test, when some clients enter a particular state and a particular outside event happens, then this server takes several minutes of 100% CPU and does its work very slowly. I'd like to know what it is doing.
How do you get a profile for a particular span of time in a long-running server?
I could easily send the server start and stop messages via HTTP if there was a way to enable or inject the profiler at runtime?
Given the choice, I'd like stack-based/call-graph profiling but even leaf sampling might give insight.
yappi profiler can be started and stopped at runtime.
There are two interesting tools that came up that try to solve that specific problem, where you might not necessarily have instrumented profiling in your code in advance but want to profile production code in a pinch.
pyflame will attach to an existing process using the ptrace(2) syscall and create "flame graphs" of the process. It's written in Python.
py-spy works by reading the process memory instead and figuring out the Python call stack. It also provides a flame graph but also a "top-like" interface to show which function is taking the most time. It's written in Rust and Python.
Not a very Pythonic answer, but maybe straceing the process gives some insight (assuming you are on a Linux or similar).
Using strictly Python, for such things I'm using tracing all calls, storing their results in a ringbuffer and use a signal (maybe you could do that via your HTTP message) to dump that ringbuffer. Of course, tracing slows down everything, but in your scenario you could switch on the tracing by an HTTP message as well, so it will only be enabled when your trouble is active as well.
Pyliveupdate is a tool designed for the purpose: profiling long running programs without restarting them. It allows you to dynamically selecting specific functions to profiling or stop profiling without instrument your code ahead of time -- it dynamically instrument code to do profiling.
Pyliveupdate have three key features:
Profile specific Python functions' (by function names or module names) call time.
Add / remove profilings without restart programs.
Show profiling results with call summary and flamegraphs.
Check out a demo here: https://asciinema.org/a/304465.

Categories

Resources