I found the same issue here Tasks fail when running an agent inside Notebook 2: Remote Agent · Issue #1204 · allegroai/clearml
Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for the reply. This happens after I enqueue
the task in the queue where the colab agent is registered. Please see None to have a look at the outputs
Hi @<1686547344096497664:profile|ContemplativeArcticwolf43>
In the 2nd 'Getting Started' tutorial,
Could you send a link to the specific notebook?
. But whenever a task is picked, it fails for the following
You mean after the Task.init
call?
# Copyright 2023 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Custom kernel launcher app to customize socket options."""
from ipykernel import kernelapp
import zmq
# We want to set the high water mark on *all* sockets to 0, as we don't want
# the backend dropping any messages. We want to set this before any calls to
# bind or connect.
#
# In principle we should override `init_sockets`, but it's hard to set options
# on the `zmq.Context` there without rewriting the entire method. Instead we
# settle for only setting this on `iopub`, as that's the most important for our
# use case.
class ColabKernelApp(kernelapp.IPKernelApp):
def init_iopub(self, context):
context.setsockopt(zmq.RCVHWM, 0)
context.setsockopt(zmq.SNDHWM, 0)
return super().init_iopub(context)
if __name__ == '__main__':
ColabKernelApp.launch_instance()
Good news is that I've tried again running from a fresh colab instance (run initial task, clone, run agent, send cloned task to agent) and it seems to be working now
It actually started executing your code, but it did not capture it correctly:
/root/.clearml/venvs-builds/3.10/bin/python -u /root/.clearml/venvs-builds/3.10/code/colab_kernel_launcher.py
Which I assume means the actual Task had bad code.
What do you have under the Task execution tab in the UI (the one you were launching, i.e. enqueueing )
Oh!
I see this is using the colab as remote agent (i.e. to launch jobs on it),
[ColabKernelApp] CRITICAL | Bad config encountered during initialization: The 'kernel_class' trait of <main.ColabKernelApp object at 0x7fa41b29e5c0> instance must be a type, but 'google.colab._kernel.Kernel' could not be imported
Can you send the full log?
the task is a clone of one previously executed on google colab too