Reputation
Badges 1
54 × Eureka!For your second question, those are generated using custom tooling, it relies on the build system to be setup which is guaranteed by the docker image used. So I don’t think this is a case of supporting a specific env setup or build tool but just allowing custom script for env setup step / building code.
WDYT?
AgitatedDove14 can I specify a script to be run after pip install packages is done? I see that it’s possible in docker mode.
I’m not sure but it seems like you get different kinds of flexibility depending on whether you enqueue the task yourself or whether you rely on execute_remotely
. I think ideally if I could choose to get the benefit of auto-scanning provided by execute_remotely
as well as more flexibility it would be great.
Is it possible to set that at task enqueueing SuccessfulKoala55 ?
I think it’s great to let users build their own UI-connected apps, I’d use that for sure!
That won’t work 😕
The docker shell script runs too early in the process.
I want to inject a bash command after the repo has been clone (and maybe even after the venv has been installed).
TimelyPenguin76 After creating the venv (so I don’t have to do it myself). Once an env is there, I need to run a script while the env is activated from the root of the repo.
Here’s another place where /root/
is hardcoded https://github.com/allegroai/clearml-agent/blob/b196ab57931f3c67efcb561df0c8a2fe7c0e76f9/clearml_agent/commands/worker.py#L3338-L3341
AgitatedDove14 this works: pip install
git+ssh://git@github.com/user/repo.git
SuccessfulKoala55 I tried to make a docker image by combining one of our dockerfiles with this https://github.com/allegroai/clearml-agent/blob/master/docker/agent/Dockerfile . I modified the entrypoint
to also be a combination of both.
Right now I’m not seeing that error, but the the process seems to exit (as completed) after the docker run
. I’m wondering if my Dockerfile is not properly setup and it’s exiting before the deamon is started.
This is exactly what I was looking for. I thought once you call execute_remotely
the task is sent and it’s too late to change anything.
Fixed it by adding this code block. Makes sense.if clone: task = Task.clone(self) else: task = self # check if the server supports enqueueing aborted/stopped Tasks if Session.check_min_api_server_version('2.13'): self.mark_stopped(force=True) else: self.reset()
AgitatedDove14 when I try this I getclearml.backend_interface.session.SendError: Action failed <400/110: tasks.enqueue/v1.0 (Invalid task status (Invalid status change): current_status=in_progress, new_status=queued)> (queue=e78d2fdf2d5140b6b5c6678338c532bb, task=95082c9174a04044b25253d724362ec1)
AgitatedDove14 wouldn’t the above command task.execute_remotely(queue_name=None, clone=False, exit_process=False)
fail becauseclone==False and exit_process==False is not supported. Task enqueuing itself must exit the process afterwards.
I thought it worked earlier 😮
btw, AgitatedDove14 I launch the agent daemon
outside docker (with --docker
) , that’s the way it is supposed to work right?
$ clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
And then the worker itself will run the docker run
command for me and start another non-daemon agent inside.
I guess the failure happens when it tries to switch to docker because the same experiment works with agents not started with --docker
flag
Being able to create and remove queues as well as list their contents.
My docker image will have all required apt
packages, so no need.
I do expect it to pip
install though which doesn’t root access I think
I’m wondering, would an older version of the agent work well with a newer server version and vice-versa?
I tried with and without. I’m having the issue where if I run the task from the queue it will complete as soon as it goes into docker but if I run the same docker run it works.
It’s not that I think because it works if I run the same command manually.
AgitatedDove14 no I mean I can do:
` docker run -t --gpus "device=1" -dit -e APP_ENV=kprod -e CLEARML_WORKER_ID=ada:gpu1 -e CLEARML_DOCKER_IMAGE=922531023312.dkr.ecr.us-west-2.amazonaws.com/jym-coach:202108080511.7e8d6d1 -v /home/smjahad/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.kjx6r9oo.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.l8cguj81:/root/.ssh -v /home/smjahad/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/smjahad/.clearml/pip-cache:/root/.cache/pip -v /home/smjah...
AgitatedDove14 should I try running the above command with privileged user?
ugh, sudo actually makes it fail explicitly because
` error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
- Make sure you pushed the requested commit:
(repository='git@github.com:salimmj/clearml-demo.git', branch='main', commit_id='f76f3affd28d5558928d7ffd9a6797890ffdd708', tag='', docker_cmd='nvidia/cuda:11.4.0-runtime-ubuntu20.04', entry_point='mnist.py', working_dir='.') - Check if remote-wo...