Reputation
Badges 1
54 × Eureka!SuccessfulKoala55 I tried to make a docker image by combining one of our dockerfiles with this https://github.com/allegroai/clearml-agent/blob/master/docker/agent/Dockerfile . I modified the entrypoint
to also be a combination of both.
Right now I’m not seeing that error, but the the process seems to exit (as completed) after the docker run
. I’m wondering if my Dockerfile is not properly setup and it’s exiting before the deamon is started.
It’s not that I think because it works if I run the same command manually.
AgitatedDove14 no I mean I can do:
` docker run -t --gpus "device=1" -dit -e APP_ENV=kprod -e CLEARML_WORKER_ID=ada:gpu1 -e CLEARML_DOCKER_IMAGE=922531023312.dkr.ecr.us-west-2.amazonaws.com/jym-coach:202108080511.7e8d6d1 -v /home/smjahad/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.kjx6r9oo.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.l8cguj81:/root/.ssh -v /home/smjahad/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/smjahad/.clearml/pip-cache:/root/.cache/pip -v /home/smjah...
ugh, sudo actually makes it fail explicitly because
` error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
- Make sure you pushed the requested commit:
(repository='git@github.com:salimmj/clearml-demo.git', branch='main', commit_id='f76f3affd28d5558928d7ffd9a6797890ffdd708', tag='', docker_cmd='nvidia/cuda:11.4.0-runtime-ubuntu20.04', entry_point='mnist.py', working_dir='.') - Check if remote-wo...
btw, AgitatedDove14 I launch the agent daemon
outside docker (with --docker
) , that’s the way it is supposed to work right?
$ clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
And then the worker itself will run the docker run
command for me and start another non-daemon agent inside.
I guess the failure happens when it tries to switch to docker because the same experiment works with agents not started with --docker
flag
I think it’s great to let users build their own UI-connected apps, I’d use that for sure!
I tried with and without. I’m having the issue where if I run the task from the queue it will complete as soon as it goes into docker but if I run the same docker run it works.
I think it works, I’m fixing something else that came up.
In fact, if there is a good python API to list/duplicate/edit/run experiments by ID, it seems straightforward to do that from Airflow (or any other job scheduler). I’m just wondering if there is some built-in scheduler.
$ git remote -v fork git@github.com:salimmj/somerepo.git (fetch) fork git@github.com:salimmj/somerepo.git (push) origin git@github.com:mainuser/somerepo.git (fetch) origin git@github.com:mainuser/somerepo.git (push)
I want to keep the above setup, the remote branch that will track my local will be on fork
so it needs to pull from there. Currently it recognizes origin
so it doesn’t work because the agent then can’t find the commit.
TimelyPenguin76 After creating the venv (so I don’t have to do it myself). Once an env is there, I need to run a script while the env is activated from the root of the repo.
It recognizes the main repo, but I want it to push and pull from another one (my own forked repo). AgitatedDove14
The private_package
can be installed by doing pip install
git+ssh://git@github.com/user/private_package.git but the agent is trying to do pip install private_package
which won’t work.
AgitatedDove14 should I try running the above command with privileged user?
I don’t mean a serving endpoint, just the equivalent of “cloning an experiment” and running it on a different (larger) dataset.
You’re saying there’s a built-in scheduler? SuccessfulKoala55
If so where can I find it?
Here’s another place where /root/
is hardcoded https://github.com/allegroai/clearml-agent/blob/b196ab57931f3c67efcb561df0c8a2fe7c0e76f9/clearml_agent/commands/worker.py#L3338-L3341
AgitatedDove14 can I specify a script to be run after pip install packages is done? I see that it’s possible in docker mode.
EagerOtter28 I’m running into a similar situation as you.
I think you could use --standalone-mode
and do the cloning yourself in the docker bash script that you can configure in the agent config.
Issue seems fixed now, thanks! Is the fact that clearml-agent needs to be installed from system python mentioned anywhere in the docs, if not I suggest it gets added.
Thank you so much for helping.
I do expect it to pip
install though which doesn’t root access I think
For your second question, those are generated using custom tooling, it relies on the build system to be setup which is guaranteed by the docker image used. So I don’t think this is a case of supporting a specific env setup or build tool but just allowing custom script for env setup step / building code.
WDYT?