Reputation
Badges 1
54 × Eureka!AgitatedDove14 no I mean I can do:
` docker run -t --gpus "device=1" -dit -e APP_ENV=kprod -e CLEARML_WORKER_ID=ada:gpu1 -e CLEARML_DOCKER_IMAGE=922531023312.dkr.ecr.us-west-2.amazonaws.com/jym-coach:202108080511.7e8d6d1 -v /home/smjahad/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.kjx6r9oo.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.l8cguj81:/root/.ssh -v /home/smjahad/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/smjahad/.clearml/pip-cache:/root/.cache/pip -v /home/smjah...
ugh, sudo actually makes it fail explicitly because
` error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
- Make sure you pushed the requested commit:
(repository='git@github.com:salimmj/clearml-demo.git', branch='main', commit_id='f76f3affd28d5558928d7ffd9a6797890ffdd708', tag='', docker_cmd='nvidia/cuda:11.4.0-runtime-ubuntu20.04', entry_point='mnist.py', working_dir='.') - Check if remote-wo...
SuccessfulKoala55 I tried to make a docker image by combining one of our dockerfiles with this https://github.com/allegroai/clearml-agent/blob/master/docker/agent/Dockerfile . I modified the entrypoint to also be a combination of both.
Right now Iām not seeing that error, but the the process seems to exit (as completed) after the docker run . Iām wondering if my Dockerfile is not properly setup and itās exiting before the deamon is started.
I tried with and without. Iām having the issue where if I run the task from the queue it will complete as soon as it goes into docker but if I run the same docker run it works.
Great find! So a pip upgrade should fix it hopefully.
Being able to create and remove queues as well as list their contents.
AgitatedDove14 when I try this I getclearml.backend_interface.session.SendError: Action failed <400/110: tasks.enqueue/v1.0 (Invalid task status (Invalid status change): current_status=in_progress, new_status=queued)> (queue=e78d2fdf2d5140b6b5c6678338c532bb, task=95082c9174a04044b25253d724362ec1)
btw, AgitatedDove14 I launch the agent daemon outside docker (with --docker ) , thatās the way it is supposed to work right?
$ clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
And then the worker itself will run the docker run command for me and start another non-daemon agent inside.
I guess the failure happens when it tries to switch to docker because the same experiment works with agents not started with --docker flag
Is there a way to make it use ssh+git instead of git+git ? Maybe add a force_ssh_pip_install to the agent config?
I am already forcing ssh auth
Fixed it by adding this code block. Makes sense.if clone: task = Task.clone(self) else: task = self # check if the server supports enqueueing aborted/stopped Tasks if Session.check_min_api_server_version('2.13'): self.mark_stopped(force=True) else: self.reset()
Itās not that I think because it works if I run the same command manually.
$ git remote -v fork git@github.com:salimmj/somerepo.git (fetch) fork git@github.com:salimmj/somerepo.git (push) origin git@github.com:mainuser/somerepo.git (fetch) origin git@github.com:mainuser/somerepo.git (push)I want to keep the above setup, the remote branch that will track my local will be on fork so it needs to pull from there. Currently it recognizes origin so it doesnāt work because the agent then canāt find the commit.
AgitatedDove14 wouldnāt the above command task.execute_remotely(queue_name=None, clone=False, exit_process=False) fail becauseclone==False and exit_process==False is not supported. Task enqueuing itself must exit the process afterwards.I thought it worked earlier š®
I already have that set to true and want that behavior. The issue is on the ācommittedā change set. When I push code to github I push to my fork and pull from the main/master repo (all changes go through PRs from fork to main).
Now when I use execute_remotely , whatever code does the git discovery, considers whatever repo I pull from the repo to use. But these changes havenāt necessarily been merged into main. The correct behavior would be to use the forked repo.
The private_package can be installed by doing pip install git+ssh://git@github.com/user/private_package.git but the agent is trying to do pip install private_package which wonāt work.
This is exactly what I was looking for. I thought once you call execute_remotely the task is sent and itās too late to change anything.
EagerOtter28 Iām running into a similar situation as you.
I think you could use --standalone-mode and do the cloning yourself in the docker bash script that you can configure in the agent config.
Iām wondering, would an older version of the agent work well with a newer server version and vice-versa?
If venv works inside containers thatās even better. We actually have custom containers that build on master merges. I wonder if using our own containers which should have most the deps will work better than a simpler container.
Well this doesnāt workpip install -e
AgitatedDove14 should I try running the above command with privileged user?
For your second question, those are generated using custom tooling, it relies on the build system to be setup which is guaranteed by the docker image used. So I donāt think this is a case of supporting a specific env setup or build tool but just allowing custom script for env setup step / building code.
WDYT?
... more-itertools==8.6.0 -e git+git@github.com:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package msgpack==1.0.2 msgpack-numpy==0.4.7.1 ...
I think itās great to let users build their own UI-connected apps, Iād use that for sure!
It is indeed autopopulated by init
The commit is valid for sure.