
Reputation
Badges 1
54 × Eureka!For your second question, those are generated using custom tooling, it relies on the build system to be setup which is guaranteed by the docker image used. So I don’t think this is a case of supporting a specific env setup or build tool but just allowing custom script for env setup step / building code.
WDYT?
So when the repo is cloned and venv is created and activated I want to executed this from the repo: tools/setup_dependencies.sh
I am already forcing ssh auth
My docker image will have all required apt
packages, so no need.
Is there a way to make it use ssh+git
instead of git+git
? Maybe add a force_ssh_pip_install
to the agent config?
AgitatedDove14 can I specify a script to be run after pip install packages is done? I see that it’s possible in docker mode.
Our code is tightly integrated with protobuffers which needs to be re-compiled every now and then. We have a script to do that. If that’s not done, some imports end up failing.
TimelyPenguin76 After creating the venv (so I don’t have to do it myself). Once an env is there, I need to run a script while the env is activated from the root of the repo.
I hadn’t enabled that line when the failure happened.
If you were to add this, where would you put it? I can use a modified version of clearml-agent
I do expect it to pip
install though which doesn’t root access I think
I think it’s great to let users build their own UI-connected apps, I’d use that for sure!
AgitatedDove14 this works: pip install
git+ssh://git@github.com/user/repo.git
The commit is valid for sure.
I think it works, I’m fixing something else that came up.
Great find! So a pip upgrade should fix it hopefully.
That won’t work 😕
The docker shell script runs too early in the process.
I want to inject a bash command after the repo has been clone (and maybe even after the venv has been installed).
This is exactly what I was looking for. I thought once you call execute_remotely
the task is sent and it’s too late to change anything.
It recognizes the main repo, but I want it to push and pull from another one (my own forked repo). AgitatedDove14
I tried with and without. I’m having the issue where if I run the task from the queue it will complete as soon as it goes into docker but if I run the same docker run it works.
AgitatedDove14 when I try this I getclearml.backend_interface.session.SendError: Action failed <400/110: tasks.enqueue/v1.0 (Invalid task status (Invalid status change): current_status=in_progress, new_status=queued)> (queue=e78d2fdf2d5140b6b5c6678338c532bb, task=95082c9174a04044b25253d724362ec1)
btw, AgitatedDove14 I launch the agent daemon
outside docker (with --docker
) , that’s the way it is supposed to work right?
$ clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
And then the worker itself will run the docker run
command for me and start another non-daemon agent inside.
I guess the failure happens when it tries to switch to docker because the same experiment works with agents not started with --docker
flag
AgitatedDove14 no I mean I can do:
` docker run -t --gpus "device=1" -dit -e APP_ENV=kprod -e CLEARML_WORKER_ID=ada:gpu1 -e CLEARML_DOCKER_IMAGE=922531023312.dkr.ecr.us-west-2.amazonaws.com/jym-coach:202108080511.7e8d6d1 -v /home/smjahad/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.kjx6r9oo.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.l8cguj81:/root/.ssh -v /home/smjahad/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/smjahad/.clearml/pip-cache:/root/.cache/pip -v /home/smjah...
Fixed it by adding this code block. Makes sense.if clone: task = Task.clone(self) else: task = self # check if the server supports enqueueing aborted/stopped Tasks if Session.check_min_api_server_version('2.13'): self.mark_stopped(force=True) else: self.reset()
$ git remote -v fork git@github.com:salimmj/somerepo.git (fetch) fork git@github.com:salimmj/somerepo.git (push) origin git@github.com:mainuser/somerepo.git (fetch) origin git@github.com:mainuser/somerepo.git (push)
I want to keep the above setup, the remote branch that will track my local will be on fork
so it needs to pull from there. Currently it recognizes origin
so it doesn’t work because the agent then can’t find the commit.
The private_package
can be installed by doing pip install
git+ssh://git@github.com/user/private_package.git but the agent is trying to do pip install private_package
which won’t work.
ugh, sudo actually makes it fail explicitly because
` error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
- Make sure you pushed the requested commit:
(repository='git@github.com:salimmj/clearml-demo.git', branch='main', commit_id='f76f3affd28d5558928d7ffd9a6797890ffdd708', tag='', docker_cmd='nvidia/cuda:11.4.0-runtime-ubuntu20.04', entry_point='mnist.py', working_dir='.') - Check if remote-wo...
Here’s another place where /root/
is hardcoded https://github.com/allegroai/clearml-agent/blob/b196ab57931f3c67efcb561df0c8a2fe7c0e76f9/clearml_agent/commands/worker.py#L3338-L3341