
Reputation
Badges 1
54 × Eureka!Here’s another place where /root/
is hardcoded https://github.com/allegroai/clearml-agent/blob/b196ab57931f3c67efcb561df0c8a2fe7c0e76f9/clearml_agent/commands/worker.py#L3338-L3341
TimelyPenguin76 After creating the venv (so I don’t have to do it myself). Once an env is there, I need to run a script while the env is activated from the root of the repo.
ugh, sudo actually makes it fail explicitly because
` error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
- Make sure you pushed the requested commit:
(repository='git@github.com:salimmj/clearml-demo.git', branch='main', commit_id='f76f3affd28d5558928d7ffd9a6797890ffdd708', tag='', docker_cmd='nvidia/cuda:11.4.0-runtime-ubuntu20.04', entry_point='mnist.py', working_dir='.') - Check if remote-wo...
AgitatedDove14 can I specify a script to be run after pip install packages is done? I see that it’s possible in docker mode.
My docker image will have all required apt
packages, so no need.
If venv works inside containers that’s even better. We actually have custom containers that build on master merges. I wonder if using our own containers which should have most the deps will work better than a simpler container.
AgitatedDove14 should I try running the above command with privileged user?
SuccessfulKoala55 I tried to make a docker image by combining one of our dockerfiles with this https://github.com/allegroai/clearml-agent/blob/master/docker/agent/Dockerfile . I modified the entrypoint
to also be a combination of both.
Right now I’m not seeing that error, but the the process seems to exit (as completed) after the docker run
. I’m wondering if my Dockerfile is not properly setup and it’s exiting before the deamon is started.
Is there a way to make it use ssh+git
instead of git+git
? Maybe add a force_ssh_pip_install
to the agent config?
Our code is tightly integrated with protobuffers which needs to be re-compiled every now and then. We have a script to do that. If that’s not done, some imports end up failing.
I’m wondering, would an older version of the agent work well with a newer server version and vice-versa?
So when the repo is cloned and venv is created and activated I want to executed this from the repo: tools/setup_dependencies.sh
It doesn’t install it automatically, I think I need to specify it somewhere, see the above error. Or am I misunderstanding?
I think it’s great to let users build their own UI-connected apps, I’d use that for sure!
I tried with and without. I’m having the issue where if I run the task from the queue it will complete as soon as it goes into docker but if I run the same docker run it works.
AgitatedDove14 no I mean I can do:
` docker run -t --gpus "device=1" -dit -e APP_ENV=kprod -e CLEARML_WORKER_ID=ada:gpu1 -e CLEARML_DOCKER_IMAGE=922531023312.dkr.ecr.us-west-2.amazonaws.com/jym-coach:202108080511.7e8d6d1 -v /home/smjahad/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.kjx6r9oo.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.l8cguj81:/root/.ssh -v /home/smjahad/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/smjahad/.clearml/pip-cache:/root/.cache/pip -v /home/smjah...
I don’t mean a serving endpoint, just the equivalent of “cloning an experiment” and running it on a different (larger) dataset.
Fixed it by adding this code block. Makes sense.if clone: task = Task.clone(self) else: task = self # check if the server supports enqueueing aborted/stopped Tasks if Session.check_min_api_server_version('2.13'): self.mark_stopped(force=True) else: self.reset()
If you were to add this, where would you put it? I can use a modified version of clearml-agent
Well this doesn’t workpip install -e
Great find! So a pip upgrade should fix it hopefully.
I think it works, I’m fixing something else that came up.
$ git remote -v fork git@github.com:salimmj/somerepo.git (fetch) fork git@github.com:salimmj/somerepo.git (push) origin git@github.com:mainuser/somerepo.git (fetch) origin git@github.com:mainuser/somerepo.git (push)
I want to keep the above setup, the remote branch that will track my local will be on fork
so it needs to pull from there. Currently it recognizes origin
so it doesn’t work because the agent then can’t find the commit.
... more-itertools==8.6.0 -e git+git@github.com:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package msgpack==1.0.2 msgpack-numpy==0.4.7.1 ...
You’re saying there’s a built-in scheduler? SuccessfulKoala55
If so where can I find it?
AgitatedDove14 when I try this I getclearml.backend_interface.session.SendError: Action failed <400/110: tasks.enqueue/v1.0 (Invalid task status (Invalid status change): current_status=in_progress, new_status=queued)> (queue=e78d2fdf2d5140b6b5c6678338c532bb, task=95082c9174a04044b25253d724362ec1)
AgitatedDove14 this works: pip install
git+ssh://git@github.com/user/repo.git
The private_package
can be installed by doing pip install
git+ssh://git@github.com/user/private_package.git but the agent is trying to do pip install private_package
which won’t work.
I am already forcing ssh auth
In fact, if there is a good python API to list/duplicate/edit/run experiments by ID, it seems straightforward to do that from Airflow (or any other job scheduler). I’m just wondering if there is some built-in scheduler.