Reputation
Badges 1
970 × Eureka!I can ssh into the agent and:source /trains-agent-venv/bin/activate (trains_agent_venv) pip show pyjwt Version: 1.7.1
And so in the UI, in workers&queues tab, I see randomly one of the two experiments for the worker that is running both experiments
Is it because I did not specify --gpu 0
that the agent, by default pulls one experiment per available GPU?
I'll try to pass these values using the env vars
both are repos for python modules (experiment one and dependency of the experiment)
When an experiment on trains-agent-1 is finished, I see randomly no experiment/long experiment and when two experiments are running, I see randomly one of the two experiments
Interesting! Something like that would be cool yes! I just realized that custom plugins in Mattermost are written in Go, could be a good hackday for me π to learn go
This is consistent: Each time I send a new task on the default queue, if trains-agent-1 has only one task running (the long one), it will pick another one. If I add one more experiment in the queue at that point (trains-agent-1 running two experiments at the same time), that experiment will stay in queue (trains-agent-2 and trains-agent-3 will not pick it because they also are running experiments)
I think it comes from the web UI of the version 1.2.0 of clearml-server, because I didnβt change anything else
I just move one experiment in another project, after moving it I am taken to the new project where the layout is then reset
(Just to know if I should wait a bit or go with the first solution)
Hi PompousParrot44 , you could have a Controller task running in the services queue that periodically schedules the task you want to run
Are you planning to add a server-backup service task in the near future?
The task I cloned from is not the one I though
AgitatedDove14 I see https://github.com/allegroai/clearml-session/blob/main/clearml_session/interactive_session_task.py#L21= that a key pair is hardcoded in the repo. Is it being used to ssh to the instance?
You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data
I still don't see why you would change the type of the cloned Task, I'm assuming the original Task had the correct type, no?
Because it is easier for me that I create a training task out of the controller task by cloning it (so that parameters are prefilled and I can set the parent task id)
I did change the replica setting on the same index yes, I reverted it back from 1 to 0 afterwards
Hi AgitatedDove14 , Here is the full log.
Both python versions (local and remote) are python 3.6 Locally (macos), I get pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0, 0.4.0, 0.5.0)
Remotely (Ubuntu), I get (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)
So I guess itβs not related to clearml-agent really, rather pip that cannot find the proper wheel for ubuntu for latest versions of pytorch3d, right? If yes, is there a way to build the wheel on the remote machine...
Yea again I am trying to understand what I can do with what I have π I would like to be able to export as an environment variable the runtime where the agent is installing, so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
SuccessfulKoala55 I am using ES 7.6.2
I also would like to avoid any copy of these artifacts on s3 (to avoid double costs, since some folders might be big)
what would be the name of these vars?
AppetizingMouse58 Yes and yes
Could be, but not sure -> from 0.16.2 to 0.16.3
and the agent says agent.cudnn_version = 0