Reputation
Badges 1
25 × Eureka!I'm assuming you mean for the clients, right?
Different question. How can I pass PYTHONPATH env variable to a task, run by agent (so python can find classes inside m subdirectories)?
Hi HelpfulHare30
By default the working directory will be added to the python path, this means if I have under execution:Working Dir: "." Script: "src/script.py"
The root git repo will be added to the python path.
BTW: next RC you could add a flag to the agent to always add the git repo
I'm trying to achieve a workflow similar to the one
You mean running everything on a single machine (manually)?
ExcitedFish86
How do I set the config for this agent? Some options can be set through env vars but not all of themΒ
Hmm okay if you are running an agent inside a container and you want it to spin "sibling" containers, you need to se the following:
mount the docker socket to the container running the agent itself (as you did), basically adding " --privileged -v /var/run/docker.sock:/var/run/docker.sock
" Allow the host to mount cache and configuration from the host into the siblin...
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
Try this one πHyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]
RobustRat47 I think you have to use the latest clearml package for that (1.6.0)
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
Yep... they are pushing "heavy" users away from these instances. Nothing really you can do, maybe switch to Azure/GCP, but it might be the same there
RoughTiger69
move the files locally (i.e. based on the example move folder b
into folder a
) Create a new version with two parents ('a' and 'b') then sync the local root folder ('a' in your case). Only the meta-data should change (because the referenced files are already in one of the datasets)wdyt?
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
I cannot reproduce, tested with the same matplotlib version and python against the community server
to fix it, I excluded this var entirely from the docker-compose
Make sense.
the path to the JSON file
Yep, that's what I did and things seem to work... Let me check again if I missed anything
Oh then this should just workcp -R --link b a/
You can achieve the same symbol link link from python as well
Hi PanickyFish98
It verifies it has access to it when actually creating the Task, maybe it should be a warning?!
fyi: you can also change the value from the UI (under Execution output) or have a default one set in the clearml.conf
used by the agent
Notice: dataset_rgb.list_files()
will list the content of the dataset, Not the local files:
e.g.: /folder/myfile.ext
and not /hone/user/cache/folder/myfile.ext
So basically i think you are just not passing actual files, you should probably do:for local_file in Path(folder_rgb).rglob('*'): ...
So in a simple "all-or-nothing"
Actually this is the only solution unless preemption is supported, i.e. abort running Task to free-up an agent...
There is no "magic" solution for complex multi-node scheduling, even SLURM will essentially do the same ...
The problem is not really for the agents to wait (this is easily solved by additional high priority queue) the problem is will you have a "free" agent... you see my point ?
Hi ExcitedFish86
Good question, how do you "connect" the 3 nodes? (i.e. what the framework you are using)
Hi StickyWhale51
I think this issue is due to some internal race condition, anyhow I think we have an RC out solving it, can you try with:pip install clearml==1.2.0rc2
Great! btw: final v1.2.0 should be out after the weekend
In both case if I get the element from the list, I am not able to get when the task started. Where is info stored?
If you are using client.tasks.get_all( ...)
should be under started
field
Specifically you can probably also do:queried_tasks = Task.query_tasks(additional_return_fields=['started']) print(queried_tasks[0]['id'], queried_tasks[0]['started'],)
so I guess this could be one reason to start about thinking upgrading ....
Wait you mean the clearml-server ? (there is no reason not to upgrade the python package)
FreshKangaroo33 you can:from time import time Task.query_tasks(..., task_filter=dict(started=['<{}'.format(datetime.utcfromtimestamp(time())), ]))
I think this should work
you can also just create a venv and run the tests there (with the latest python package) ?
What is the proper way to change a clearml.conf ?
inside a container you can mount an external clearml.conf, or override everything with OS environment
https://clear.ml/docs/latest/docs/configs/env_vars#server-connection