
Reputation
Badges 1
25 × Eureka!I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
ExcitedFish86 Oh if this is the case:
in your cleaml.conf:agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: true
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://git...
, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
A bit of how clearml-agent works (and actually on how clearml itself works).
When running manually (i.e. not executed by an agent), Task.init (and similarly task.connect etc.) will log data on the Task itself (i.e. will send arguments /parameters to the server), This includes logint the argparser for example (and any other part of the automagic or manuall connect).
When run...
I still see things being installed when the experiment starts. Why does that happen?
This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)
the hack doesn't work if conda is not installedΒ
Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing
Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace
And the assumption is the code is also there ?
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
I'm trying to achieve a workflow similar to the one
You mean running everything on a single machine (manually)?
ExcitedFish86 that said if running in docker mode you can actually pass it on a Task basis with:-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/venv/bin/python
as an additional docker container argument on the Task "Execution" tab itself.
Try this one πHyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
we run in containers without venv, in the main section, and then delete it or use it for similar experimentsIf this is the case then the idea is the venv creation is actually cached, you can turn it on here (unmark the line)
https://github.com/allegroai/clearml-agent/blob/51eb0a713cc78bd35ca15ed9440ddc92ffe7f37c/docs/clearml.conf#L116
in which I can just spawn an ad-hoc worker
Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?
Hi @<1523702932069945344:profile|CheerfulGorilla72>
the agent is Always inherits from the docker system installed environment
If you have a custom venv inside the docker that is Not activated by default you can set the agent to use it:
None
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
itβs not implemented right,
I think we forgot to add it as an argument (the query models supports it, but it is not passed to the call)
If there is new issue will let you know in the new thread
Thanks! I would really like to understand what is the correct configuration
Yes, it's a bit confusing, the gist of it is that we wanted to have the ability to have diff configurations for diff buckets
Hi TastyOwl44
So this depends on your code itself, but usually you need a CPU machine to run ClearML server (or use the free community server), than a machine to run the pipeline controller (usually the same machine running the clearml-server
, as the pipeline control code is basically controller only and does not execute the Task itself), lastly you need machines with GPU running the clearml-agent
(these GPU machines are the one actually doing the training inference etc.)
Make ...
Hi SmugOx94
Hmm are you creating the environment manually, or is it done by Task.init ?
(Basically Task.init will store the entire environment of conda, and if the agent is working with conda package manager it will use it to restore it)
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L50
Hi SmallDeer34
The clearml-agent has its own cleaml.conf file , there you should put S3 credentials and they will be passed to any Task the agent executes:
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L234
Hmm so you are saying you have to be logged out to make the link work? (I mean pressing the link will log you in and then you get access)
Just curious, if
is a value I can set, where is it used?
It is used when Creating a dataset from inside the cluster (i.e. when launching using the clearml k8s glue),
it will have No effect on what users have on their local machines
i.e. they can always point to a diff server.
That said, when users create their initial clearml.conf and copy paste the info from the web UI, this value (or it might be another one, I'll double check later) will set the initial configuration the c...
... these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?
Helper functions are not "component", they are actually files that will be accessible when running the component itself.
am I missing something ?
YummyMoth34
It tried to upload all events and then killed the experiment
Could you send a log?
Also, what's the train package version ?
Correct (basically pip freeze results)
Make sure you have the S3 credentials in your agent's clearml.conf :
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L210
Hmm there was this one:
https://github.com/allegroai/clearml/commit/f3d42d0a531db13b1bacbf0977de6480fedce7f6
Basically always caching steps (hence the skip), you can install from the main branch to verify this is the issue. an RC is due in a few days (it was already supposed to be out but got a bit delayed)