Task.completed(ignore_errors=False)
What are you getting?
execution_queue
is not relevent anymore
Correct
total_max_jobs
is determined by how many machine I launch the script
Actually this is the number of concurrent subprocesses that are launched on Your machine. Notice that local execution means all experiments are launched on the machine that started the HPO process.
Maybe to clarify, I was looking for something with the more classic Ask-and-Tell interface
so the way to connect "ask" in the model, is to just...
you need to set
CLEARML_DEFAULT_BASE_SERVE_URL:
So it knows how to access itself
this is very odd, can you post the log?
I see TrickyFox41 try the following:--args overrides="param=value"
Notice this will change the Args/overrides argument that will be parsed by hydra to override it's params
Hi @<1523715429694967808:profile|ThickCrow29> , thank you for pinging!
We fixed the issue (hopefully) can you verify with the latest RC? 1.14.0rc0 ?
worker nodes are bare metal and they are not in k8s yet
By default the agent will use 10022 as an initial starting port for running the sshd that will be mapped into the container. This has nothing to do with the Host machine's sshd. (I'm assuming agent running in docker mode)
Hover over the border (I would suggest to use the full screen, i.e. maximize)
My main issue with this approach is that it breaks the workflow into “a-sync” set of tasks:
This is kind of the way you depicted it, meaning, there is an an initial dataset, "offline process" (i.e. external labeling) then, ingest process.
I was wondering if the “waiting” operator can actually be a part of the pipeline.
This way it will look more clear what is the workflow we are executing.
Hmm, so pipeline is "aborted", then the trigger relaunches the pipeline, and the pipeli...
But in credentials creation it still shows 8008. Are there any other places in docker-compose.yml where port from 8008 to 8011 should be replaced?
I think there is a way to "tell" it what to out there, not sure:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#configuration-files
IrritableOwl63 in the profile page, look at the bottom right corner
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
Hi BurlyRaccoon64
What do you mean by "custom_build_script" ? not sure I found it in "clearml,conf"
https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf
is the "installed packages" part editable? good to know
Of course it is, when you clone a Task everything is Editable 🙂
Isn't it a bit risky manually changing a package version?
worst case it will crash quickly, and you reset/edit/enqueue 🙂
(Should work though)
Yes I think the writer.add_figure
somehow crops the image
Hi WackyRabbit7 ,
Regrading git credentials, see here in the trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L18
Trains assumes one of two (almost three) possible setups
Your code/script is in a git repository. Then when executing manually all the git references incl` uncommitted changes are stored. Then when executing with the trains-agent, it will clone the code based on these references apply the uncommitted changes and run your code. To do that the ...
How do you run the
clearml-agent
in docker mode
clearml-agent --docker
See here:
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?
Hmm I'm sorry it might be "continue_last_task", can you try:Task.init(..., continue_last_task="aabb11")
Hi @<1523715429694967808:profile|ThickCrow29>
Is there a way to specify a callback upon an abort action from the user
You mean abort of the entire pipeline?
None
so it would be better just to use the original code files and the same conda env. if possible…
Hmm you can actually run your code in "agent mode" assuming you have everything else setup.
This basically means you set a few environment variables prior to launching the code:
Basically:export CLEARML_TASK_ID=<The_task_id_to_run> export CLEARML_LOG_TASK_TO_BACKEND=1 export CLEARML_SIMULATE_REMOTE_TASK=1 python my_script_here.py
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script
Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
For example, could you test if this one works:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
Something like the TYPE_STRING that Triton accepts.
I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py
and since the update the docs seem to be a bit off but I think I got it
Working on a whole new site 😉
JitteryCoyote63 the agent.cuda_version
(or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version
should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDN...
Would you have an example of this in your code blogs to demonstrate this utilisation?
Yes! I definitely think this is important, and hopefully we will see something there 🙂 (or at least in the docs)
Does this mean that I need to create multiple ssh keys? 1 key for each user?
I think so
Use .git-credentials
This might also support multiple user/repo