Reputation
Badges 1
25 × Eureka!BTW: why use CLI? the idea of clearml it becomes part of the code, even in the development process, this means add "Task.init(...)" at the beginning of the code, this creates the Tasks and logs them as part of the development. Which means that xecuting them is essentially cloning and enqueuing in the UI. Of course you can automate it directly as part of the code.
Hi MammothGoat53
Do you mean working with RestAPI directly?
https://clear.ml/docs/latest/docs/references/api/events
Check on which queue the HPO puts the Tasks, and if the agent is listening to these queues
and you have clearml v0.17.2 installed on the "system" packages level, and 0.17.5rc6 installed inside the pyenv venv ?
JitteryCoyote63 I meant to store the parent ID as another "hyper-parameter" (under its own section name) not the data itself.
Makes sense ?
Task status change to "completed" is set after all artifacts upload is completed.
JitteryCoyote63 that seems like the correct behavior for your scenario
JitteryCoyote63 look for the latest RC it should have the fix (output_uri=False) 1.7.3rc1
I think this is the temp requirements it creates not your requirements file. If you attach a log here with the "installed packages" section maybe we could help to debug it
SparklingHedgehong28 this is actually quite cool! Still not sure why not just use the built in autoscaler https://github.com/allegroai/clearml/tree/master/examples/services/aws-autoscaler , but it is a really cool usage of ASG π€©
E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.
Hmm yes, that makes sense
That'd be a great solution, thanks! I'll create a PR shortly
Thank you! π π€©
Thanks BitterStarfish58 !
IrritableJellyfish76 point taken, suggestions on improving the interface ?
With remote_execution it isΒ
command="[...]"
Β , but on local it isΒ
command='train'
Β like it is supposed to be.
I'm not sure I follow, could you expand ?
What do you mean? every Model has a unique ID, what do you consider a version?
How do I reproduce it? When I use add_step with the wrong parameter it throws an exception before the pipeline even starts ...
just got the pipeline to runΒ
Nice!
using the default queue okay?
Using the default queue is fine. The different queue is the "services" queue that by default the "trains-server" is running an agent the will pull jobs from there.
With "services" mode, an agent will pull jobs right after the other (not waiting for the previous job to finish), as opposed to regular queue (any other) that the trains-agent will pull a job only after the previous one completed .
(This code sample should work on your setup with your installed packages without a problem)
give me a minute to test
No worries π glad to hear it worked out π
Hi CleanWhale17 , at least for the moment, the code although open ( https://github.com/allegroai/trains-web ) has no external theme/customization interface.
That said we do have some thoughts on it.., What did you have in mind ?
I mean if I enter my host machine ssh password it works. But we will disable password auth in future, so itβs not an option
To clarify, it should not allow users to ssh into the host machine (if you can do that this means you own it), it only allows users to SSH into the container the host machine spins, make sense ?
ScantChimpanzee51 what's the use case for the full path without specific artifact?
set the following:CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...
The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible
Btw it seems the docker runs in
network=host
Yes, this is so if you have multiple agents running on the same machine they can find a new open port π
I can telnet the port from my mac:
Okay this seems like it is working
no available π
PompousHawk82 unfortunately this is kind of binary, either you have full tracking of load/save operations or you do not.
This warning message will disappear in the next version as we will be able to log multiple models under the same Task :)
Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?
Only if you have a Callback function, and that callback function returns False, then it will skip it (otherwise it will process it)
Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?
Correct, the step name is a unique identifier for the pipeline
how would I access the artifact of a previous step within the pre ...
AstonishingRabbit13 so is it working now ?