Reputation
Badges 1
25 × Eureka!ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
WackyRabbit7 basically starting v1.1 if you are running code without any configuration file, you will get an error (in contrast to previous versions where it defaulted to the demo-server)
Hi CleanPigeon16
You need to be able access the machine running the agent, usually the default port will be 10022.
If you need further debug message, add --debug at the beginning of the clearml-session.clearml-session --debug ...
To get all the debug print, please upgrade to clearml-session==0.3.3
Since I can't use the
torchrun
comand (from my tests, clearml won't use it on the clearm-agent), I went with the
@<1556450111259676672:profile|PlainSeaurchin97> did you check this example?
None
I want the task of human tagging a model to be βjust another step in the pipelineβ
That makes total sense.
Quick question, would you prefer the pipeline controller to "wait" for the tagging and then continue, or would it make more sense to create a trigger on the tagging ?
Hi JitteryCoyote63
I think there is a GitHub issue (request on it), this is not very trivial to build (basically you need the agent to first temporary pull the git, apply changes, build docker, remove temp build, and restart with the new image)
Any specific reason for not pushing a docker, or using the extra docker bash script on the Task itslef?
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
Hi GreasyPenguin14
It looks like you are trying to delete a Task that does not exist
Any chance the cleanup service is misconfigured (i.e. accessing the incorrect server) ?
you should have a gpu argument there, set it to true
Sure thing, let me know ... π
Hi GiganticTurtle0
Sure, OutputModel can be manually connected:model = OutputModel(task=Task.current_task()) model.update_weights(weights_filename='localfile.pkl')
file and redirect the public url to k8 dns url?
Yes! that would work, Nice!
You can add it into the extra_docker_shell_script
it will be executed in any pod the clearml-glue will spin (obviously this needs to be configured on the pod running the clearml k8s glue)
https://github.com/allegroai/clearml-agent/blob/ba2db4e727b90e595df2b13f458d9580659bf12e/docs/clearml.conf#L152
TrickyRaccoon92 actually Click is on the to do list as well ...
I see, when you run it manually (i.e. not via an agent) what do you have under the configuration tab in the UI (meaning do you see both argparser arguments there)?
Hi JitteryCoyote63 , is there a callback for that?
it's in the docker image, doesn't the git clone command run in the container
Then this should have worked.
Did you pass in the configuration: force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L25
okay let's PR this fix ?
PompousParrot44
Check out the task.execute_remotely()
You can call it right after the task init, and it will enqueue your running Task, and leave the process (if you want).
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/task.py#L1437
Notice there is no need to upgrade the server, only the ClearML python package
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
Hi @<1533257411639382016:profile|RobustRat47>
sorry for the delay,
Hi when we try and sign up a user with github.
wait, where are you getting this link?
When I passed specific arguments (for example --steps) it ignored them...
script.py test blah1 blah2 blah3 42
Is this how it is intended to be used ?
Yey!
Out of curiosity, what's the workflow with snowflake?
Apparently the error comes when I try to access from
get_model_and_features
the pipeline component
load_model
. If it is not set as pipeline component and only as helper function (provided it is declared before the components that calls it (I already understood that and fixed, different from the code I sent above).
ShallowGoldfish8 so now I'm a bit confused, are you saying that now it works as expected ?
Hi UnevenDolphin73
Does ClearML somehow
remove
any loggers from
logging
module? We suddenly noticed that we have some handlers missing when running in ClearML
I believe it adds a logger, it should not remove any loggers,
What's the clearml version you are using ?
HighOtter69 I was able to change the color individually without an issue. What's your clearml-server ? are you using the community server ?
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?