yes, that makes sense to me.
What is your specific use case, meaning when/how do you stop / launch the hpo?
Would it make sense to continue from a previous execution and just provide the Task ID? Wdyt?
TrickySheep9
Is there a way to see a roadmap on such things
? (edited)
Hmm I think we have some internal one, I have to admit these things change priority all the time (so it is hard to put an actual date on them).
Generally speaking, pipelines with functions should be out in a week or so, TaskScheduler + Task Triggers should be out at about the same time.
UI for creating pipelines directly from the web app is in the working, but I do not have a specific ETA on that
Hi @<1734020162731905024:profile|RattyBluewhale45>
What's the clearml agent version? And could you verify with the latest RC?
Lastly how are you running the agent, docker mode? What's the bade container?
To auto upload the model you have to tell clearml to upload it somewhere, usually by passing output_uri to Task.init or setting the default_output_uri in the clearml.conf
My clearml-server server crashed for some reason
😞 No worries
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
Yes they are supposed to be routed there by pytorch dist
(and the TB logs are on the master only anyhow)
Intersting!
I would also add that Task name is not unique and you can use to describe the "process / goal etc" which would make it pretty obvious to search / review from the UI.
Regrading models and branchs, Iw ould use the Task tags (you can have as many as you like) to tag the specific model type (or dev branch if the alg is diff), this means you can also easily filter based on the Tags in the UI.
can you use the Web UI to compare the artifacts from two separate subprojects?
Yes comp...
Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
Hmm. What's the Hydra version you have?
The second run prints out the same (non) "random" numbers as the first run
ClearML sets the initial random seed for you, basically trying to help with reproducibility. That said inside the function you can always do:import random import time random.seed(time.time())
Thank you! 😊
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that 🙂 what exactly are you looking for ?
Hi SharpDove45
what
suggested about how it fails on bad/missing credentials
Yes, this is correct, since you specifically set the hosts worst case you will end up with wrong credentials 🙂
(you can find it in the pipeline component page)
This seems more complicated that I thought... I think you are correct, and it fails to load the entire module, let me check what I can do
GrievingTurkey78 yes, you are correct on both.
Will the sweep functionality work?
Yes it should, that said, it will not use the trains-agent
so you are limited to the machine running the sweep.
If you want to do HPO on multi-node, checkout this example 🙂
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
Hi HarebrainedBear62
What's the type of data ?
Is this from the pipeline logic? Or a component?
Hi GrievingTurkey78
Could you provide some more details on your use case, and what's expected?
WackyRabbit7 hmmm seems like non regular character inside the diff.
Let me check something
Interesting question, should work and looks like an interesting combination, I'm curious what you come up with.
btw: grafana itself can already provide a lot of alerts for drift etc, this is basically their histogram delta feature
Sure, in that case, wait until tomorrow, when the github repo is fully synced
Ohh, the controller task itself holds the artifacts ?
Hi @<1686547344096497664:profile|ContemplativeArcticwolf43>
In the 2nd 'Getting Started' tutorial,
Could you send a link to the specific notebook?
. But whenever a task is picked, it fails for the following
You mean after the Task.init
call?
Hi ReassuredTiger98
Basically assuming Linux, init.d will do the trick
https://unix.stackexchange.com/questions/20357/how-can-i-make-a-script-in-etc-init-d-start-at-boot
Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...