And maybe adding idle time spent without a job to API is not that a bad idea 😉
yes, adding that to the feature list 🙂
What if I write the last active state in an instance tag? This could be a solution…
I love this hack, yes this should just work.
BTW: if you lambda is a for loop that is constantly checking there is no need to actually store "last idle timestamp check as tag", no?
How can I make a task that does a helm install or kubectl create deployment.yaml?
The task that it launches should have your code that actually does the helm deployments and other things, thing of the Task as a way to launch a script that does something, that script can then just interact with the cluster. The queue itself (i.e. clearml-agent) will not directly deploy helm charts, it will only deploy jobs (i.e. pods)
Any recommendation or working combinations of AMI
I would take the deeplearning AMIs from Nvidia AWS , I think they work on both CPU and GPU machines.
In terms of dockers, python dockers for CPU and nvidia runtime for GPU
[https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86d[…]d2c01646d599352f6ddd9893420eb815a06c3b90619f8?context=explore](https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86db7f6b1b286d2c01646d599352f6ddd98...
@<1539780258050347008:profile|CheerfulKoala77> make sure the AMI id matches the zone of the EC2 machine
@<1569858449813016576:profile|JumpyRaven4> fyi clearml-serving was synced 🤞
Hi GrievingTurkey78
the artifacts are downloaded to the cache folder (and by default the last 100 accessed artifacts are maintained there).
node executes the task all the info will be erased or does this have to be done explicitly?
Are you referring to the trains-agent
running a docker?
By default the cache is persistent between execution (i.e. saving time on multiple downloads between experiments)
because step can be constructed with multiple
sub-components
but not all of them might be added to the UI graph
Just to make sure I fully understand when we decorate with @sub_node we want that to also appear in the UI graph (and have it's own Task / metrics etc)
correct?
or shall I call the Task.init even from the agent
WorriedParrot51 I think something is lost here.
Task.init() is always called, even when the agent is executing the code. The difference is in what happens inside the Task.init() call. When the codebase itself is executed by the trains-agent, it signals through OS environment to the task.init() that instead of a new created task, it should use the already created one. from this point all data flows from the trains-server back into the c...
can I mount the s3 bucket as file system on place where
you need to mount it where the file server is storing it's files, correct (notice, not the DBs, just the files server)
PreciousParrot26 I think this is really a matter of the CI process having very limited resources. just to be clear, you are correct and the steps them selves are Not executed inside the CI environment, but it seems that even running the pipeline logic is somehow "too much" for the limited resources... Make sense ?
Hi BrightGoat74
So merging general purpose plotly plots is very hard (i.e. putting both on the same graph)
But if you report using logger.report_scatter(...) the UI will merge the ROC curves into the dame graph, wdyt?
https://clear.ml/docs/latest/docs/guides/reporting/scatter_hist_confusion_mat_reporting#2d-scatter-plots
I'm getting:hydra_core == 1.1.1
What's the setup you have? python version, OS, Conda yes/no?
If this is the case, then we do not change the maptplotlib backend
Also
I've attempted converting theÂ
mpl
 image toÂ
PIL
 and useÂ
report_image
 to push the image, to no avail.
What are you getting? error / exception ?
pipe.start_locally() will run the DAG compute part on the same machine, where pipe.start() will start it on a remote worker (if it is not already running on a remote worker)
basically "pipe.start()" executed via an agent, will start the compute (no overhead)
does that help?
well I do not think you set your pytorch lightining to use cuda:
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
Dynamic GPU option only available with Enterprise version right?
Correct 🙂
I figured out the problem...
Nice!
Unfortunately, the hyperparameters in configuration object seems to be superior to the hyperparameters in Hyperparameter section
Hmm what do you mean by that ? how did you construct the code itself? (you should be able to "prioritize" one over the over)
LazyTurkey38 , ohh I think you are correct 😞
it should be:# patch the Task and actually send it for execution if Task.running_locally(): # this will verify all auto repo detection and python is done. task.close() # so that we can edit the task task.reset() # update the repo task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}}) # now to actually enqueue the Task Task.enqueue(task, queue_name='default')
wdyt?
Could it be these packages (i.e. numpy etc) are not installed as system packages in the docker (i.e. inside a venv, inside the docker) ?
What should have happened is the experiments should have been pending (i.e. in a queue)
(Not sure why they are not).
You can manually send them for execution , right click on an experiment in the able, select enqueue and select the default queue (This will be the one the trains-agent will pull from , by default)
No TB (Tesnorboard) is not enabled.
That explains it 🙂 did you manage to get it working ?
Hmm good point, it should probably return he clearml python version. Is this what you mean?
logger.report_scalar("loss", "train", iteration=0, value=100)
logger.report_scalar("loss", "test", iteration=0, value=200)
Thanks GorgeousMole24
That is a very good point! passing to product guys
Correct, but do notice that (1) task names are not unique and you can change them after the Task was executed (2) when you clone the Task, you can actually rename it, when an agent is running the Task, basically the init
function is ignored, because the Task already exists. Make sense ?