Reputation
Badges 1
25 × Eureka!using this is it possible to add to requirements of task with task_overrides?
Correct, but you will be replacing (not adding) requirements
I think it was just pushed, including nested call you have to use the new argument for the decorator, helper_function
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L2392
In theory task.tags.remove(tag) might also work, but I'm not sure of it will automatically be updated on the backend
The downstream stages are rankN scripts, they are waiting for the IP address of the first stage.
Is this like a multi-node training, rather than a pipeline ?
TrickyRaccoon92 I didn't know that π
where did you try to add it? did you report a plotly figure or is it with report_???
DilapidatedDucks58 You might be able to, check the links, they might be embedded into the docker, so you can map diff png file from the host π
BTW: what would you change the icons to?
if I run my own ClearML self-hosted server?
Then you have everything on your end, it will not communicate with the saas offering. meaning no limits what so ever.
(That said some of the cloud auto-scaling and compute features are not part of the open source)
what does it mean to run the steps locally?
start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, other...
ZanyPig66 what do you mean with "git integration " ? So what would be two ways of calling the function, where one works and the other does not?
Task.debug_simulate_remote_task simulates the Task being executed by the agent (basically same behaviour, only local). the argument it gets is the Task ID (string).
The to see how it works is to run the code once (no debug_simulate call), get the Task ID we created, then rerun with the debug_simulate_remote_task passing the previous Task.ID
Make sense ?
Another issue that might be the case, might be that I'm on ubuntu some of the packages might've been for windows thus the different versions not existing
Usually this is not the case, the version number match (implementation wise it might be a different file, but it is almost always a matching version)
And can you see your promethues in your grafana?
Great!
BTW: you can take some inspiration from here:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
Or from the full pipeline:
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
No worries, I would love for us to come up with a nice solution π
Hi FierceHamster54
This is already supported, unfortunately the open-source version only supports static allocation (i.e you can spin multiple agents and connect each one to specific number of GPUs), the dynamic option (where you have single agent allocating jobs to multiple GPUs / Slices is only part of the enterprise edition
(there is the hidden assumption there that if you spent so much on a DGX you are probably not a small team π )
but is there any other way to get env vars / any value or secret from the host to the docker of a task?
if this is docker -e/--env as argument would do the same-e VAR=somevalue
this
from fastai.callbacks.tensorboard import LearnerTensorboardWriter
doesnβt exist anymore in fastai2
Hmm we should definitely update the example to fastai2 API
maybe the fastai bindings in clearml package are outdated
Are you getting any scalars reported to clearml?
they also appear to be relying on the tensorboard callback which seems not to work on distributed training
Yes that is correct, usually the way it works all nodes report back to "master...
Hi @<1523701079223570432:profile|ReassuredOwl55> let me try ti add some color here:
Basically we have to parts (1) pipeline logic, i.e. the code that drives the DAG, (2) pipeline components, e.g. model verification
The pipeline logic (1) i.e. the code that creates the dag, the tasks and enqueues them, will be running in the git actions context. i.e. this is the automation code. The pipeline components themselves (2) e.g. model verification training etc. are running using the clearml agents...
can you tell me what the serving example is in terms of the explanation above and what the triton serving engine is,
Great idea!
This line actually creates the control Task (2)clearml-serving triton --project "serving" --name "serving example"
This line configures the control Task (the idea is that you can do that even when the control Task is already running, but in this case it is still in draft mode).
Notice the actual model serving configuration is already stored on the crea...
it certainly does not use tensorboard python lib
Hmm, yes I assume this is why the automagic is not working π
Does it have a pythonic interface form the metrics ?
Hmm, how does your preprocessing code looks like?
what do you see in the console when you start the trains-agent , it should detect the cuda version
Out of interest, is there a reason these are read-only?
Yes, we should probably change that... they are designed to be pre-populated, but there should not be any reason you could not remove them
The code for these tasks is on github right?
Correct
Actually that is less interesting, as it is quite straight forward
We should probably change it so it is more human readable π
Once a model is saved and published, it should be downloadable right
Well that depends if you configured CLearML to autoupload it (by default it will just log the "local location").
To auto-upload add output_uri=True to Task.Init (or specify a destination with output_uri= ` s3://bucket/ )
You can also configure it as default here:
https://github.com/allegroai/clearml/blob/65f1c0baa124efb05fb7894a5386f0dd52c0536b/docs/clearml.conf#L163
AttractiveCockroach17 I verified this is an issue with hypeparemeters with "." or section names with ".", thank you for noticing!
I will make sure I pass it along, should be part of the next version (ETA a week) π
Hi AstonishingSwan80 , what do you mean by "ec2 API"?