Reputation
Badges 1
25 × Eureka!can I mix steps with Task and Function?
Hmm interesting question, I think that in theory you should be able to, I have to admit that I have not tried yet, but it should work
models been trained stored ...
mongodb will store url links, the upload itself is controlled via the "output_uri" argument to the Task
If None is provided, the Trains log the local stored model (i.e. link to where you stored your model), if you provide one, Trains will automatically upload the model (into a new subfolder) and store the link to that subfolder.
- how can I enable the tensorboard and have the graphs been stored in trains?
Basically if you call Task.init all your...
Hi @<1671689437261598720:profile|FranticWhale40>
You mean the download just fails on the remote serving node becuause it takes too long to download the model?
(basically not a serving issue per-se but a download issue)
AdventurousRabbit79 are you passing cache_executed_step=False to the PipelineController ?
https://github.com/allegroai/clearml/blob/332ceab3eadef4997e897d171957975a247a6dc1/clearml/automation/controller.py#L129
Could you send a usage example ?
my pipeline controller always updates to the latest git commit id
This will only happen if the Task the pipeline creates has no specific commit ID, and instead just uses the latest from the git repo. Is this the case ?
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
This points to the wrong cu117 / driver - could that be?
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
Just to make sure, the first two steps are working ?
Maybe it has to do with the fact the "training" step specifies a docker image, could you try to remove it and check?
BTW: A few pointers
The return_values is used to specify multiple returned objects stored individually, not the type of the object. If there is a single object, no need to specify
The parents argument is optional, the pipeline components optimizes execution based on inputs, for example in your code, all pipeline comp...
so would that be "tags" "parents" ?
Shouldn't this be a real value and not a template
you mean value being pulled to the pod that failed ?
If you are using the latest RC:pip install clearml==0.17.5rc5You can pass True it will use the "files_server" as configured in your clearml.conf
I used the http link as a filler to point to the files_server.
Make sense ?
Hi GiddyTurkey39
Are you referring to an already executed Task or the current running one?
(Also, what is the use case here? is it because the "installed packages are in accurate?)
I cannot reproduce, tested with the same matplotlib version and python against the community server
FrothyShark37 what was different in your script ?
. It is not possible to specify the full output destination right?
Correct π
then when we triggered a inference deploy it failed
How would you control it? Is it based on a Task ? like a property "match python version" ?
I'm so glad you mentioned the cron job, it would have taken us hours to figure
Hi RoughTiger69
unfortunately, the model was serialized with a different module structure - it was originally placed in a (root) module called
model
....
Is this like a pickle issue?
Unfortunately, this doesnβt work inside clear.ml since there is some mechanism that overrides the import mechanism using
import_bind
.
__patched_import3
What error are you getting? (meaning why isn't it working)
Oh right, I missed the fact the helper functions are also decorated, yes it makes sense we add the tags as well.
Regarding nested pipelines, I think my main question is , are they independent or are we generating everything from the same code base?
If you spin two agent on the same GPU, they are not ware of one another ... So this is expected behavior ...
Make sense ?
you mean in the enterprise
Enterprise with the smarter GPU scheduler, this is inherent problem of sharing resources, there is no perfect solution, you either have fairness, but then you get idle GPU's of you have races, where you can get starvation
It's relatively new and it is great as from the usage aspect it is exactly like a user/pass only the pass is the PAT , really makes life easier
Hi PerplexedGoat65
it appears, in a practical sense, this means to mount the second drive, and then bind them in ClearMLβs configuration
Yes, the entire data folder (reason is, if you loose it, you loose all the server storage / artifacts)
Also, thinking about Docker and slower access speed for Docker mounts and such,
If the host OS is linux, you have nothing to worry about, speed will be the same.