I'm not sure I'm the right person to answer that, but yes my understanding is that this is a Scale/Enterprise tier feature, at least for the time being.
Hi JuicyFox94
you pointed to exactly the issue π
In your trains.conf
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L94
Hi SmarmyDolphin68
I see this in between my training epochs, what could be causing this?
This is basically saying we are saving a second model on the same Task and even though both are logged, only the last is stored on the Task itself.
This will change as in the next version a Task will be able to hold reference to multiple models in the artifactory π
Seems like everything is in order. Can you curl to the API/web/files server?
Would be cool to let it get untracked as well, especially if we want to as an option
How would you decide what should be tracked?
DistressedGoat23
We are running a hyperparameter tuning (using some cv) which might take a long time and might be even aborted unexpectedly due to machine resources.
We therefore want to see the progress
On the HPO Task itself (not the individual experiments the one controlling it all) there is the global progress of the optimization metric, is this what you are looking for ? Am I missing something?
I reached over 1M API calls in about one week using clearml-serving
Oh that makes sense now π
If I remember correctly, adding an additional model to a signal clearml-serving instance should not actually change the number of API calls, they are mostly affected by the number of clearml-serving / containers and not in the number of models.
We are planning an RC later this week, I'll make sure this fix is part of it
(just using local server not connected to Internet), am I right?
You can if you host your own git server, Or if your code is a single file / jupyter notebook, then the entire code is stored on the Task.
btw: what is the exact setup, how come there is no git repo?
mostly out of curiosity, what is the motivation behind introducing this as an environment variable knob rather then a flag with some default in Task.init?
DepressedChimpanzee34 we will deprecate the demo server (not exactly sure when) as we have the free community one that gives better service and stores the data. It was originally set for easy on-boarding and testing, but I think that now the user experience might be better with using the community free tier.
Make sense ? btw: what ...
Hi FrothyShark37
is the task scheduler only acessible through the SDK?
yes, in the open source version this is strictly code based. I know the enterprise tier has a UI for it, but in terms of features I believe this is equivalent
Sure, thing, I'll fix the "create_draft" docstring to suggest it
Dynamic GPU option only available with Enterprise version right?
Correct π
should i only do mongodb
No, you should do all 3 DBs ELK , Mongo, Redis
ItchyJellyfish73
Unfortunately this needs backend support, and only available in the enterprise version, what is your use case for it? (It was designed to allow out of the box bare-metal multi gpu dynamic allocation, think DGX with 8 GPUs that instead of spinning down agents when you want to change the queue->num-gpu mapping you can do it on the fly)
Hmm, let me check, there is a chance the level is dropped when manually reporting (it might be saved for internal critical reports). Regardless I can't see any reason we could not allow to control it.
because it should have detected it...
Did you see "Repository and package analysis timed out ..."
Hi WittyOwl57
I think what happens is it auto-logs the joblib load/save calls, these calls track models used/created by the code, and attach them to the model repository representing these model.
I'm assuming there are multiple load/save , and there are multiple model instances pointing to the same local file "file:///tmp/..." . The earning basically says it is re-registering existing models.
Make sense ?
I can install clearml and clearml-agemt and run the worker inside a docker
oh I see, you should install it inside a docker, then mount the docker socket so it can spin sibling containers , ans lastly make sure the mounts are correct with this env variable:
None
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
Hi GreasyPenguin14
Quick question, any reason not to use a 2D scatter ? or a histogram (or any other non time-series plot)?
HighOtter69
By default if you are continuing an experiment it will start from the last iteration of the previous run. you can reset it with:task.set_initial_iteration(0)
Sure thing, let me know ... π
when you clone the Task, it might be before it is done syncying git / packages.
Also, since you are using 0.16 you have to have a section name (Args or General etc.)
How will task b use the parameters ? (argparser / connect dict?)