They could, the problem by the time you set them,they have been read into the variables.
Maybe we should make it lazy loaded, it will also speedup the import.
Now will these 10 experiments be of different names? How will I know these are part of the 'mnist1' HPO case?
Yes (they will have the specific HP name/value combination).
FYI names are not unique so in theory you could have multiple experiments with the same name.
If you look under the Configuration Tab, you will find all the configuration arguments for the experiment. You can also add specific arguments to the experiment table (click the cogwheel at the right top corner, and select...
So net-net does this mean it’s behaving as expected,
It is as expected.
If no "Installed Packages" are listed, then it cannot pull a cached venv (because requirements.txt is not a full env, and it never analyzed it)).
It does however create a venv cache based on it (after installing it)
The Clone of this Task (i.e. right click on the UI clone experiment, enqueue it, Will use the cached copy becuase the full packages are listed in the "Installed Packages" section of the Task.
Make sens...
you can run md5 on the file as stored in the remote storage (nfs or s3)
s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?
I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R
SmugOx94 Yes, we just introduced it 🙂 with 0.16.3
Discussion was here (I'll make sure to update the issue that the version is out)
https://github.com/allegroai/trains/issues/222
In your trains.conf
add the following line:sdk.development.store_code_diff_from_remote = true
It will store the diff from the remote HEAD instead of the local one.
Basically the idea is that you create the pipeline once (say debug), then once you see it is running, you have a Task of your pipeline in the system (with any custom logic you added). With a Task in the system you can always clone/modify and launch externally (i.e. from code/ui. Make sense ?
I think it should be treated as failed,
I'm not sure where I stand on default behavior, it it could easily be an argument for the pipeline controller
JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)
When you have a bit of experience, please suggest a path forward, it will be great to integrate
Hmm I would recommend passing it as an artifact, or returning it's value from the decorated pipeline function. Wdyt?
[Assuming the above is what you are seeing]
What I "think" is happening is that the Pipeline creates it's own Task. When the pipeline completes, it closes it's own Task, basically making any later calls to Tasl.current_task() return None, because there is no active Task. I think this is the reason that when you are calling process_results(...) you end up with None.
For a quick fix, you can dopipeline = Pipeline(...) MedianPredictionCollector.process_results(pipeline._task)
Maybe we should...
Glad to hear that! 🙂
Suppose that a new model version 2 is trained, but it does not fulfill our target metrics, is it possible to just save the model to model repo and not serve it, if a model version 1 is already being served?
Sure, just do not "publish" the model, it will be stored in the model repository, fully accessible but the clearml-serving will not serve it 🙂
Hi MassiveBat21
CLEARML_AGENT_GIT_USER is actually git personal token
The easiest is to have a read only user/token for all the projects.
Another option is to use the ClearML vault (unfortunately not part of the open source) to automatically take these configuration on a per user basis.
wdyt?
Does clearml resolve the CUDA Version from driver or conda?
Actually it starts with the default CUDA based on the host driver, but when it installs the conda env it takes it from the "installed packages" (i.e. the one you used to execute the code in the first place)
Regrading link, I could not find the exact version bu this is close enough I guess:
None
Hi @<1600299043865497600:profile|MagnificentSeaurchin90>
Any chance you can provide more info on the error?
if I want to compare two experiments the scalar plots do not load ( loading forever ).
I'm assuming the issue is the Plots tab? or is it the Scalars? what do you have in the Plots? can you send an image of the single experiment ?
Maybe the only thing to worry about is making sure the IP address is stable, so if k8s replaces the node, you do not have to reconfigure the clients 🙂
I think this is the only mount you need:
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
SuccessfulKoala55 is this correct ?
Only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
Actually the way it works currently (and we might change it if there is a better way), every time you call PipelineDecorator.component
a new component is stored on the Pipeline Task, which is later translated into DaG graph and Table (next version will have a very nice UI to display / edit them).
The idea is first to have a representation of the p...
The quickest workaround would be, In your final code just do something like:my_params_for_hpo = {'key': omegaconf.key} task.connect(my_params_for_hpo, name='hpo_params') call_training_with_value(my_params_for_hpo['key'])
This will initialize the my_params_for_hpo
with the values from OmegaConf, and allow you to override them in the hyperparameyter section (task.connect is two, in manual it stores the data on the Task, in agent mode, it takes the values from the Task and puts them ba...
Well done man!
Hi @<1610808279263350784:profile|FriendlyShrimp96>
Is there a way to get a list of variants given a metric, or even just a full list of metrics and variants for a given task id?
Try this
None
from clearml.backend_api.session.client import APIClient
c = APIClient()
metrics = c.events.get_task_metrics(tasks=["TASK_ID_HERE"], event_type="training_debug_image")
print(metrics)
I think API ...