Reputation
Badges 1
25 × Eureka!OddShrimp85 you can see the full configuration at the top of the Task log. What do you have there? Also what is the clearml python version?
Hi SmallDeer34
The clearml-agent has its own cleaml.conf file , there you should put S3 credentials and they will be passed to any Task the agent executes:
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L234
Then in theory (since the backend is python based) you just need to find a base docker image to build it on.
I think the reason is that the "original" task is already the right type. I'll make sure we fix it, and always set the system tag
it overwrites the previous run?
It will overwrite the previous if
Under 72h from last execution no artifact/model was createdYou can control it with "reuse_last_task_id=False" passed to Task.init
Task name itself is Not unique in the system, think of it as short description
Make sense ?
Hi @<1544853695869489152:profile|NonchalantOx99>
I would assume the clearml-server configuration / access key is misconfigured in your copy of example.env
Nested in the UI is not possible I think?
Yes, but the next version will have nested projects, that's something ๐
I mean that it is possible to start the subtask while the main task is still active.
You cannot call another Task.init while a main one is running.
But you can call Task.create and log into it, that said the autologging is not supported on the newly created Task.
Maybe the easiest solution is just to do the "sub-tasks" and close them. That means the main Task i...
The first pipeline
ย step is calling init
GiddyPeacock64 Is this enough to track all the steps?
I guess my main question is every step in the pipeline an actual Task/Job or is it a single small function?
Kubeflow is great for simple DAGs but when you need to build more complex logic it is usually a bit limited
(for example the visibility into what's going on inside each step is missing so you cannot make a decision based on that).
WDYT?
Hi CheerfulGorilla72
see
Notice all posts on that channel are @ channel ๐
Regrading the limit interface, let me check I think this is worked on (i.e. nice interface that should be pushed in the next few days). Let me get back to you on this one.
How will imposing an instance limit , prevent or allow --order-fairness feature for example, which exists when running in clearml-agent version compared to k8s_glue_example version ?
A bit of background on how the glue works:
It pulls jobs from the clearml queue, then it prepares a k8s job, and launches the k8s jobs...
but can it NOT use /tmp for this iโm merging about 100GB
You mean to configure your Temp folder for when squashing ?
you can do hack the following:
` import tempfile
tempfile.tempdir = "/my/new/temp"
Dataset squash
tempfile.tempdir = None `But regradless I think this is worth a GitHub issue with feature request, to set the temp folder///
generally speaking the agent will convert the repo url to the auth scheme it is configured with, ssh->hhtp if using user/pass, and http->ssh if using ssh
Notice the configuration parameters:
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/services/monitoring/slack_alerts.py#L160
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/services/monitoring/slack_alerts.py#L162
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/services/monitoring/slack_alerts.py#L156
. Curious what advantage it would be to use the StorageManager
Basically if you set the clearml cache folder to the EFS, users can always do:from clearml import StorageManager local_file = StorageManager.get_local_copy("
")
where local_file is stored on persistent cache (EFS) and the cache is automatically cleaned based on last accessed file
but DS in order for models to be uploaded,
you still have to set:
output_uri=True
in the
No, if you set the default_output_uri, there is no need to pass output_uri=True
in the Task.init()
๐
It is basically setting it for you, make sense ?
Or is this a feature of hyperdatasets and i just mixed them up.
Ohh yes, this is it. Hyper Datasets are part of the UI (i.e. there is a Tab with the HyperDataset query) Dataset Usage is currently listed on the Task. make sense ?
named asย
venv_update
ย (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable anย
output_uri
ย parameter in theย
PipelineDecorator.componen...
Hi SmoggyGoat53
What do you mean by "feature store" ? (These days the definition is quite broad, hence my question)
Can you post here the actual line? seems like we can fix it to also support this scenario (if we could test it)
SubstantialElk6
Regrading cloning the executed Task:
In the pip requirements syntax, "@" is a hint that tells pip where to find the package if it is not preinstalled.
Usually when you find the @ /tmp/folder
It means the packages was preinstalled (usually pre installed in the docker).
What is the exact scenario that caused it to appear (this was always the case, before v1 as well).
For example zipp
package is installed from pypi be default and not from local temp file.
Your fix b...
UnevenOstrich23
but interesting that auto-reload config does not working as I expected.
Unfortunately the trains-agent does not support auto reloading the config file yet. If you think this will be a great feature, please feel free to open a GitHub feature request issue ๐
EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?
481.2130692792125 seconds
This is very slow.
It makes no sense, it cannot be network (this is basically http post, and I'm assuming both machines on the same LAN, correct ?)
My guess is the filesystem on the clearml-server... Are you having any other performance issues ?
(I'm thinking HD degradation, which could lead to a slow write speeds, which would effect the Elastic/Mongo as well)
Could it be that clone has to be False? (I assume the reasoning is the cloning feature)
BTW: trains-agent is leaner, and does not need plotly. And you can use the APIClient to basically query the entire system, would that be a better solution? See https://github.com/allegroai/trains-agent/blob/master/examples/archive_experiments.py
Hi SteadyFox10 , this one will get all the last metric scalarstrain_logger.get_last_scalar_metrics()
Hi ElegantKangaroo44 ,
This is basically the number of average number of experiments running, and the number of projects, and number of users. I think this is about it. nothing like google-analytics stuff. It is mainly aimed at giving some idea on how large is the usage. Sounds reasonable?