It all depends how we store the meta-data on the performance. You could actually retrieve it from the say val metric and deduce the epoch based on that
I'd prefer to use config_dict, I think it's cleaner
I'm definitely with you
Good news:
new
best_model
is saved, add a tag
best
,
Already supported, (you just can't see the tag, but it is there :))
My question is, what do you think would be the easiest interface to tell (post/pre) store, tag/mark this model as best so far (btw, obviously if we know it's not good, why do we bother to store it in the first place...)
JitteryCoyote63 okay... but let me explain a bit so you get a better intuition for next time 🙂
The Task.init call, when running remotely, assumes the Task object already exists in the backend, so it ignores whatever was in the code and uses the data stored on the trains-server, similar to what's happening with Task.connect and the argparser.
This gives you the option of adding/changing the "output_uri" for any Task regardless of the code. In the Execution tab, change the "Output Destina...
JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL 😄
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merchandise with AI 🚆 😉
Anyhow, this one definitely backfired...
UpsetBlackbird87pipeline.start()
Will launch the pipeline itself On a remote machine (a machine running the services agent).
This is why your pipeline is "stuck" it is not actually running.
When you call start_lcoally() the pipeline logic itself is runnign on your machine and the nodes are running on the workers.
Makes sense ?
docstring ?
Usually the preferred way is StorageManager
https://clear.ml/docs/latest/docs/references/sdk/storage
https://clear.ml/docs/latest/docs/integrations/storage
Hi @<1526371965655322624:profile|NuttyCamel41>
. I do that because I do not know how to get the pickle file into the docker container
What would the pickle file do?
and load the MinMaxScaler within the script, as the sklearn dependency is missing
what do you mean by that? are you getting an error when loading your model ?
My current experience is there is only print out in the console but no training graph
Yes Nvidia TLT needs to actually use tensorboard for clearml to catch it and display it.
I think that in the latest version they added that. TimelyPenguin76 might know more
DisturbedWorm66 it does, I think there is an example here:
https://github.com/allegroai/nvidia-clearml-integration/tree/main/tlt
Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/attrs_1604765588209/work'
Seems like pip failed creating a folder
Could it be you are out of space ?
UnevenDolphin73 if the repo does not include a poetry file it will revert to pip
poetry
stores git related data in ... you get an internal package we have with its version, but no git reference, i.e.
internal_module==1.2.3
instead of
internal_module @H4dr1en
This seems like a bug with poetry (and I think I have run into this one), worth reporting it, no?
Local changes are applied before installing requirements, right?
correct
LOL EnormousWorm79 you should have a "do not show again" option, no?
EnormousWorm79 you mean to get the DAG graph of the Dataset (like you see in the plots section)?
Hi UpsetBlackbird87
This is an Optuna decision on how many concurrent tests to run simultaneously.
You limited it to 100, but remember Optuna does a Bayesian optimization process, where it decides on the best set of arguments based on the performance of the previous set, this means it will first try X trials, then decide on the next batch.
That said you can a pruner to Optuna specifying how it should start
https://optuna.readthedocs.io/en/v1.4.0/reference/pruners.html#optuna.pruners.Median...
Hi @<1523715429694967808:profile|ThickCrow29>
clearml.automation.auto_scaler.AutoScaler which runs smoothly (kudos!!).
NICE!
The only thing I am missing is the in the clearml dashboard/orchestration --> Is there a way to make it
hmm kind of needs backend support for that 😞
For now, I can just see the log of the clearML task to monitor what’s happening
Or is this retricted to pro user ?
Yeah the GCP and AWS autoscalers dashboards are paid tier feature. But...
My bad, there is a mixture in terms.
"configuration object" is just a dictionary (or plain text) stored on the Task itself.
It has no file representation (well you could get it dumped to a file, but it is actually stored a s a blob of text on the Task itself, at the backend side)
clearml-agent
repo please 🙂
UnevenDolphin73
we'd like the remote task to be able to spawn new tasks,
Why is this an issue? this should work out of the box ?
Hi RoughTiger69
unfortunately, the model was serialized with a different module structure - it was originally placed in a (root) module called
model
....
Is this like a pickle issue?
Unfortunately, this doesn’t work inside clear.ml since there is some mechanism that overrides the import mechanism using
import_bind
.
__patched_import3
What error are you getting? (meaning why isn't it working)
What's the python, torch, clearml version?
Any chance this can be reproducible ?
What's the full error trace/stack you are getting?
Can you try to debug it to where exactly it fails here?
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/import_bind.py#L48
RoughTiger69 wdyt?
it is a pickle issue
‘package model doesn’t exist’
Sounds like it, why do you think clearml
has anything there ?
BTW:
import_bind
.
__patched_import3
this is just so when packages that clearml autoconnects with are patched if imported After Task.init was called.
hi ElegantCoyote26
but I can't see any documentation or examples about the updates done in version 1.0.0
So actually the docs are only for 1.0... https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving
Hi there, are there any plans to add better documentation/example
Yes, this is work in progress, the first Item on the list is custom model serving example (kind of like this one https://github.com/allegroai/clearml-serving/tree/main/examples/pipeline )
about...
I am actually saving a dictionary that contains the model as a value (+ training datasets)
How are you specifically doing that? pickle?
You might be able to write a script to override the links ... wdyt?
Honestly, this is all related to issue #340.
makes total sense.
But actually this id different from #340. The feature is to store the Data on the Task, this means each Task in your "pipeline" will be upload a new copy of the data. No?
I'd suggest some
task.detach()
method for remote execution maybe
That is a good idea, in theory it can also be used in local execution