Reputation
Badges 1
25 × Eureka!it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent
will be able to override them :)
Yes including this. (There was a fix to an issue with trains-agent
and disabling frameworks, it is already part of 0.16.3 )
I see it's a plotly plot, even though I report a matplotlib one
ClearML tries to convert matplotlib into plotly objects so they are interactive, it it fails it falls back into a static image as in matplotlib
Legit, if you have a cached_file (i.e. exists and accessible), you can return it to the caller
Hi SubstantialElk6 I'll start at the end, you can run your code directly on the remote GPU machine 🙂
See clearml-task
documentation, on how to create a task from existing code and launch it
https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md
That said, the idea is that you add the Task.init
call when you are writing/coding the code itself, then later when you want to run it remotely you already have everything defined in the UI.
Make sense ?
It reflects what is stored by Keras, so if Keras stores the best model this is what you get. BTW if you pass output_uri=True it will automatically upload the models
maybe worth updating the main Readme.md in the github.. if someone try to follow the instructions there it breaks
Hmm I thought we already did, Yes you are absolutely correct, I'll make sure we do
JitteryCoyote63 , just making sure, does refresh fixes the issue ?
We should probably change it so it is more human readable 🙂
Firstly, thank you for your efforts and your support.
Thanks SmugOx94 !
Are you running trains-agent
in docker mode? The aforementioned scripts are executed before, the experiment is being cloned, they are meant to be a part of the docker setup, not a per experiment script.
You could try to edit the experiment and have:
Working Directory: "."
(that means the root of the repository)
Script Path: "experiments_that_uses_library/train.py"
This will make sure you can do "import l...
I managed to do it by using logger.report_scalar, thanks!
Sure, but for future reference where (in ignite callbacks) did you add the report_scalar
call ?
Could be nice to write some automation
Hi RoundMosquito25
This is a bit old but probably a good start:
https://clear.ml/blog/stacking-up-against-the-competition/
tl;dr
ClearML advantages (at least a few I can think of)
Scales way better Enables out of the box experiment orchestration (i.e. remote execution etc) Data management Nicer UI Full RestAPI Full MLops platform Model serving Query-able model repositoryProbably more 🙂
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Sure:Dataset.create(..., use_current_task=True)
This will basically attach/make the main Task the Dataset itself (Dataset is a type of a Task, with logic built on top of it)
wdyt ?
these are being repeated as well for a single task (this is training a t5_model with transformers): (edited)
Seems like someone is storing lots of files with torch.save
that ClearML automatically logs.
You can disable the autolog:task = Task.init(..., auto_connect_frameworks={'pytorch': False})
Hi JitteryCoyote63 , is there a callback for that?
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
OddShrimp85 you can see the full configuration at the top of the Task log. What do you have there? Also what is the clearml python version?
Hi ReassuredTiger98
However, the clearml-agent also stops working then.
you mean the clearml-agen daemon (the one that spinned the container) is crashing as well ?
SubstantialElk6 (2) yes definitely will be fixed
Regrading (1), what do you mean by "via the code" ? Do you mean like as a Task docker cmd ?
These are both specific cases of the glue, and yes both need to be fixed.
(1) I think is actually a feature, nonetheless we should support it.
FriendlySquid61 could you verify specifically on (2)
Thanks SubstantialElk6 !
I believe an initial a fix was pushed 😉 A full one (merging Task --env with k8s template) will be added soon
Do we have it on the git issue ?
GiddyTurkey39 I think I need some more details, what exactly is the scenario here?
Specifically for this one, this is the auto generated docstring from the actual code, so PR to the
https://github.com/allegroai/clearml/blob/e53a76b713910adaf87578c69e86f8154d4ab4c1/clearml/logger.py#L152
Thanks JitteryCoyote63 let me double check if there is a reason for that (there might be one, not sure)
WickedGoat98 if this is the case, you can check this example. Same idea only "manual":
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
Also SoreDragonfly16 could you test with if the issue exists with trains==0.16.2rc0
?