Reputation
Badges 1
25 × Eureka!Can you reproduce this behavior outside of lightning? or in a toy example (because I could not)
Hi ConvolutedSealion94
Yes this seems like the correct curl
How did you spin the clearml-serving containers? is it with the docker-compose or with the helm chart (I remember that there are some pitfalls with the helm chart, and I would actually start with the local docker-compose to debug it)
EnviousStarfish54 whats your matplotlib version ?
PompousBeetle71 , These are cuda versions, I'm looking for the nvidia driver version for example 440.xx or 418.xx .
The reason is, we set an OS environment for the driver, and I remember that old drivers did not support it . Basically they do not support NVIDIA_VISIBLE_DEVICES=all , so I'm trying to see if that's the case, then we could add fix .
However, when 'extra' is a positional argument then it is transformed to 'str'
Hmm... okay let me check something
Wait. are you saying it is disappearing ? meaning when you cloned the Pipeline (i.e. in draft mode) the configuration was there, then the configuration disappeared ?
Not really sure that's easily done ... I mean you could query the data, but I'm not sure how you would import it. Btw why would you move from pro to self hosted?
Can you run the entire thing on your own machine (just making sure it doesn't give this odd error) ?
JitteryCoyote63 Great to hear π
BTW:
Would it be possible to extendΒ
Task.init
Β with aΒ
force_reuse
Β that would enforce reusing these tasks
You can pass continue_last_task=True I think it should be equivalent to what you suggest
The cloning is done in another task, which has the argv parameters I want the cloned task to inherit from
JitteryCoyote63 What do you mean by that?
Hmmm, make sure the task doing the cloning is using 0.16.1 and above , because with .16 we added sections and the compatibility is between the version. Meaning if you have tasks generated with trains .16 you need trains .16 to clone them from code (so you could properly control the arguments)
https://stackoverflow.com/questions/60860121/plotly-how-to-make-an-annotated-confusion-matrix-using-a-heatmap
MagnificentSeaurchin79 see plotly example here:
https://allegro.ai/clearml/docs/docs/examples/reporting/plotly_reporting.html
So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
Yeah I think that for some reason it fails detecting this is actually jupyter noteboko (not really sure why), Thank you for double checking on the container !!
Hi ShinyPuppy47
getting this error pretty sprotically
What do you mean by "sporadically" ? This should be consistent ,either there is access to the clearml.conf, file or not. no ?!
What is your setup? Is this coming from the agent or manual execution ?
@<1523701868901961728:profile|ReassuredTiger98> how did you install the nightly locally ?
Can you also provide the full log?
@<1542316991337992192:profile|AverageMoth57> it sounds like you should use SSH authentication for the agent, just setforce_git_ssh_protocol: true
None
And make sure you have the SSH kets on the agent's machine
I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything.
Sure man π no rush, I appreciate the gesture regardless of the outcome
Many thanks!
What do you have under the "installed packages" ?
i'm Jax, not Manoj! lol.
I know π I just mentioned that this issue is being actively discussed
How is this different from argparser btw?
Not different, just a dedicated section π Maybe we should do that automatically, the only "downside" is you will have to name the Dataset when getting it (so it will have an entry name in the Dataset section), wdyt ?
This is very odd ... let me check something
BTW trains agent will not delete the venv until the next run, so you can check exactly what's missing there
Feels like we've been over this
LOL, I think I can't wrap my head around the use case π
When running locally, this is "out of the box", as we can init and close before and after each model.
I finally got it! Task.init should be dubbed "init Main task" , automagic kicks in Only when it is the only one existing. You remote execution is "linear" Task after Task, in theory a good candidate for pipeline.
Basically option (2) , the main task is being "replaced" (which loca...
and of course:task.set_parameters_as_dict(params)
Hi TroubledJellyfish71
What do you have listed on the Task's execution "installed packages" section ? (of the original Task) ?
How did it end up with an http link of pytorch ?
Usually it would be torch==1.11 ...
EDIT:
I'm assuming the original Task was executed on a Mac M1, what are you getting when calling pip freeze ?
And where is the agent running ? (and is it venv or docker mode?)
This seems more complicated that I thought... I think you are correct, and it fails to load the entire module, let me check what I can do
. I wonder if I can extend this to reporting grad_norm per layer.
oh that makes sense, technically I assume so, is this a HF logger option? notice ClearML is already integrated with HF on the HF side, do they report that when TB logger is used?
Hi @<1523701066867150848:profile|JitteryCoyote63>
I found a memory leak
in
Logger.report_matplotlib_figure
Are you sure this is not Matplotlib leak but the Logger's fault ? I'm trying to think how we could create such a mem leak
wdyt?
StorageManager is what you need, if you want to download/upload files to any server (this is a utility class the takes care of the DL/uL + adds caching) storage helper is used internally