Reputation
Badges 1
979 × Eureka!Yes that was my assumption as well, it could be several causes to be honest now that I see that also matplotlib itself is leaking 😄
And after the update, the loss graph appears
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)
That doesn’t work unfortunately
ok, so there is no way to cache it and detect when the ref changes?
yes, in setup.py I have:..., install_requires= [ "my-private-dep @ git+
", ... ], ...
I call task._update_requirements(my_reqs) regardless whether I am in the local machine or in the clearml agent, so "installed packages" section is always updated to the list my_reqs
that I pass to the function, in this case ["."]
This one doesn’t have _to_dict
unfortunately
yes that makes sense, I will do that. Thanks!
Hi CostlyOstrich36 , I mean insert temporary access keys
They are, but this doesn’t work - I guess it’s because temp IAM accesses have an extra token, that should be passed as well, but there is no such option on the web UI, right?
I get the following error:
automatically promote models to be served from within clearml
Yes!
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: false
So it states that IAM role using metadata service should be supported, right?
btw SuccessfulKoala55 the parameter is not documented in https://allegro.ai/clearml/docs/docs/references/clearml_ref.html#sdk-development-worker
AnxiousSeal95 The main reason for me to not use clearml-serving triton is the lack of documentation tbh 😄 I am not sure how to make my pytorch model run there
Awesome, thanks!
This is what I get with mprof
on this snippet above (I killed the program after the bar reaches 100%, otherwise it hangs trying to upload all the figures)
Well no luck - using matplotlib.use('agg')
in my training codebase doesn't solve the mem leak
Some context: I am trying to log an HTML file and I would like it to be easily accessible for preview
Or even better: would it be possible to have a support for HTML files as artifacts?
I created a snapshot of both disks
Ok, I am asking because I often see the autoscaler starting more instances than the number of experiments in the queues, so I guess I just need to increase the max_spin_up_time_min
Yes, it did spin two instances for the same task
Here is what happens with polling_interval_time_min=1
when I add one task to the queue. The instance takes ~5 mins to start and connect. During this timeframe, the autoscaler starts to new instances, then spin them down. So it acts as if max_spin_up_time_min=10
is not taken into account
Why would it solve the issue? max_spin_up_time_min
should be the param defining how long to wait after starting an instance, not polling_interval_time_min
, right?
I will try with that and keep you updated