Reputation
Badges 1
979 × Eureka!ok, so there is no way to cache it and detect when the ref changes?
yes, in setup.py I have:..., install_requires= [ "my-private-dep @ git+
", ... ], ...
I call task._update_requirements(my_reqs) regardless whether I am in the local machine or in the clearml agent, so "installed packages" section is always updated to the list my_reqs
that I pass to the function, in this case ["."]
This one doesn’t have _to_dict
unfortunately
yes that makes sense, I will do that. Thanks!
Hi CostlyOstrich36 , I mean insert temporary access keys
They are, but this doesn’t work - I guess it’s because temp IAM accesses have an extra token, that should be passed as well, but there is no such option on the web UI, right?
I get the following error:
automatically promote models to be served from within clearml
Yes!
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: false
So it states that IAM role using metadata service should be supported, right?
btw SuccessfulKoala55 the parameter is not documented in https://allegro.ai/clearml/docs/docs/references/clearml_ref.html#sdk-development-worker
Awesome, thanks!
This is what I get with mprof
on this snippet above (I killed the program after the bar reaches 100%, otherwise it hangs trying to upload all the figures)
Well no luck - using matplotlib.use('agg')
in my training codebase doesn't solve the mem leak
Some context: I am trying to log an HTML file and I would like it to be easily accessible for preview
Or even better: would it be possible to have a support for HTML files as artifacts?
Ok, I am asking because I often see the autoscaler starting more instances than the number of experiments in the queues, so I guess I just need to increase the max_spin_up_time_min
Yes, it did spin two instances for the same task
Here is what happens with polling_interval_time_min=1
when I add one task to the queue. The instance takes ~5 mins to start and connect. During this timeframe, the autoscaler starts to new instances, then spin them down. So it acts as if max_spin_up_time_min=10
is not taken into account
Why would it solve the issue? max_spin_up_time_min
should be the param defining how long to wait after starting an instance, not polling_interval_time_min
, right?
I will try with that and keep you updated
Thanks! Corrected both, now its building
btw, I tried with alpine instead of ubuntu:18.04, got :
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
df20fa9351a1: Pulling fs layer
df20fa9351a1: Verifying Checksum
df20fa9351a1: Download complete
df20fa9351a1: Pull complete
Digest: sha256:185518070891758909c9f839cf4ca393ee977ac378609f700f60a771a2dfe321
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting containe...
Ok, so what worked for me in the end was:config = task.connect_configuration(read_yaml(conf_path)) cfg = OmegaConf.create(config._to_dict())
Are you planning to add a server-backup service task in the near future?
Ok, I won't have time to venture to check the different database components, the first option (shuting down the server) sounds like the easiest option for me, I would then run manually the script once a month or so
both are repos for python modules (experiment one and dependency of the experiment)
(Just to know if I should wait a bit or go with the first solution)