Reputation
Badges 1
113 × Eureka!Oh, so you mean CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/my/venv/bin/python3 ??
oh ... maybe the bottleneck is augmentation in CPU !
But is it normal that the agent don't detect the GPU count and type properly ?
You will need to change more than just REQUESTS_CA_BUNDLE to use custom certificate. Python libraries don't all follow REQUESTS_CA_BUNDLE
You need to also add your certificate to your OS
In conda we have to export SSL_CERT_FILE=~/ca-bundle.crt
etc ...
Awesome. I will try that !
And also found this based on your suggestion that clearml use azure sdk underneath: None
Just not sure under which conditions from_config is actually called ...
@<1523701087100473344:profile|SuccessfulKoala55> It's working !! Thank you very much !!! Clearml is awesome !!!!
@<1558986867771183104:profile|ShakyKangaroo32> If you just want something to run in regular period, have you consider TaskScheduler: None
most of people probable wont even know what that do
I am not familiar with autoscaler ... are you using the paid version of Clearml ?
what is the command you use to run clearml-agent ?
this looks like the agent running inside your docker did not have any username/password to do git clone. so the default behavior is to wait for keyboard input: which look like hanging ....
So the question is really: how to know if there are new ClearML version so that the sysadmin can update ?
May be follow the github release ?
Onprem: User management is not "live" as you need to reboot and password are hardcoded ... No permission distinction, as everyone is admin ...
I don;t think there is a "kill task" code. By principle, in Linux, as a parent process, ClearML agent launch the training process. When a parent process is terminated, the linux kernel will, in most of the case, kill all child processes, including your training process.
There may be some way to resume a task from ClearML agent when it restart, but I don;t think that is the default behavior
Should I raise a github issue ticket ?
@<1523703436166565888:profile|DeterminedCrab71> Thansk for the suggestion. But no effect.
We already have client_max_body_size 0; in the server section
I tried to set both http and server section 100M but nothing changes.
Do you think the gzip be related ?
you may want to share your config (with credential redacted) and the full docker compose start up log ?
@<1523701070390366208:profile|CostlyOstrich36> Is there a way to tell clearml to not try to detect the Installed package ?
thanks for all the pointer ! I will try to have a good play around
For local agent running on-prem, we use Service Principal or each user login to auth with Azure and then mount ~/.azure into the container
We are using this: WebApp: 2.2.0-690 • Server: 2.2.0-690 • API: 2.33
not sure ... providing Zscaler certificate seems to allow clearml to talk to our clearml server, hosted in azure, Task init worked. But then failed to connect to the storage account (Azure too) ...
and in the train.py , I have task.add_requirements("requirements.txt")
so it's not suppose to say "illegal output destination ..." ?
You try to make your own docker image with CMake and even dlib inside manually
Then run clearml-agent inside your container, without docker mode.
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv