Reputation
Badges 1
25 × Eureka!I called task.wait_for_status() to make sure the task is done
This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object
in clearml.conf we could have:azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }
Then in AzureContainerConfigurations
:
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
...
Hi @<1523715429694967808:profile|ThickCrow29>
clearml.automation.auto_scaler.AutoScaler which runs smoothly (kudos!!).
NICE!
The only thing I am missing is the in the clearml dashboard/orchestration --> Is there a way to make it
hmm kind of needs backend support for that π
For now, I can just see the log of the clearML task to monitor whatβs happening
Or is this retricted to pro user ?
Yeah the GCP and AWS autoscalers dashboards are paid tier feature. But...
And If I create myself a Pro account
Then you have the UI and implementation of both AWS & GCP autoscalers, am I missing something?
The "Optimizer task" will continue to run as long as there are sub-Tasks it launched.
Is anything else running/pending ?
Okay we have located the issue, thanks guys! We will push a patch release hopefully later today
Sure, in that case, wait until tomorrow, when the github repo is fully synced
@<1569858449813016576:profile|JumpyRaven4> fyi clearml-serving was synced π€
, but what I really want to achieve is to share this code:
You mean to share the code between them, unless this is a "preinstalled" package in the container, each endpoint has it's own separate set of modules / files
(this is on purpose, so you could actually change them, just image diff versions of the same common.py file)
okay let's PR this fix ?
as i also noticed that uploads are sometimes slow, and i see here max_connections=2
Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object
and an additional variable on the AzureContainerConfigurations
class.
Could you PR a tested draft ? we will be able to take from there
@<1545216070686609408:profile|EnthusiasticCow4>
Is there currently a way to bind the same GPU to multiple queues? I believe the agent complains last time I tried (which was a bit ago)
run multiple agents on the same GPU,
CLEARML_WORKER_NAME=host-gpu0a clearml-agent daemon --gpus 0
CLEARML_WORKER_NAME=host-gpu0b clearml-agent daemon --gpus 0
How does it work with k8s?
You need to install the clearml-glue and them on the Task request the container, notice you need to preconfigure the clue with the correct Job YAML
@<1523706266315132928:profile|DefiantHippopotamus88> seems like you are missing the ports π
CLEARML_WEB_HOST="
"
CLEARML_API_HOST="
"
CLEARML_FILES_HOST="
"
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
ReassuredTiger98 regrading the agent error, can you see the package some_packge
in the "Installed Packages" in the UI? Was it installed ? are you using pip or conda as package manager in the agent (check the clearml.conf) is the agent running in docker mode ?
I assume issue: None
Yeah this is odd I noticed as well. Let me ask the guys to take a look
-- I've been running my script from VSCode for the first time,
In the initial Task (the one created when running inside VSCode) do you have all the packages listed in the "Installed Packages" section ?
JitteryCoyote63 There is a basic elastic license that should always be there. If for some reason it was deleted/expired then the following command should fix it:
curl -XPOST ' http://localhost:9200/_xpack/license/start_basic '
Would it suffice to provide the git credentials ...
That should be enough, basically this is where they should be:
https://github.com/allegroai/clearml-agent/blob/0462af6a3d3ef6f2bc54fd08f0eb88f53a70724c/docs/clearml.conf#L18
SoggyBeetle95 is this secret a per Task secret, or is it for the agent itself (I.e. for all Tasks the agent will spin)?
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
Notice we changed the value there, it now has two versions, pne for python 3.10 < and one for python 3.10>=
The main reason is that pip changed their resolving algorithm, and the new one can break its own dependencies (i.e. pip freeze > requirements.txt -> pip install might not actually work)
None
Can you see it on the console ?
I update my-private-dep to 1.8.0
Not sure how this is connected with the venv, could you expand ?
JitteryCoyote63 next week is the Trains next release with upgrade to ES 7, do you want to wait or sort a solution for this one ?
(BTW: I think that you can mount a license file or delete one, and it should be okay, I'll ask the backend guys regradless)
JitteryCoyote63 nice hack π
how come it is not automatically logged as console output ?