Reputation
Badges 1
103 × Eureka!Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
we are not able to reference this orphan worker (it does not show up with ps -ef | grep clearml-agent
-
but still appears with clearml-agent list
and not able to stop with clearml-agent daemon --stop clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
getting
` Could not find a running clearml-agent instance with worker_name=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 wo...
not sure i understand
we are running the daemon in a detached mode
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
Sorry - I'm a Helm newbee
when runninghelm search repo clearml --versions
I can't see version 3.6.2 - the highest is 3.5.0
This is the repo that we used to get the helm charthelm repo add allegroai
What I'm I missing?
Still trying to understand what is this default worker.
I've removed clearml.conf
and reinstall clearml-agent
then running theclearml-agent list
gets the following error
` Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing ~/clearml.conf
or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own clearml-server
, or create a free account at and run
clearml-agent init
Then returning the
...
is this running from the same linux user on which you checked the git ssh clone on that machine?
yes
The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder
maybe -
Question - if we change the clearml.conf
do we need to stop and start the daemon?
agree -
we understand now that the worker is the default worker that is installed after runningpip install clearml-agent
is it possible to remove it ? since all tasks that use the worker don't have the correct credentials.
so running the command clearml-agent -d list
returns the https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
Thx for investigating - What is the use case for such behavior ?
How would you use the user properties
as part of an experiment?
ClearML key/secret provided to the agent
When is this provided? Is this during the build
?
BTW - is the CLEARML_HOST_IP
relevant for the clearml-agent
?
i can see that we can create a worker with this environment variable . e.g.CLEARML_WORKER_NAME=MY-WORKDER CLEARML_WORKER_ID=MY-WORKER:0 CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --detached
my mistake doesn't use it to create a dedicated IP
Btw -after updating clearml.conf
do I need to restart the agent?
I can't see the additional tab under https://clearml.slack.com/archives/CTK20V944/p1658199530781499?thread_ts=1658166689.168039&cid=CTK20V944 , and I reran the task and got the same error
yes - the agent is running with --docker
Great - where do I define the volume mount?
Should I build a base image that runs on the server and then use it as the base image in the container?
upgrading to 1.12.1 didn't help
I think the issue is that when I create the dataset
- i used
use_current_task=True,
If I change it to
use_current_task=False,
then it finalizes
This also may help with the configuration for GCS
https://clearml.slack.com/archives/CTK20V944/p1635957916292500?thread_ts=1635781244.237800&cid=CTK20V944
Feeling that we are nearly there ....
One more question -
Is there a way to configure Clearml to store all the artifacts
and the Plots
etc. in a bucket instead of manually uploading/downloading the artifacts from within the client's code?
Specifying the output_uri
in Task.init
saves the the checkpoints, what about the rest of the outputs?
https://clear.ml/docs/latest/docs/faq#git-and-storage
Hi,
Thx for you response,
Yes - we are using the above repo.
We would like to have easy/cheaper access to the artifacts etc. that will be output from the experiments
Hi SuccessfulKoala55
Thx again for your help
in case of the google colab, the values can be provided as environment variables
We still need to run the code in a colab environment (or remote client)
do you have any example for setting the environment variables?
For a general environment variable there is an example! export MPLBACKEND=TkAg
But what would be for the clearml.conf
?
retrieving we can use
config_obj.get('sdk.google')
but how would the setting work? we did ...
But this is not on the pods, isn't it? We're talking about the python code running from COLAB or locally...?
correct - but where is the clearml.conf
file?
Just for the record - I guess there is an option to use os.environ
https://github.com/allegroai/clearml/blob/ca7909f0349b255f7edca0500878a8e08f3b1c99/clearml/automation/auto_scaler.py#L152-L157
Hi
you will have to configure the credentials there (in a local
clearml.conf
or using environment variables
This is the part that confuses me - is there a way to configure clearml.conf
from the values.yaml
? I would like the GKE to load the cluster with the correct credentials without logging into the pods and manually updating the claerml.conf
file
Are we suppose to use the "Extra Configurations" from the https://clear.ml/docs/latest/assets/images/ClearML_Server_Diagram-7ea19db8e22a7737f062cce207befe38.png ?
https://docs.google.com/drawings/d/11f-AWVmIq7P0e8bP5OnMUz0hguXm2T_Xqq7iNMA-ANA/edit?usp=sharing
Thx CostlyOstrich36 for your input -
So I guess that if we try to always work with https://clear.ml/docs/latest/docs/fundamentals/hyperparameters (even if there is only have 1 parameter), we will consistently log our parameters.
Do you have a suggested different workflow?
Strange
I ranclearml-agent daemon --stop
and after 10 min I ranclearml-agent list
and I still see a worker