
Reputation
Badges 1
103 × Eureka!Hi SweetBadger76 ,
Well - apparently I was mistaken.
I still have a ghost worker that i'm mot able to remove (I had 2 workers on the same queue - that caused my confusion).
I can see it in the UI and when I run clearml-agent list
And although I'm stoping the worker specificallyclearml-agent daemon --stop <worker_id>
I'm gettingCould not find a running clearml-agent instance with worker_name=<worker_id> worker_id=<worker_id>
Hi SmugDolphin23
Do you have a timeline for fixing this https://clearml.slack.com/archives/CTK20V944/p1661260956007059?thread_ts=1661256295.774349&cid=CTK20V944
Here is the screenshot - we deleted all the workers - accept for the one that we couldn't
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04Codename: focal
updated the clearml.conf
with empty worker_id/name ran
clearml-agent daemon --stop
top | grep clearmKilled the pidsran
clearml-agent list
still both of the workers are listed
Hi SuccessfulKoala55
Thx again for your help
in case of the google colab, the values can be provided as environment variables
We still need to run the code in a colab environment (or remote client)
do you have any example for setting the environment variables?
For a general environment variable there is an example! export MPLBACKEND=TkAg
But what would be for the clearml.conf
?
retrieving we can use
config_obj.get('sdk.google')
but how would the setting work? we did ...
Sorry - I'm a Helm newbee
when runninghelm search repo clearml --versions
I can't see version 3.6.2 - the highest is 3.5.0
This is the repo that we used to get the helm charthelm repo add allegroai
What I'm I missing?
Hi,
Thx for you response,
Yes - we are using the above repo.
We would like to have easy/cheaper access to the artifacts etc. that will be output from the experiments
I'm checking the possibility of our firewall between the clearml-agent
machine and the local computer running the session
BTW - is the CLEARML_HOST_IP
relevant for the clearml-agent
?
i can see that we can create a worker with this environment variable . e.g.CLEARML_WORKER_NAME=MY-WORKDER CLEARML_WORKER_ID=MY-WORKER:0 CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --detached
my mistake doesn't use it to create a dedicated IP
Thx for your reply
Great - Thx TimelyPenguin76 for your input
not sure I understand
runningclearml-agent list
I get
`
workers:
- company:
id: d1bd92...1e52b
name: clearml
id: clearml-server-...wdh:0
ip: x.x.x.x
... `
Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
currently we are just creating a new worker and on a separate queue
Is current relationship only available via _get_parents()
method?
yes - the agent is running with --docker
Great - where do I define the volume mount?
Should I build a base image that runs on the server and then use it as the base image in the container?
using the helm charts
https://github.com/allegroai/clearml-helm-charts
This does not work -
Since all the files are stored as a single ZIP file (which if unzipped will have all the data), but we would like to have access to the raw files in there original format.
I found the task in the UI -
and in the UNCOMMITTED CHANGES
execution section there is
No changes logged
Any other suggestions?
We have assets in a GCP bucket.
The dataset is created and then the assets are linked to the dataset via the add_external_files
method
add the google.storage parameters to the conf settingssdk { google.storage { credentials = [ { bucket: "clearml-storage" project: "dev" credentials_json: /path/to/SA/creds/user.json }, ] } }%
Btw -after updating clearml.conf
do I need to restart the agent?
I can't see the additional tab under https://clearml.slack.com/archives/CTK20V944/p1658199530781499?thread_ts=1658166689.168039&cid=CTK20V944 , and I reran the task and got the same error
Well - that will convert it to a binary pickle format but not as parquet -
since the artifact will be accessed from other platforms we want to use parquet
Hi AgitatedDove14
OK - the issue was the firewall rules that we had.
Now both of the jupyter lab
and vscode
servers are up.
But now there is an issue with the Setting up connection to remote session
After the
Environment setup completed successfully
Starting Task Execution:
ClearML results page:
There is a WARNING
clearml - WARNING - Could not retrieve remote configuration named 'SSH'...