Reputation
Badges 1
103 × Eureka!I think I have a lead.
looking at list of workers from clearml-agent list e.g. https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
is there a way to find the worker_name ?
in the above example the worker_id is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 but I'm not able to stop this worker using the command
clearml-agent daemon --stop
since this orphan worker has no corresponding clearml.conf
I'm looking for the bucket URI
I think my work flow needs to alter.
get the data into the bucket and then create the Dataset using the add_external_file and then be able to consume the data locally or stream And then I can use - link_entries
we reinstalled the clearml-agent$clearml-agent --version CLEARML-AGENT version 1.2.3running top | grep clearmlwe can see the agent running
running clearml-agent listwe can see 2 workers
before running clearml-agent daemon --stopWe updated the clearml.conf and updated the worker_id and worker_name with the relevant name/id that we can see from clearml-agent list
and we get
` Could not find a running clearml-agent instance with worker_name=<clearml_worker_na...
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04Codename: focal
This is my current solution[ds for ds in dataset.list_datasets() if ds['project'].split('/')[0]==<PROJEFCT_NAME>]
I'm checking the possibility of our firewall between the clearml-agent machine and the local computer running the session
Hi SuccessfulKoala55
I've run the daemon via dockerCLEARML_WORKER_ID=XXXX clearml-agent daemon --queue MY_QUEUE --docker --detached and then run the session via dockerclearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verboseHowever I'm still getting the same error
clearml_agent: ERROR: Can not run task without repository or literal script in
script.diff
exactly - (that is how I used it in my initial code) - but if you have to convert it back to the original data type then something is broken...
updated the clearml.conf with empty worker_id/name ran
clearml-agent daemon --stop
top | grep clearmKilled the pidsran
clearml-agent list
still both of the workers are listed
Thx for investigating - What is the use case for such behavior ?
How would you use the user properties as part of an experiment?
Just for the record - I guess there is an option to use os.environ
https://github.com/allegroai/clearml/blob/ca7909f0349b255f7edca0500878a8e08f3b1c99/clearml/automation/auto_scaler.py#L152-L157
Are we suppose to use the "Extra Configurations" from the https://clear.ml/docs/latest/assets/images/ClearML_Server_Diagram-7ea19db8e22a7737f062cce207befe38.png ?
https://docs.google.com/drawings/d/11f-AWVmIq7P0e8bP5OnMUz0hguXm2T_Xqq7iNMA-ANA/edit?usp=sharing
I've updated the configuration and now i'm able to see sub projects that I didn't see before.
As I can see - each dataset is a separate sub project - is that correct?
Hi SuccessfulKoala55
Is this section only relevant to AWS or also to GCP?
Hi SmugDolphin23
Do you have a timeline for fixing this https://clearml.slack.com/archives/CTK20V944/p1661260956007059?thread_ts=1661256295.774349&cid=CTK20V944
Thx - it worked!
BTW - maybe it worth while to add this comment in the ClearML Agent daemon documentation - that when ever you update the clearml.conf you need to
clearml-agent daemon --stop recreate all the daemonclearml-agent daemon ....
We need to convert it a DataFrame since
Displaying metadata in the UI is only supported for pandas Dataframes for now. Skipping!
upgrading to 1.12.1 didn't help
I think the issue is that when I create the dataset - i used
use_current_task=True,
If I change it to
use_current_task=False,
then it finalizes
AgitatedDove14 -
I also tried to https://github.com/allegroai/clearml-session
running the session within docker but got the same error
clearml-session --docker
--git-credentials
(there is a typo in git - --git-credent ila s -> --git-credent ials)
and still got the same error
clearml_agent: ERROR: Can not run task without repository or literal script in
script.diff
Thx CostlyOstrich36 for your input -
So I guess that if we try to always work with https://clear.ml/docs/latest/docs/fundamentals/hyperparameters (even if there is only have 1 parameter), we will consistently log our parameters.
Do you have a suggested different workflow?
Hi,
Thx for you response,
Yes - we are using the above repo.
We would like to have easy/cheaper access to the artifacts etc. that will be output from the experiments
will do
A work around that worked for me is to explicitly complete the task, seems like the flush has some bug
task = Task.get_task('...')
task.close()
task.mark_completed()
ds.is_final()
True
clearml-3.5.0