
Reputation
Badges 1
103 × Eureka!Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
we are not able to reference this orphan worker (it does not show up with ps -ef | grep clearml-agent
-
but still appears with clearml-agent list
and not able to stop with clearml-agent daemon --stop clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
getting
` Could not find a running clearml-agent instance with worker_name=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 wo...
Thx CostlyOstrich36 for your reply
Can't see the reverence to parquet
. we are currently using the above functionality , but the pd.DataFrame
is only saved as csv
compressed by gz
SmugDolphin23 Where can I check the lates RC? I was not able to find it in the clearml github repo
Thx CostlyOstrich36 for your input -
So I guess that if we try to always work with https://clear.ml/docs/latest/docs/fundamentals/hyperparameters (even if there is only have 1 parameter), we will consistently log our parameters.
Do you have a suggested different workflow?
Hi,
You may want to consider to do the visualizing while creating the Datasets - see https://github.com/thepycoder/asteroid_example/blob/main/get_data.py#L34 logging the head()
of the dataframe
Hi SuccessfulKoala55
Is this section only relevant to AWS or also to GCP?
Possibly - thinking more of https://github.com/pytorch/data/blob/main/examples/vision/caltech256.py - using clearml dataset as root path.
updated the clearml.conf
with empty worker_id/name ran
clearml-agent daemon --stop
top | grep clearmKilled the pidsran
clearml-agent list
still both of the workers are listed
I found the task in the UI -
and in the UNCOMMITTED CHANGES
execution section there is
No changes logged
Any other suggestions?
exactly - (that is how I used it in my initial code) - but if you have to convert it back to the original data type then something is broken...
Hi SweetBadger76
Further investigation showed that the worker was created with a dedicated CLEARML_HOST_IP
- so running the
clearml-agent daemon --stop
didn't kill it (but it did appear in the clearml-agent list
But once we added the
CLEARML_HOST_IP `
CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --stop
it finally killed it
Hi AgitatedDove14
OK - the issue was the firewall rules that we had.
Now both of the jupyter lab
and vscode
servers are up.
But now there is an issue with the Setting up connection to remote session
After the
Environment setup completed successfully
Starting Task Execution:
ClearML results page:
There is a WARNING
clearml - WARNING - Could not retrieve remote configuration named 'SSH'...
shape -> tuple([int],[int])
I decided to use
._task.upload_artifact(name='metadata', artifact_object=metadata)
where metadata is a dict
metadata = {**metadata, **{"name":f"{Path(file_tmp_path).name}", "shape": f"{df.shape}"}}
I'm checking the possibility of our firewall between the clearml-agent
machine and the local computer running the session
Still trying to understand what is this default worker.
I've removed clearml.conf
and reinstall clearml-agent
then running theclearml-agent list
gets the following error
` Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing ~/clearml.conf
or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own clearml-server
, or create a free account at and run
clearml-agent init
Then returning the
...
BTW - is the CLEARML_HOST_IP
relevant for the clearml-agent
?
i can see that we can create a worker with this environment variable . e.g.CLEARML_WORKER_NAME=MY-WORKDER CLEARML_WORKER_ID=MY-WORKER:0 CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --detached
my mistake doesn't use it to create a dedicated IP
Are you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?
correct - running it locally - not inside docker . Should I try to run within a docker?
Can you send the full clearml-session console output ?
see above
Hi SuccessfulKoala55
I've run the daemon via dockerCLEARML_WORKER_ID=XXXX clearml-agent daemon --queue MY_QUEUE --docker --detached
and then run the session
via dockerclearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verbose
However I'm still getting the same error
clearml_agent: ERROR: Can not run task without repository or literal script in
script.diff
upgrading to 1.12.1 didn't help
I think the issue is that when I create the dataset
- i used
use_current_task=True,
If I change it to
use_current_task=False,
then it finalizes
Hi SweetBadger76 -
I'm I misunderstanding how this tests
worker runs?