ShinyRabbit94

6 Questions, 34 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

34 × Eureka!

Questions 6
Answers 34

0 Votes

16 Answers

959 Views

0 Votes 16 Answers 959 Views

Hello ! When Running

Hello ! When running Dataset.get the wrong file_server api is being used. This is the content of my clearml.conf api { # Notice: 'host' is the api server (de...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hello, I Have A Server With Several Gpus That I Wish To Use To Automatically Train Models. Clearml Seems Like The Perfect Tool For My Use Case But I Am Confused As To How I Can Communicate With The Agent Daemon Running On The Gpu Server. Do I Necessarily

Hello, I have a server with several GPUs that I wish to use to automatically train models. ClearML seems like the perfect tool for my use case but I am confu...

mlops

3 years ago

0 Votes

16 Answers

1K Views

0 Votes 16 Answers 1K Views

Hello, I Am Trying To Run The

Hello, I am trying to run the clearml-agent in docker mode. I use this command to start it : sudo clearml-agent daemon --cpu-only --queue training_queue --do...

clearml

3 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hello! The Agent-Services Present In Clearml Server'S Docker-Compose Is Only For Cleanup Tasks, Right ? For Training I Would Need To Run Another Instance Of Clearml-Agent Alongside The Docker-Compose ?

Hello! The agent-services present in ClearML server's docker-compose is only for cleanup tasks, right ? For training I would need to run another instance of ...

clearml

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hello, I Am Running My Own Instance Of The Clearml-Server. All Works As Expected, But Sometimes My Training Tasks Get Stuck For 40+ Minutes (While Usually Taking About 5 Minutes) With The Following Log :

Hello, I am running my own instance of the clearml-server. All works as expected, but sometimes my training tasks get stuck for 40+ minutes (while usually ta...

clearml

2 years ago

0 Votes

21 Answers

1K Views

0 Votes 21 Answers 1K Views

Hello! When I Delete Tasks, Models Or Datasets From The Fileserver'S Ui, The Associated Artifacts (In

Hello! When I delete Tasks, Models or Datasets from the fileserver's UI, the associated artifacts (in /opt/clearml/data/fileserver ) are not deleted. Any ide...

clearml

2 years ago

0 Hello, I Am Trying To Run The

I noticed logs start as follows :
/usr/bin/python3.9 /usr/bin/python3.9: No module named pip /usr/local/bin/python3.8

3 years ago

0 Hello, I Am Trying To Run The

even thought when starting the worker I see this :
agent.python_binary = /opt/venv/bin/python3

3 years ago

0 Hello, I Am Trying To Run The

(in the logs)

3 years ago

0 Hello! The Agent-Services Present In Clearml Server'S Docker-Compose Is Only For Cleanup Tasks, Right ? For Training I Would Need To Run Another Instance Of Clearml-Agent Alongside The Docker-Compose ?

Maybe it is some sort of misunderstanding from my side ? I thought :
Task.enqueue(task, queue_name="training_queue")is what starts the execution of the task. Do I need another function ?

3 years ago

Thank you! Is there a way to test the agent on a machine without GPU ?
When running this little script, I can see my agent installing the requirements, but it does not seem to ever start running the task.
task = Task.create( project_name="train", task_name="train", requirements_file="./requirements.txt", repo="") task.set_script(entry_point="./test.py") Task.enqueue(task, queue_name="training_queue")The logs are as follows :
` Starting Task ...

3 years ago

It seems the agent does not like working with scripts located inside a git repository, I moved the requirements and the script in a folder without a .git and it works now, thank you!

3 years ago

0 Hello, I Am Running My Own Instance Of The Clearml-Server. All Works As Expected, But Sometimes My Training Tasks Get Stuck For 40+ Minutes (While Usually Taking About 5 Minutes) With The Following Log :

Okey thanks! I'll try this, if it does not work I'll just deactivate the automatic detection feature.

2 years ago

Is there a way to make it synchronous ?

2 years ago

I don't really know. I just detected it automatically from the start, so I haven't looked into it yet.

2 years ago

0 Hello ! When Running

I am very confused now, I tried switch to my local machine and change the clearml.conf.
It only partly worked :
Dataset.list_datasets() returns the correct list (from the remote server).
But Dataset.get(dataset_id="ce2abe847e004ac282cc435bfa9c4bd5")
gives me :
2021-12-20 13:46:39,404 - clearml.storage - ERROR - Could not download ` , err: Failed getting object localhost:8081/annotation_dataset/annotation.ce2abe847e004ac282cc435bfa9c4bd5/artifacts/state/state.json (404): <!DO...

3 years ago

0 Hello, I Am Trying To Run The

I tried to fix the python binary in the config as well :
agent.python_binary = /opt/venv/bin/python3where :
/opt/venv/bin/python3is the output of which python ran inside a docker container using my image.
In the clearml-agent logs I see this :
/root/.clearml/venvs-builds/3.8/bin/python -u /root/.clearml/venvs-builds/3.8/code/train.pySo I don't know if it's using the same python version or not.

3 years ago

0 Hello, I Am Trying To Run The

and agent.python_binary is empty.

3 years ago

0 Hello, I Have A Server With Several Gpus That I Wish To Use To Automatically Train Models. Clearml Seems Like The Perfect Tool For My Use Case But I Am Confused As To How I Can Communicate With The Agent Daemon Running On The Gpu Server. Do I Necessarily

Okey thanks!

3 years ago

0 Hello, I Am Trying To Run The

yes, it's set to true in the logs as well

3 years ago

Okey thank you!
If I plan using S3 for external file storage, do I still need Elasticsearch and Mongo ?

3 years ago

0 Hello ! When Running

I did, I copy pasted the config from within the docker

3 years ago

0 Hello ! When Running

here is the command I am using :
sudo docker run -it -v /home/ubuntu/app/:/app/ -v /home/ubuntu/folder/clearml.conf:/root/clearml.conf --network "clearml_backend" my_image bash

3 years ago

0 Hello ! When Running

Ah! That's it, thank you very much ! I did not know this was an issue. I though the dataset was only linked to the fileserver and not to the specific url used to upload it.

3 years ago

0 Hello ! When Running

I was looking at the code of the Dataset class, but I could not find where the files_server is retrieved.

3 years ago

0 Hello, I Am Trying To Run The

Python 3.8.12

3 years ago

0 Hello ! When Running

of course, I am checking using the env command

3 years ago

0 Hello, I Am Trying To Run The

Sorry for the late reply. It is indeed a venv, I though it would not be an issue since the PYTHONPATH and the PATH are both set to prioritize the venv. I'll try to create a more classic image.

3 years ago

0 Hello ! When Running

What is the proper way to change a clearml.conf ?

3 years ago

The fileserver is remote, but the bandwidth is not an issue.
Is the automatic artifact storage of clearml async ? (meaning even if the task is finished it could still be uploading associated artifacts ?)

2 years ago

0 Hello, I Am Trying To Run The

The logs continue like this :
` Summary - installed python packages:
pip:

attrs==20.3.0
backports.entry-points-selectable==1.1.1
certifi==2021.10.8
chardet==4.0.0
clearml==1.1.4
Cython==0.29.26
distlib==0.3.4
filelock==3.4.0
furl==2.1.3
future==0.18.2
idna==2.10
jsonschema==3.2.0
numpy==1.21.5
orderedmultidict==1.0.1
pathlib2==2.3.6
Pillow==8.4.0
platformdirs==2.4.0
psutil==5.8.0
pyhocon==0.3.59
PyJWT==2.0.1
pyparsing==2.4.7
pyrsistent==0.18.0
pyt...

3 years ago

0 Hello ! When Running

there is nothing in the env

3 years ago

0 Hello, I Am Trying To Run The

If it helps, I tried changing the python version to 3.9 (which is also installed in my image). The change is reflected in the agent's config (the lines that appear when starting the worker) but it's still using 3.8 when executing the script.

3 years ago

0 Hello! When I Delete Tasks, Models Or Datasets From The Fileserver'S Ui, The Associated Artifacts (In

Thanks! Version: 1.1.1-135 • 1.1.1 • 2.14

2 years ago

0 Hello! When I Delete Tasks, Models Or Datasets From The Fileserver'S Ui, The Associated Artifacts (In

CostlyOstrich36 Yes, I am getting the exact same error as Malcolm (thanks for the link!) except I can see the URLs of my artifacts instead of undefined .
SuccessfulKoala55 I am running a self-hosted server. I installed it about 3 months ago, so I would assume my current version is v1.1.1 , how can I check for sure ?

2 years ago

0 Hello! When I Delete Tasks, Models Or Datasets From The Fileserver'S Ui, The Associated Artifacts (In

I updated my clearml-server, but the issue is still present

2 years ago

Show more results