Reputation
Badges 1
34 × Eureka!I tried to fix the python binary in the config as well :agent.python_binary = /opt/venv/bin/python3
where :/opt/venv/bin/python3
is the output of which python
ran inside a docker container using my image.
In the clearml-agent
logs I see this :/root/.clearml/venvs-builds/3.8/bin/python -u /root/.clearml/venvs-builds/3.8/code/train.py
So I don't know if it's using the same python version or not.
I am very confused now, I tried switch to my local machine and change the clearml.conf.
It only partly worked :Dataset.list_datasets()
returns the correct list (from the remote server).
But Dataset.get(dataset_id="ce2abe847e004ac282cc435bfa9c4bd5")
gives me :2021-12-20 13:46:39,404 - clearml.storage - ERROR - Could not download
` , err: Failed getting object localhost:8081/annotation_dataset/annotation.ce2abe847e004ac282cc435bfa9c4bd5/artifacts/state/state.json (404): <!DO...
Ah! That's it, thank you very much ! I did not know this was an issue. I though the dataset was only linked to the fileserver and not to the specific url used to upload it.
I was looking at the code of the Dataset
class, but I could not find where the files_server
is retrieved.
here is the command I am using :sudo docker run -it -v /home/ubuntu/app/:/app/ -v /home/ubuntu/folder/clearml.conf:/root/clearml.conf --network "clearml_backend" my_image bash
Thank you! Is there a way to test the agent on a machine without GPU ?
When running this little script, I can see my agent installing the requirements, but it does not seem to ever start running the task.task = Task.create( project_name="train", task_name="train", requirements_file="./requirements.txt", repo="") task.set_script(entry_point="./test.py") Task.enqueue(task, queue_name="training_queue")
The logs are as follows :
` Starting Task ...
It seems the agent does not like working with scripts located inside a git repository, I moved the requirements and the script in a folder without a .git
and it works now, thank you!
Maybe it is some sort of misunderstanding from my side ? I thought :Task.enqueue(task, queue_name="training_queue")
is what starts the execution of the task. Do I need another function ?
The fileserver is remote, but the bandwidth is not an issue.
Is the automatic artifact storage of clearml async ? (meaning even if the task is finished it could still be uploading associated artifacts ?)
I don't really know. I just detected it automatically from the start, so I haven't looked into it yet.
Okey thanks! I'll try this, if it does not work I'll just deactivate the automatic detection feature.
Is there a way to make it synchronous ?
Sorry for the late reply. It is indeed a venv, I though it would not be an issue since the PYTHONPATH
and the PATH
are both set to prioritize the venv. I'll try to create a more classic image.
I noticed logs start as follows :/usr/bin/python3.9 /usr/bin/python3.9: No module named pip /usr/local/bin/python3.8
even thought when starting the worker I see this :agent.python_binary = /opt/venv/bin/python3
The logs continue like this :
` Summary - installed python packages:
pip:
- attrs==20.3.0
- backports.entry-points-selectable==1.1.1
- certifi==2021.10.8
- chardet==4.0.0
- clearml==1.1.4
- Cython==0.29.26
- distlib==0.3.4
- filelock==3.4.0
- furl==2.1.3
- future==0.18.2
- idna==2.10
- jsonschema==3.2.0
- numpy==1.21.5
- orderedmultidict==1.0.1
- pathlib2==2.3.6
- Pillow==8.4.0
- platformdirs==2.4.0
- psutil==5.8.0
- pyhocon==0.3.59
- PyJWT==2.0.1
- pyparsing==2.4.7
- pyrsistent==0.18.0
- pyt...
Okey thank you!
If I plan using S3 for external file storage, do I still need Elasticsearch and Mongo ?
CostlyOstrich36 Yes, I am getting the exact same error as Malcolm (thanks for the link!) except I can see the URLs of my artifacts instead of undefined
.
SuccessfulKoala55 I am running a self-hosted server. I installed it about 3 months ago, so I would assume my current version is v1.1.1
, how can I check for sure ?
I updated my clearml-server, but the issue is still present
Thanks! Version: 1.1.1-135 • 1.1.1 • 2.14
For example to create a dataset, I use this :from clearml import Dataset ds = Dataset.create(dataset_project='XX', dataset_name='XX') ds.add_files( path='/tmp/tmpbk2g6c3h' ) ds.upload() ds.finalize()
I can provide a screenshot, but I'd need to hide the urls 😅 and if do so it would look just like Malcolm's screenshot.
The URLs are correct, I can use them to download the dataset zip.