Reputation
Badges 1
34 × Eureka!If it helps, I tried changing the python version to 3.9 (which is also installed in my image). The change is reflected in the agent's config (the lines that appear when starting the worker) but it's still using 3.8 when executing the script.
It seems the agent does not like working with scripts located inside a git repository, I moved the requirements and the script in a folder without a .git and it works now, thank you!
I tried to fix the python binary in the config as well :agent.python_binary = /opt/venv/bin/python3where :/opt/venv/bin/python3is the output of which python ran inside a docker container using my image.
In the clearml-agent logs I see this :/root/.clearml/venvs-builds/3.8/bin/python -u /root/.clearml/venvs-builds/3.8/code/train.pySo I don't know if it's using the same python version or not.
Sorry for the late reply. It is indeed a venv, I though it would not be an issue since the PYTHONPATH and the PATH are both set to prioritize the venv. I'll try to create a more classic image.
Thank you! Is there a way to test the agent on a machine without GPU ?
When running this little script, I can see my agent installing the requirements, but it does not seem to ever start running the task.task = Task.create( project_name="train", task_name="train", requirements_file="./requirements.txt", repo="") task.set_script(entry_point="./test.py") Task.enqueue(task, queue_name="training_queue")The logs are as follows :
` Starting Task ...
The logs continue like this :
` Summary - installed python packages:
pip:
- attrs==20.3.0
- backports.entry-points-selectable==1.1.1
- certifi==2021.10.8
- chardet==4.0.0
- clearml==1.1.4
- Cython==0.29.26
- distlib==0.3.4
- filelock==3.4.0
- furl==2.1.3
- future==0.18.2
- idna==2.10
- jsonschema==3.2.0
- numpy==1.21.5
- orderedmultidict==1.0.1
- pathlib2==2.3.6
- Pillow==8.4.0
- platformdirs==2.4.0
- psutil==5.8.0
- pyhocon==0.3.59
- PyJWT==2.0.1
- pyparsing==2.4.7
- pyrsistent==0.18.0
- pyt...
Thanks! Version: 1.1.1-135 • 1.1.1 • 2.14
I updated my clearml-server, but the issue is still present
Is there a way to make it synchronous ?
Okey thanks! I'll try this, if it does not work I'll just deactivate the automatic detection feature.
I can provide a screenshot, but I'd need to hide the urls 😅 and if do so it would look just like Malcolm's screenshot.
here is the command I am using :sudo docker run -it -v /home/ubuntu/app/:/app/ -v /home/ubuntu/folder/clearml.conf:/root/clearml.conf --network "clearml_backend" my_image bash
Ah! That's it, thank you very much ! I did not know this was an issue. I though the dataset was only linked to the fileserver and not to the specific url used to upload it.
I noticed logs start as follows :/usr/bin/python3.9 /usr/bin/python3.9: No module named pip /usr/local/bin/python3.8
even thought when starting the worker I see this :agent.python_binary = /opt/venv/bin/python3
The fileserver is remote, but the bandwidth is not an issue.
Is the automatic artifact storage of clearml async ? (meaning even if the task is finished it could still be uploading associated artifacts ?)
The URLs are correct, I can use them to download the dataset zip.
For example to create a dataset, I use this :from clearml import Dataset ds = Dataset.create(dataset_project='XX', dataset_name='XX') ds.add_files( path='/tmp/tmpbk2g6c3h' ) ds.upload() ds.finalize()
Maybe it is some sort of misunderstanding from my side ? I thought :Task.enqueue(task, queue_name="training_queue")is what starts the execution of the task. Do I need another function ?
I was looking at the code of the Dataset class, but I could not find where the files_server is retrieved.
I don't really know. I just detected it automatically from the start, so I haven't looked into it yet.
