Hi SubstantialElk6 , can you update your ClearML agent to the latest ( 0.17.2rc4 )?pip install clearml-agent==
0.17.2rc4
and try with it?
After the task is cloned, the task is in a draft state. In this state every field is editable, so you can just double click the BASE DOCKER IMAGE section and change it to your image. If you’ll just delete the value from this section, then the ClearML agent will use the docker image you configure in the clearml.conf file (dockerrepo/mydocker:custom).
To note, the latest codes have been pushed to the Gitlab repo.
And how do i pass in new env parameters?
If you don’t value in the task for BASE DOCKER IMAGE, it will use the default, if you are setting the BASE DOCKER IMAGE, add the env vars to it too:
dockerrepo/mydocker:custom --env GIT_SSL_NO_VERIFY=true
where task
is the value return from your Task.init
call,
task = Task.init(project_name=<YOUR PROJECT NAME>, task_name=<YOUR TASK NAME>)
when you do git diff
on your terminal about this git repo, do you get the requirements changes too? or the same as inApplying uncommitted changes Executing: ('git', 'apply', '--unidiff-zero'): b"<stdin>:11: trailing whitespace.\n task = Task.init(project_name='MNIST', \n<stdin>:12: trailing whitespace.\n task_name='Pytorch Standard', \nwarning: 2 lines add whitespace errors.\n"
?
Hi,
It did, nvidia/cuda:10.1-runtime-ubuntu18.04.
So if i need to set this every time, what is the following config for? And how do i pass in new env parameters?
` default_docker: {
# default docker image to use when running in docker mode
image: "dockerrepo/mydocker:custom"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
I can help you with that 🙂
task.set_base_docker("dockerrepo/mydocker:custom --env GIT_SSL_NO_VERIFY=true")
ok, I think I missed something on the way then.
you need to have some diffs, because
Applying uncommitted changes Executing: ('git', 'apply', '--unidiff-zero'): b"<stdin>:11: trailing whitespace.\n task = Task.init(project_name='MNIST', \n<stdin>:12: trailing whitespace.\n task_name='Pytorch Standard', \nwarning: 2 lines add whitespace errors.\n"
can you re-run this task from your local machine again? you shouldn’t have anything under UNCOMMITTED CHANGES
this time (as we just saw with empty git diff
from bash). But before, please verify that the repo have torch
in the repo’s requirements.txt
file
In the task you cloned, do you have torch as part of the requirements?
you need to run it, but not actually execute it. You can execute it on the ClearML agent with task.execute_remotely(queue_name='YOUR QUEUE NAME', exit_process=True)
.
with this, the task wont actually run from your local machine but just register in the ClearML app and will run with the ClearML agent listening to 'YOUR QUEUE NAME'
.
Ok that worked. So every time i have changes in codes, i will have to rerun the experiment on my own machine that doesn't have any GPUs?
Kinda defeat the purpose of using ClearML Agent.
They should be copied, I just want to verify they are.
If so, can you send the logs of the failed task?
Next step to figure out if i can do all that in the python code instead of UI.
Thanks. That's easy to miss as its not quite apparent in the main docs. How should i pass in env variables with Task?
are you referring to the docker image? The same as before with task.set_base_docker("dockerrepo/mydocker:custom --env GIT_SSL_NO_VERIFY=true")
Hi SubstantialElk6 , does the task have a docker image too (you can check it in the UI)?
according to this part
Applying uncommitted changes Executing: ('git', 'apply', '--unidiff-zero'): b"<stdin>:11: trailing whitespace.\n task = Task.init(project_name='MNIST', \n<stdin>:12: trailing whitespace.\n task_name='Pytorch Standard', \nwarning: 2 lines add whitespace errors.\n"
I don’t see the requirements change, lets try without the cache, can you clear it (ClearML cache dir is located at ~/.clearml
)?
Sorry i don't quite understand this. The task itself was submitted as I run the code on the client. I suppose the dependancies requirements would be copied over as the experiment is cloned?
Hi, the problem is the same.
I noticed that its not checking out the latest version in gitlab. This latest version would contain the requirements.txt.Using cached repository in "/root/.clearml/vcs-cache/pytorchmnist.f220373e7227ec760b28c7f4cd99b534/pytorchmnist" warning: redirecting to
Note: checking out 'cfb833bcc70f3e10d3b6a96cfad3225ed682382b'.
But i'm guessing this block below applied the diff..does it include the requirements.txt though?HEAD is now at cfb833b Upload New File type: git url:
branch: HEAD commit: cfb833bcc70f3e10d3b6a96cfad3225ed682382b root: /root/.clearml/venvs-builds/3.6/task_repository/pytorchmnist Applying uncommitted changes Executing: ('git', 'apply', '--unidiff-zero'): b"<stdin>:11: trailing whitespace.\n task = Task.init(project_name='MNIST', \n<stdin>:12: trailing whitespace.\n task_name='Pytorch Standard', \nwarning: 2 lines add whitespace errors.\n"
Yes, as listed in the snippet. The torch library is torchvision.
running git diff
on my terminal in this repo gave nothing. nothing at all.
Thank. Gonna try that out. But i hit another snag. Strangely, the Agent is not creating the right venv. This is what the Agent created.
` pip:
- asn1crypto==0.24.0
- attrs==20.3.0
- certifi==2020.12.5
- chardet==4.0.0
- cryptography==2.1.4
- Cython==0.29.22
- furl==2.1.0
- future==0.18.2
- humanfriendly==9.1
- idna==2.6
- importlib-metadata==3.7.0
- jsonschema==3.2.0
- keyring==10.6.0
- keyrings.alt==3.0
- orderedmultidict==1.0.1
- pathlib2==2.3.5
- psutil==5.8.0
- pycrypto==2.6.1
- pygobject==3.26.1
- pyhocon==0.3.57
- PyJWT==1.7.1
- pyparsing==2.4.7
- pyrsistent==0.17.3
- python-dateutil==2.8.1
- pyxdg==0.25
- PyYAML==5.3.1
- requests==2.25.1
- requests-file==1.5.1
- SecretStorage==2.3.1
- six==1.11.0
- tqdm==4.54.1
- typing==3.7.4.3
- typing-extensions==3.7.4.3
- urllib3==1.26.3
- virtualenv==16.7.10
- zipp==3.4.0
But this is my requirements.txt
attrs==20.3.0
boto3==1.17.17
botocore==1.20.17
certifi==2020.12.5
chardet==4.0.0
clearml==0.17.4
furl==2.1.0
future==0.18.2
humanfriendly==9.1
idna==2.10
jmespath==0.10.0
jsonschema==3.2.0
numpy
orderedmultidict==1.0.1
pathlib2==2.3.5
Pillow==8.1.0
psutil==5.8.0
PyJWT==2.0.1
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
PyYAML==5.4.1
requests==2.25.1
requests-file==1.5.1
s3transfer==0.3.4
six==1.15.0
torch==1.7.1
torchvision==0.8.2
typing-extensions==3.7.4.3
urllib3==1.26.3In particular, i am getting a error as follows.
Environment setup completed successfully
Starting Task Execution:
Traceback (most recent call last):
File "pytorch_mnist.py", line 8, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
DONE: Running task '3a90802d1dfa4ec09fbccba0beffbaa8', exit status 1 `
So according to it, you are using the repo requirements, and you have torch there?