Hi guys,
i have problem understanding to set requirements.txt and running remotelty using clearml-agent.
so i tried to set requirements, and want to execute remotely. when in draft mode, the Installed Package is correct and match like requirements.txt, but when i do enqueue and start install the package, they change and not found module clearml.
here is my code to initiate.
import os
import sys
os.environ['PYTHONPATH'] = os.getcwd()
sys.path.append(os.getcwd())
from clearml import Task, OutputModel, StorageManager
from dataclasses import asdict
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
from src.net import Classifier
from src.data import ImageDataModule
from src.utils import download_data, override_config, read_json, read_yaml, get_classes_from_s3_folder
from config.train_config import TrainingConfig
from src.schema import Report
import torch
from rich import print
#region SETUP CLEARML
cwd = os.getcwd()
conf = TrainingConfig()
conf_copy = TrainingConfig()
report = Report()
req_path = os.path.join(cwd,'docker/requirements.train.txt')
Task.force_requirements_env_freeze(False, req_path)
# Task.add_requirements(req_path)
task = Task.init(
project_name='Template/Image-Classifier',
task_name='Training',
task_type=Task.TaskTypes.training,
auto_connect_frameworks=False,
output_uri=conf.OUTPUT_URI,
)
Task.current_task().set_script(
repository='
',
branch='master',
working_dir='.',
entry_point='src/train.py'
)
Task.current_task().set_base_docker(docker_image='nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu20.04')
Task.current_task().execute_remotely()
this will produce Draft mode, right? and the installed packeges shows:
# Python 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
-f
torch==1.12.1+cu113
torchvision==0.13.1+cu113
clearml
boto3==1.24.66
torchmetrics==0.10.3
torchtext==0.13.1
pytorch-lightning==1.8.3.post1
timm==0.6.11
opencv-python-headless==4.6.0.66
albumentations==1.2.1
pandas
matplotlib
ipython
tqdm
plotly
nbformat
rich
pyhocon==0.3.59
but i do enqueue and found the error
Successfully installed Cython-0.29.33
Adding venv into cache: /mnt/hdd_2/clearml-cache/venvs-builds/3.8
Running task id [1dae62e287b64908922e960395fda45c]:
[.]$ /mnt/hdd_2/clearml-cache/venvs-builds/3.8/bin/python -u src/train.py
Summary - installed python packages:
pip:
- attrs==22.2.0
- certifi==2022.12.7
- charset-normalizer==3.0.1
- Cython==0.29.33
- distlib==0.3.6
- filelock==3.9.0
- furl==2.1.3
- idna==3.4
- importlib-resources==5.10.2
- jsonschema==4.17.3
- orderedmultidict==1.0.1
- pathlib2==2.3.7.post1
- pkgutil_resolve_name==1.3.10
- platformdirs==3.0.0
- psutil==5.9.4
- PyJWT==2.6.0
- pyparsing==3.0.9
- pyrsistent==0.19.3
- python-dateutil==2.8.2
- PyYAML==6.0
- requests==2.28.2
- six==1.16.0
- urllib3==1.26.14
- virtualenv==20.19.0
- zipp==3.12.1
Environment setup completed successfully
Starting Task Execution:
Traceback (most recent call last):
File "src/train.py", line 6, in <module>
from clearml import Task, OutputModel, StorageManager
ModuleNotFoundError: No module named 'clearml'
and after that, the section Installed Packeges in Execution change:
attrs==22.2.0
certifi==2022.12.7
charset-normalizer==3.0.1
Cython==0.29.33
distlib==0.3.6
filelock==3.9.0
furl==2.1.3
idna==3.4
importlib-resources==5.10.2
jsonschema==4.17.3
orderedmultidict==1.0.1
pathlib2==2.3.7.post1
pkgutil_resolve_name==1.3.10
platformdirs==3.0.0
psutil==5.9.4
PyJWT==2.6.0
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
PyYAML==6.0
requests==2.28.2
six==1.16.0
urllib3==1.26.14
virtualenv==20.19.0
zipp==3.12.1
which is different with my requirements.txt.
Thanks in Advance!