@<1523701087100473344:profile|SuccessfulKoala55> Sorry for delay reply , i have attached the logs and issue is only happening when do ML training with PyTorch. Training with other framework is working fine like tensor flow and sklearn.
@<1523701087100473344:profile|SuccessfulKoala55> Yes, this is end of logs and nothing happening after it. i am using this command clearml-agent daemon --detached --gpu 0 --queue A40 to launch the agent.
@<1523701087100473344:profile|SuccessfulKoala55> it works once i allow traffic to download.PyTorch.org from proxy. 🙂
Yes , machine is connected to on prem ClearML server.
Hi @<1562973095227035648:profile|ThoughtfulOctopus83> , what version of ClearML server are you running? Also what versions of clearml
& clearml-agent
And use also --foreground
without the detached option, to debug it
@<1523701087100473344:profile|SuccessfulKoala55> after enabling debug mode below are logs , just to let you know this agent do not have internet and pip packages are installed vis proxy which i can working but for pytorch it seems to going to internet "DEBUG:urllib3.connectionpool: http://api.clearml.domain.com:80 "GET /v2.5/tasks.started HTTP/1.1" 200 353
Executing task id [d3807deae2644e00824e774ff8997eaa]:
repository =
branch =
version_num =
tag =
docker_cmd =
entry_point = pytorch.py
working_dir = .
DEBUG:clearml_agent.commands.worker:Searching for python3.7
DEBUG:clearml_agent.commands.worker:Searching for python3
DEBUG:clearml_agent.commands.worker:Searching for python
WARNING:clearml_agent.commands.worker:Python executable with version '3.7' requested by the Task, not found in path, using '/usr/bin/python3' (v3.10.6) instead
NoneType: None
created virtual environment CPython3.10.6.final.0-64 in 134ms
creator CPython3Posix(dest=/home/adminvj/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/adminvj/.local/share/virtualenv)
added seed packages: pip==23.1, setuptools==67.6.1, wheel==0.40.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
INFO:clearml_agent.commands.worker:found literal script in script.diff
DEBUG:clearml_agent.commands.worker:selected execution directory: /home/adminvj/.clearml/venvs-builds/3.10/code
Looking in indexes: https://artifacts.domain.com/repository/pypi/simple
Ignoring pip: markers 'python_version < "3.10"' don't match your environment
Collecting pip<22.3
Using cached https://artifacts.domain.com/repository/pypi/packages/pip/22.2.2/pip-22.2.2-py3-none-any.whl (2.0 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 23.1
Uninstalling pip-23.1:
Successfully uninstalled pip-23.1
Successfully installed pip-22.2.2
Looking in indexes: https://artifacts.domain.com/repository/pypi/simple
Collecting Cython
Using cached https://artifacts.domain.com/repository/pypi/packages/cython/0.29.34/Cython-0.29.34-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.34
INFO:clearml_agent.commands.worker:Found task requirements section, trying to install
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): download.pytorch.org:443
1684151695770 worker:0 DEBUG DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): download.pytorch.org:443
1684151735969 worker:0 DEBUG DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): download.pytorch.org:443
1684151776174 worker:0 DEBUG DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): download.pytorch.org:443
1684151816358 worker:0 DEBUG DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): download.pytorch.org:443"
@<1523701087100473344:profile|SuccessfulKoala55> Any idea why it is going to internet only when I run training with PyTorch framework download.PyTorch.org
@<1562973095227035648:profile|ThoughtfulOctopus83> is this the end of the log? Nothing after it? How exactly are you launching the agent?
@<1562973095227035648:profile|ThoughtfulOctopus83> I assume this machine is also connected to the clearml server?
Also, can you try sending a GET request to the server using curl? Something like curl
None and sharing the result?