This is the snippet that works for me. Please be aware that I use a custom Task.init call at the start of my script ( repo/main_scripts/train.py
), so if you don't do that you need to set add_task_init_call to True.
try:
repo = os.popen('git remote -v').read().strip().split('\n')
if len(repo) > 2:
raise RuntimeError('More than one git repository found')
repo = repo[0].split('\t')[-1].split(' ')[0]
branch = os.popen('git rev-parse --abbrev-ref HEAD').read().strip()
commit = os.popen('git rev-parse HEAD').read().strip()
except Exception as e:
logging.error(f'Error getting git repository: {e}')
raise RuntimeError('Error getting git repository')
clearml.Task.force_requirements_env_freeze(True)
# Create a task in ClearML
task = clearml.Task.create(
project_name=parameters['experiment']['project_name'],
task_name=parameters['experiment']['experiment_name'],
task_type=task_type,
repo=repo,
branch=branch,
commit=commit,
packages=True,
script=script,
add_task_init_call=False,
)
task.set_tags(parameters['experiment']['tags'])
task.set_script(repository=repo, branch=branch, entry_point=script)
task.connect_configuration(parameters)
Ah yeah I also encountered this, that was actually one of the reasons we did not move over to fully incorporate docker Agents into our workflow. If you find a soliution, I would be also very curious. Maybe @<1523701070390366208:profile|CostlyOstrich36> has a solution
Do you get any errors from the container? Do you have any limitations? Can you share the docker run
command (the first output in the log)
Hi,
I think the repo has to be the git location, not your local path, so something like
git@gitlab.com/repo_name or git@github.com:Project-MONAI/VISTA.git
You can run git remote -v
in the command line to find what your current repo is.
Thanks. I tried that too but it doesn't make a difference. Still hangs after building the container and installing the clearml-agent package. Any idea what the next step is supposed to be?
Thanks again. This works for me if I don't use the docker parameter in the create
. However, when I have a docker queue and pass the docker image name, it will build the container successfully but won't do anything else. We want to use the image as the task environment and run the script from the pulled repo in that environment. Is the expectation that the entry point script will be invoked from the Docker image itself?
Hi @<1835851157679902720:profile|CumbersomeBluewhale64> , can you try running this one?
from clearml import Task
# create an dataset experiment
task = Task.init(project_name="examples", task_name="my task")
task.set_base_docker("python:3.9-slim")
# only create the task, we will actually execute it later
task.execute_remotely(queue_name=<placeholder>, clone=False)
print("hello world")
Dont forget to change the queue name
Hi @<1744891825086271488:profile|RoundElephant20> . I get the same behavior when I push your task to the docker queue. The worker tries to run the task in the python container. The docker run
command hangs after installing the clearml-agent
package. This is all in my local WSL environment. The agent was launched with the following command: clearml-agent daemon --queue default --docker
.