TimelyPenguin76 Thanks for you suggestion! I’ve considered using that but, from what I understood, it doesn’t offer the same functionalities.
For example, by creating the task within the script it already identifies the branch/commit being used and also includes uncommitted changes. I also have auxiliary functions to guarantee that the experiments go to the correct projects within ClearML according to the script that is being used. It would also require a significant change to the command line I use for test runs on my dev machine to the one used to running it on the training machine.
SteepDeer88 which clearml
version are you using?
and how did you run this script? just from the CLI? PyCharm? which OS?
Hi SteepDeer88 ,
You can use https://clear.ml/docs/latest/docs/apps/clearml_task for this, what do you think?
Hi SweetBadger76 , I have not been able to deal with this issue yet. I am getting all sorts of weird behaviours which are likely due to some miss configuration of my ClearML agents or of the experiments I am trying to run. The latest one is that ClearML agents are ignoring my --docker
flag and running everything on the host machine using an env. On this, can you clarify something for me: if I clone an experiment, will the configs on the experiment overwrite the ones from the agent? For example, if the experiment I am cloning has no docker image and parameters set, will that make the agent ignore the ones I set in clearml.conf
?
Hi, SteepDeer88
For example, if the experiment I am cloning has no docker image and parameters set, will that make the agent ignore the ones I set in
clearml.conf
?
No, the experiment should run in docker mode if the agent was run with --docker
mode
TimelyPenguin76 SweetBadger76 thanks for the support!
I ran the script on the terminal (powershell) using a command similar to python -m <path>.<to>.<module> --arg1 <arg1_value> --arg2 <arg2_value> ...
I ran it on Windows. The ClearML server is running on Ubuntu.
I will create a minimal program that reproduces the error and come back to you (I will also test both on WSL and Ubuntu to have a better idea if it is OS specific)
hey SteepDeer88
did you managed to get rid of that issue or you still need support on it ?
Update on this one, I noticed I had different versions of clearml
in my dev machine and the training machine (host and container). Updating both to the latest 1.4.1
caused a different error (related to the other question I posted in the channel) where it tries to install the packages from my dev machine (windows) in the docker container used in the training machine (ubuntu container, ubuntu host). The main issue I am trying to get around now is that I use pycocotools
which has a package specific for windows ( pycocotools-windows
)
This happens even though I am setting the env var CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/usr/bin/python
Hi SteepDeer88
I wrote this script to try to reproduce the error. I am passing there +50 parameters and so far everything works fine. Could you please give me some more details about your issue, so that we could reproduce it ?
from clearml import Task
import argparse
'''
COMMAND LINE:
python -m my_script --project_name my_project --task_name my_task --execute_remotely true --remote_queue default --param_1 parameter --param_2 parameter <...etc>
'''
parser = argparse.ArgumentParser()
parser.add_argument("--project_name")
parser.add_argument("--task_name")
parser.add_argument("--execute_remotely")
parser.add_argument("--remote_queue")
#adding 50 arguments
for i in range(1, 51):
str = f"--param_{i}"
parser.add_argument(str)
args = parser.parse_args()
task = Task.init(project_name=args.project_name,
task_name=args.task_name,
output_uri=True,
reuse_last_task_id=False)
if args.execute_remotely:
task.execute_remotely(queue_name=args.remote_queue,
clone=False,
exit_process=True)