Reputation
Badges 1
979 × Eureka!Thanks TimelyPenguin76 and AgitatedDove14 ! I would like to delete artifacts/models related to the old archived experiments, but they are stored on s3. Would that be possible?
Hi AgitatedDove14 , sorry somehow this message got lost π
clearml version is the latest at the time, 1.7.1
Yes, I always see the "model uploaded completed" for such stuck tasks I am using python 3.8.10
yes but they are in plain text and I would like to avoid that
How exactly is the clearml-agent killing the task?
The task I cloned from is not the one I though
mmmh good point actually, I didnβt think about it
AgitatedDove14 I see that the default sample_frequency_per_sec=2.
, but in the UI, I see that there isnβt such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30.
?
Nice, thanks!
and just run the same code I run production
I found, the filter actually has to be an iterable:Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type=["training"])))
Note: I can verify that post_packages is well picked up by the trains-agent, since in the experiment log I see:agent.package_manager.type = pip agent.package_manager.pip_version = \=\=20.2.3 agent.package_manager.system_site_packages = true agent.package_manager.force_upgrade = false agent.package_manager.post_packages.0 = PyJWT\=\=1.7.1
The rest of the configuration is set with env variables
Oof now I cannot start the second controller in the services queue on the same second machine, it fails with
` Processing /tmp/build/80754af9/cffi_1605538068321/work
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work'
clearml_agent: ERROR: Could not install task requirements!
Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r'...
Alright SuccessfulKoala55 I was able to make it work by downgrading clearml-agent to 0.17.2
Actually I think I am approaching the problem from the wrong angle
Can I simply set agent.python_binary = path/to/conda/python3.6
?
Ok, deleting installed packages list worked for the first task
in clearml.conf:agent.package_manager.system_site_packages = true agent.package_manager.pip_version = "==20.2.3"
SInce it fails on the first machine (clearml-server), I try to run it on another, on-prem machine (also used as an agent)
After I started clearml-session
I execute the clearml-agent this way:/home/machine/miniconda3/envs/py36/bin/python3 /home/machine/miniconda3/envs/py36/bin/clearml-agent daemon --services-mode --cpu-only --queue services --create-queue --log-level DEBUG --detached
might be worth documenting π
Hi AgitatedDove14 , thanks for the answer! I will try adding 'multiprocessing_context='forkserver' to the DataLoader. In the issue you linked, nirraviv mentionned that forkserver was slower and shared a link to another issue https://github.com/pytorch/pytorch/issues/15849#issuecomment-573921048 where someone implemented a fast variant of the DataLoader to overcome the speed problem.
Did you experiment any drop of performances using forkserver? If yes, did you test the variant suggested i...
Hi SuccessfulKoala55 , not really wrong, rather I don't understand it, the docker image with the args after it
here is the function used to create the task:
` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...
In execution tab, I see old commit, in logs, I see an empty branch and the old commit