Nvm. I forgot to start my agent with --docker . So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
Thank you very much. I am going to try that.
Btw: I think Task.init is more confusing than Task.create and I would rather rename the former.
But you can manually add them with Task.add_requirements, no?
In my opinion an ugly solution. I would have to keep track of which requirements are missing. Then I would rather just add all requirements manually.
My code is in classes, indeed. But I have more than one model. Actually, all the things that people store in for example yaml or json configs I store in python files. And I do not want to statically import all the models/configs.
AgitatedDove14 Yes, you understood correctly. But Task.create is used by Task.init something like this, right?
` def init(project_name, task_name):
if not Task.exists_already(project_name, task_name):
task = Task.create(...)
else:
task = load_existing_task()
return task `
Perfect! That sounds like a good solution for me.
What I am trying to do it install thistorch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpuIs this what you mean by specific build?
Can you actually reproduce my problem when also using conda_freeze: true ?
Could you guide me to the documentation for using the docker file? I am not able to find it. I only found task.set_base_docker which I am not sure what it does.
That seems to be the case. After parsing the args I run task = Task.init(...) and then task.execute_remotely(queue_name=args.enqueue, clone=False, exit_process=True) .
Okay, thanks for explaining!
@<1523701435869433856:profile|SmugDolphin23> Good catch. I have a good but unsatisfying message for you guys: I restarted the whole machine (server and agent) and now it works fine ...
name: core
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- blas=1.0
- bzip2=1.0.8
- ca-certificates=2020.12.5
- certifi=2020.12.5
- cudatoolkit=11.1.1
- ffmpeg=4.3
- freetype=2.10.4
- gmp=6.2.1
- gnutls=3.6.13
- jpeg=9b
- lame=3.100
- lcms2=2.11
- ld_impl_linux-64=2.33.1
- libedit=3.1.20191231
- libffi=3.3
- libgcc-ng=9.3.0
- libiconv=1.16
- libpng=1.6.37
- libstdcxx-ng=9.3.0
- libtiff...
I think in the paid version there is this configuration vault, so that the user can pass their own credentials securely to the agent.
Can you maybe also tell me which docker image you used? For me this is all not working unfortunately
I got the error again. Seems to happen only when I try to delete "large" experiments.
Is it possible to set extra-index-url on a per-task basis? Just asking because of the way you wrote it with the two dashes 🙂
I am currently on the move, but it was something like upstream server not found in /etc/nginx/nginx.conf and if I remember correctly line 88
Sure, no problem!
I see. But I just realized: Subsampling means you just show every nth datapoint, right? I still do not get why this leads to some 0.5 values when in my plot there should only be 0 and 1.
btw: Could you check whether agent.package_manager.system_site_packages is true or false in your config and in the summary that the agent gives before execution?
I start my agent in --foreground mode for debugging and it clearly show false , but in the summary that the agent gives before the task is executed, it shows true .
Thanks! I am fascinated by what you guys offer with clearml 🙂
Thanks for answering, but I still do not get it. file_history_size decides how many past files are shown? So if file_history_size=100 and I have 1 image/iteration and ran 1000 iterations, I will see images for iteration 900-1000?
Haha, fortunately I have a good job already. Just wanted to know how many people are actively working on clearml.
You suggested this fix earlier, but I am not sure why it didnt work then.