Reputation
Badges 1
606 × Eureka!Seems to happen only while the cleanup_service is running!
Yea, something like this seems to be the best solution.
Thanks for researching this issue. If you have time, you can create the issue since you are way more knowledgeable, but I can also open it if you do not have time 🙂
Hard to answer now. I just wiped everything and reinstalled. If I encounter this problem again, I will investigate further.
Let me try it another time. Maybe something else went wrong.
name: core
channels:
- pytorch
- anaconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- blas=1.0
- bzip2=1.0.8
- ca-certificates=2020.10.14
- certifi=2020.6.20
- cloudpickle=1.6.0
- cudatoolkit=11.1.1
- cycler=0.10.0
- cytoolz=0.11.0
- dask-core=2021.2.0
- decorator=4.4.2
- ffmpeg=4.3
- freetype=2.10.4
- gmp=6.2.1
- gnutls=3.6.13
- imageio=2.9.0
- jpeg=9b
- kiwisolver=1.3.1
- lame=3.100
- lcms2=2.11
-...
Nvm, I think its my mistake. I will investigate.
` args = parser.parse_args()
print(args) # args PRINTED HERE ON LOCAL
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Task
task = (
Task.init(
project_name="reinforcement-learning/" + type_name,
task_name=args.name or preset_name,
tags=...
Good, at least now I know it is not a user-error 😄
So missing args that are not specified are not None
like intended, but just do not exists in args
. And command is a list instead of a single str.
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command
is different and wrong on remote.
Args
is similar to what is shown in print(args)
when executed remotely.
No problem in my case at least.
And in the WebUI I can see arguments similar to the second print statement's.
Ok. I just wanted to make sure I have configured my agent properly. Just want to make sure I have to set it on all agents.
So I just tried again, but with manual deleting via Web UI.
I have no idea whether it is a user error or because of the clearml-server update...
In the WebUI it just shows that an error happened after the loading bar has been running for a while.
I tried to delete the same tasks again and this time, it instantly confirmed deletion and the tasks are gone.
Seems possible because I didn't know I had to specify an entrypoint somewhere. I will do some additional tests.
Okay, thanks for the info! I am currently not using k8s, but may be good to know for the future.
I installed as told on pytorch.org : pip3 install --pre torch torchvision torchaudio --index-url
None
For me this does not work (at least with nested tqdm bars, did not try single ones yet).
Thanks a lot. I somehow missed this.