Reputation
Badges 1
11 × Eureka!@<1523701205467926528:profile|AgitatedDove14>
Thanks!
Would you mind walking me through the process?
Upon my understanding, first I'm gonna build a self-hosted server with docker on my windows computer.
Secondly, I'm gonna connect other windows computers with the server. To do that, I need a token from my server, so that I could copy and paste it when I execute the command clearml-agent init --token <my_token> --queue default
from 'other windows computers'.
Lastly, I just execute `c...
@<1523701070390366208:profile|CostlyOstrich36>
I have a follow-up question for the first question.
I initiated a task, did get_local_copy
of a dataset,
and then I executed and finished the task (training).
From web UI, I don't see any information saying that the task and dataset are related or linked.
What should I do to connect or link those two or find the information about it?
@<1523701070390366208:profile|CostlyOstrich36>
Actually, I've got another questions about dataset!
I tried add_external_files
from AWS S3 as a simple test.
And in web UI, it says it's been uploading for 16hours now.
The zip file I tried to upload is under 50MB.
Is something wrong here?
Also, I'm wondering if I could add files that are not "zipped" files, for example a directory containing various files.
Sorry for the late reply.
I believe this is the why it's not working (from console log):
adfba156d16e: Pull complete
Digest: sha256:0ce15c07d55860dfd2eeae535c42d85383a664821da5ff18d10448b5a2993e5a
Status: Downloaded newer image for ultralytics/yolov5:latest
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0...
I've figured out what's wrong and fixed it! Thanks!
@<1523701070390366208:profile|CostlyOstrich36> I didn't specify remote version. Where can I check the version and adjust?
@<1523701070390366208:profile|CostlyOstrich36> I'm using python 3.9.11 and pytorch 1.11.0+cu113.
@<1523701070390366208:profile|CostlyOstrich36> Hi! Actually, I changed it to run the training in local environment now (not docker) !
I ran a queued task from web UI by clicking 'enqueue', and I got this error!
# Error logs
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manag...
I've found this from docs.
Am I not supposed to run the agent in docker mode on Windows computer?
But, I also would like to know how to run this with docker!
This is the log file!
Thanks a lot!!