Reputation
Badges 1
23 × Eureka!fyi,
I set the options for HyperParameterOptimizer() like,
- compute_time_limit=None,
- total_max_jobs=100,
- min_iteration_per_job=NOne,
- max_iteration_per_job=NOne,
- max_number_of_concurrent_tasks=1
@<1523701070390366208:profile|CostlyOstrich36> here it is!
@<1722061354531033088:profile|TroubledCamel37> No, I didn't add "task.close()" in the code. This link is what I followed.
Even after completing one experiment, the console and UI don't seem to terminate the task.
@<1523701070390366208:profile|CostlyOstrich36> I keep failing to execute the task with clearml-agent because of environment setting.. How could I adjust clearml.conf file for the agent to use specific local environment?
@<1523701070390366208:profile|CostlyOstrich36> I didn't specify remote version. Where can I check the version and adjust?
@<1523701070390366208:profile|CostlyOstrich36> I'm using python 3.9.11 and pytorch 1.11.0+cu113.
@<1722061354531033088:profile|TroubledCamel37> Thanks! I'll look over the connectivity issue that you said.
@<1722061354531033088:profile|TroubledCamel37> but, I guess task.close()
would terminate the optimization task, not the single experiment. am I misunderstanding something? ðŸ˜
@<1523701070390366208:profile|CostlyOstrich36> Would you mind looking over this issue?
Yeah, the problem was about fileserver connection like you said!
I was running the experiment in remote server, and solved the issue by opening the port for fileserver! Thanks!
plus, the first experiment terminated with early stopping.
I've figured out what's wrong and fixed it! Thanks!
I figured out the metrics should be provided in list format.
Sorry for the late reply.
I believe this is the why it's not working (from console log):
adfba156d16e: Pull complete
Digest: sha256:0ce15c07d55860dfd2eeae535c42d85383a664821da5ff18d10448b5a2993e5a
Status: Downloaded newer image for ultralytics/yolov5:latest
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0...
@<1523701205467926528:profile|AgitatedDove14>
Thanks!
Would you mind walking me through the process?
Upon my understanding, first I'm gonna build a self-hosted server with docker on my windows computer.
Secondly, I'm gonna connect other windows computers with the server. To do that, I need a token from my server, so that I could copy and paste it when I execute the command clearml-agent init --token <my_token> --queue default
from 'other windows computers'.
Lastly, I just execute `c...
@<1523701070390366208:profile|CostlyOstrich36>
I have a follow-up question for the first question.
I initiated a task, did get_local_copy
of a dataset,
and then I executed and finished the task (training).
From web UI, I don't see any information saying that the task and dataset are related or linked.
What should I do to connect or link those two or find the information about it?
@<1523701070390366208:profile|CostlyOstrich36> Hi! Actually, I changed it to run the training in local environment now (not docker) !
I ran a queued task from web UI by clicking 'enqueue', and I got this error!
# Error logs
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manag...
I've found this from docs.
Am I not supposed to run the agent in docker mode on Windows computer?
But, I also would like to know how to run this with docker!
This is the log file!
Thanks a lot!!
@<1523701070390366208:profile|CostlyOstrich36>
My code is supposed to automatically clone a optimization task with template_task_id and execute each experiment. I didn't remove any lines from the logs.
When I run the code locally, I run it with a virtual environment activated. However, if I use clearml-agent daemon to execute the task, it seems like using a default docker image, and I don't know how to change the corresponding settings in the clearml.conf file!
@<1523701070390366208:profile|CostlyOstrich36>
Actually, I've got another questions about dataset!
I tried add_external_files
from AWS S3 as a simple test.
And in web UI, it says it's been uploading for 16hours now.
The zip file I tried to upload is under 50MB.
Is something wrong here?
Also, I'm wondering if I could add files that are not "zipped" files, for example a directory containing various files.