Reputation
Badges 1
606 × Eureka!CharmingPuppy6 These threads may also be interesting for you: https://clearml.slack.com/archives/CTK20V944/p1614867532303700 https://clearml.slack.com/archives/CTK20V944/p1617963053397600
Latest version for everything. I will message you again, if I encounter this problem again.
Okay. It works now. I don't know what went wrong before. Probably a user error 😅
Let me try it another time. Maybe something else went wrong.
Perfect! That sounds like a good solution for me.
I will create a minimal example.
Don't know whether I do something wrong. Locally it works, but when executed via queue I get:
` File "run_task.py", line 14, in <module>
main()
File "run_task.py", line 9, in main
printme = importlib.import_module("some_package.file_to_import").printme
File "/home/tim/.clearml/venvs-builds/3.7/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd...
When experimenting we use a entrypoint script which we pass the specific experiment to.
Nono, I got to thank you for this awesome tool!
@<1523701435869433856:profile|SmugDolphin23> Good catch. I have a good but unsatisfying message for you guys: I restarted the whole machine (server and agent) and now it works fine ...
With clearml==1.4.1 it works, but with the current version it aborts. Here is a log with latest clearml
Related to this: How does the local cache/agent cache work? Are the sdk.storage.cache
parameters for the agent? When are datasets deleted from cache? When are datasets deleted if I run local execution?
Yea, is there a guarantee that the clearml-agent will not crash because it did not clean the cache in time?
But this seems like something that is not related to clearml 🙂 Anyways, thanks again for the explanations!
Thanks for the answer. So currently the cleanup is done based number of experiments that are cached? If I have a few big experiments, this could make my agents cache overflow?
I mean, could my hard drive not become full at some point? Can clearml-agent currently detect this?
Makes sense, but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
To summarize: The scheduler should assign tasks the the agent first, which gives a queue the highest priority.
No. Here is a better example. I have two types of workstations: Type X can execute tasks of type A and B. Type Y can execute tasks of type B. This could be the case if type X workstations have for example more VRAM, newer drivers, etc...
I have two queues. Queue A and Queue B. I submit tasks of type A to queue A and tasks of type B to queue B.
Here is what can happen:
Enqueue the first task of type B. Workstations of type X will run this task. Enqueue the second task of type A. Workstation ...
I see. Thank you very much. For my current problem giving priority according to queue priority would kinda solve it. For experimentation I will sometimes enqueue a task and then later enqueue a another one of a different kind, but what happens is that even though this could be trivially solved, I will have to wait for the first one to finish. I guess this is only a problem for people with small "clusters" where SLURM does not make sense, but no scheduling at all is also suboptimal.
However, I...
I will read up on the services documentation then. Thank you very much for the help 🙂
Ah, now I see. This sounds like a good solution.
Wouldn't it be enough to just require a call to clearml-init
and throw an error when running without clearml.conf
which tells the user to run clearml-init first?
Thank you very much!
Yes, that works fine. Just the http vs https was the problem. The UI will automatically change s3://<minio-address>:<port>
to
http://<minio-address>:<port>
in http://myclearmlserver.org/settings/webapp-configuration . However what is needed for me is https://<minio-address>:<port>