Reputation
Badges 1
25 × Eureka!I mean manually you can get the results and rescale but, not through the UI
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call trains.init in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
PompousParrot44 Enterprise licensing pricing usually custom tailored to the size of the company and based on usage. If you are interested feel free to leave details in the "contact us" form on the website, and someone from sales will contact you shortly after.
I meant even just a link to a blank comparison and one can then add the experiments from that view
Just making sure you are aware, once you are in comparison you can always add Tasks (any Task):
Notice you can press on the "Add experiments", then select Any experiment (including all projects! as filters)
Notice you need to remove all filters (right side red x on the filter Icon)
and the clearml server version ?
JitteryCoyote63 This is odd you have both python3.9 and python3.8 on the container, and since it says (probably) ob the task the agent should run python3.9 it's trying to use it for creating the enthronement (it does not matter that agent is using python3.8).1673431344706 agent-1 DEBUG /usr/bin/python3.9 /usr/bin/python3.9: No module named pipThis is the main issue, pip is missing for python3.9 and this is why it reverted to python 3.8 when it was setting the environment.
It should prob...
But adding a simpleΒ
force_download
Β flag to theΒ
get_local_copy
That's sounds like a good idea
JitteryCoyote63 good news
not trains-server error, but trains validation error, this is easily fixed and deployed
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... π
I'll make sure we fix it on the next version
Hi LittleShrimp86
just to login into your clearml app (demo or server) so I can run python files related to clearml.
I think this amounts to creating a Task and enqueueing it, am I understanding correctly ?
Hmm can you run the agent in debug mode, and check the specific console log?
'''
clearml-agent --debug daemon --foreground ...
Hi @<1523701181375844352:profile|ExasperatedCrocodile76>
the docker containers should get the host IP, not the internal docker IP. what am I missing ?
Is it not possible to say just look at my requirements.txt file and the imports in the script?
I think there is a GitHub Issue for this feature
(basically the issue is, requirements.txt are very often not updated, and have no real version lock, so replicating a working env is always safer)
LOL, could be there is no remote repository ?
Okay I have an idea, it could be a lock that another agent/user is holding on the cache folder or similar
Let me check something
The easiest way would be to rename a queue to "1xgpu 16gb", then make sure only machines with >16gb GPUs listen to it.
Note that an agent can listen to Multiple queues
Do I set theΒ
CLEARML_FILES_HOST
Β to the end point instead of an s3 bucket?
Yes you are right this is not straight forward:CLEARML_FILES_HOST=" s3://minio_ip:9001 "
Notice you must specify "port" , this is how it knows this is not AWS. I would avoid using an IP and register the minio as a host on your local DNS / firewall. This way if you change the IP the links will not get broken π
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?
TenseOstrich47
I noticed that with one agent, only one task gets executed at one time
Yes you can π
Also, you are correct, a single agent will run a single Task at a time, that said you can have multiple agents running on the same machine, and when you launch them you specify which GPUs they use (in theory they can share the same GPU, but your code might not like it π )
You can see a few examples here:
https://github.com/allegroai/clearml-agent#running-the-clearml-agent
Hi @<1630377234361487360:profile|RoughSeaturtle43>
code from gitlab repo with ssl cert.
what do you mean by ssl secret? is it SSH or app-token ?
NastyFox63 ask SuccessfulKoala55 tomorrow, I think there is a way to change the default settings even with the current version.
(I.e. increase the default 100 entries limit)
is no agent listening to the "k8s_scheduler"
There should not be one, this is purely "virtual" , so users understand the k8s cluster is spinning their pod (sometimes it takes time, imagine EKS etc. , just visibility)
unfortunately I can't get info from the cluster
You should be able the pod in the cluster no?!
What's the Task Info panel say, can you share a screen shot ?