Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.
I think this is the nargs="?" , is that right ?
Which means you currently save the argument after resolving and I'm looking to save them explicitly so the user will not forget to change some dependencies.
That is correct
I'm looking to save them explicitly so the user will not forget to change some dependencies.
Hmm interesting point. What's the use case for storing the values before the resolving ?
Do we want to store both ?
The main reason for storing the post resolve values, is that you have full visibility to the actual...
JuicyFox94 maybe you can help here?
Hmm I see, if this is the case, would it make sense to run the pipeline logic locally? (notice the pipeline compute, i.e. the components will be running on remote machines with the agents)
Check on which queue the HPO puts the Tasks, and if the agent is listening to these queues
Ohh then you do docker sibling:
Basically you map the docker socket into the agent's docker , that lets the agent launch another docker on the host machine.
You cab see an example here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L144
WittyOwl57 I can verify the issue reproduces! π !
And I know what happens, TQDM is sending an "up arrow" key, if you are running inside bash, that looks like CR (i.e. move the cursor to the begining of the line), but when running inside other terminals (like PyCharm or ClearML log) this "arrow-key" is just unicode character to print, it does nothing, and we end up with multiple lines.
Let me see if we can fix it π
Hmm worked now...
When Task.init called with output_uri=' s3://my_bucket/sub_folder 's3://my_bucket/sub_folder/examples/upload issue.4c746400d4334ec7b389dd6232082313/artifacts/test/test.json
hmm, yes it should create the queue if it's missing (btw you could work around that and create it in the UI). Any chance you can open a github issue in the clearml helm chart repo so we do not forget ?
itβs not implemented right,
I think we forgot to add it as an argument (the query models supports it, but it is not passed to the call)
ZanyPig66 this should have worked, any chance you can send the full execution log (in the UI "results -> console" download full log) and attach it here? (you can also DM it so it is not public)
Hi SteadyFox10 , unfortunately trains-agent currently supports only docker as a container solution (I guess they became the de-facto standard)
That said, there is the option of virtual environment, where the trains-agent installs everything inside a newly created virtual environment. That actually makes it quite easy to expand to other use cases. Essentially the docker option will spin a docker install trains-agent inside the docker and run it execute command.
Do you fee l...
But it does make me think, if instead of changing the optimizer I launch a few workers that "pull" enqueued tasks, and then report values for them in such a way that the optimizer is triggered to collect the results? would it be possible?
But this is Exactly how the optimizer works.
Regardless of the optimizer (OptimizerOptuna or OptimizerBOHB) both set the next step based on the scalars reported by the tasks executed by agents (on remote machines), then decide on the next set of para...
MelancholyElk85 I'm assuming you have the agent setup and everything in the example code works, is that correct ?
Where is it failing on your pipelines ?
WittyOwl57 this is what I'm getting on my console (Notice New lines! not a single one overwritten as I would expect)
` Training: 10%|β | 1/10 [00:00<?, ?it/s]
Training: 20%|ββ | 2/10 [00:00<00:00, 9.93it/s]
Training: 30%|βββ | 3/10 [00:00<00:00, 9.89it/s]
Training: 40%|ββββ | 4/10 [00:00<00:00, 9.87it/s]
Training: 50%|βββββ | 5/10 [00:00<00:00, 9.87it/s]
Training: 60%|ββββββ | 6/10 [00:00<00:00, 9.88it/s]
Training: 70%|βββββββ | 7/10 [00:00<00...
Hi @<1523722267119325184:profile|PunySquid88> I guess it's a good thing we talk, because I believe that what you are looking for is already available :)
Logger.current_logger().report_media('title', 'series', iteration=1337, local_path='/tmp/bunny.mp4')
This will actually work on any file, that said, the UI might display the wrong icon (which will be fixed in the next version).
We usually think of artifacts as data you want to reuse, so all the files uploaded there are accessibl...
Hi @<1715900788393381888:profile|BitingSpider17>
Notice that you need __ (double underscore) for converting "." in the clearml.conf file,
this means agent.docker_internal_mounts.sdk_cache will be CLEARML_AGENT__AGENT__DOCKER_INTERNAL_MOUNTS__SDK_CACHE
None
Then running by using the
, am I right?
yep
I have put the
--save-period
while running Yolov5 and ClearML does not save the weight per epoch that I have trained. Why is this happened?
But do you still see it in the clearml UI ? do you see the models logged in the clearml UI ?
You will be able to set it.
You will just not see the output in the console log , but everything is running and being executed
I guess I got confused since the color choices in
One of the most beloved features we added π
According to you the VPN shouldn't be a problem right?
Correct as long as all parties are on the same VPN it should work, all the connections are always http so basically trivial communication
why would root cause the user to become nobody with group nogroup?
It is exactly the case, they inherit the cron service user (uid/gid) which would look like nobody/nogroup
I think prefix would be great. It can also make it easier for reporting scalars in general
Actually those are "supposed" to be collected automatically by pytorch and reported by the master node.
currently we need a barrier to sync all nodes before reporting a scalar which makes it slower.
Also "should" be part of pytorch ddp
It's launched with torchrun
I know there is an integration with torchrun (the under the hood infrastructure) effort, I'm not sure where it stands....
btw: I'm assuming that args is not the ArgParser object, as the ArgParser is automatically "connected" ?
SmugLizard25 are you saying that with the latest version it does not work?
Thanks StrongHorse8
Where do you think would be a good place to put a more advanced setup? Maybe we should add an entry for DevOps? Wdyt?
UnevenDolphin73 something like this one?
https://github.com/allegroai/clearml/pull/225