
Reputation
Badges 1
25 × Eureka!Right, if this is the case, then just use 'title/name 001'
it should be enough (I think this is how TB separates title/series or metric/variant )
This is part if a more advanced set of features of the scheduler, but only available in the enterprise edition 🙂
OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?
okay let's PR this fix ?
Hi ObnoxiousStork61
Is it possible to report ie. validation scalars but shifted by 1/2 iteration?
No 😞 these are integers
What's the reason for the shift?
I'm also curious 🙂
Yes 🙂
BTW: do you guys do remote machine development (i.e. Jupyter / vscode-server) ?
The bug was fixed 🙂
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
(I suspect you are correct, but I'm missing some information in order to understand where the problem is)
WackyRabbit7 can you send mock code that explains how you create the pipeline ?
Hi MortifiedDove27
Looks like there is a limit of 100 images per experiment,
The limit is 100 unique combination of title/series per image.
This means that changing the title or the series name will add 100 more images (notice the 100 limit is for previous iterations)
Hi QuaintPelican38
Can you ssh to {instance_public_ip_address}:10022 (something like ssh -p 10022 user@IP_HERE
)?
Basically just getting the password prompt means you are okay.
I suspect that you have some AWS security definition (firewall) that prevents a direct access to the instance, could that be?
I set up the alert rule on this metric by defining a threshold to trigger the alert. Did I understand correctly?
Yes exactly!
Or the new metric should...
basically combining the two, yes looks good.
If you spin two agent on the same GPU, they are not ware of one another ... So this is expected behavior ...
Make sense ?
GrievingTurkey78 sure, aws autoscaler can do that:
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
Hi EagerOtter28
The agent knows how to do the http->ssh conversion on the fly, in your cleaml.conf (on the agent's machine) set force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25
Ohh sorry. task_log_buffer_capacity
is actually internal buffer for the console output, on how many lines it will store before flushing it to the server.
To be honest, I can't think of a reason to expose / modify it...
you should have something like 192.168... or 10.0 ....
See here:
https://pip.pypa.io/en/stable/user_guide/#environment-variables
Pass these environment variables as part of the YAML template you are using with the k8s.
Should work for both 🙂
But the git apply failed, the error message is the "xxx already exists in working directory" (xxx is the name of the untracked file)
DefeatedOstrich93 what's the clearml-agent
version?
GrievingTurkey78
Both are now supported, they basically act the same way 🙂
and log overrides + the final omegaconf
I get gaps in the graphs.
For example, the first time I run, I create a task and run a loop:
Hi SourOx12
Is this related to this one?
https://github.com/allegroai/clearml/issues/496
Guys I think I lost context here 🙂 what are we talking about? Can I help in anyway ?
Hmmm could you attach the entire log?
Remove any info that you feel is too sensitive :)
FiercePenguin76
So running the Task.init from the jupyter-lab works, but running the Task.init from the VSCode notebook does not work?
That's a very neat solution! maybe there's a way to inject "Task.init" into the code through a plugin, or worst case push it into some internal base package, and only call it when the code is orchestrated automatically (usually there is a an environment variable that is set to signal that, like CI_something )
Well done man!
Regulatory reasons and proprietary data is what I had in mind. We have some projects that may need to be fully self hosted in the end
If this is the case then, yes do self-hosted, or talk to clearml sales to get the VPC option, but SaaS is just not the right option
I might take a look at it when I get a chance but I think I'd have to see if ClearML is a good fit for our use case before I can justify the commitment
I hope it is 🙂