Okay that kind of makes sense, now my followup question is how are you using the ASG? I mean the clearml autoscaler does not use it, so I just wonder on what the big picture, before we solve this little annoyance 🙂
RobustRat47 I think you have to use the latest clearml package for that (1.6.0)
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure 🙂
TrickyRaccoon92 the title provided by write.scalars is also a representing string for the specific metric. This is more than just a title on the plot itself.
It means that this will be the name of the scalar metric (title/series combination) .
Is that your intention, or is it for viewing purpose only?
Go to the workers & queues, page right side panel 3rd icon from the top
In the UI you can edit the base container image + add "SETUP SHELL SCRIPT", with any missing "apt update && apt-get install -y ..."
What should have happened is the experiments should have been pending (i.e. in a queue)
(Not sure why they are not).
You can manually send them for execution , right click on an experiment in the able, select enqueue and select the default queue (This will be the one the trains-agent will pull from , by default)
I am very confused now, I tried switch to my local machine and change the clearml.conf.
It only partly worked :
Notice that the Dataset.get (...) is downloading an artifact that was uploaded before, basically it gets the full URL and downloads the data. it seems the original dataset uploaded to "localhost:8081", could that be the case?
nfs version 3
That's the thing, NFS will automatically set file access and flags based on the mount options you cannot change them post mount.
How about creating a new user just for the agent, it makes sense from security / credentials perspective
as a backup plan: is there a way to have an API key set up prior to running docker compose up?
Not sure I follow, the clearml API pair is persistent across upgrades, and the storage access token are unrelated (i.e. also persistent), what am I missing?
SuccessfulKoala55 please post here once the code is available in your pytorch_ignite 🙂
What do you have under the "installed packages" ?
Hi UpsetTurkey67
The status that you see on the graph is fetched from the pipeline itself (for example cached), I think that what happened is that the pipeline Logic has yet to update itself on the status of the running component. If the pipeline is indeed running, it should update the status shortly (actually you can set the polling frequency for that). If for some reason the pipeline Task died than indeed this is an odd state (that we should probably fix in the UI)
basically
would allow blocking the machine from being scaled-in when
Oh this is what I was missing 🙂 That makes sense to me!
So what you are saying is that the AWS autoscaler agent, when it is launching a Task, inside the container you will set "protection flag" when the Task ends, you will unset "protection flag"
Is that correct?
BeefyCow3 On the plot itself click on the json download button
I think the reason is that the "original" task is already the right type. I'll make sure we fix it, and always set the system tag
Makes total sense!
Interesting, you are defining the sub-component inside the function, I like that, this makes the code closer to how this is executed!
…every user in the server has the same credentials, and they don’t need to know them..makes sense?
Make sense, single credentials for everyone, without the need to distribute
Is that correct?
So this should be easier to implement, and would probably be safer.
You can basically query all the workers (i.e. agents) and check if they are running a Task, then if they are not (for a while) remove the "protection flag"
wdyt?
Could you try to clone the clearml git repo, create a new notebook in it and test ?
https://clear.ml/docs/latest/docs/references/sdk/task#mark_stopped
Maybe we should add an argument so you could do:mark_stopped(force=False, message='it was me who stopped it')And we will automatically add the user name as well ?
Can I delete logs from existing experiments on the ClearML server?
Only by resetting the Task (which would delete everything), or deleting the Task iteself.
You can also disable the auto console log, and report manually ?
Task.init(..., auto_connect_streams=False)
is there a built in programmatic way to adjust
development.default_output_uri
?
How about: In your Task.init(output_uri='...')
And maybe adding idle time spent without a job to API is not that a bad idea 😉
yes, adding that to the feature list 🙂
What if I write the last active state in an instance tag? This could be a solution…
I love this hack, yes this should just work.
BTW: if you lambda is a for loop that is constantly checking there is no need to actually store "last idle timestamp check as tag", no?
@<1523701079223570432:profile|ReassuredOwl55>
Hey, here’s a quickie – is it possible to specify different “types” of input parameters (“Args/…“) such that they are handled nicely on the front end?
You me cast / checked in the UI ?
LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data 🙂
OddAlligator72 quick question:
suggest that you implement a simple entry-point API
How would the system get the correct packages / git repo / arguments if you are only passing a single function entrypoint ?
You might be able to write a script to override the links ... wdyt?