
Reputation
Badges 1
25 × Eureka!I'm hoping we are ready to release
seems like the server returned 400 error, verify that you are working with your trains-server and not the demoserver :)
@<1523701868901961728:profile|ReassuredTiger98> how did you install the nightly locally ?
Can you also provide the full log?
I'm not able to compare the tables of two experiments, is it a known issue ?
How so? they should appear one next to the other, the content of the two tables is not "really" compared, the standard is too complicated π (apparently this is far from trivial)
hey, that worked! what library is being used that reads that configuration?
It's passed to boto3, but the pyhon interface and aws cli use different configuration, I guess, because otherwise it should have worked...
Sorry that was a reply to:
Otherwise I could simply create these tasks with Task.init,
Okay there should not be any difference ... π
JitteryCoyote63 Hmmm in theory, yes.
In practice you need to change this line:
https://github.com/allegroai/clearml/blob/fbbae0b8bc933fbbb9811faeabb9b6d9a0ea8d97/clearml/automation/aws_auto_scaler.py#L78
` python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker} --gpus 0 --detached
python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker} --gpus 1 --detached
python -m clearml_agent --config-file '/root/clearml.conf' d...
Just to make sure I understand, running locally creates the Args/command correctly, then when actually executed on the remote machine (i.e. execute_remotely creates the correct Args/command But when the agent actually executes it) it updates back the Args/command as a list. Is that a correct description ?
in my repo I maintain a bash script to setup a separate python env.
Hmm interesting, now I have to wonder what is the difference ? meaning why doesn't the agent build a similar one based on the requirements ?
without the ClearML Server in-between.
You mean the upload/download is slow? What is the reasoning behind removing the ClearML server ?
ClearML Agent per step
You can use the ClearML agent to build a socker per Task, so all you need is just to run the docker. will that help ?
inΒ
Β issues a delete command to the ClearML API server,...
almost, it issues the boto S3 delete commands (directly to the S3 server, not through the cleaml-server)
And that I need to enter an AWS key/secret in the profile page of the web app here?Β (edited)
correct
I am trying to see if the user can submit a list of resource requirements (e.g 4GPUs, 12 cores, 100GB diskspace)
This will be quite easy to implement using the cleamrl k8s glue, just use user-properties and change the template based on it. I can point to where you need to modify the code
But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
when I run it on my laptop...
Then yes, you need to set the default_output_uri
on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?
. I'm trying to run to get a task to run using a specific docker image and to source a bash script before execution of the python script.
Are you running an agent in docker mode ? if so you should be able to see the Output of your bash script first thing in the log
(and it will appear in the docker CMD)
The idea of queues is not to let the users have too much freedom on the one hand and on the other allow for maximum flexibility & control.
The granularity offered by K8s (and as you specified) is sometimes way too detailed for a user, for example I know I want 4 GPUs but 100GB disk-space, no idea, just give me 3 levels to choose from (if any, actually I would prefer a default that is large enough, since this is by definition for temp cache only), and the same argument for number of CPUs..
Ch...
Hi EnviousStarfish54
Color coding on the entire UI is stored per user (I think that on your local cookies, but I might be wrong). Anyhow any title/series combination will have the select color regardless of the project.
This way you can configure once that loss is red and accuracy is green, etc.
it is just local copy so you can rerun and reconfigure
Nice, that seems to be the issue. Any chance you can open a GitHub issue, so we do not loose track of it ?
TenseOstrich47 as long as on the machine running the agent has credentials to your ECR, when the agent will run Any docker container, it will able to pull it. There is no need to manually change anything, notice the Task itself contains the name of the image it will use
Hover near the edge of the plot, the you should get a "bar" you can click on to resize
DeliciousBluewhale87 Yes I think so, do notice that you might end up with maximum of 12 pods.
You can also do the following with max 10 nodes: (notice --queue can always get a list of nodes it will pull based on the order of the queues)python k8s_glue_example.py --queue high_priority_q low_priority_q --ports-mode --num-of-services 10