 
			Reputation
Badges 1
32 × Eureka!FriendlySquid61  Your solution seems to have solved the problem. But only after I removed the  export CLEARML_API_HOST={api_server}export CLEARML_WEB_HOST={web_server}export CLEARML_FILES_HOST={files_server}
command from the bash script executed when the instance is launched
Yes the workaround it's working 🙂
Hi  TimelyPenguin76 , I used  api_client.tasks.create   and It works, thank you!
Hi AgitatedDove14 , do you mean the the k8s glue autoscaler here https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py ? If yes, I understood that this service deploys pods on the nodes in the cluster, but I'd prefer to have a new instance deployed for each new experiment and that it also terminates when no new experiments are queued
Actually I had the same issue even with that value set to False
I've just seen it is a know issue https://clearml.slack.com/archives/CTK20V944/p1611763839133700 . Has a new version been released meanwhile?
Also, if I want to modify another parameter, e.g. ui.height I have this problem:
Hi  AgitatedDove14 , sorry for the late reply. Btw, I tried with the latest RC and the issue is still there. So if I clone an experiment, modify an overrides params eg  ['training.max_epochs=10']   my experiment run the old configuration. Therefore it seems that it doesn't change the OmegaConf configuration.
Hi  AgitatedDove14 , you can try with this toy example. If i run the task with  python example.py ui.width=2048  the task will run correctly and print  Title=My app, size=2048x768 pixels  . However, in the UI I'm not allowed to change the ui.width in the Hydra parameters section: the 'Save' button is frozen
Because at the moment I'm having a problem with the s3fs package where I have it in my requirements.txt but the import manager at the entry point doesn't install it
My problem right now is that Pytorch Lightning need the s3fs package to store model checkpoint into s3 buckets, but in my "installed packages" is not imported and I get an import error
Ok now I noticed that If I change the value of the port inside the Hydra parameters section ( not the overrides) It does actually change also in the experiment. The overrides doesn't seem to be working
However, If I edit directly the OmegaConf in the UI than the port changes correctly. I'd still prefer to override the Args so I can change entire sub-configuration e.g.  ['dataset=cifar']   to  ['dataset=imagenet']  instead of having to change all the parameters inside the OmegaConf
Nice, I didn't know that 🙂
Nice, I'll try also with the extra_bash_script, thank you!
Hi TimelyPenguin76 , I tried your approach and it works, thank you! However it's a bit different to what I was trying to do: instead of cloning an existing task I'd like to specify the repository and a specific commit tag to use as it is done in Task.create. If this is possible with the API client it would be perfect
Hi  AgitatedDove14 , thank you for your answer!
At the moment I can't configure both internal/external with the same dns. Before changing the server infrastructure, i'm trying a workaround where I upload the artifact with the internal file server path, and then I upload a string artifact which is the first artifact url where I replace the internal dns with the external dns, and use it to download the artifact from the UI.
Yes it was set to nvidia/cuda:10.1-runtime-ubuntu18.04... ok I'll try again and see if that was the problem, thank you
Hi  AgitatedDove14 ,  FriendlySquid61 ! I managed to grant permission to the AWS autoscaler to spin instances using the instance profile as suggested by  FriendlySquid61 . The instances are created and terminated correclty, however the new instances don't executed the queued task and shutdown immediately. I noticed that the clearml credential atself.web_server = Session.get_app_server_host()self.api_server = Session.get_api_server_host()
`         self.files_server = S...
I also removed 'sudo' from all the commands as is suggested in https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html but that wasn't the cause of the problem
Hi AgitatedDove14 , what I meant is that if it is possible to associate ec2 instances of the autoscaler to a IAM role in order to grant permissions to applications running on that instances, which could be for example the access to a s3 buckets that can be accessed only with a certain IAM role permissions. I'm not completely sure that what I'm saying makes sense, but I refer to something similar as it's specified here https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role...
Hi  AgitatedDove14 , I noticed that in the Hydra parameters section it is not possible to add as parameters keys string with dots:  .(dot) $(dollar) and space are not allowed in parameter key.   However, it's very useful to add parameters with the dot to change something in a sub-configuration as, for example,  training.max_epochs=10  . Do you think it's possible to allow this?
Please let me know if my explanation is not really clear
AgitatedDove14 that seems like the best option. Once the aws autoscaler is inside a docker container I can deploy it inside a kube pod or a job. This, however, requires that I slightly modify the clearml helm chart with the aws-autoscaler deployment, right?
` from clearml import Task
from dataclasses import dataclass
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf
@dataclass
class MySQLConfig:
host: str = "localhost"
port: int = 3306
@dataclass
class UserInterface:
title: str = "My app"
width: int = 1024
height: int = 768
@dataclass
class MyConfig:
db: MySQLConfig = MySQLConfig()
ui: UserInterface = UserInterface()
cs = ConfigStore.instance()
cs.store(name="config", n...
I created this toy example so you don't need any external conf files. Btw if I first launch the task as  python example.py port=80   than the task will print the message "Is this a webserver" correctly. If then in the UI I clone the same task, overrides the port with  ['port=43']  , for example, and run the experiment, I will still get the message "Is this a webserver" so the port didn't change
` # ClearML - Hydra Example
from clearml import Task
from dataclasses import dataclass
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf
@dataclass
class MySQLConfig:
host: str = "localhost"
port: int = 3306
cs = ConfigStore.instance()
Registering the Config class with the name 'config'.
cs.store(name="config", node=MySQLConfig)
@hydra.main(config_name="config")
def my_app(cfg: MySQLConfig) -> None:
# type (DictConfig) -> None
...