Reputation
Badges 1
981 × Eureka!AgitatedDove14 Yes exactly! it is shown in the recording above
But I would need to reindex everything right? Is that a expensive operation?
This https://discuss.elastic.co/t/index-size-explodes-after-split/150692 seems to say for the _split API such situation happens and solves itself after a couple fo days, maybe the same case for me?
I checked the commit date anch and went to all experiments, and scrolled until finding the experiment
SuccessfulKoala55
In the docker-compose file, you have an environment setting for the apiserver service host and port (CLEARML_ELASTIC_SERVICE_HOST and CLEARML_ELASTIC_SERVICE_PORT) - changing those will allow you to point the server to another ES service
The ES cluster is running in another machine, how can I set its IP in CLEARML_ELASTIC_SERVICE_HOST ? I would need to add host to the networks of the apiserver service somehow? How can I do that?
SuccessfulKoala55 Am I doing/saying something wrong regarding the problem of flushing every 5 secs (See my previous message)
I donât think it is, I was rather wondering how you handled it to understand potential sources of slow down in the training code
I can also access these files directly if I enter the url in the browser
there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent wonât start because the userdata script fails
Ok, so what worked for me in the end was:config = task.connect_configuration(read_yaml(conf_path)) cfg = OmegaConf.create(config._to_dict())
but most likely I need to update the perms of /data as well
without the envs, I had error: ValueError: Could not get access credentials for ' s3://my-bucket ' , check configuration file ~/trains.conf After using envs, I got error: ImportError: cannot import name 'IPV6_ADDRZ_RE' from 'urllib3.util.url'
They indeed do auto-rotate when you limit the size of the logs
Ha I see, it is not supported by the autoscaler > https://github.com/allegroai/clearml/blob/282513ac33096197f82e8f5ed654948d97584c35/trains/automation/aws_auto_scaler.py#L120-L125
The host is accessible, I can ping it and even run curl " http://internal-aws-host-name:9200/_cat/shards " and get results from the local machine
Ok, in that case it probably doesnât work, because if the default value is 10 secs, it doesnât match what I get in the logs of the experiment: every second the tqdm adds a new line
Thanks for sharing the issue UnevenDolphin73 , Iâll comment on it!
I am not sure I can do both operations at the same time (migration + splitting), do you think itâs better to do splitting first or migration first?
in the controller, I want to upload an artifact and start a task that will query that artifact and I want to make sure that the artifact exists when the task will try to retrieve it
I execute the clearml-agent this way:/home/machine/miniconda3/envs/py36/bin/python3 /home/machine/miniconda3/envs/py36/bin/clearml-agent daemon --services-mode --cpu-only --queue services --create-queue --log-level DEBUG --detached
Hi CostlyOstrich36 , I am not using Hydra, only OmegaConf, so you mean just calling OmegaConf.load should be enough?
Yea thats what I thought, I do have trains server 0.15
I think clearml-agent tries to execute /usr/bon/python3.6 to start the task, instead of using the python used to start clearml-agent
My bad, alpine is so light it doesnt have bash
AgitatedDove14 So what you are saying is that since I have trains-server 0.16.1, I should use trains>=0.16.1? And what about trains-agent? Only version 0.16 is released atm, this is the one I use
"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"
AgitatedDove14 It was only on comparison as far as I remember