This https://discuss.elastic.co/t/index-size-explodes-after-split/150692 seems to say for the _split API such situation happens and solves itself after a couple fo days, maybe the same case for me?
Thanks! I would like to use this opportunity to split the indices into multiple shards, as explained here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html#indices-split-index
Ok, I got the following error when uploading the table as an artifact:ValueError('Task object can only be updated if created or in_progress')
So the problem comes when I domy_task.output_uri = "
s3://my-bucket , trains in the background checks if it has access to this bucket and it is not able to find/ read the creds
the reindexing operation showed no error and copied everything
Thanks! Unfortunately still not working, here is the log file:
What is this cleanup service? where is it available?
without the envs, I had error: ValueError: Could not get access credentials for '
s3://my-bucket ' , check configuration file ~/trains.conf
After using envs, I got error: ImportError: cannot import name 'IPV6_ADDRZ_RE' from 'urllib3.util.url'
PS: in the new env, I’v set num_replicas: 0, so I’m only talking about primary shards…
So the controller task finished and now only the second trains-agent services mode process is showing up as registered. So this is definitly something linked to the switching back to the main process.
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
I followed https://github.com/NVIDIA/nvidia-docker/issues/1034#issuecomment-520282450 and now it seems to be setting up properly
I killed both trains-agent and restarted one to have a clean start. This way it correctly spin up docker containers for services tasks. So probably the bug comes when a bug occurs while setting up a task, it cannot go back to the main task. I would need to do some tests to validate that hypothesis though
But I see in the agent logs:Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...
I got some progress TimelyPenguin76 , Now the task runs and I get the error from docker:docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
, causing it to unregister from the server (and thus not remain there).
Do you mean that the agent actively notifies the server that it is going down? or the server infers that the agent is down after a timeout?
Hi DeterminedCrab71 Version: 1.1.1-135 • 1.1.1 • 2.14
Alright I have a followup question then: I used the param --user-folder “~/projects/my-project”, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?
(I am not part of the awesome ClearML team, just a happy user 🙂 )
I will let the team answer you on that one 🙂
Will it freeze/crash/break/stop the ongoing experiments?
AgitatedDove14 In theory yes there is no downside, in practice running an app inside docker inside a VM might introduce slowdowns. I guess it’s on me to check whether this slowdown is negligible or not
Yes, it works now! Yay!