Reputation
Badges 1
981 × Eureka!To be fully transparent, I did a manual reindexing of the whole ES DB one year ago after it run out of space, at that point I might have changed the mapping to strict, but I am not sure. Could you please confirm that the mapping is correct?
Now I am trying to restart the cluster with docker-compose and specifying the last volume, how can I do that?
SuccessfulKoala55 Thanks! If I understood correctly, setting index.number_of_shards = 2 (instead of 1) would create a second shard for the large index, splitting it into two shards? This https://stackoverflow.com/a/32256100 seems to say that it’s not possible to change this value after the index creation, is it true?
I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?
So it looks like it tries to register a batch of 500 documents
I fixed, will push a fix in pytorch-ignite 🙂
In execution tab, I see old commit, in logs, I see an empty branch and the old commit
very cool, good to know, thanks SuccessfulKoala55 🙂
Thanks SuccessfulKoala55 ! So CLEARML_NO_DEFAULT_SERVER=1 by default, right?
What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?
AppetizingMouse58 Yes and yes
Something was triggered, you can see the CPU usage starting right when the instance became unresponsive - maybe a merge operation from ES?
Very cool! Run two train-agent daemons, one per GPU on the same machine, with default Nvidia/CUDA Docker This is close to my use case, I just would like to run these two daemons not with docker, would that be possible? I should just remove the --docker nvidia/cuda param right?
yes, the new project is the one where I changed the layout and that gets reset when I move an experiment there
AMI ami-08e9a0e4210f38cb6 , region: eu-west-1a
trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 1 --queue default &
Well, as long as you’re using a single node, it should indeed alleviate the shard disk size limit, but I’m not sure ES will handle that too well. In any case, you can’t change that for existing indices, you can modify the mapping template and reindex the existing index (you’ll need to index to another name, delete the original and create an alias to the original name as the new index can’t be renamed...)
Ok thanks!
Well, as long as you use a single node, multiple shards offer no sca...
and with this setup I can use GPU without any problem, meaning that the wheel does contain the cuda runtime
RuntimeError: CUDA error: no kernel image is available for execution on the device
Yes, I am preparing them 🙂
There’s a reason for the ES index max size
Does ClearML enforce a max index size? what typically happens when that limit is reached?
So the controller task finished and now only the second trains-agent services mode process is showing up as registered. So this is definitly something linked to the switching back to the main process.