Reputation
Badges 1
981 × Eureka!I think waiting for the apt locks to be released with something like this would workstartup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...Weirdly this throws an error in the autoscaler:
` Spinning new instance type=v100_spot
Error: Failed to start new instance, unexpected '{' in field...
Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements , with this approach only the package required will be installed in the agent
Hi SuccessfulKoala55 , Yes it’s for the same host/bucket - I’ll try with a different browser
what about the stacktrace of the error:Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]?
Very nice! Maybe we could have this option as a toggle setting in the user profile page, so that by default we keep the current behaviour, and users like me can change it 😄 wdyt?
because I cannot locate libcudart or because cudnn_version = 0?
UnevenDolphin73 , task = clearml.Task.get_task(clearml.config.get_remote_task_id()) worked, thanks
AgitatedDove14 In my case I'd rather have it under the "Artifacts" tab because it is a big json file
To be fully transparent, I did a manual reindexing of the whole ES DB one year ago after it run out of space, at that point I might have changed the mapping to strict, but I am not sure. Could you please confirm that the mapping is correct?
Now I am trying to restart the cluster with docker-compose and specifying the last volume, how can I do that?
SuccessfulKoala55 Thanks! If I understood correctly, setting index.number_of_shards = 2 (instead of 1) would create a second shard for the large index, splitting it into two shards? This https://stackoverflow.com/a/32256100 seems to say that it’s not possible to change this value after the index creation, is it true?
I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?
So it looks like it tries to register a batch of 500 documents
I fixed, will push a fix in pytorch-ignite 🙂
In execution tab, I see old commit, in logs, I see an empty branch and the old commit
very cool, good to know, thanks SuccessfulKoala55 🙂
Thanks SuccessfulKoala55 ! So CLEARML_NO_DEFAULT_SERVER=1 by default, right?
What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?
AppetizingMouse58 Yes and yes
I am running on bare metal, and cuda seems to be installed at /usr/lib/x86_64-linux-gnu/libcuda.so.460.39