Reputation
Badges 1
981 × Eureka!It seems that around here, a Task that is created using init remotely in the main process gets its output_uri parameter ignored
But I see in the agent logs:Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)
That doesnβt work unfortunately
CostlyOstrich36 good enough, I will fallback to sorting by updated, thanks!
DeterminedCrab71 This is the behaviour of holding shift while selecting in Gmail, if ClearML could reproduce this, that would be perfect!
If I manually call report_matplotlib_figure yes. If I don't (just create the figure), no mem leak
that would work for pytorch and clearml yes, but what about my local package?
Something like that?
` curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"match": {
"variant": "loss_model"
}
},
{
"match": {
"task": "8f88e4b8cff84f23bde74ed4b7213ec6"
}
}
]
}
},
"aggs": {
"series": {
"terms": { "field": "iter" }
}
}
}...
trains-elastic container fails with the following error:
And so in the UI, in workers&queues tab, I see randomly one of the two experiments for the worker that is running both experiments
And now that I restarted the server and went back into the project where I initially deleted the archived experiments, some of them are still there - I will leave them alone, too scared to do anything now π
"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"
For new projects it works π
I think waiting for the apt locks to be released with something like this would workstartup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...Weirdly this throws an error in the autoscaler:
` Spinning new instance type=v100_spot
Error: Failed to start new instance, unexpected '{' in field...
what about the stacktrace of the error:Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]?
Very nice! Maybe we could have this option as a toggle setting in the user profile page, so that by default we keep the current behaviour, and users like me can change it π wdyt?
because I cannot locate libcudart or because cudnn_version = 0?
UnevenDolphin73 , task = clearml.Task.get_task(clearml.config.get_remote_task_id()) worked, thanks
AgitatedDove14 In my case I'd rather have it under the "Artifacts" tab because it is a big json file
To be fully transparent, I did a manual reindexing of the whole ES DB one year ago after it run out of space, at that point I might have changed the mapping to strict, but I am not sure. Could you please confirm that the mapping is correct?
Now I am trying to restart the cluster with docker-compose and specifying the last volume, how can I do that?
SuccessfulKoala55 Thanks! If I understood correctly, setting index.number_of_shards = 2 (instead of 1) would create a second shard for the large index, splitting it into two shards? This https://stackoverflow.com/a/32256100 seems to say that itβs not possible to change this value after the index creation, is it true?
I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?