Try configuring the following and tell me if this helps 🙂
In your ~/clearml.conf
configure the following:
sdk.development.default_output_uri: "<S3_bucket>"
I think I found what you need 🙂
https://clear.ml/docs/latest/docs/references/sdk/task#get_parametersget_parameters(cast=True)
Hi @<1673501397007470592:profile|RelievedDuck3> , no 🙂
You can restore these tasks by copying or moving them from task__trash into task collection. But the events for these tasks cannot be restored. About the user who deleted them unfortunately ClearML does not record this info in Mongo and without logging to ES there is no place to retrieve it (I can suggest using Kibana to monitor ES). You can try to inspect the mongo collection url_to_delete. It contains all the links from the deleted tasks that should be removed from the fileserver. If you se...
Hi @<1523701083040387072:profile|UnevenDolphin73> , this is the K8s integration. You can find more here:
None
GrievingTurkey78 , can you try disabling the cpu/gpu detection?
Hi @<1529271098653282304:profile|WorriedRabbit94> , I'll ask the guys to take a look at this and what is required for it.
Hi JitteryCoyote63 , I think you can click one of the debug samples to enlarge it. Then you will have a scroll bar to get to your needed Iteration. Does that help?
Hi @<1787291173992271872:profile|BlandCormorant75> , how are you logging those plots? Can you provide a stand alone snippet that reproduces your behaviour?
CluelessElephant89 , did you run the vm.max_map_count
command for elastic? Also what amount of RAM memory do you have on the machine you're running on?
Can you try going into the docker and verifying you have the same clearml.conf
inside?
Hi @<1585441130525233152:profile|TrickyGoose45> , you mean that the mongodb you set up is not part of the docker compose?
Can you share the elastic part of your docker container? Are you using any overrides?
There is an options for a configurations vault in the Scale/Enteprises licenses - basically applying global settings without having to edit clearml.conf
Hi @<1571308079511769088:profile|GentleParrot65> , ideally you shouldn't be terminating instances manually. However you mean that the autoscaler spins down a machine and still recognizes it as running and refuses to spin up a new machine?
@<1523701295830011904:profile|CluelessFlamingo93> , just so I understand - you want to upload a string as the artifact?
Hi @<1569133676640342016:profile|MammothPigeon75> , I believe such SLURM integration of what you described is supported on ClearML Scale/Enterprise versions
Hi EcstaticBaldeagle77 ,
I'm not sure I follow. Are you using the self hosted server - and you'd like to move data from one self hosted server to another?
Hi HungryArcticwolf62 ,
from what I understand you simply want to access models afterwards - correct me if I'm wrong.
What I think would solve your problem is the following:task = Task.init(...., output_uri=True)
This should upload the model to the server and thus make it accessible by other entities within the system.
Am I on track?
Hi @<1523701122311655424:profile|VexedElephant56> , I think is achievable with Slurm + ClearML, however I don't think something like this out of the box exists
What is the best way to achieve that please?
I think you would need to edit the webserver code to change iterations to epochs in the naming of the x axis
CrookedWalrus33 , Hi 🙂
Can you please provide which packages are missing?
I think so, yes
SucculentBeetle7 please give an example of the path that is given to you by the web interface :)
LethalCentipede31 , it appears we had an internal issue with a load balancer, it was fixed a couple of minutes after your comment 🙂
Does it go back to working if you revert the changes?
I would guess sosudo docker logs --follow trains-webserver
Hi @<1539417873305309184:profile|DangerousMole43> , I think you can do it if add some code to the pipeline controller to extract the console logs from a failed step