Reputation
Badges 1
979 × Eureka!TimelyPenguin76 , no, I’ve only set the sdk.aws.s3.region = eu-central-1
param
Ok yes, I get it, this info is also available at the very beginning of the logs, where the agent logs the full docker run command, this docker_cmd is a shorter version?
Hi DeterminedCrab71 Version: 1.1.1-135 • 1.1.1 • 2.14
If I don’t start clearml-session
, I can easily connect to the agent, so clearml-session is doing something that messes up the ssh config and prevent me from ssh into the agent afterwards
So in my minimal reproducable example, it does work 🤣 very frustrating, I will continue searching for that nasty bug
AgitatedDove14 This seems to be consistent even if I specify the absolute path to /home/user/trains.conf
Also I can simply delete the /elastic_7 folder, I don’t use it anymore (I have a remote ES cluster). In that case, I guess I would have enough space?
AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4
and the ImportError disappeared 🙂
AppetizingMouse58 Yes and yes
my docker-compose for the master node of the ES cluster is the following:
` version: "3.6"
services:
elasticsearch:
container_name: clearml-elastic
environment:
ES_JAVA_OPTS: -Xms2g -Xmx2g
bootstrap.memory_lock: "true"
cluster.name: clearml-es
cluster.initial_master_nodes: clearml-es-n1, clearml-es-n2, clearml-es-n3
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
cluster.routing.allocation.disk.watermark.low: 500mb
clust...
AgitatedDove14 WOW, thanks a lot! I will dig into that 🚀
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
AgitatedDove14 yes but I don't see in the docs how to attach it to the logger of the earlystopping handler
I'll try to pass these values using the env vars
AgitatedDove14 How can I filter out tasks archived? I don't see this option
No I agree, it’s probably not worth it
Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
PS: in the new env, I’v set num_replicas: 0, so I’m only talking about primary shards…
I was rather wondering why clearml was taking space while I configured it to use the /data volume. But as you described AgitatedDove14 it looks like an edge case, so I don’t mind 🙂
AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)
I will let the team answer you on that one 🙂