
Reputation
Badges 1
38 × Eureka!yep, in most of them:
/opt/clearml/config
apiserver.conf
clearml.conf
/opt/clearml/data/elastic_7
/nodes
/opt/clearml/data/fileserver
<empty>
/opt/clearml/data/mongo/configdb
<empty>
/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc
/opt/clearml/data/redis
dump.rdb
/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
no, they are still rebooting. i've looked in /opt/clearml/logs/apiserver.log
no errors
or have I got this wrong, and it's the clearml-agent that needs to read/write to S3?
incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss
not yet, going to try and fix it today.
if I do a df
I see this, which is concerning:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 928K 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.9G 13G 40% /
tmpfs 790M 0 790M 0% /run/user/1000
so it looks like the mount points are not created. When do these g...
thanks, i'll try that out
ok i'll try that out thanks
Hi @<1523701205467926528:profile|AgitatedDove14>
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though
3.7G /var/lib/
3.7G /var/lib/docker
3.0G /var/lib/docker/overlay2
followed by a whole load of files that are a few hundred KBs in size, nothing huge though
it looks like clearml-apiserver
and clearml-fileserver
are continually restarting
Is there a way you can allow our account to make a copy of the AMI and store it privately?
I added this to each of the containers
logging:
options:
max-file: 5
max-size: 10m
that should be the case, we have default_output_uri:
set to an s3 bucket
no, that's what i'm trying to do
Morning, we got to 100% used which is what triggered this investigation. When we initially looked at overlay2 it was using 8GB, so now seems to be acceptable.
Hi @<1523701205467926528:profile|AgitatedDove14> I tried this out, but I keep getting connection timeouts in the browser getting to the ELB. The instance is showing as inservice and passing the healthcheck. Is there any other configuration I need to do in the clearml.conf to make this work?
Is it possible to use an IAM role rather than user credentials in the clearml.conf file?
is there any documentation for connecting to an S3 bucket?
yep still referring to the S3 credentials, somewhat familiar with boto and IAM roles
After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?
Yep i've done all that, it didn't seem to work until I set the deploy key to write