
Reputation
Badges 1
38 × Eureka!thanks Stef, with max-size
do you set it for every running service separately, or can you set it once?
After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
yep, in most of them:
/opt/clearml/config
apiserver.conf
clearml.conf
/opt/clearml/data/elastic_7
/nodes
/opt/clearml/data/fileserver
<empty>
/opt/clearml/data/mongo/configdb
<empty>
/opt/clearml/data/mongo/db
collection/index files, /diagnostic.data, /journal etc
/opt/clearml/data/redis
dump.rdb
/opt/clearml/logs
apiserver.log.x, filserver.log (0 bytes)
yep still referring to the S3 credentials, somewhat familiar with boto and IAM roles
all sorted, I somehow missed the documentation about the mongodb migration
incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss
or have I got this wrong, and it's the clearml-agent that needs to read/write to S3?
it looks like clearml-apiserver
and clearml-fileserver
are continually restarting
Thanks. Although it's AWS related, the context was with an error we see within clearml "ValueError: Insufficient permissions for None "
not yet, going to try and fix it today.
if I do a df
I see this, which is concerning:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 928K 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.9G 13G 40% /
tmpfs 790M 0 790M 0% /run/user/1000
so it looks like the mount points are not created. When do these g...
I added this to each of the containers
logging:
options:
max-file: 5
max-size: 10m
strange, I used one of the publicly available AMIs for ClearML (we did not upgrade from the Trains AMI as started fresh)
ok i'll try that out thanks
Hi @<1523701205467926528:profile|AgitatedDove14>
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though
3.7G /var/lib/
3.7G /var/lib/docker
3.0G /var/lib/docker/overlay2
followed by a whole load of files that are a few hundred KBs in size, nothing huge though
that should be the case, we have default_output_uri:
set to an s3 bucket
no, that's what i'm trying to do
our setup currently consists of an EC2 instance for clearml-server and one for clearml-agent. We're not using a load balancer at the moment.
not entirely sure on this as we used the custom AMI solution, is there any documentation on it?
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Yep i've done all that, it didn't seem to work until I set the deploy key to write
Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?
Hey @<1523701205467926528:profile|AgitatedDove14> I am helping Max to get this working. I ran the clearml-agent init
and now have the correct entries in the clearml.conf file.
Created an ssh key from the agent, uploaded it to the git repo, but still getting this error:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
If I manually ssh on to the agent, and run:
`git clon...
so am I right in thinking it's just the mount points that are missing?based on the output of df
above
thanks, i'll try that out
Morning, we got to 100% used which is what triggered this investigation. When we initially looked at overlay2 it was using 8GB, so now seems to be acceptable.