Reputation
Badges 1
38 × Eureka!it looks like clearml-apiserver and clearml-fileserver are continually restarting
not entirely sure on this as we used the custom AMI solution, is there any documentation on it?
incidentally we turn off the server every evening as it's not used overnight, we've not faced issues with it starting up in the morning or noticed any data loss
thanks @<1523715084633772032:profile|AlertBlackbird30> this is really informative. Nothing seems to be particularly out of the ordinary though
3.7G /var/lib/
3.7G /var/lib/docker
3.0G /var/lib/docker/overlay2
followed by a whole load of files that are a few hundred KBs in size, nothing huge though
so am I right in thinking it's just the mount points that are missing?based on the output of df above
Is there a way you can allow our account to make a copy of the AMI and store it privately?
not yet, going to try and fix it today.
if I do a df I see this, which is concerning:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 928K 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.9G 13G 40% /
tmpfs 790M 0 790M 0% /run/user/1000
so it looks like the mount points are not created. When do these g...
Hi @<1523701205467926528:profile|AgitatedDove14> I tried this out, but I keep getting connection timeouts in the browser getting to the ELB. The instance is showing as inservice and passing the healthcheck. Is there any other configuration I need to do in the clearml.conf to make this work?
I added this to each of the containers
logging:
options:
max-file: 5
max-size: 10m
Thanks. Although it's AWS related, the context was with an error we see within clearml "ValueError: Insufficient permissions for None "
is there any documentation for connecting to an S3 bucket?
Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?
Hey @<1523701205467926528:profile|AgitatedDove14> I am helping Max to get this working. I ran the clearml-agent init and now have the correct entries in the clearml.conf file.
Created an ssh key from the agent, uploaded it to the git repo, but still getting this error:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
If I manually ssh on to the agent, and run:
`git clon...
After making the change yesterday to the docker-compose file, the server is completely unusable - this is all I see for the /dashboard screen
Thanks. Where is the configuration stored on the server? Currently we have deployed an EC2 instance using the marketplace AMI - if this demo is successful we would be looking at splitting the environment into the different AWS services - logging to S3, use of secrets manager, elasticsearch, redis, mongoDB etc
I've looked through the documentation, but didn't initially spot anything that would help with doing this (granted I may have overlooked something)
strange, I used one of the publicly available AMIs for ClearML (we did not upgrade from the Trains AMI as started fresh)
that should be the case, we have default_output_uri: set to an s3 bucket
thanks Stef, with max-size do you set it for every running service separately, or can you set it once?
Is it possible to use an IAM role rather than user credentials in the clearml.conf file?
Hi @<1523701205467926528:profile|AgitatedDove14>
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
have 2 listeners setup. LB 80 > instance 8080 and LB 443 > instance 8080
no, they are still rebooting. i've looked in /opt/clearml/logs/apiserver.log no errors
yep still referring to the S3 credentials, somewhat familiar with boto and IAM roles
Some ideas, not all directly related to this:
- Passwords should be encrypted before being stored
- A mechanism on the server application to add/remove users, avoiding having to SSH on to the server would be nice
- Some level of permissions in the application would be nice - Admin/Owner/Viewer restrictions which would dictate would users can do and give finer control