Reputation
Badges 1
19 × Eureka!Can you try to get the agent log?
Hi SmoothSheep78 ,
You might need to use volumes, as shown here:
https://stackoverflow.com/questions/54911021/unable-to-start-docker-mongo-image-on-windows
I cannot ssh into the machine
That's very strange - since the server runs in docker, I don't see how it can cause the EC2 instance to be unavailable - can you check the EC2 panel to see what might be the problem?
In any case, restarting the instance without shutting down the server in an orderly fashion always has the risk of damaging the database storage (mongo/elastic etc.)
That's the purpose of the Trains Agent Services mode - in a nutshell, you run an agent who's purpose if to run cpu-only maintenance tasks that take care of things like cleanup
In the new version, we made it so that the default agent credentials embedded in the ClearML Server are disabled is the server is not in the open mode (i.e. requires user/password to login). This is since having those default credentials available in this mode basically means anyone without a password can actually send commands to the server (since these credentials are hard-coded)
There's a reason for the ES index max size 😞
Cool! what permission?
Hi @<1574207113163444224:profile|ShallowCoyote86> , can you try to shut down the agent, clean the ~/,clearml folder containing the vcs cache, and run it again?
Hi UnevenDolphin73 - this seems to indicate mongo is down? Can you do docker ps
and see the containers status?
ShallowKitten67 this could happen if you're changing your task's status somewhere in your code - are you?
Hi ItchyJellyfish73 ,
Do you mean upgrading the workers themselves or upgrading the server they're connected to?
JoyousElephant80 regarding #1, it seems the current scripts used to build this container require some work in orde to be shared. We might add an option to configure apiserver
service name using an environment variable in the next release, would that be something of value?
Meanwhile, you can take a look at Docker-compose' extra_hosts
feature ( https://docs.docker.com/compose/compose-file/compose-file-v3/#extra_hosts ) which basically allows you to edit the hosts file inside the co...
How exactly did you configure your clearml.xonf file? Can you share the api section?
Hi SuperiorDucks36 ,
Priority within a queue is not currently supported in Trains Server. You could create different queues for different experiment types. You can also use a script to change the order of experiments within a queue.
Hi ReassuredTiger98 ,
I think the first things to do it to disable the cleanup service, until we figure this out 🙂
Hi @<1578193373062238208:profile|MammothShark25> , this index contains all the scalars reports from experiments. I would assume it is too large for the resources in your system and ES can't load it? Can you attach some ES logs?
If that's not the case, do you want to me create one on github?
I would appreciate that 🙂
You need to make sure you install them using the same python interpreter as the one the agent is using. Also, please note that if you're not running the agent in the --docker
mode, you will run into trouble if you run it from a virtual environment, as it creates a virtual environment to run the task, and the venvs are not nested as one might expect
both the SDK and the Agent use clearml.conf
Hi GentlePelican46 , you're right, the documentation needs to be fixed there. I'll make sure we work on that
Well, installing t manually would require a lot of work, including building the angular app, and installing various requirements for the different servers (apiserver, fileserver etc.) - without docker, each should be run in it's own virtual python environment
Also, you can simply install a VM and install the docker image inside...
Hi CheekyToad28 , nice job 🙂
In order to use other path from /opt/clearml I mad a little hack in the apiserver/config/basic.py and added another path for configs.
This can be achieved using the CLEARML_CONFIG_DIR
env var
I guess I could use some environment variable on the uwsgi files.
Out of curiosity, what would you need that var for, exactly, and what would it control?
So for some reason you have an issue pulling from dockerhub in your Network?