
Reputation
Badges 1
19 × Eureka!Can you attach the entire response (or preview)?
This basically means the WebApp is unable to reach the server, so you'll need to check if the server is indeed up and where the webapp is trying to connect to
That's because ClearML needs the hooks in that class to make sure any changes you make in that data structure are propagated back to the server when you're not in remote mode
Hi AverageRabbit65 ,
Any task (including pipeline steps) is always executed either on a machine with a clearml.conf
file, or by an agent which already has the clearml server address
Well, can you share some of the code (or at least the imports it has)? The SDK created the installed packages list to make sure the task will not fail when being run by the agent
Hi @<1600299043865497600:profile|MagnificentSeaurchin90> , this is indeed on our roadmap π
Hi π
Regarding #2 - the SSH keys for the host machines can simply be updated in GitLab/GitHub etc. so that each most machine (and so each docker executed on that machine that has the .ssh
dir mounted inside it) can clone any of your repositories
Hi GiganticTurtle0 ,
This comes related to another question, an agent is always executed in Docker container even if you do not specify any image in the arguments ofΒ
Task.create
Β orΒ
PipelineDecorator.component
?
This depends on how you start the ClearML Agent - if you use the --docker
command line option, it will use docker mode
@<1538330703932952576:profile|ThickSeaurchin47> sorry, I missed this somehow π - do you mean in the SDK? The SDK can choose to store models and artifacts on any object storage (not in the clearml server). The server itself only has the fileserver storage, but it is not up to the server which storage will be used - it's an SDK choice.
@<1533619716533260288:profile|SmallPigeon24> you can use the task's last_worker
property
Can you perhaps share a screenshot?
Hi ReassuredTiger98 , this is not yet possible in clearml-agent, but I'm sure you can add some cron job to handle that using the docker command line
JitteryCoyote63 thanks for the kind words π . If you can spare the time, I'd appreciate the ES logs - for the sake of other community members, I really like to make sure the migration process is as resilient and smooth as possible π
You can simply comment out the fileserver
service in the docker-compose file
And the task you are trying to run, does it have any specific docker image specified in the container section? (you can see in the UI under execution/container)
Hi MistakenStarfish81 ,
In the current version the UI is dependent on the Trains API Server component being available in port 8008
, which explains why you could access the UI itself but the UI wasn't able to access the underlying server API to create projects, receive credentials etc.
We plan to change that in future versions.
Meanwhile, you can use subdomains if you want more flexibility (see https://allegro.ai/docs/deploying_trains/trains_server_config/ )
Regarding the error itself, it seems to be some sort of permission issue - can you please show the Response
details for the two auth.create_credentials
calls shown in the list?
Also, are you using HTTP or HTTPS?
Bad requests meaning unanswered requests or returning with 4xx code?
Hi ImmensePenguin78 - what status are you looking for?
It's still worth mentioning that HTTPS termination is up to the deployment, and is highly recommended if you deploy your server in an open network.
Hi @<1805048176315469824:profile|DecayedRaccoon75> , what ClearML queue are you using to enqueue the tasks? Did you specify that queue in the agent chart values?
If you're running the agent in docker mode, and assuming you're running in GCP or some other cloud solution, you can theoretically use the custom bash script configuration option for the agent to pull a file from some sort of secrets vault solution provided by the cloud provider and place on the machine for the duration of the task execution (or set the secret in an env var)
I would first try curl
http://localhost:8008 from the server console (i.e. ssh)
but you can try reducing ES to ES_JAVA_OPTS: -Xms500mb -Xmx500mb
?
Hi AbruptElephant13 , start_locally()
should be available in ClearML 1.1.3rc0 (the latest pre-release) - can you try?
Thanks for the details, UnevenDolphin73 , and sorry for the inconvenience - we'll try to nail this down...
JitteryCoyote63 can you share the full log for both cases?
@<1533619716533260288:profile|SmallPigeon24> the intent behind queues supporting multiple workers is to have many consumers for that queue ("producer") - it does not mean multiple workers can pull the same task from the queue