Hi JealousSealion33 . We are planning on releasing a clearml k8s package very soon. I hope this is something that you will be able to use. If not, could you please elaborate on the issue you are facing and share the error you are receiving?
AbruptWorm50 - just to make sure there is no misunderstanding - the last image you sent is on the "training" queue and not on the "services" queue. Are there free agents running on that queue?
AbruptWorm50 - does the issue still occur, or did you manage to resolve it?
Looking at the 2nd image you sent, I see that in addition to "services" queue, you also have queues called "training" and "training*_2" - and the experiments you circled is in the "training" queue. In that image - there are no experiments in the services queue.
If you press on the "services" queue (like you did in the first image) you can view the experiments in the queue and the workers. Can you check if there is a situation where there are tasks pending in that queue while one of the work...
AbruptWorm50 - the agent poll the queue, so any free agent can pull tasks. From the graph on the right, it looks like experiments were not waiting in the queue (max experiments is 1, and it was immediately pulled). Can you also check what happens if two experiments are enqueued at the same time?
SmallBluewhale13 - plot comparing only shows the last iteration, where as the individual info screen displayed the last 5 iterations. Could this explain the issue?
@<1571308003204796416:profile|HollowPeacock58> - there indeed an issue with connectivity to the payment service. I'll update when it is resolved
Hi @<1526371986278715392:profile|VivaciousReindeer64> ,
I replicated your environment, and found the following solution:
- In the docker-compose, add the following to the webserver section:
volumes:
- /opt/clearml/config:/mnt/external_files/configs
- In the host machine, create a file
/opt/clearml/config/configuration.json
containing the following:
{
"displayedServerUrls": {
"filesServer": "
"
}
}
- restart the docker-compose: `sudo docker-...
sudo docker logs clearml-webserver
@<1526371986278715392:profile|VivaciousReindeer64> - please check the following:
- What do you get if you go to http//192.168.1.145:8080/configuration.json?
- Can you check the log of the webserver docker (using
sudo docker logs clearml-webserver
) - especially the beginning? Does it say anything about the fileBaseUrl?
VivaciousReindeer64 - you can try to add to the following to the webserver service environment section:
WEBSERVER__fileBaseUrl= https://192.168.1.145:8081
@<1526371986278715392:profile|VivaciousReindeer64> - I'll check it on my env
Hi @<1526371986278715392:profile|VivaciousReindeer64> - please try this file:
Make sure restart the compose ( docker-compose up -d
)
@<1526371986278715392:profile|VivaciousReindeer64> - I added this and it worked:
- WEBSERVER__fileBaseUrl="
"
Maybe the quotes are required.
@<1526371986278715392:profile|VivaciousReindeer64> - Yes - please send me the docker-compose and the log of the webserver
you can re-direct it to a file or just copy with the clipboard
Hi DepressedChimpanzee34 . Thanks for the info.
Plotly animation is currently not supported in the web app. We're looking into adding this into a future version. Can you describe the use case of animations in your system?
DrainedHippopotamus42 - once you send more metrics, the calculation will be correct. The problem only occurs when you delete all the metrics.
The artifacts (debug samples, models, artifacts) deletion is currently performed done from the browser, after the task object is deleted from the backend. So it might take time for all the artifacts to be deleted - please make sure not to close your browser.
Note that only artifacts saved on fileserver are deleted - exteranal artifacts (in the cloud or on a local filesystem) are not deleted.
We are working on moving the deletion to the server side, to avoid these kind of issues. Should be in ...
Hi ThoughtfulGorilla90 - when did you perform the deletion. It might take up to 24 hours for the application to get the updated size
Hi ThoughtfulGorilla90 , how did you delete the experiments/models - was it from the web app or using the SDK?
Hi @<1570583227918192640:profile|FloppySwallow46> . We've update the rate limits. Can you please check if the issue is still occurring?
We limit the allowed calls per IP - to make sure the server is not blocked accidentally. We enabled over 1000 calls per minute.
EnviousStarfish54 - Got it!
I'll look into it and update you. Thanks for reporting!
@<1564060248291938304:profile|AmusedParrot89> - I see the logic in displaying the last iteration per metric in the compare screen. We will need to think if this won't cause any other issues.
In the mean time - may I ask you to open a github issue - so it will be easier to track?
@<1566596960691949568:profile|UpsetWalrus59> - if you could also paste the payload and response of the call to events.get_multi_task_plots
- from the network tab of the browser's dev-tools (F12), this might also help understand
@<1566596960691949568:profile|UpsetWalrus59> - please note that if you report the plots as two separate series of the same metric - it should work better
Hi DepressedChimpanzee34 .
Currently supporting plotly animations is not planned for the upcoming versions. I would suggest opening a feature request in github, or add it to the UI change request thread: https://github.com/allegroai/clearml/issues/81 .
Thanks,
Oren.
@<1558986867771183104:profile|ShakyKangaroo32> - if you check the api section in your client-side clearml.conf, the value for files_server
there should be the same one that you set in the .env
file on the server. Can you check that they are indeed the same?
If they are the same - can you please send me the output of the following command in the server:sudo docker logs -n30 async_delete