Reputation
Badges 1
62 × Eureka!ReassuredTiger98 why don't you take 5 minutes time and check out source code? https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/backend_interface/task/log.py
this is pretty obvious, it replaces last task with new task when the buffer is full
yes, all runs on same machine on different dockers
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
Very good question, I need to understand it what happens when I press "Enqueue" In web UI and set it to default queue
Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used
Hi, I solved this cut out of labels withfig.tight_layout() return fig
I can only assume that task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name)
is broken because it has to read config, and depending on where I run it it has no access to config. I will investigate this with my co-worker and let you know if we find solution.
One more important thing - I have nvidia based docker running on the ubuntu server (same one that hosts clearml server) and I am afraid that initiating task from command line and from ClearML web UI run in ...
here are requirements from the repository that I was able to run hydra_example.py and that I have crash with my custom train.py
You can see the white-gray mesh on background that shows the end of the image. It is cropped in the middle of labels
We have physical server in server farm that we configure with 4 GPUs, so we run all on this hardware without cloud rent
Couple of words about our hydra config
it is located in root with train.py file. But the default config points to experiment folder with other configs and this is what I need to specify on every run
Thank you. I've changed clearml.conf, but url are remain with old ip. Do I need to restart ClearML or run any command to apply config changes?
Hi Shay, thanks for reply
I just went by old path remembered in browser. Last week we updated client and server, they are both running on our physical server
No, I was always shutting down server. But if you can give me step by step how to clean install I will be happy to do it
from torch.utils.tensorboard import SummaryWriter
writer.add_figure('name',
figure=fig)
where fig is matplotlib
Shay, you are correct, one of the docker is down. But don't they supposed to run as part of docker /opt/clearml/ docker-compose -f docker-compose.yml up
?
AppetizingMouse58 Thanks for the answer, sending the logs
curl: (7) Failed to connect to localhost port 9200: Connection refused
doesn't fit in 1 message in slack
do you think if we manually delete folder /opt/clearml/data/
that would solve this problem same way?
SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?
Hi David, where can I get these logs?
I have firewall installed on the server and not all ports are open
Thank you very much it worked! I hope I will never see this kind of bug, will be happy to give more feedback if you would like to find a rootcause
AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"
AgitatedDove14 orchestration module - what is this and where can I read more about it?