Reputation
Badges 1
62 × Eureka!I don't know for sure but this is what I understand from the code. But you need to have 100 experiment running at the same time, so unless you have access to 100 GPUs you should be fine
curl: (7) Failed to connect to localhost port 9200: Connection refused
AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"
doesn't fit in 1 message in slack
I did my best in explanation.
You have buffer of tasks, for example 100. When you add task #101 the task under #1 is replaced with new and you keep now tasks from #2 to #101.
Because I have > 100 saved experiment, I don't think that anyone should bother to change it, unless you are running more than 100 experiments at the same time
No, even new started experiment is still creating images with 172.,
` cat ~/clearml.conf
ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server: `
and more logs 🙂 nice warning about dev server in production
` clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates.
clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning)
clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_tra...
` clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.app_sequence] ################ API Server initializing #####################
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Initializing database connections
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb host mongo
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb port 27017
clearml-apiserver | [2...
` Retrying (Retry(total=239, connect=240, read=239, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
Retrying (Retry(total=238, connect=240, read=238, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=24...
now it is empty and I don't know where to find credentianl to connect one more docker client
here are requirements from the repository that I was able to run hydra_example.py and that I have crash with my custom train.py
Hi Martin, thank you for your reply.
Could you please show an example about image title/series?
My have names like Epoch_001_first_batch_train, Epoch_001_first_batch_val,
Epoch_001_first_batch_val_balanced,
Epoch_002_first_batch_train, and so on
Hi David, where can I get these logs?
ReassuredTiger98 why don't you take 5 minutes time and check out source code? https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/backend_interface/task/log.py
this is pretty obvious, it replaces last task with new task when the buffer is full
AppetizingMouse58 Thanks for the answer, sending the logs
We have physical server in server farm that we configure with 4 GPUs, so we run all on this hardware without cloud rent
I have firewall installed on the server and not all ports are open
from torch.utils.tensorboard import SummaryWriterwriter.add_figure('name', figure=fig)
where fig is matplotlib
so I run /opt/clearml/ docker-compose -f docker-compose.yml up just to make sure and this are errors that I am seeing
Jake, the only way that I know to run the agents is to run docker
` docker ps | grep clearml
2635cec202d9 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up 3 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f8d307913fe0 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up About a minute 0.0....
Hi, I solved this cut out of labels withfig.tight_layout() return fig
Shay, you are correct, one of the docker is down. But don't they supposed to run as part of docker /opt/clearml/ docker-compose -f docker-compose.yml up ?
I save it to PC and it is not only UI issue. My guess is that it plt.fig is cropped or by SummaryWriter or by trains. Help me debug where is the problem, I will meanwhile try to see what this SummaryWriter does to plt plots
Thank you very much it worked! I hope I will never see this kind of bug, will be happy to give more feedback if you would like to find a rootcause
Hi Shay, thanks for reply
I just went by old path remembered in browser. Last week we updated client and server, they are both running on our physical server
` docker ps | grep clearml
bd8fb61e8684 allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f79c54472a6f allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp ...