Reputation
Badges 1
62 × Eureka!AgitatedDove14 orchestration module - what is this and where can I read more about it?
When you previously mention clone the Task IÂ the UI and then run it, how do you actually run it?
Very good question, I need to understand it what happens when I press "Enqueue" In web UI and set it to default queue
Thank you. I've changed clearml.conf, but url are remain with old ip. Do I need to restart ClearML or run any command to apply config changes?
No, even new started experiment is still creating images with 172.,
` cat ~/clearml.conf
ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server: `
and more logs 🙂 nice warning about dev server in production
` clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates.
clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning)
clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_tra...
SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?
ReassuredTiger98 why don't you take 5 minutes time and check out source code? https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/backend_interface/task/log.py
this is pretty obvious, it replaces last task with new task when the buffer is full
I don't know for sure but this is what I understand from the code. But you need to have 100 experiment running at the same time, so unless you have access to 100 GPUs you should be fine
from torch.utils.tensorboard import SummaryWriter
writer.add_figure('name',
figure=fig)
where fig is matplotlib
You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?
Thank you, I will try
Shay, you are correct, one of the docker is down. But don't they supposed to run as part of docker /opt/clearml/ docker-compose -f docker-compose.yml up
?
yes, I was using only experiments tab to compare scalars and see validation and train images and I can see that information
Hi Shay, thanks for reply
I just went by old path remembered in browser. Last week we updated client and server, they are both running on our physical server
No, I was always shutting down server. But if you can give me step by step how to clean install I will be happy to do it
` clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.app_sequence] ################ API Server initializing #####################
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Initializing database connections
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb host mongo
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb port 27017
clearml-apiserver | [2...
so I run /opt/clearml/ docker-compose -f docker-compose.yml up
just to make sure and this are errors that I am seeing
Hi, I solved this cut out of labels withfig.tight_layout() return fig
I save it to PC and it is not only UI issue. My guess is that it plt.fig is cropped or by SummaryWriter or by trains. Help me debug where is the problem, I will meanwhile try to see what this SummaryWriter does to plt plots
You can see the white-gray mesh on background that shows the end of the image. It is cropped in the middle of labels
` docker ps | grep clearml
bd8fb61e8684 allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f79c54472a6f allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp ...
and experiments now stuck in "Running" mode even when the train loop is finished
here are requirements from the repository that I was able to run hydra_example.py and that I have crash with my custom train.py
1 more interesting bug. After I changed my "train.py" in according to hydra_exampl.py I started getting errors in the end of experiment
` --- Logging error ---
2021-08-17 13:33:28
ValueError: I/O operation on closed file.
2021-08-17 13:33:28
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 200, in write
self._terminal._original_write(message) # noqa
2021-08-17 13:33:28
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger...