
Reputation
Badges 1
62 × Eureka!I did my best in explanation.
You have buffer of tasks, for example 100. When you add task #101 the task under #1 is replaced with new and you keep now tasks from #2 to #101.
Because I have > 100 saved experiment, I don't think that anyone should bother to change it, unless you are running more than 100 experiments at the same time
so I run /opt/clearml/ docker-compose -f docker-compose.yml up
just to make sure and this are errors that I am seeing
Hi David, where can I get these logs?
SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?
curl: (7) Failed to connect to localhost port 9200: Connection refused
AppetizingMouse58 Thanks for the answer, sending the logs
Thank you very much it worked! I hope I will never see this kind of bug, will be happy to give more feedback if you would like to find a rootcause
doesn't fit in 1 message in slack
AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"
do you think if we manually delete folder /opt/clearml/data/
that would solve this problem same way?
You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?
Thank you, I will try
AgitatedDove14 I think Tim wanted to know what is task_log_buffer_capacity
and what functionality it provides
If it is the best practice to have 1 more docker with ClearML client - will be happy to set it up, but I see no particular benefit of spliting it out from nvidia docker that runs experiments
Hi, I solved this cut out of labels withfig.tight_layout() return fig
You can see the white-gray mesh on background that shows the end of the image. It is cropped in the middle of labels
task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name) After discussion we have suspicion on using config before initing the task, can it cause any problems?
I save it to PC and it is not only UI issue. My guess is that it plt.fig is cropped or by SummaryWriter or by trains. Help me debug where is the problem, I will meanwhile try to see what this SummaryWriter does to plt plots
Hi Martin, thank you for your reply.
Could you please show an example about image title/series?
My have names like Epoch_001_first_batch_train, Epoch_001_first_batch_val,
Epoch_001_first_batch_val_balanced,
Epoch_002_first_batch_train, and so on
ReassuredTiger98 why don't you take 5 minutes time and check out source code? https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/backend_interface/task/log.py
this is pretty obvious, it replaces last task with new task when the buffer is full
` clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.app_sequence] ################ API Server initializing #####################
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Initializing database connections
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb host mongo
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb port 27017
clearml-apiserver | [2...
and more logs 🙂 nice warning about dev server in production
` clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates.
clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning)
clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_tra...
Jake, the only way that I know to run the agents is to run docker
` docker ps | grep clearml
2635cec202d9 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up 3 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f8d307913fe0 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up About a minute 0.0....
No, even new started experiment is still creating images with 172.,
` cat ~/clearml.conf
ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server: `
Thank you. I've changed clearml.conf, but url are remain with old ip. Do I need to restart ClearML or run any command to apply config changes?
TimelyPenguin76 Thank you for posting this. I just realized that I changed wrong config. I changed the one on server, but I needed to change the one inside the docker container. Now all works. Thanks for help!
I have firewall installed on the server and not all ports are open
Nevertheless, when I try to run my training code, that differs very little from the example, I can't copy and run it from UI and I even don't see hyper parameters in experiment results
` import os
import hydra
from hydra import utils
from utils.class_utils import instantiate
from omegaconf import DictConfig, OmegaConf
from clearml import Task
@hydra.main(config_path="conf", config_name="default")
def app(cfg):
run(cfg)
def run(cfg):
task = Task.init(project_name=cfg.project.name, t...