Reputation
Badges 1
62 × Eureka!curl: (7) Failed to connect to localhost port 9200: Connection refused
doesn't fit in 1 message in slack
You can see the white-gray mesh on background that shows the end of the image. It is cropped in the middle of labels
I save it to PC and it is not only UI issue. My guess is that it plt.fig is cropped or by SummaryWriter or by trains. Help me debug where is the problem, I will meanwhile try to see what this SummaryWriter does to plt plots
Hi, I solved this cut out of labels withfig.tight_layout() return fig
do you think if we manually delete folder /opt/clearml/data/
that would solve this problem same way?
SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?
Hi David, where can I get these logs?
I have firewall installed on the server and not all ports are open
AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"
AppetizingMouse58 Thanks for the answer, sending the logs
Ok, let me check it later today and come back with the results of the example app
So now I did run with the example and I see hydra tab. Is the the expermient arg that I used to run it?python hydra_example.py experiment=gm_fl_dcl
Nevertheless, when I try to run my training code, that differs very little from the example, I can't copy and run it from UI and I even don't see hyper parameters in experiment results
` import os
import hydra
from hydra import utils
from utils.class_utils import instantiate
from omegaconf import DictConfig, OmegaConf
from clearml import Task
@hydra.main(config_path="conf", config_name="default")
def app(cfg):
run(cfg)
def run(cfg):
task = Task.init(project_name=cfg.project.name, t...
1 more interesting bug. After I changed my "train.py" in according to hydra_exampl.py I started getting errors in the end of experiment
` --- Logging error ---
2021-08-17 13:33:28
ValueError: I/O operation on closed file.
2021-08-17 13:33:28
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 200, in write
self._terminal._original_write(message) # noqa
2021-08-17 13:33:28
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger...
`
cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/logging/init.py", line 1084, in emit
stream.write(msg + self.terminator)
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 141, in stdout__patched__write_
return StdStreamPatch._stdout_proxy.write(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-p...
here are requirements from the repository that I was able to run hydra_example.py and that I have crash with my custom train.py
and experiments now stuck in "Running" mode even when the train loop is finished
yes, all runs on same machine on different dockers
Couple of words about our hydra config
it is located in root with train.py file. But the default config points to experiment folder with other configs and this is what I need to specify on every run
Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used
Martin, thank you very much for your time and dedication, I really appreciate it
AgitatedDove14 orchestration module - what is this and where can I read more about it?
sys.stdout.close() we have it 🙂 forget to mention
If it is the best practice to have 1 more docker with ClearML client - will be happy to set it up, but I see no particular benefit of spliting it out from nvidia docker that runs experiments
task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name) After discussion we have suspicion on using config before initing the task, can it cause any problems?
docker has access to all 4 GPUs with --gpus all flag and we specify in config on what cuda device(s) to run, in pytorch we can run more than 2 gpus
I can only assume that task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name)
is broken because it has to read config, and depending on where I run it it has no access to config. I will investigate this with my co-worker and let you know if we find solution.
One more important thing - I have nvidia based docker running on the ubuntu server (same one that hosts clearml server) and I am afraid that initiating task from command line and from ClearML web UI run in ...