
Reputation
Badges 1
62 × Eureka!and more logs 🙂 nice warning about dev server in production
` clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates.
clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning)
clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_tra...
` Retrying (Retry(total=239, connect=240, read=239, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
Retrying (Retry(total=238, connect=240, read=238, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=24...
Shay, you are correct, one of the docker is down. But don't they supposed to run as part of docker /opt/clearml/ docker-compose -f docker-compose.yml up
?
so I run /opt/clearml/ docker-compose -f docker-compose.yml up
just to make sure and this are errors that I am seeing
I have firewall installed on the server and not all ports are open
curl: (7) Failed to connect to localhost port 9200: Connection refused
Hi David, where can I get these logs?
AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"
AppetizingMouse58 Thanks for the answer, sending the logs
` docker ps | grep clearml
bd8fb61e8684 allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f79c54472a6f allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp ...
`
cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/logging/init.py", line 1084, in emit
stream.write(msg + self.terminator)
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 141, in stdout__patched__write_
return StdStreamPatch._stdout_proxy.write(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-p...
Thank you. I've changed clearml.conf, but url are remain with old ip. Do I need to restart ClearML or run any command to apply config changes?
I can only assume that task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name)
is broken because it has to read config, and depending on where I run it it has no access to config. I will investigate this with my co-worker and let you know if we find solution.
One more important thing - I have nvidia based docker running on the ubuntu server (same one that hosts clearml server) and I am afraid that initiating task from command line and from ClearML web UI run in ...
docker has access to all 4 GPUs with --gpus all flag and we specify in config on what cuda device(s) to run, in pytorch we can run more than 2 gpus
Couple of words about our hydra config
it is located in root with train.py file. But the default config points to experiment folder with other configs and this is what I need to specify on every run
Hi AgitatedDove14 !
Thanks for your answers. Now I have a follow up. I was able to successfully run the experiment, copy it in UI and enqueue to default queue and see it complete.
task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name) After discussion we have suspicion on using config before initing the task, can it cause any problems?
Python 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] :: Anaconda, Inc. on linux
clearml.version
'1.0.5'
Ubuntu 20.04.1 LTS
Hi, I solved this cut out of labels withfig.tight_layout() return fig
When you previously mention clone the Task IÂ the UI and then run it, how do you actually run it?
Very good question, I need to understand it what happens when I press "Enqueue" In web UI and set it to default queue
If it is the best practice to have 1 more docker with ClearML client - will be happy to set it up, but I see no particular benefit of spliting it out from nvidia docker that runs experiments
sys.stdout.close() we have it 🙂 forget to mention
Nevertheless, when I try to run my training code, that differs very little from the example, I can't copy and run it from UI and I even don't see hyper parameters in experiment results
` import os
import hydra
from hydra import utils
from utils.class_utils import instantiate
from omegaconf import DictConfig, OmegaConf
from clearml import Task
@hydra.main(config_path="conf", config_name="default")
def app(cfg):
run(cfg)
def run(cfg):
task = Task.init(project_name=cfg.project.name, t...
Previously I had general tab in Hyper Parameters, but now without this line I don't have it.
Ok, let me check it later today and come back with the results of the example app
Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used