
Reputation
Badges 1
62 × Eureka!SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?
If it is the best practice to have 1 more docker with ClearML client - will be happy to set it up, but I see no particular benefit of spliting it out from nvidia docker that runs experiments
I can only assume that task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name)
is broken because it has to read config, and depending on where I run it it has no access to config. I will investigate this with my co-worker and let you know if we find solution.
One more important thing - I have nvidia based docker running on the ubuntu server (same one that hosts clearml server) and I am afraid that initiating task from command line and from ClearML web UI run in ...
`
cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/logging/init.py", line 1084, in emit
stream.write(msg + self.terminator)
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 141, in stdout__patched__write_
return StdStreamPatch._stdout_proxy.write(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-p...
sys.stdout.close() we have it 🙂 forget to mention
Previously I had general tab in Hyper Parameters, but now without this line I don't have it.
Shay, you are correct, one of the docker is down. But don't they supposed to run as part of docker /opt/clearml/ docker-compose -f docker-compose.yml up
?
Python 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] :: Anaconda, Inc. on linux
clearml.version
'1.0.5'
Ubuntu 20.04.1 LTS
so I run /opt/clearml/ docker-compose -f docker-compose.yml up
just to make sure and this are errors that I am seeing
` docker ps | grep clearml
bd8fb61e8684 allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f79c54472a6f allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp ...
Nevertheless, when I try to run my training code, that differs very little from the example, I can't copy and run it from UI and I even don't see hyper parameters in experiment results
` import os
import hydra
from hydra import utils
from utils.class_utils import instantiate
from omegaconf import DictConfig, OmegaConf
from clearml import Task
@hydra.main(config_path="conf", config_name="default")
def app(cfg):
run(cfg)
def run(cfg):
task = Task.init(project_name=cfg.project.name, t...
Hi AgitatedDove14 !
Thanks for your answers. Now I have a follow up. I was able to successfully run the experiment, copy it in UI and enqueue to default queue and see it complete.
and more logs 🙂 nice warning about dev server in production
` clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates.
clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning)
clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_tra...
yes, all runs on same machine on different dockers
Martin, thank you very much for your time and dedication, I really appreciate it
yes, I was using only experiments tab to compare scalars and see validation and train images and I can see that information
Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used
docker has access to all 4 GPUs with --gpus all flag and we specify in config on what cuda device(s) to run, in pytorch we can run more than 2 gpus
task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name) After discussion we have suspicion on using config before initing the task, can it cause any problems?
` Retrying (Retry(total=239, connect=240, read=239, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
Retrying (Retry(total=238, connect=240, read=238, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=24...
AgitatedDove14 orchestration module - what is this and where can I read more about it?
No, I was always shutting down server. But if you can give me step by step how to clean install I will be happy to do it
So now I did run with the example and I see hydra tab. Is the the expermient arg that I used to run it?python hydra_example.py experiment=gm_fl_dcl
and experiments now stuck in "Running" mode even when the train loop is finished
Jake, the only way that I know to run the agents is to run docker
` docker ps | grep clearml
2635cec202d9 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up 3 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f8d307913fe0 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up About a minute 0.0....
Ok, let me check it later today and come back with the results of the example app
now it is empty and I don't know where to find credentianl to connect one more docker client