Reputation
Badges 1
62 × Eureka!do you think if we manually delete folder /opt/clearml/data/ that would solve this problem same way?
You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?
Thank you, I will try
and more logs 🙂 nice warning about dev server in production
` clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates.
clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning)
clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_tra...
AppetizingMouse58 all is Linux. Or idea was to run docker on same server to initiate tasks from UI but it was taking to much time so we give up and still do "python train.py experiment=myexpname"
and experiments now stuck in "Running" mode even when the train loop is finished
Thank you. I've changed clearml.conf, but url are remain with old ip. Do I need to restart ClearML or run any command to apply config changes?
Jake, the only way that I know to run the agents is to run docker
` docker ps | grep clearml
2635cec202d9 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up 3 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver
f8d307913fe0 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up About a minute 0.0....
` clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.app_sequence] ################ API Server initializing #####################
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Initializing database connections
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb host mongo
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb port 27017
clearml-apiserver | [2...
Nevertheless, when I try to run my training code, that differs very little from the example, I can't copy and run it from UI and I even don't see hyper parameters in experiment results
` import os
import hydra
from hydra import utils
from utils.class_utils import instantiate
from omegaconf import DictConfig, OmegaConf
from clearml import Task
@hydra.main(config_path="conf", config_name="default")
def app(cfg):
run(cfg)
def run(cfg):
task = Task.init(project_name=cfg.project.name, t...
Python 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] :: Anaconda, Inc. on linux
clearml.version
'1.0.5'
Ubuntu 20.04.1 LTS
1 more interesting bug. After I changed my "train.py" in according to hydra_exampl.py I started getting errors in the end of experiment
` --- Logging error ---
2021-08-17 13:33:28
ValueError: I/O operation on closed file.
2021-08-17 13:33:28
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 200, in write
self._terminal._original_write(message) # noqa
2021-08-17 13:33:28
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger...
Ok, let me check it later today and come back with the results of the example app
No, even new started experiment is still creating images with 172.,
` cat ~/clearml.conf
ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server: `
doesn't fit in 1 message in slack
SweetBadger76 sorry to tag you but I dont know where to find logs. Do I have elasticsearch logs on my server that I installed the Clearml-server?
When you previously mention clone the Task IÂ the UI and then run it, how do you actually run it?
Very good question, I need to understand it what happens when I press "Enqueue" In web UI and set it to default queue
sys.stdout.close() we have it 🙂 forget to mention
`
cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/logging/init.py", line 1084, in emit
stream.write(msg + self.terminator)
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 141, in stdout__patched__write_
return StdStreamPatch._stdout_proxy.write(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-p...
I don't know for sure but this is what I understand from the code. But you need to have 100 experiment running at the same time, so unless you have access to 100 GPUs you should be fine
AgitatedDove14 orchestration module - what is this and where can I read more about it?
Hi, I solved this cut out of labels withfig.tight_layout() return fig
ReassuredTiger98 why don't you take 5 minutes time and check out source code? https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/backend_interface/task/log.py
this is pretty obvious, it replaces last task with new task when the buffer is full
So now I did run with the example and I see hydra tab. Is the the expermient arg that I used to run it?python hydra_example.py experiment=gm_fl_dcl
yes, all runs on same machine on different dockers
I did my best in explanation.
You have buffer of tasks, for example 100. When you add task #101 the task under #1 is replaced with new and you keep now tasks from #2 to #101.
Because I have > 100 saved experiment, I don't think that anyone should bother to change it, unless you are running more than 100 experiments at the same time
I have firewall installed on the server and not all ports are open
Hi Martin, thank you for your reply.
Could you please show an example about image title/series?
My have names like Epoch_001_first_batch_train, Epoch_001_first_batch_val,
Epoch_001_first_batch_val_balanced,
Epoch_002_first_batch_train, and so on