Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Trying To Run Experiment From Clearml Web Ui. I Did Experiment Copy, Enqueue, But In The Execution Log I See That It Runs Command

Hi, I am trying to run experiment from ClearML web ui. I did experiment copy, enqueue, but in the execution log I see that it runs command
[.]$ /home/exx/.clearml/venvs-builds/3.8/bin/python -u train.py but I need to add experiment=my_config after train.py. Is there any way to do it from UI?
Thank you

  
  
Posted 3 years ago
Votes Newest

Answers 31


, I need to understand it what happens when I press "Enqueue" In web UI and set it to default queue

The Task ID is pushed into the execution queue (from the UI / backend that is it), Then you have clearml-agent running on Your machine, the agent listens on queue/s and pulls jobs from queue.
It will pull the Task ID from the queue, setup the environment according to the Task (i.e. either inside a docker container or in a new virtual-env), clone the code/apply uncommitted changes install the python packages etc. then it will spin the code which will use the configuration in the UI (instead of logging into the UI, when executed manually)
Make sense ?

  
  
Posted 3 years ago

AgitatedDove14 orchestration module - what is this and where can I read more about it?

  
  
Posted 3 years ago

When you previously mention clone the Task Iย the UI and then run it, how do you actually run it?

Very good question, I need to understand it what happens when I press "Enqueue" In web UI and set it to default queue

  
  
Posted 3 years ago

image

  
  
Posted 3 years ago

and experiments now stuck in "Running" mode even when the train loop is finished

  
  
Posted 3 years ago

here are requirements from the repository that I was able to run hydra_example.py and that I have crash with my custom train.py

  
  
Posted 3 years ago

1 more interesting bug. After I changed my "train.py" in according to hydra_exampl.py I started getting errors in the end of experiment
--- Logging error --- 2021-08-17 13:33:28 ValueError: I/O operation on closed file. 2021-08-17 13:33:28 File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 200, in write self._terminal._original_write(message) # noqa 2021-08-17 13:33:28 File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 141, in _stdout__patched__write__ return StdStreamPatch._stdout_proxy.write(*args, **kwargs) 2021-08-17 13:33:28 File "/opt/conda/lib/python3.8/logging/__init__.py", line 1084, in emit stream.write(msg + self.terminator) 2021-08-17 13:33:28 Traceback (most recent call last): 2021-08-17 13:33:28 Message: 'Waiting to finish uploads' Arguments: () 2021-08-17 13:33:28 File "/opt/conda/lib/python3.8/site-packages/clearml/task.py", line 3005, in __shutdown self.log.info('Waiting to finish uploads') 2021-08-17 13:33:28 File "/opt/conda/lib/python3.8/site-packages/clearml/task.py", line 2915, in _at_exit self.__shutdown() 2021-08-17 13:33:28 Call stack:

  
  
Posted 3 years ago

Thanks!

  
  
Posted 3 years ago

I can only assume that task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name) is broken because it has to read config, and depending on where I run it it has no access to config. I will investigate this with my co-worker and let you know if we find solution.

One more important thing - I have nvidia based docker running on the ubuntu server (same one that hosts clearml server) and I am afraid that initiating task from command line and from ClearML web UI run in different environments and this causes issues, but I don't know how to check the differences

  
  
Posted 3 years ago

`
cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)

--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/logging/init.py", line 1084, in emit
stream.write(msg + self.terminator)
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 141, in stdout__patched__write_
return StdStreamPatch._stdout_proxy.write(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/clearml/backend_interface/logger.py", line 200, in write
self._terminal._original_write(message) # noqa
ValueError: I/O operation on closed file.
Call stack:
File "/opt/conda/lib/python3.8/site-packages/clearml/task.py", line 2915, in _at_exit
self.__shutdown()
File "/opt/conda/lib/python3.8/site-packages/clearml/task.py", line 3005, in __shutdown
self.log.info('Waiting to finish uploads')
Message: 'Waiting to finish uploads'
Arguments: () `

  
  
Posted 3 years ago

Martin, thank you very much for your time and dedication, I really appreciate it

My pleasure ๐Ÿ™‚

Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used

Hmm are you saying the auto hydra connection doesn't work ? is it the folder structure ?
When is the Task.init is called ?
See example here:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py

  
  
Posted 3 years ago

Couple of words about our hydra config
it is located in root with train.py file. But the default config points to experiment folder with other configs and this is what I need to specify on every run

  
  
Posted 3 years ago

Previously I had general tab in Hyper Parameters, but now without this line I don't have it.

  
  
Posted 3 years ago

Martin, thank you very much for your time and dedication, I really appreciate it

  
  
Posted 3 years ago

MortifiedDove27 did you update to the latest cleaml python package ?

  
  
Posted 3 years ago

Nevertheless, when I try to run my training code, that differs very little from the example, I can't copy and run it from UI and I even don't see hyper parameters in experiment results
` import os
import hydra
from hydra import utils
from utils.class_utils import instantiate
from omegaconf import DictConfig, OmegaConf
from clearml import Task

@hydra.main(config_path="conf", config_name="default")
def app(cfg):
run(cfg)

def run(cfg):

task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name)
logger = task.get_logger()
logger.report_text("You can view your full hydra configuration under Configuration tab in the UI")

print(OmegaConf.to_yaml(cfg))
print('+'*200)

# some other hydra.utils.instantiate code

trainer.train()

if name == "main":
app() `

  
  
Posted 3 years ago

Thanks MortifiedDove27 ! Let me see if I can reproduce it, if I understand the difference, it's the Task.init in a nested function, is that it?
BTW what's the hydra version? Python, and OS?

  
  
Posted 3 years ago

Hi AgitatedDove14 !
Thanks for your answers. Now I have a follow up. I was able to successfully run the experiment, copy it in UI and enqueue to default queue and see it complete.

  
  
Posted 3 years ago

task = Task.init(project_name=cfg.project.name, task_name=cfg.project.exp_name) After discussion we have suspicion on using config before initing the task, can it cause any problems?

  
  
Posted 3 years ago

So now I did run with the example and I see hydra tab. Is the the expermient arg that I used to run it?
python hydra_example.py experiment=gm_fl_dcl

  
  
Posted 3 years ago

Python 3.8.8 (default, Feb 24 2021, 21:46:12)
[GCC 7.3.0] :: Anaconda, Inc. on linux
clearml.version
'1.0.5'
Ubuntu 20.04.1 LTS

  
  
Posted 3 years ago

Ok, let me check it later today and come back with the results of the example app

  
  
Posted 3 years ago

As long as you import clearml on the main script, it should work. Regarding the Nvidia container, it should not interfere with any running processes, the only issue is memory limit. BTW any reason not to spin an agent on a dedicated machine? What is the gpu used for in the ckearml server machine?

  
  
Posted 3 years ago

We have physical server in server farm that we configure with 4 GPUs, so we run all on this hardware without cloud rent

  
  
Posted 3 years ago

orchestration module
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
regarding the exception stack
It's pointing to a stdout that was closed?! How could that be? Any chance you can provide a toy example for us to debug?

  
  
Posted 3 years ago

docker has access to all 4 GPUs with --gpus all flag and we specify in config on what cuda device(s) to run, in pytorch we can run more than 2 gpus

  
  
Posted 3 years ago

Hmm are you running the clearml-agent on this machine? (This is the orchestration module, it will spin the Tasks and the dockers on the gpus)

  
  
Posted 3 years ago

yes, all runs on same machine on different dockers

  
  
Posted 3 years ago

Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used

  
  
Posted 3 years ago

sys.stdout.close() we have it ๐Ÿ™‚ forget to mention

  
  
Posted 3 years ago
16K Views
31 Answers
3 years ago
7 months ago
Tags
Similar posts