Hello, I'M Trying To Use Clearml On A Local Server And For 2 Days When I Try To Close The Clearml Task "Task.Close()" , It Hangs Forever And Never Stop. Do You Have Any Idea Why?

Answered

Hello,
I'm trying to use clearml on a local server and for 2 days when I try to close the clearml task "task.close()" , it hangs forever and never stop. Do you have any idea why?

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					MistakenTurtle88
				
					0
					 × 1

Votes Newest

Answers 7

@<1638349756755349504:profile|MistakenTurtle88> I'm not sure I understand what gets stuck - you're running python code with the ClearML ASK and call task.close()? Can you share the code you're running and how your clearml.conf file is configured?

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1638349756755349504:profile|MistakenTurtle88> - Can you also share your docker-compose.yml file? Thanks!!

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					LittleFox79
				
					0

Hello,
This is my train.py
model = ModelParams(cfg.get("model", None))
opt = OptimizationParams(cfg.get("optimization", None))
cmlparams = ClearmlParams(cfg.get("clearml", None))
pipeline: "PipelineParams" = PipelineParams(cfg=cfg.get("pipeline", None))
test_iterations_default = (
list(range(0, 100)) + list(range(100, 1000, 10)) + list(range(1000, 10000, 50))
)
GS_loger: "loggingGS" = cfg.get("gs_logger", None)
test_iterations_default = (
list(range(0, 100, 10)) + list(range(0, 100000, 100)) + [opt.iterations - 1]
)

test_iterations_default = sorted(list(set(test_iterations_default)))

if CLEARML_FOUND and not pipeline.debug:
from utils.clearml_utils import safe_init_clearml, connect_whole

assert (
cmlparams.task_name != ""
), "Please provide a task name for ClearML,got {}".format(cmlparams.task_name)

task = Task.init(
project_name=cmlparams.project_name,
task_name=cmlparams.task_name,
tags=cmlparams.tags,
)
connect_whole(
cfg=cfg,
task=task,
name_hyperparams_summary="train config",
name_connect_cfg="whole train cfg",
)
# task.connect(cfg,name="test_train")
else:
print(
" We didn't find clearml or you are in debug mode, we don't log to Clearml"
)
print("Optimizing " + cfg.model.model_path)

# Initialize system state (RNG)
safe_state(cfg.quiet, seed=cfg.seed)

# Start GUI server, configure and run training
torch.autograd.set_detect_anomaly(cfg.detect_anomaly)

training(
sceneparams=model,
opt=opt,
pipe=pipeline,
GS_loger=GS_loger,
testing_iterations=test_iterations_default,
saving_iterations=cfg.save_iterations,
checkpoint_iterations=cfg.checkpoint_iterations,
start_checkpoint=cfg.start_checkpoint,
debug_from=cfg.debug_from,
)
# All done
print("\nTraining complete.")
if CLEARML_FOUND and not pipeline.debug:
print("Attempting to close clearml task")
# print("task url",task.get_web_a)

task.close()
print("ClearML task closed")

The code stop at task.close()
my clearml.conf is

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					MistakenTurtle88
				
					0
					 × 1

Sometimes I have " connection refused" when I logged my task but I've never been able to understand why exactly.
I followed the tutorial to setup my server except I didn't set up any of the experted parameters ," clearml_agent key" " CLEARML_host_IP" / " CLEARML_AGENT_GIT_PASS" ...
Eventually I just run this command
docker compose -f opt/clearml/docker-compose.yml

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					MistakenTurtle88
				
					0
					 × 1

Have you looked into why this comes up?

clearml-fileserver  |     raise ValueError('Connection Error: it seems *api_server* is misconfigured. '
clearml-fileserver  | ValueError: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					FoolishFlamingo73
				
					0

@<1638349756755349504:profile|MistakenTurtle88> - Are you still having an issue here?

  				
Posted 
	11 days ago

					More
				  		
  Report
		
					LittleFox79
				
					0

this is the logs of my clearml server

  				
Posted 
	22 days ago

					More
				  		
  Report
		
					MistakenTurtle88
				
					0
					 × 1

Write your answer

236 Views

7 Answers

22 days ago

10 days ago