Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everybody, I Am Having An Issues With A Self-Hosted Clearml Server... I Am Having A Problem Enqueuing Experiments Whose Code Is In A Git Repository, They Are In A Pending State And Proceed... However If I Copy The Same Code Out In A Folder With No Rep

Hi everybody, I am having an issues with a self-hosted clearml server...
I am having a problem enqueuing experiments whose code is in a git repository, they are in a pending state and proceed...

However if I copy the same code out in a folder with no repository, than they are enqueue and executed correctly.

I suspect it might be due to access to an enterprise Github version (self-hosted) too...
What I am not sure is why:

I setup a PAT for github (that works fine), however locally I use ssh, not sure if this might be a problem.

Also in the logs I see some errors, but their meaning is not quite clear to me (nor if iti is related):

" " 10.234.108.65 - - [11/Aug/2022:13:21:45 +0000] "GET /version.json HTTP/1.1" 404 1110 " " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "10.10.211.99" 2022/08/11 13:22:21 [error] 49#49: *44 open() "/usr/share/nginx/html/version.json" failed (2: No such file or directory), client: 10.234.108.65, server: _, request: "GET /version.json HTTP/1.1", host: "clearml.host", referrer: " " 10.234.108.65 - - [11/Aug/2022:13:22:21 +0000] "GET /version.json HTTP/1.1" 404 1110 " " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "10.10.211.99" 2022/08/11 13:22:22 [error] 49#49: *44 open() "/usr/share/nginx/html/version.json" failed (2: No such file or directory), client: 10.234.108.65, server: _, request: "GET /version.json HTTP/1.1", host: "clearml.host", referrer: " " 10.234.108.65 - - [11/Aug/2022:13:22:22 +0000] "GET /version.json HTTP/1.1" 404 1110 " " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "10.10.211.99" 2022/08/11 13:22:22 [error] 49#49: *44 open() "/usr/share/nginx/html/version.json" failed (2: No such file or directory), client: 10.234.108.65, server: _, request: "GET /version.json HTTP/1.1", host: "clearml.host", referrer: " " 10.234.108.65 - - [11/Aug/2022:13:22:22 +0000] "GET /version.json HTTP/1.1" 404 1110 " " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" "10.10.211.99"How would you debug what's the problem?

  
  
Posted 2 years ago
Votes Newest

Answers 15


Hi SarcasticSquirrel56 , these look like the webserver logs - they will probably not be indicative.

However if I copy the same code out in a folder with no repository, than they are enqueue and executed correctly.

What do you mean? where is this folder located?
Are you running the ClearML Agents in k8s?

  
  
Posted 2 years ago

Hi Jake thanks for your answer!

So I just have a very simple file "project.py" with this content:

` from clearml import Task

task = Task.init(project_name='project-no-git', task_name='experiment-1')

import pandas as pd

print("OK") If I run python project.py ` from a folder that is not in a git repository, I can clone the task and enqueue it from the UI, and ti runs in the agent with no problems.
If I copy the same file, in a folder that is in a git repository, when I enqueue the experiment it stays in a pending state.

So what I suspect is that the Pod can't be created because it can't access github (enterprise self hosted), but it's not clear to me why.

  
  
Posted 2 years ago

And yes, I am using the agents that come with the Helm chart from Clearml repository

  
  
Posted 2 years ago

Yeah, that sounds right. So when you do the first scenario (i.e. running outside of a Git repository), ClearML will just take the code file content and embed it on the task - that's why the agent can take it an run it (even if it doesn't have access to the git repository, because it's not required).
In the second scenario, ClearML basically stored the git repository details on the task, and the agent needs to access it somehow (using some sort of authentication)

  
  
Posted 2 years ago

So the question is how did you configure your PAT in the agent's configuration, and what is the repository URL format the ClearML SDK stored in the task's execution section - can you share the details?

  
  
Posted 2 years ago

I actually found out it was an indentation error 😅 and the credentials weren't picked

  
  
Posted 2 years ago

but I was a bit set off track seeing errors in the logs

  
  
Posted 2 years ago

So everything is working now? 🙂

  
  
Posted 2 years ago

Yes, I still see those errors, but queues are working :)

  
  
Posted 2 years ago

I'll ask the UI people to take a look at these errors anyway 🙂

  
  
Posted 2 years ago

many thanks :)

  
  
Posted 2 years ago

SarcasticSquirrel56 quick question - is it possible you're using a self-build webserver image?

  
  
Posted 2 years ago

Hi Jack, yes we had to customize the default one for some tools we use internally

  
  
Posted 2 years ago

Got it. So that's the reason, the github code-base does not include the default version.json we use when building our official images - I'll make sure we update it for the next release

  
  
Posted 2 years ago

Thanks Jake!

  
  
Posted 2 years ago
1K Views
15 Answers
2 years ago
one year ago
Tags