Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
In Task.Init I Got The Folloing Errror Message -- "Retrying (Retry(Total=239, Connect=239, Read=240, Redirect=240, Status=240)) After Connection Broken By 'Connecttimeouterror(<Urllib3.Connection.Httpconnection Object At 0X2Aaf57A2Cc50>, 'Connection To Xx

in task.init I got the folloing errror message -- "Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2aaf57a2cc50>, 'Connection to XXX.XXX.XXX.XXX timed out. (connect timeout=3.0)')': /auth.login"

I tried to reboot the server - did nothing

I tried to test curl
curl None . --- works OK

I also tried to set the following:

clearml.config.http_timeout = 3000
clearml.config.verify_ssl = False
from clearml.backend_api import Session
Session._session_initial_timeout = (15., 30.)

nothing works... still got

ANY IDEA ?

  
  
Posted one year ago
Votes Newest

Answers 9


copy and paste from the APP CREADENTIAL

  
  
Posted one year ago

As a side note, I attempted to debug the issue using strace by tracing the connect system call with the command:
$ strace -e connect python ./t1.py
connect(3, {sa_family=AF_INET, sin_port=htons(8008), sin_addr=inet_addr("XXX.XXX.XXX.XXX")}, 16) = -1 EINPROGRESS (Operation now in progress)
connect(3, {sa_family=AF_INET, sin_port=htons(8015), sin_addr=inet_addr("XXX.XXX.XXX.XXX")}, 16) = -1 EINPROGRESS (Operation now in progress)

The output showed that the program first tried to connect to port 8008, which is open on the server and also defined in the clearml.conf file. However, the connect call to port 8008 occurred only once, while all other connect calls were made to port 8015 for unknown reasons.
I am unsure why the program suddenly attempted to connect to port 8015. When I tested it on a different computer from a different network, this issue did not occur, and all connect calls were made to the same 8008 port, which is open and defined.
Do you have any idea why the program attempted to connect to the 8015 port?

  
  
Posted one year ago

Is there a firewall in between or something stopping the connection?

  
  
Posted one year ago

God, this is strange -- ANY IDEA???\

  
  
Posted one year ago

@<1546303269423288320:profile|MinuteStork43> the clearml server will not redirect any call - is this is happening, it's probably some proxy, FW or a load balancer in between the client and the server (which makes sense since calls from different networks work just fine)

  
  
Posted one year ago

Yes there is a firewall , but it open on this port (8008). the clearml suddenly attempted to connect to port 8015...

  
  
Posted one year ago

@<1523701070390366208:profile|CostlyOstrich36>

  
  
Posted one year ago

Hi @<1546303269423288320:profile|MinuteStork43> , how did you set the apiserver in clearml.conf ?

  
  
Posted one year ago

After digging deeper into the starce log, I found the following:
Due to some unknown reason, the clearml server has directed me to port 8015 (and this is only happening on one network, I have tested two different computers on this network, both behave the same). Outside the network, everything works correctly.

connect(3, {sa_family=AF_INET, sin_port=htons(8008), sin_addr=inet_addr(…..
poll([{fd=3, events=POLLOUT|POLLERR}], 1, 3000) = 1 ([{fd=3, revents=POLLOUT}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
poll([{fd=3, events=POLLOUT}], 1, 3000) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "GET /auth.login ……) = 402
ioctl(3, FIONBIO, [1]) = 0
poll([{fd=3, events=POLLIN}], 1, 10000) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "HTTP/1.1 302 Found\r\nLocation: None \r\nConnection: close\r\nX-Frame-Options: SAMEORIGIN\r\nX-XSS-Protection: 1; mode=block\r\nX-Content-Type-Options: nosniff\r\nContent-Security-Policy: frame-ancestors 'self'\r\n\r\n", 8192, 0, NULL, NULL) = 232

  
  
Posted one year ago
1K Views
9 Answers
one year ago
one year ago
Tags