. Ive seen parameters connect and task create in
seconds
and other times it takes 4 minutes.
This might be your backend (cleamrl-server) replying slowly becuase of load?
Is there a way (at the class level) to control the retry logic on connecting to the API server?
The difference in the two screenshots is literally only the URLs in
clearml.conf
and it went from 30s down to 2-3s.
Yes that could be network, also notice that there is auto retries that are quiet basically if a request is dropped due to network issues / timeout etc, it will automatically retry (but of course it will look slow from the outside because of the retries)
yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?
you mean wait for less?
None
add to your clearml.conf:
api.http.retries.backoff_factor = 0.1
thanks so much!
I've been running a bunch of tests with timers and seeing an absurd amount of variance. Ive seen parameters connect and task create in seconds and other times it takes 4 minutes.
Since I see timeout connection errors somewhat regularly, I'm wondering if perhaps I'm having networking errors. Is there a way (at the class level) to control the retry logic on connecting to the API server?
my operating theory is that some sort of backoff / timeout (eg 10s) is causing the high variance.
To test this, I spun up a local instance and pointing to localhost
as well as the normal reverse-proxy, and found that localhost
had "overhead times" that were completely reasonable - practically none at all.
The difference in the two screenshots is literally only the URLs in clearml.conf
and it went from 30s down to 2-3s.
(server has been destroyed already, not worried about the keys showing)
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
thank you very much.
for remote workers, would this env variable get parsed correctly?CLEARML_API_HTTP_RETRIES_BACKOFF_FACTOR=0.1
yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?
I've definitely narrowed it down to the reverse proxy I'm behind. when I switch to a cloudflare tunnel, the overhead of the network is <1s compared to localhost, everything feels snappy!
But for security reasons, I need to keep using the reverse proxy, hence my question about configuring the silent clearml retries.
Hi @<1689446563463565312:profile|SmallTurkey79>
This call is to set an existing (already created Task's requirements). Since it was just created it waits for the automatic package detection before overriding it.
What you want is " Task.force_requirements_env_freeze
" (notice Class level, that need to be called Before Task.init)
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
task = Task.init(...)
thanks for the clarification. is there any bypass? (a git diff + git rev parse should take mere milliseconds)
I'm working out of a mono repo, and am beginning to suspect its a cause of slowness. next week ill try moving a pipeline over to a new repo to test if this theory holds any water.
@MichaelHi @<1689446563463565312:profile|SmallTurkey79> , the trigger to the detection is the call to Task.init()
, so while calling set_packages()
overrides any packages found, it will not prevent it.