Reputation
Badges 1
103 × Eureka!the pipeline is to orchestrate tasks to create more complex functionality, and take advantage of caching, yes.
here I run backtesting (how well did i predict the future), and can control frequency "every week, every month" etc.
so if I increase frequency, I dont need to rerun certain branches of the pipeline and therefore they are cached. another example: if I change something that impacts layer 3 but not layer 1-2, then about half my tasks are cached.
the pictured pipeline is: "create data...
fwiw - i'm starting to wonder if there's a difference between me "resetting the task" vs cloning it.
# imports
...
if __name__ == "__main__:
pipe = PipelineController(...)
# after instantiation, before "the code" that creates the pipeline.
# normal tasks can handle task.execute_remotely() at this stage...
pipe = add_steps_to_pipe(pipe)
...
# after the pipeline is defined. best I can tell, *has* to be last thing in code.
pipe.start_locally() # or just .start()
thank you!
by any chance do you have insights into github.com/allegroai/clearml-server/issues/248 ? dont know if its related to this at all or not, but it is an issue I experienced after upgrading .
perfect. thank you. I verified that this was indeed reproducible on 1.16.0 with a fresh deployment.
i understood that part, but noticed that when putting in the code to start remotely, the consequence seems to be that the dag computation happens twice - once on my machine as it runs, and then again remotely (this is at least part of why its slower) . if i put pipe.start earlier in the code, the pipeline fails to execute the actual steps .
this is unlike tasks, which somehow are smart enough to publish in draft form when task.execute_remotely is up top .
do i just leave off pipe.start?
Nope still dealing with it .
Oddly enough when i spin up a new instance on the new version, it doesnt seem to happen
is it? I can't tell if these delays (DAG-computation) are pipeline-specific (i get that pipeline is just a type of task), but it felt like a different question as I'm asking "are pipelines like this appropriate?"
is there something fundamentally slower about using pipe.start()
at the end of a pipeline vs pipe.run_locally()
?
when I do a docker compose down; docker compose up -d
... these disappear.
to be clear... this was not happening before I upgraded to the latest version. That is why I am asking about this.
i am definitely not seeing it persist after upgrading. previously it wasn't a problem on other upgrades
yeah. thats how I've been generating credentials for agents as well as for my dev environment .
I did manage to figure this out with
docker compose stop agent-services
docker compose up --force-recreate --no-deps -d agent-services
and running an export
for the newly generated key.
still though, noticing restarts cause App Credentials to be lost.
for now I'm just avoiding restarts of the service, but I do want to get to the bottom of it using a fresh instance.
as a backup plan: is there a way to have an API key set up prior to running docker compose up? Like, I need at least one set of credentials that I can reliably have remote agents use, one that I know persists across restarts and upgrades.
it's really frustrating, as I'm trying to debug server behavior (so I'm restarting often), and keep needing to re-create these.
thank you!
out of curiosity: how come the clearml-webserver upgrades weren't included in this release? was it just to patch the api part of the codebase?
hello @<1523701087100473344:profile|SuccessfulKoala55>
I appreciate your help. Thank you. Do you happen to have any updates? We had another restart and lost the creds again. So our deployment is in a brittle state on this latest upgrade, and I'm going back to 1.15.1 until I hear back.
yup. once again, rebooted and lost my credentials.
so, I tried this on a fresh deployment, and for some reason that stack allows me to restart without losing App Credentials.
It's just the one that I performed an update on.
App Credentials now persist (I upgraded 1.15.1 -> 1.16.1 and the same keys exist!)
thanks!
mind-blowing... but somehow just later in the same day I got the same pipeline to create its DAG and start running in under a minute.
I don't know what exactly I changed. The pipeline task was run locally (which I've never done before), then cloned to run remotely in my services queue. And then it just flew through the experiment at the pace I expected.
so there's hope. i'll keep stress-testing it and see what causes differences. I was right to suspect that such a simple DAG should not take...
everything i just said comes from the screenshotted webpage and is regarding the CLEARML_API_ACCESS_KEY and CLEARML_API_SECRET_KEY env vars.
when i restart clearml server, the keys started disappearing . this was not the case before upgrading
this is not about storage access tokens . its about the App Credentials .
those things you set as CLEARML_API_KEY and SECRET so that clients can talk to the api
thanks so much!
I've been running a bunch of tests with timers and seeing an absurd amount of variance. Ive seen parameters connect and task create in seconds and other times it takes 4 minutes.
Since I see timeout connection errors somewhat regularly, I'm wondering if perhaps I'm having networking errors. Is there a way (at the class level) to control the retry logic on connecting to the API server?
my operating theory is that some sort of backoff / timeout (eg 10s) is causing the hig...
yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.
if there's a process I'm not understanding please clarify...
but
(a) i start up the compose stack, log in via web browser as a user . this is on a remote server .
(b) i go to settings and generate a credential
(c) i use that credential to set up my local dev env, editing my clearml.conf
(d) i repeat (b) and use that credential to start up a remote workers to serve queues .
am i misunderstanding something? if there's another way to generate credentials I'm not familiar with it .