SmallTurkey79

10 Questions, 118 Answers

Active since 12 April 2024

Last activity 3 months ago

Reputation

Badges 1

103 × Eureka!

Questions 10
Answers 118

0 Votes

9 Answers

511 Views

0 Votes 9 Answers 511 Views

Why Does Clearml Still Waste Time On Requirement Analysis When I Provide Them? Any Tips For How I Can Reduce Clearml Overhead ... (The Time Before Work Actually Starts)?

why does clearml still waste time on requirement analysis when I provide them? any tips for how I can reduce clearml overhead ... (the time before work actua...

clearml

5 months ago

0 Votes

1 Answers

467 Views

0 Votes 1 Answers 467 Views

Hi Everyone! I Just Wanted To Bring To Your Attention That Clearml 1.16.0 Introduced Authentication For The Self-Hosted Fileserver By Default.

Hi everyone! I just wanted to bring to your attention that ClearML 1.16.0 introduced authentication for the self-hosted fileserver by default. None If any of...

clearml

4 months ago

0 Votes

7 Answers

733 Views

0 Votes 7 Answers 733 Views

Thread Re: Pipelines And How They'Re Meant To Be Used / How Long They Take To Orchestrate.

Thread re: Pipelines and how they're meant to be used / how long they take to orchestrate. @<1523701205467926528:profile|AgitatedDove14> I appreciated your a...

clearml

6 months ago

0 Votes

1 Answers

404 Views

0 Votes 1 Answers 404 Views

In Case Anyone Else Ever Comes Across Mongo Issues Using The Docker Compose Clearml Stack (In Case Of A Messy Shutdown), I Have Found This Script To Be A Lifesaver On Numerous Occasions:

in case anyone else ever comes across mongo issues using the docker compose clearml stack (in case of a messy shutdown), I have found this script to be a lif...

clearml

5 months ago

0 Votes

2 Answers

584 Views

0 Votes 2 Answers 584 Views

Hello! Thank You For The Great Product. I Have A Bit Of A Request: This Hover Feature In Pipeline Overview Would Be Much More Useful If I Could Read Out The Whole Metric Name. (Not So Much An Issue With Things Like F1, "Acc", But Anything Longer Is Not

hello! thank you for the great product. I have a bit of a request: this hover feature in pipeline overview would be much more useful if I could read out the ...

clearml

7 months ago

0 Votes

54 Answers

17K Views

0 Votes 54 Answers 17K Views

I Have Set

I have set export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=true in my entrypoint.sh (which runs clearml-agent da...

clearml

6 months ago

0 Votes

43 Answers

11K Views

0 Votes 43 Answers 11K Views

I Dont Exactly Know How To Ask For Help On This... Nor Have A Reproducible Minimal Example... I Downgraded Back To 1.15.1 From 1.16.2 And Have The Same Issue There. I Have A Pipeline That'S Repeatedly Failing To Complete. It Correctly Marks Things As Cach

i dont exactly know how to ask for help on this... nor have a reproducible minimal example... I downgraded back to 1.15.1 from 1.16.2 and have the same issue...

clearml

3 months ago

0 Votes

31 Answers

12K Views

0 Votes 31 Answers 12K Views

I Noticed After Upgrading To The Latest Clearml That App Credentials Now Disappear On Restart. Is This An Intentional Design Choice? I'M In A Bit Of A Chicken-And-Egg Situation: Trying To Generate Valid Keys For

I noticed after upgrading to the latest clearml that App Credentials now disappear on restart. Is this an intentional design choice? I'm in a bit of a chicke...

clearml

4 months ago

0 Votes

13 Answers

333 Views

0 Votes 13 Answers 333 Views

Any Tips On Debugging Worker Graphs Not Showing Up? Seems To Be Some Js Errors In The Console That May Be Related. Running Localhost Against 1.16.1 Images

any tips on debugging worker graphs not showing up? seems to be some js errors in the console that may be related. running localhost against 1.16.1 images

clearml

4 months ago

0 Votes

14 Answers

572 Views

0 Votes 14 Answers 572 Views

I'M Having A Hard Time With Git Cloning + Cache For A Private Repo Accessed Via Personal Access Token. This Happens 100% Of The Time, Across Both Bitbucket + Github. I Have A Simple "Hello World" Task In A Private Repo. The Worker Is Running In A Docker

I'm having a hard time with git cloning + cache for a private repo accessed via personal access token. This happens 100% of the time, across both bitbucket +...

clearml

7 months ago

0 I Have Set

i just ran a pipeline that took about 2h (more than half this time was just the DAG), with about a hundred tasks. i'm taking a look at them now to see what the logs show for runtimes.

6 months ago

0 I Have Set

I'm just working on speeding up the time from "queue experiment" to "my code actually runs remotely" - as of yesterday things would sit for many minutes at a time. trying to see if venv is the culprit .

6 months ago

0 I Have Set

yeah, still noticing that it can be multiple minutes before something starts...
like... what is happening in this time (besides a git clone), now that I set both

export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=true
export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)

update: it's now been six mins and the task still isn't done. this should have run through in like a minute total end-to-end
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT9ATQXJ5-F072N3UF22...

6 months ago

0 I Have Set

thank you!
i'll take that design into consideration.

re: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL in "docker venv mode" im still not quite sure I understand correctly - since the agent is running in a container, as far as it is concerned it may as well be on bare-metal.

is it just that there's no way for that worker to avoid venv? (i.e. the only way to bypass venv is to use docker-mode?)

6 months ago

0 I Have Set

im not running in docker mode though - im running a clearml worker in a docker container (and then multiplying the container)

6 months ago

0 I Have Set

what if the preexisting venv is just the system python ? my base image is python:3.10.10 and i just pip install all requirements in that image . Does that not avoid venv still?

it's good to know that in theory there's a path forward with almost zero overhead . that's what I want .

is it reasonable to expect that with sufficient workers, I can get 50 tasks to run in the same time it takes to run a single one? i cant imagine the apiserver being a noticeable bottleneck .

6 months ago

0 I Have Set

of what task? i'm running lots of them and benchmarking execution times. would you like to see a best case or worst case scenario? (ive kept some experiments for each).

and yeah, in those docs you just linked, "boolean" vars like CLEARML_AGENT_GIT_CLONE_VERBOSE explicitly say true so I ended up trying that pattern. but originally i did try 1. let me go back to that now. thank you.

overall I've seen some improvements in execution time using the suggestions in this thread (tysm!) - th...

6 months ago

0 I Have Set

is there a way for me to toggle CLEARML's log level? I'm doing some manual task-debugging in ipython and think it would be helpful to see network requests and timeouts if they're occurring.

6 months ago

0 I Have Set

i was having a ton of git clone issues - disabled caching entirely... wonder if that may help too.

tysm for your help! will report back soon.

6 months ago

0 I Have Set

i really dont see how this provides any additional context that the timestamps + crops dont but okay.

6 months ago

0 I Have Set

the timestamps were all that mattered in those.

6 months ago

0 I Dont Exactly Know How To Ask For Help On This... Nor Have A Reproducible Minimal Example... I Downgraded Back To 1.15.1 From 1.16.2 And Have The Same Issue There. I Have A Pipeline That'S Repeatedly Failing To Complete. It Correctly Marks Things As Cach

but maybe here's a clue. after hanging like that for a while... it seems like the agent restarts (the container it runs in does not)

3 months ago

0 I Have Set

ah I see. thank you very much!

trying export CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=$(which python)
but I still see Environment setup completed successfully
(it is printed after Running task id )

it still takes a full 3 minutes between task pulled by worker until Running task id
is this normal? What is happening in these few minutes (besides a git pull / switch)?

6 months ago

0 I Have Set

oh yes. Using env until the next message is 2 minutes.

6 months ago

default queue is served with (containerized + custom entrypoint) venv workers (agent services just wasn't working great for me, gave up)

3 months ago

clearml-server-1.15.1, clearml-1.16.2

3 months ago

hoping this really is a 1.16.2 issue. fingers crossed. at this point more pipes are failing than not.

3 months ago

None here's how I'm establishing worker-server (and client-server) comms fwiw

3 months ago

when i run the pipe locally, im using the same connect.sh script as the workers are in order to poll the apiserver via the ssh tunnel.

3 months ago

0 I'M Having A Hard Time With Git Cloning + Cache For A Private Repo Accessed Via Personal Access Token. This Happens 100% Of The Time, Across Both Bitbucket + Github. I Have A Simple "Hello World" Task In A Private Repo. The Worker Is Running In A Docker

so far it seems that turning off cache like this is my "best option"

7 months ago

0 Thread Re: Pipelines And How They'Re Meant To Be Used / How Long They Take To Orchestrate.

is it? I can't tell if these delays (DAG-computation) are pipeline-specific (i get that pipeline is just a type of task), but it felt like a different question as I'm asking "are pipelines like this appropriate?"

is there something fundamentally slower about using pipe.start() at the end of a pipeline vs pipe.run_locally() ?

6 months ago

0 I Noticed After Upgrading To The Latest Clearml That App Credentials Now Disappear On Restart. Is This An Intentional Design Choice? I'M In A Bit Of A Chicken-And-Egg Situation: Trying To Generate Valid Keys For

perfect. thank you. I verified that this was indeed reproducible on 1.16.0 with a fresh deployment.

4 months ago

0 I Have Set

minute of silence between first two msgs and then two more mins until a flood of logs. Basically 3 mins total before this task (which does almost nothing - just using it for testing) starts.

6 months ago

0 I Have Set

sometimes I get "lucky" and see something more like what I expect... total experiment time < 1 min (and I have evidence of this happening. logs start-to-finish in sub-minute). But then other times the same task will take 5-10 minutes.

same worker, same queue, just one worker serving it... I am so utterly perplexed by the variation in how long things take. my clearml API server is running on a beefy 32 core machine and not much else is happening right now...
![image](https://clearml-web-ass...

6 months ago

hello @<1523701087100473344:profile|SuccessfulKoala55>
I appreciate your help. Thank you. Do you happen to have any updates? We had another restart and lost the creds again. So our deployment is in a brittle state on this latest upgrade, and I'm going back to 1.15.1 until I hear back.

4 months ago

0 I Have Set

yeah... still seeing variances from 1m to 10m for the same task. been testing parallel execution for hours.

6 months ago

0 I Have Set

apologies - just trying to keep sensitive data out of screenshot

6 months ago

0 I Have Set

are you on clearml agent 1.8.0?

(im noticing sometimes im just missing logs such as "Running task id.." entirely)

6 months ago

worker thinks its in venv mode but is containerized .
apiserver is docker compose stack

ill check logs next time i see it .

currently rushing to ship a model out, so I've just been running smaller experiments slowly hoping to avoid the situation . fingers crossed .

3 months ago

0 Hi Everyone ! I'M Working With Pipelinecontroller. Is It Possible To Create A Pipeline With Optional Steps ? To Clarify I Add An Example Of A Pipeline In The Picture. For Example, The User Would Be Able To Modify The Value Of A Parameter To Execute The St

I do this a lot. pipeline params spawn K number of nodes, that collect just like you drew. No decorator being used here, just referencing tasks by id or name/project. I do not use continue on fail at all.

I do this with functions that have the contract ( f(pipe: PipelineController, **kwargs) -> PipelineController ) and a for-loop.

just be aware DAG creation slows down pretty quickly after a dozen or so such loops.

All the images below were made with the same pipeline (just evolved some n...

6 months ago

Show more results