AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

Each of these steps,

[2], [3], [4], [5 & 6]

can be thought of as an independent Kedro nodes that can be reused in the future. Now, how to integrate this with ClearML is unclear to us.

The same can be said for ClearML, each of these steps is a clearml Task (with it's own repo/environment)
It sounds (and I might be completely off here, so please feel free to correct me) that the main use for Kedro is the nice web UI of the pipeline (which I
agree looks very cool).
Th...

3 years ago

0 I Have A Local Folder A, And A Dataset B. A:

so moving b in to a won’t work if some subfolders are already there

I though that if they are already there you would merge / overwrite, isn't that what you need ?
a/b/c/2.txt seems like the result of moving b from dataset B into folder b in Dataset A, what am I missing?
(My assumption is that you have both datasets locally on the same machine and that you can just copy the files from b of Datasset B into b folder of Dataset A)

2 years ago

0 How Can I Run A New Version Of A Pipeline, Wait For It To Finish And Then Check Its Completion/Failure Status? I Want To Kick Off The Pipeline And Then Check Completion

This is something that we do need if we are going to keep using ClearML Pipelines, and we need it to be reliable and maintainable, so I don’t know whether it would be wise to cobble together a lower-level solution that has to be updated each time ClearML changes its serialisation code

Sorry if I was not clear, I do not mean for you ti do unstable low-level access, I meant that pipelines are Designed to be editable externally, they always deserialize themselves.
The only part that is mi...

one year ago

0 Is It Possible To Create A Serving Endpoint With Pytorch Jit File In Web Interface Only?

DefiantHippopotamus88
HTTPConnectionPool(host='localhost', port=8081):This will not work because inside the container of the second docker compose "fileserver" is not defined
CLEARML_FILES_HOST=" "
You have two options:
configure to the docker compose to use the networkhost on all conrtainers (as oppsed to the isolated mode they are now running ing)2. Configure all of the CLEARML_* to point to the Host IP address (e.g. 192.168.1.55) , then rerun the entire thing.

2 years ago

0 How Can I Run A New Version Of A Pipeline, Wait For It To Finish And Then Check Its Completion/Failure Status? I Want To Kick Off The Pipeline And Then Check Completion

At its simplest, this could just mean checking that all of the steps and the pipeline itself have completed successfully (by checking their “Task status”).If a pipeline step ends with "failed" status in the pipeline execution function an exception will be raised, if the exception is not caught, the pipeline itself will also fail

run

pipeline_script.py

which contains the pipeline code as decorators.

So in theory the following should actually work.
Let's assume you ...

one year ago

0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

Btw I sometimes get a gzip error when I am accessing artefacts via the '.get()' part.

Hmm this is odd, is this a download issue? if this is reproducible maybe we should investigate further...

3 years ago

0 Hi. I Somehow Managed To Exceed The Metrics Quota By ~35Gb. I Logged Some Histograms, But Still That Seems Excessive. Now I Am Trying To Delete Archived Experiments With The Cleanup Service, But Some Tasks Cannot Be Deleted:

FlutteringWorm14 any insight on the Task the it fails to delete ? or to reproduce ?

2 years ago

0 Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

JitteryCoyote63 I remember something with "!" in the name or maybe "/" in the name that might cause this behavior. May I suggest checking with clearml-server 1.3 ?

2 years ago

0 Hey, Using K8S With Trains 0.16.1-320, All Of A Sudden The Entire Data (I.E Experiments, Tasks, Api Creds) Is Not Showing In The Ui Anymore. All Logs Seems To Be Fine Afai Can Tell... Any Idea What Went Wrong?

yea the api server configuration also went away

okay that proves it

3 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

At the top there should be the URL of the notebook (I think)

one year ago

0 I'M Running Hyperparameter Tuning With Oputnaotimization. When Using Optuna It Is Possible To Save Studies As You Go And Pick Them Up Again In Case Of Crashes Etc. Is There Anyway Of Accessing The Optuna.Study Class So When We Run The Optunaoptimization W

If you could provide the specific task ID then it could fetch the training data and study from the previous task and continue with the specified number of trainings.

Yes exactly, and also all the definitions for the HPO process (variables space, study etc.)

The reason that being able to continue from a past study would be useful is that the study provides a base for pruning and optimization of the task. The task would be stopped by aborting when the gpu-rig that it is using is neede...

2 years ago

Damn 😞

3 years ago

0 How Can I Run A New Version Of A Pipeline, Wait For It To Finish And Then Check Its Completion/Failure Status? I Want To Kick Off The Pipeline And Then Check Completion

Basically, for a bit more context, this is part of an effort to incorporate ClearML Pipelines in a CI/CD framework.

@<1523701079223570432:profile|ReassuredOwl55> did you check these examples?
None
None
None

And I’d rather the testing/validation etc lived outside...

one year ago

0 Well, This Is My Question... I'M Trying To Adapt Clearml To Aws Using Basically Ecs Fargate + Documentdb + Aws Es + Elasticache + Efs. I Could Start The Fileserver Component, But Now I'M Trying To Start The Api Server And Is Not Working, Before Stop The T

Sure, to get what exactly ?

3 years ago

0 Oh You Say Is The Version? (It'S A Good New This For Me

This is cleaml python client, no need to change the server

3 years ago

0 Clearml-Session Fails Ssh Tunneling. It Does Not Use Key Auth, Instead Sets Up Some Weird Password And Then Fails To Auth:

same: Not Found (#404)
May I suggest to DM it to me (so it is not public)

2 years ago

0 Hi Everybody. When I Want To Force The Agent To Not Reproduce My Local Pip Environment, I Add

My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?

This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)

2 years ago

0 If I Create A Task Using Task.Create And Then In A Separate Piece Of Code I Want To Report To It (By Using

Thank you! 😊

2 years ago

0 Hi Guys, I Am Having Some Trouble Running Some Training Scripts With The Agent Functionality:

When we enqueue the task using the web-ui we have the above error

ShallowGoldfish8 I think I understand the issue,
basically I think the issue is:
task.connect(model_params, 'model_params')Since this is a nested dict:
model_params = { "loss_function": "Logloss", "eval_metric": "AUC", "class_weights": {0: 1, 1: 60}, "learning_rate": 0.1 }The class_weights is stored as a String key, but catboost expects "int" key, hence it fails.
One op...

2 years ago

0 I Wanted To Ask About K8S + Clearml-Agent Integration. Details In The Thread.

K8s + clearml-agent integration.

Hmm is this an on-prem k8s cluster?

2 years ago

0 Hi, Guys! Thank You A Lot For Your Great Software, But I'Ve Got A Problem. I Have Got Two Remotes: Gitlab And Gitea. The Branch From Which I Run The Code Is Upstreamed With Gitea. However, In The Clearml Experiment, Gitlab Repository Is Automatically Sele

So the issue is that you have two reference branches on the local git, one to gitlab one to gitea and it fails to understand which on is the correct remote ...
I wonder if "git ls-remote --get-url" will always work ?!

2 years ago

0 Hi Folks, Is There A Way To Force Clear-Ml Agent With --Docker To

My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]

2 years ago

0 Hey, I Have A Question Regarding Pipelines. Let'S Say I Have 2 Scripts: Train.Py And Evaluate.Py. Each Of Them Creates A Task Using Task.Init And Logs Some Information. These Scripts Are Run Independently (In My Case They Are Run By Dvc). I Would Like Bot

Hi ScaryLeopard77
You can probably do:
Task.init(...,continue_last_task='task_id_here')This will continue a previously executed Task and log both steps in the same place.
Does that help?
BTW: you can also of course manually report to any Task as it is still running with:
aux_task = Task.get_task(task_id_here) aux_task.get_logger().report_scalar(...)

2 years ago

0 Hey! I Have My Custom Model, That Uses Models From Populars Frameworks Inside, Such As Lgbm, Catboost Etc. Also It Have Multiple Instances Of One Models Of One Framework.

EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?

2 years ago

0 Hello! Thank You All For Your Work! I Have A Question (Which Is Probably Not Clearml Related At All). I Am Using Clearml-Agent Running In Docker Mode On Several Machines With Gpu In Our Local Network And Get Different Behaviour Depending On How I Logged I

I think the main issue is that for some reason the container running changed one of the files inside the temp folder. then the host machine is "stuck" with a file that the root user owned/changed, and now it cannot reuse / delete the temp folder.
I think the fix is to make sure the container deleted the temp folder when it is done

2 years ago

0 Can We Use Dynamodb With Clearml Helm Charts Instead Of Mongodb? We'D Like To Move All Stateful Storage To Aws As A Separate Service And That Would Be A Nice Alternative

Hi UnevenDolphin73
In theory it "might" work, I have to admit that personally I'm not a fan of what Amazon did to Mongo, i.e. forking their their code base and selling it as a service, just bad open-source practice
(The main issue might be API calls that might not fully match)
wdyt?

2 years ago

0 Anyone Doing Sagemaker With Clearml - Something Like The K8S Glue But The Tasks Are Pulled Into Sagemaker Training Jobs

Basic setup:
glues service per "job template" (e.g. k8s resources, for example cpu requirement, or gpu requirement).
queue per glue service, e.g. cpu_machine queue, and 1xGPU queue
wdyt?

3 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

I see, so in theory you could call add_step with a pipeline parameter (i.e. pipe.add_parameter etc.)
But currently the implementation is such that if you are starting the pipeline from the UI
(i.e. rerunning it with a different argument), the pipeline DAG is deserialized from the Pipeline Task (the idea that one could control the entire DAG externally without changing the code)
I think a good idea would be to actually allow the pipeline class to have an argument saying always create from cod...

one year ago

0 Hi, I Need Your Help Setting Up An Trains Agent Running In Docker. I Have An Python Script Calling Wget As System Command Which Runs Fine On My Dev Engine. When Cloning The Experiment And Scheduling It Into The Services Queue I Get An Error That The Call

One last thing make sure you spin the pod container with privileged mode, because the trains-agent docker will spin a sibling docker for your actual experiment.

3 years ago

WickedGoat98 sorry, I missed the thread...

that the trains.conf has to be located on the node running the trains-agent.

Correct 🙂
The easiest way to check is to see if you can curl to the ip:port from the docker.
If you fail it is probably the wrong IP.
the IP you need to use is the IP of the machine running the docker-compose (not the IP of the docker inside that machine).
Make sense ?

4 years ago

Show more results