AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

The reason is because it is logged as an image, not a plot 🙂

3 years ago

0 Hi, I Need Your Help Setting Up An Trains Agent Running In Docker. I Have An Python Script Calling Wget As System Command Which Runs Fine On My Dev Engine. When Cloning The Experiment And Scheduling It Into The Services Queue I Get An Error That The Call

One last thing make sure you spin the pod container with privileged mode, because the trains-agent docker will spin a sibling docker for your actual experiment.

3 years ago

0 Hi, For Some Reason Many Packages Are Not Detected In The Installed Packages Section. My Experiment Clone Crashes As It Fails To Import A Package That Wasn'T Included In The Installed Packages Although It Is Installed In The Default Environment. I'M Using

SmarmySeaurchin8
When running in "dev" mode (i.e. writing the code) only packages imported directly are registered under "installed packages" , then when the agent is executing the experiment, it will update back the entire environment (including derivative packages etc.)
That said you can set detect_with_pip_freeze to true (in trains.conf) and it will basically store the entire pip freeze.
https://github.com/allegroai/trains/blob/f8ba0495fb3af1f99732fdffbbccd2fa992934a4/docs/trains.c...

4 years ago

0 Hello, We Have A Self Hosted Clearml Server Connected To Different Queues And Use It To Launch Remote Experiments (Clearml==1.9.3, Clearml-Agent==1.5.2Rc0). It Is Working Really Well For Us Unless One Workflow :) We Would Like To Abort An Experiment And E

I had again the same problem but within a remote pipeline setup.

Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?

one year ago

0 One More Thing, I'M Trying To Take Full Advantage Of The Controller, But I Run Into A Problem In My Use Case. The Controller Is Super Useful For Creating A Dag Of Tasks Which Is A Behaviour Of Interest. But Issues Rise When The Tasks Are Changing. Not On

Did you change the commit ID ?

4 years ago

0 Executed From Within A Pipelinecontroller Task, What Possible Reason Does

(I suspect you are correct, but I'm missing some information in order to understand where the problem is)
WackyRabbit7 can you send mock code that explains how you create the pipeline ?

2 years ago

0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

GrievingTurkey78 Actually it is in progress, see the GitHub issue for details:
https://github.com/allegroai/trains/issues/219

3 years ago

0 Another Question, Is It Possible To Run A Single Experiment Which Is Composed Of Multiple Steps Executed As Sequential Sub-Processes Where The Current Task Is Fetched As

Okay, I'm pretty sure there is a hack, let me see if there is something "nicer"

3 years ago

0 With

However, that would mean passing back the hostname to the Autoscaler class.

Sorry my bad, the agent does that automatically in real-time when it starts, no need to pass the hostname it takes it from the VM (usually they have some random number/id)

3 years ago

0 I Have Another Question Regarding Creating A Task With

Ok, so it doesn't follow the exact same rules as

Task.init

?

Correct

I was afraid all the logs and outputs of a hyperparameter optimization task would be deleted just because no artifacts were created. (edited)

Should not happen 🙂

2 years ago

0 Hi! If There Are Several Tasks Running Concurrently, Which Task Should

GiganticTurtle0

If there are several tasks running concurrently, which task should

Task.current_task()

return? (

How could you have that ?
Per process, there is one Main current Task (until you close it).
Are you referring to a pipeline with multiple steps ?
If this is the case, task.current_task will return the Task of the component (if executed form the component) and the pipeline (if called from the pipeline logic function).
Notice we added the ability to s...

2 years ago

0 Hi, How Can I Use

I see, by default it will look for requirements.txt in the root of the repo (the actual repo).
That said in code you can specify the requirements .txt:
Task.force_requirements_env_freeze(requirements_file='repo/project-a/requirements.txt') task = Task.init(...)Notice, you need to call it prior to the Task.init call

3 years ago

0 Hi, How Can I Use

Hi ClumsyElephant70
So do you need both requirements.txt combined ?
How will the agent be able to reproduce both repo on the remote machine ?

3 years ago

0 Hi, How Can I Use

Does that help ?

3 years ago

0 Hi, I Have Quite A Generic Question. Basically, I Am Picking Your Brains For Any Solution. Our Current Pipeline Has (Clearml-Data, Clearml And Seldon). We Were Looking For Some Workflow Orchestrator To Stitch Them Up. One Scenario:

DeliciousBluewhale87 fyi, the new version of the pipeline (hopefully pushed towards the end of this week), will allow you to more easily write steps as functions (not only as Tasks, as the current implementation)
Also check the new Trigger and Scheduler both intended to trigger these pipelines:
https://github.com/allegroai/clearml/blob/fe3c481c37e70881c44d67c1cf9bbce00a84747e/clearml/automation/scheduler.py#L457
https://github.com/allegroai/clearml/blob/fe3c481c37e70881c44d67c1cf9bbce00a8...

3 years ago

we can add non-clearml code as a step in the pipeline controller.

Yes 🙂 , btw you can kind of already do that, with pre/post function callbacks (notice they are running from the same scope as the actual pipeline controller).
What exactly did you have in mind to put there ?

3 years ago

0 Hey, Could You Help Me? I’Ve Tried Update Clearml-Server In K8S Old And New Clearml In The Different Namespaces, But After Migrate I Got The Error Error 101 : Inconsistent Data Encountered In Document: Document=Output, Field=Model How It Fix?

ResponsiveCamel97
could you attach the full log?

3 years ago

Error 101 : Inconsistent data encountered in document: document=Output, field=model

Okay this point to a migration issue from 0.17 to 1.0
First try to upgrade to 1.0 then to 1.0.2
(I would also upgrade a single apiserver instance, once it is done, then you can spin the rest)
Make sense ?

3 years ago

HI ResponsiveCamel97
What's the clearml-server version? How do you spin the server on your k8s cluster, helm ?

3 years ago

0 Currently Trying To Figure Out How To Extend Clearml'S Automagical Reporting To Joeynmt.

Hi SmallDeer34
ClearML automagical logging will work on the current python process. But in your example yyour Bash is running another python script (that has nothing to do with the original notebook), hence clearml automagic is not aware of it (i.e. it cannot "patch" the tensorboard calls).
In order to make it work.
you should do something like:
from joeynmt import train train.main(...)Or something similar 🙂
Make sense ?

3 years ago

0 When Launching A Task To Trains Agent, I'M Having Trouble Getting The Imports From Other Files Working Correctly. For Instance, If My Task Imports A Function From Another File Within The Same Git Repo [

Hi GiddyTurkey39
First, yes you can just edit the "installed packages" section and add any missing package (this is equal to requirements.txt)
I wonder why trains failed detecting the "bigquery" package in the first place... Any thoughts ?

3 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

pytorch DDP

with what backend ? gloo ? nvcc ? openmpi ?

3 years ago

0 Hey, We'Ve Experienced Some Issues With Clearml Trigger Schedulers We Were Playing With In The Last Few Days. This Is What Happened:

This is odd... can you post the entire trigger code ?
also what's the clearml version?

one year ago

0 Is There An Efficient Way To Query All Unique Models (Ie Excluding Versions) In A Project?

What do you mean? every Model has a unique ID, what do you consider a version?

3 years ago

0 Hi Guys, I Have Many Questions To Ask, Sorry If This Questions Were Posted Already - If The Answer Exist, Please, Point Me To It. Thank You For Your Help. I'M Training Object Detection Model Using Tf 2.3 Object Detection Api And Use Clearml On Local Serve

MagnificentSeaurchin79 no need for the detection api (yes definitely a mess to setup), it will be more helpful to get a toy example.

3 years ago

0 Well, This Is My Question... I'M Trying To Adapt Clearml To Aws Using Basically Ecs Fargate + Documentdb + Aws Es + Elasticache + Efs. I Could Start The Fileserver Component, But Now I'M Trying To Start The Api Server And Is Not Working, Before Stop The T

The only important for me is to know if exist anyway to get more information in the apiserver log

what do you mean by that ?

3 years ago

0 Hi, Is There An Equivalent For Set_Name To Change The Task'S Project Name? I'M Stuck In A Loop, I Have To Run Task.Init Right At The Start Of The File Because I Give It

SmarmySeaurchin8 regarding the original question:
task.set_project(project_id)Task.get_projects() to get all the project names/ids

3 years ago

ClearML maintains a github action that sets up a dummy clearml-server,

You have one, it's the http://app.clear.ml (not a dummy one, but for this purpose it will work)
thoughts ?

2 years ago

0 I Have A Questions About Queue Priorities With Clearml-Agent. I Have Two Queues,

Hi ReassuredTiger98
Agent's queue priory can be translated to the order the agent will pull jobs from.
Now let's assume we have two agents with priorities A,B for one and B,A for the other. If we only push a Task to queue A, and both agents are idle (implying queue B is empty), there is no guarantee which one will pull the job.
Does that make sense ?
What is the use-case you are trying to solve/optimize for ?

3 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

Thanks!

3 years ago

Show more results