AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Is There A Nicer Way To Program The Color For Report_Scalar? By Default It Use A Color Scheme That Is Very Hard To Compare When I Have Multiple Lines. I Can Change It Manually But I Do Not Want To Repeat It For Every Experiment.

Hi EnviousStarfish54
Color coding on the entire UI is stored per user (I think that on your local cookies, but I might be wrong). Anyhow any title/series combination will have the select color regardless of the project.
This way you can configure once that loss is red and accuracy is green, etc.

4 years ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

Yep 🙂

2 years ago

0 Hi. First Time User Here

It seems the code is trying to access an s3 bucket, could that be the case? PanickyMoth78 any chance you can post the full execution log? (Feel free to DM so it won't end up being public)

2 years ago

0 Given I Want To Run A Task In A Pipeline Using A Base Task Id. One Of My Steps Just Finds The Latest Model To Use. I Want The Task To Output The Id, And The Next Step To Use It. How Would I Go About Doing This?

but I can't seem to figure out a way to do something similar using a task in add_step

VexedCat68 With "add_step" it assumes the Task you are adding is self contained (i.e. there is no "return object" to serialize), this means you can only add arguments, or use the artifacts the Task (i.e. step) will recreate, obviously you knowing in advance what the step creates. Make sense ?

2 years ago

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

oh...so is this a bug?

It was always a bug, only an elusive one 😉
Anyhow, I'll make sure we push a fix to GitHub, an RC is planned for later this week, it will contain it

3 years ago

0 In Order For A New Worker To Come Online In My K8 Cluster, Do I Need To Have An Ec2 Startup Script Init The Agent/Config, And Then Start The Daemon? Do I Have To Do This Manually Is This A Better Way?

yea, does the enterprise version have more functionality like this?

yes, all sorts of bit and pieces for easier DevOps / K8s etc.

2 years ago

0 What Is

With

pipe.start(queue='services')

, it still tries to run some docker for some reason

The services agent is always running with --docker:
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docker/services/entrypoint.sh#L16
Actually I think we should have it as an argument, so it is easier to control from docker-compose

I'll be waiting for the full log to check the "git clone" issue

3 years ago

0 Hi, How Could I Know That "Task.Init" Find My "Clearml.Conf" File? I Executed

What are you seeing?

3 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

Yes 🙂
BTW: do you guys do remote machine development (i.e. Jupyter / vscode-server) ?

3 years ago

0 So I'Ve Install Allegro On Kubernetes Using Helm, How To I Perform

SubstantialElk6 on the client side?

4 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

at that point we define a queue and the agents will take care of training

This is my preferred way as well :)

3 years ago

0 Hello Guys, I Have A Strange Situation With A Pipeline Controller I'M Testing Atm. If I Run The Controller Directly In My Pycharm On Notebook It Connects Correctly To The K8S Cluster With Trains Installed. After This, If I Go Directly In The Ui, I Reset T

what is?

3 years ago

0 Hey All, Quick Question About Pipeline Execution Queues. I Set The

This workflow however is the only way I have found to easily fix my previous ‘Module not found’ errors

Hmm okay make sense,
Did you try to set these ?
or even hack the sys.path with something like
import sys, os sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)+"/../")

one year ago

0 In Order To Use The Aws Autoscaling, With Spot And Without Spot Instances - Should We Create A Custom Policy With The Associated Iam Or Will One Of The Two Aws Managed Policies (Or Both) Will Suffice?

WackyRabbit7 you can configure AWS autoscaler with two types of instances , with priority to one of them. So in theory you do not need two autoscaler processes, with that in mind I "think" single IAM should suffice

4 years ago

0 Hi There, I Used

JitteryCoyote63 Are you calling clearml-task with --skip-task-init ?

2 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

I just set the git credentials in the

clearml.conf

and it works out of the box

git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.

Nope, no

.netrc

defined anywhere, ...

If this is the case can you try to add the following to your "extra_vm_bash_script"
` echo machine example.com > ~/.netrc && echo log...

2 years ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Did you you set 'force_git_ssh_protocol: true '?
https://github.com/allegroai/clearml-agent/blob/249b51a31bee97d63f41c6d5542e657962008b68/docs/clearml.conf#L39

one year ago

0 Given These Are Settled.. Another Question I Have Is About The Job Scheduling Based On Cron Style.. E.G. Run Training Every Night At 2 Am Etc.

Meanwhile you can just sleep for 24hours and put it all on the services queue. it should work 🙂
Example here:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py

4 years ago

0 Anyone Seeing These Errors?

is your server

you mean the app.clear.ml ?

2 years ago

0 Hey Since Hydra Does Not Work With

Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass
--args overrides="['dataset.path=abcd']"

one year ago

0 Hi

LOL

3 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

Hi PanickyMoth78 , an RC is out with a fix.
pip install clearml==1.6.3rc0
Thank you for noticing the graph issue.
Btw do notice that since data is being changed inside the controller loop the parents are still kind of odd, because it is not clear to the logic the source of the data so it assumes it depends on the current state (i.e. all the leaves)

2 years ago

0 What Could Be The Reason For Fail Status Of A Task That Seems To Have Completed Correctly? No Information In The Log Whatsoever

Okay, could you try to run again with the latest clearml package from github?
pip install -U git+

3 years ago

0 And If Allegros Trains Doc On Github As Well? I Found Some Documentation Are Wrong And Would Like To Make Prs Along The Way.

Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).

4 years ago

0 Hi All, Is There A Way To Clone A Pipeline From The Web Ui Like You Can With A Task? The Goal Is To Be Able To Give The Cloned Version A Different Name So I Can Organize Pipeline Runs By Project.

Hi @<1533620191232004096:profile|NuttyLobster9>

Hi All, is there a way to clone a pipeline from the web UI like you can with a task?

Right click on the pipeline and select Run (it is basically the same thing as cloning it)

9 months ago

0 Trains Seems To Fail To Capture My Conda Environment, Any Idea? Os: Window 10

EnviousStarfish54 something is also off in the git detection, it has not remote address, it just says "origin"
Any chance you have no git server ?
Regrading the installed packages, any chance you can send a sample code for me to debug ?

4 years ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

If the same happens in venv mode, see if pip process actually finished (you can find it with ps -Af | grep pip )

2 years ago

0 I Seem To Be Missing Something ... I'Ve Only Got One Task Running To Train A Segmentation Model On My Local Machine, And In A Few Days It'S Hit Over 1.15M Api Calls. It Looks Like It'S Sending Every Single Console Output ... Are There Settings To Control

each epoch runs about 55 minutes, and that screenshot I posted earlier kind of show the logs for the rest of the info being output, if you wanted to check that out

I thought you disabled the stdout log. no?

Maybe ClearML is using

tensorboard

in ways that I can fine tune? I

You can open your TB and see, every report there is logged into clearml

one year ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

AdventurousButterfly15 this one is quite self container:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py

So I guess pip install finished working
But the task is evidently not being executed.

This is very odd ... you can run the agent with debugging with --debug --foreground to see all the outputs and logs

2 years ago

0 Any Plans To Add Unpublished State For Clearml-Serving?

There are also "completed, aborted, queued" .
Archived is actually a tag (system tag, not user tag). There is a "state machines" of moving from one state to the other. The special case is "published" that we probably should have called "locked". The idea is that if a Task/Model is published, you cannot reset it (and even deleting requires force flag).
I would use additional user tags (or even system-tags) to mark "deployed" state, wdyt?

2 years ago

Show more results