Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
49 Questions, 8126 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

25 × Eureka!
0 Hi, I'M Using Huggingface Trainer, Is There A Way To Capture Grad_Norm Per Layer? Thanks!

Hi @<1558624430622511104:profile|PanickyBee11>
You mean this is not automatically logged? do you have a callback that logs it in HF?

9 months ago
0 Hi, Does Anyone Know Where Trains Stores Tensorboard Data? Because I Am Used To Using Tensorboard To Record Experimental Data And Store Data, I Hope I Can Access The Folder Where Tensorboard Stores Data When I Use Command Like

Hi FierceFly22

Hi, does anyone know where trains stores tensorboard data

Tesnorboard data is stored wherever you point your file-writer to πŸ™‚
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...

5 years ago
0 Web Server Ui Bug? When Trying To Extend The Width Of A Column In The Experiments Table, If You Try To Extend It More Then The Width Of The Column To The Right, It Doesn'T Do Anything..

Wait, how do I reproduce it on community server? Maybe it has something to do with number of columns ? Or whether it is already wider than the screen? What's your browser / OS ?

4 years ago
0 Hi, I'M On A Machine That Normally Connects To Storage Using

Hi WittyOwl57
That's actually how it works (original idea/design was borrowed from libclound), basically you need to create a Drive, then the storage manger will use it.
Abstract class here:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L51
Is this what you had in mind ?

4 years ago
0 With

I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything.

Sure man πŸ™‚ no rush, I appreciate the gesture regardless of the outcome
Many thanks!

4 years ago
0 Hey, Just Trying Out Clearml-Serving And Getting The Following Error

I can raise this as an issue on the repo if that is useful?

I think this is a good idea, at least increased visibility πŸ™‚
Please do πŸ™

3 years ago
0 Hello. Recently Installed Packages Behavior Has Been Changed. Previously It Was Following: 1. If Installed Packages Is Empty, Packages Should Be Installed From Requirements.Txt. 2. If Installed Packages Is Not Empty, They Should Be Installed. Now It'S Fol

Hi ItchyJellyfish73
The behavior should not have changed.

"force_repo_requirements_txt" was always a "catch all option" to set a behavior for an agent, but should generally be avoided

That said, I think there was an issue with v1.0 (cleaml-server) where when you cleared the "Installed Packages" it did not actually cleared it, but set it to empty.
It sounds like the issue you are describing.
Could you upgrade the clearml-server and test?

4 years ago
0 When Launching A Task To Trains Agent, I'M Having Trouble Getting The Imports From Other Files Working Correctly. For Instance, If My Task Imports A Function From Another File Within The Same Git Repo [

Hi GiddyTurkey39
First, yes you can just edit the "installed packages" section and add any missing package (this is equal to requirements.txt)
I wonder why trains failed detecting the "bigquery" package in the first place... Any thoughts ?

5 years ago
0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

With offline mode,
Later if you need you can actually import the execution (including artifacts etc.) you just need the zip file it creates when you are done.

5 years ago
0 Is There Any Way To, Like, Load-Balance Automatically? Like, On The User End Can I Just Specify An Amount Of Gb I Think I Will Need, And It Goes And Picks A Queue For Me Based On That? Like, Let'S Say I Want "A 15Gb Gpu Or Better" And There'S 4 Queues, Tw

OK, so if I've got, like, 2x16GB GPUs ...

You could do:
clearml-agent daemon --queue "2xGPU_32gb" --gpus 0,1Which will always use the two gpus for every Task it pulls

Or you could do:
clearml-agent daemon --queue "1xGPU_16gb" --gpus 0 clearml-agent daemon --queue "1xGPU_16gb" --gpus 1Which will have two agents, one per GPU (with 16gb per Task it runs)

Or
clearml-agent daemon --queue "2xGPU_32gb" "1xGPU_16gb" --gpus 0,1Which will first pull Tasks from the "2xGPU_32gb" qu...

4 years ago
0 Hey There, I Would Like To Increase The

Set it on the PID of the agent process itself (i.e. the clearml-agent python process)

4 years ago
0 Hi, I Have One Doubt Related To Pipeline I Have One Pipeline With Eg 3 Tasks, Preprocess, Train And Test Now I Want To Clone The Pipeline And Change The Hyperparameters Of Train Task, Is It Possible? If So, How??

like this.. But when I am cloning the pipeline and changing the parameters, it is running on default parameters, given when pipeline was 1st run

Just making sure, you are running the cloned pipeline with an agent. correct?
What is the clearml version you are using?
Is this reproducible with the pipeline example ?

2 years ago
0 Hi, Are The Experiments Logs Stored In S3 Or In The Trains-Server? (When Using S3 As Artifact Storage)

Hi JitteryCoyote63

experiments logs ...

You mean the console outputs ?

4 years ago
0 I Am Working Up With The Autoscaler, After Setting Up The Autoscaler Instance I Am Getting The Following Error When I Launch The Autoscaler Googleapiclient.Errors.Httperror: <Httperror 404 When Requesting

Thanks!
Hmm from here : None
Could it be you do not have privileges to the resource, or that you did not provide credentials ?
Did that autoscaler work before ?

2 years ago
0 Hi, I Have One Doubt Related To Pipeline I Have One Pipeline With Eg 3 Tasks, Preprocess, Train And Test Now I Want To Clone The Pipeline And Change The Hyperparameters Of Train Task, Is It Possible? If So, How??

@<1585078763312386048:profile|ArrogantButterfly10> could it be that in the "base task" of the pipeline step, you do not have any hyper-parameter ? (I mean the Task that the pipeline clones and is supposed to set new hyperparameters for...)

2 years ago
0 Hello World,

Hi PerplexedGoat65

it appears, in a practical sense, this means to mount the second drive, and then bind them in ClearML’s configuration

Yes, the entire data folder (reason is, if you loose it, you loose all the server storage / artifacts)

Also, thinking about Docker and slower access speed for Docker mounts and such,

If the host OS is linux, you have nothing to worry about, speed will be the same.

3 years ago
0 Is There Any Simple Way To Orchestrate A Batch To Train A Model With Different Features (In Order To Do Feature Selection, For Example) Through A Single .Py File? I Saw The Following Example

Could I just build it and log these parameters using

task.set_parameters()

so that I call

task.get_parameters()

later?

instead of manually calling set/get, you call task.connect(some_dict_or_object) , it does both:
When running manually (i.e. without an agent) it logs the keys/values on the Task,
when running with an agents, it takes the values from the backend (Task) and sets them on the dict/object
Make sense ?

3 years ago
0 Hi Guys! Trains Monitor: Could Not Detect Iteration Reporting, Falling Back To Iterations As Seconds-From-Start. What Happened?

Hi ItchyHippopotamus18
The iteration reporting is automatically detected if you are using tensorboard, matplotlib, or explicitly with trains.Logger
I'm assuming there were no reports, so the monitoring falls back to report every 30seconds where the iterations are seconds from start" (the thing is, this is a time series, so you have to have an X axis...)
Make sense ?

5 years ago
0 When Launching A Task To Trains Agent, I'M Having Trouble Getting The Imports From Other Files Working Correctly. For Instance, If My Task Imports A Function From Another File Within The Same Git Repo [

GiddyTurkey39

as others will also be running the same scripts from their own local development machine

Which would mean trains ` will update the installed packages, no?

his is why I was inquiring about theΒ 

requirements.txt

Β file,

My apologies, of course this is supported πŸ™‚
If you have no "installed packages" (i.e. the field is empty in the UI) the trains-agent will revert to installing the requirements.txt from the git repo itself, then it...

5 years ago
0 Hi Guys, I Have Been Running The Clearml-Serving For A While Now And I Realize That From Time To Time After A Couple Of Hours The Serving Task (Control Plane) That Is Configured Through The Cli Goes Into Status Abort. This Happens Even Though All The Pods

Hi @<1569858449813016576:profile|JumpyRaven4>
What's the clearml-serving version you are running ?

This happens even though all the pods are healthy and the endpoints are processing correctly.

The serving pods are supposed to ping "I'm alive" and that should verify the serving control plan is alive.
Could it be no requests are being served ?

one year ago
0 Is It Possible To Avoid The Clearml-Agent For Local Installations, And Have The File Server Automatically Use An S3 Bucket? I'Ve Found

maybe we should add some ENV setting it? (I'm not sure we should disable SSL for all S3 connections... so somehow specify the mino it should use http with)

3 years ago
0 Hi, I’M Training On Multi-Node, Clearml Captures Only A Single Machine Utility (Memory/Cpu/Etc.). I Assume It Captures Node 0. Is There A Way To Make It Report All Nodes?

multiple machines and reporting to the same task.

Out of curiosity , how do you launch it on multiple machines?

reporting to the same task.

So the "funny" think is, they all report on on top (overwriting) the other...
In order for them to report individually, it might be that you need multiple Tasks (i.e. one per machine)
Maybe we could somehow have prefix with rank on the cpu/network etc?! or should it be a different "title", wdyt?

2 years ago
0 Greetings Everyone, In The Course Of My Work, I Utilize A Particular Library That Necessitates More Than Just A Simple Clone And Dependency Installation Procedure. It Also Requires The Cloning Of An Additional Repository, Along With Its Installation, And

Oh if this is the case, then by all means push it into your Task's docker_setup_bash_script
It does not seem to have to be done after the git clone, the only part the I can see is setting the PYTHONPATH to the additional repo you are pulling, and that should work.
The main hurdle might be passing credentials to git, but if you are using SSH it should be transparent
wdyt?

2 years ago
0 Hey There, I Would Like To Increase The

I think this should work 🀞

4 years ago
Show more results compactanswers