Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
48 Questions, 8051 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Hi, I'M Using The Autoscaler And Getting The Error

Hi CloudySwallow27

This error occurs randomly during training (in other words training does successfully start).

What's the cleamrl-agent version you are using, and the clearml version ?

2 years ago
0 Hi Everyone! Quick Question: I Have A Script That Allows The Model To Be Saved Out In Case Of An Early Exit. At The Moment The Script Is Catching The Sigint And Sigterm Signals, Ending The Training And Writing Out The Model. I Understand I Could Use Check

SillyPuppy19 yes you are correct, actually I can promise you the callback will be called from a different thread (basically the monitoring thread) so it's on the user to make sure the callback can handle it .
How about we move this discussion to GitHub?

4 years ago
0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

upload_artifact will actually do two things:
upload the file to the trains-server register it as an artifact on the experiment
What did you mean by "register the artifact manually"? You still need to upload the file to the trains-server (so it is later accessible )

4 years ago
0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

Anyhow if the StorageManager.upload was fast, the upload_artifact is calling that exact function. So I don't think we actually have an issue here. What do you think?

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

maybe I should use explicit reporting instead of Tensorboard

It will do just the same 😞

there is no method for settingΒ 

last iteration

, which is used for reporting when continuing the same task. maybe I could somehow change this value for the task?

Let me double check that...

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine ...

That is a very good point

but for the metrics, I explicitly pass th...

3 years ago
0 Hi, I Am Trying To Pull Api Data From /Tasks.Get_All Endpoint

DrabCockroach54 that is quite cool!
Basically here is what I would do
Query Tasks that are both Running and Do not have system tag "development" (that means running on agents) + filter only tasks that start say 10 min ago Go over the list and see if (1) they have GPU scalar reported (meaning GPU is accessible) (2) min/max/val of GPU utilization is under 5%wdyt?

2 years ago
0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

None of them is problematic, this is what I'm trying to say πŸ™‚
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:
task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)

4 years ago
4 years ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Thanks SparklingHedgehong28
So I think I'm missing information on what you call "Instance protection" ?
You mean like respining spot instances ? or is it away to review the performance of AWS ASG (i.e. like a watchdog of a sort) ?

2 years ago
0 Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent

2 years ago
0 I'M Having Issues Running Trains-Agent On My Aws, It Seems To Not Be Able To Install Pytorch... I Have

With pleasure, I'll make sure we officially release RC1 soon :)

4 years ago
0 I'M Having Issues Running Trains-Agent On My Aws, It Seems To Not Be Able To Install Pytorch... I Have

I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "

4 years ago
0 I'M Having Issues Running Trains-Agent On My Aws, It Seems To Not Be Able To Install Pytorch... I Have

Please send the full log, I just tested it here, and it seems to be working

4 years ago
0 Hello! I Was Hoping I Could Get Some Debug Help. I'Ve Set Up A Clearml Pipeline Using The Pipelinecontroller, And When Running Through

It just seems frozen at the place where it should be spinning up the tasks within the pipeline

And is there an agent for those ? usually there is one agent for running logic tasks (like pipelines) running with --services-mode which means multiple Tasks can be executed by the same agent. And other agents for compute Tasks that are a signle Task per agent (but you can run multiple agents on the same machine)

2 years ago
0 Is There A Way To Generate Usage Stats And Reports For Queues? For Example, How Often Is A Queue Used, How Much Cpu Does

Unfortunately not, the queues tab shows only the number of tasks, but not resources used

in the queue

Oh, yes, that makes sense to add, I like that πŸ™‚
(the main question is what data is there in the backend DBs, let me know what I can get)

2 years ago
0 Hello Everyone! I'M Currently Trying To Set Up A Pipeline, And Am A Bit Confused At A Few Things. Some Questions I Have:

SteadySeagull18 btw: in post-callback the node.job will be completed
because it is a called after the Task is completed

2 years ago
0 Hello! Is There A Way To Avoid Or Accelerate

With default settings, to upload 2 datasets of 120 GB and 70 Gb it took more than 6 hours!

SmugSnake6 at the end s the an outcome of limited bandwidth or limited CPU ?

2 years ago
0 Hi People! I Think The Clearml

The image is

allegroai/clearml:1.0.2-108

Yep, that makes sense, seems like a backwards compatibility issue

one year ago
0 Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

LOL love that approach.
Basically here is what I'm thinking,
` from clearml import Task, InputModel, OutputModel

task = Task.init(...)

run this part once

if task.running_locally():
my_auxiliary_stuff = OutputModel()
my_auxiliary_stuff.system_tags = ["DATA"]
my_auxiliary_stuff.update_weights_package(weights_path="/path/to/additional/files")
input_my_auxiliary = InputModel(model_id=my_auxiliary_stuff.id)
task.connect(input_my_auxiliary, "my_auxiliary")

task.execute_remotely()
my_a...

2 years ago
0 Hi All! Is There Any Simple Way To Use
  • Yes Task.init should be called on each subprocess (because torch forks them before they ar epatched)
  • I think the main issue is that we patch the argparse on the Subprocess (this is assuming you did not manually parse non argv argument)
  • If you can create a mock test I think we can work around the issue, as long as the way you spin it is the standard pytorch distub way
one year ago
0 Hi

πŸ‘

3 years ago
Show more results compactanswers