DilapidatedParrot58

42 Questions, 205 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

186 × Eureka!

Answers 205

0 I Keep Getting Errors When Trying To Compare A Lot Of Experiments At The Same Time (>10). What'S Evern Worse Is That Trains Start Working Much Slower In General After These Attempts, The Only Way To Fix It Is To Restart The Whole Thing. Would Getting Bett

Error
Failed to get Scalar Charts

4 years ago

0 Is Is Possible To Pass Custom

this is probably what I need, thanks. I'll check if it works

2 years ago

0 Is Is Possible To Pass Custom

agent.hide_docker_command_env_vars.extra_keys: ["DB_PASSWORD=password"]

like this? or ["DB_PASSWORD", "password"]

2 years ago

0 Is Is Possible To Pass Custom

ah, I see, I still keep it in agent.extra_docker_arguments

2 years ago

0 There Is Something Weird Going On With Console Log After Latest Updates Of Clearml Server. It Doesn'T Show The Latest Updates, Instead It Often Jumps To The Seemingly Random Parts Of The Console Output

nice, thanks

2 years ago

0 Hey Guys, Here I Am Again With Another Question

fantastic, everything is working perfectly
thanks guys

4 years ago

thanks! this bug and cloning problem seem to be fixed

2 years ago

0 Downloading Output Artifacts From S3 By Clicking On The Download Button Next To Model Url Was Great, But Since We Moved From Aws To Yandex.Cloud, This Feature Doesn'T Work. Any Chance You Could Support Other Cloud Providers?

okay, I will try it with port

2 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

maybe I should use explicit reporting instead of Tensorboard

3 years ago

0 Hey Guys, Do You Have Any Plans To Add Functionality To Export Training Config With All Hyperparameters To The Different Formats, Such As Training Command Line Command, Yaml, Etc.?

yeah, I am aware of trains-agent, we are planning to start using it soon, but still, copying original training command would be useful

5 years ago

okay, I will open an issue

3 years ago

0 I’M Interested In Learning More About Internals Of Clearml Server - For Example, How Elasticsearch, Mongodb, And Redis Are Used Internally. Are There Any Materials Available?

I guess I could manually explore different containers and their content 😃 as far as I remember, I had to update Elastic records when we moved to the new cloud provider in order to update model URLs

2 years ago

0 With Clearml 1.0 It Seems That Console Logs Are Only Shown In The Web Ui When The Task Has Finished. Is This Expected Behaviour? With Previous Versions I Was Able To See "Live" Output. I Tested This With The Pytorch_Tensorboardx.Py Example. I Run The Scri

same here, no live logs

3 years ago

give me a sec

3 years ago

task = Task.get_task(task_id = args.task_id)
task.mark_started()
task.set_parameters_as_dict(
{
"General": {
"checkpoint_file": model.url,
"restart_optimizer": False,
}
}
)
task.set_initial_iteration(0)
task.mark_stopped()
Task.enqueue(task = task, queue_name = task.data.execution.queue)

3 years ago

yes. we upload artifacts to Yandex.Cloud S3 using ClearML. we set " s3://storage.yandexcloud.net/clearml-models " as output uri parameter and add this section to the config:
{
host: " http://storage.yandexcloud.net "
key: "KEY"
secret:"SECRET_KEY",
secure: true
}

this works like a charm. but download button in UI is not working

2 years ago

not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer

3 years ago

0 We Have A Use Case Where An Experiment Consists Of Multiple Docker Containers. For Example, One Container Works On Cpu Machine, Preprocesses Images And Puts Them Into Queue. The Second One (Main One) Resides On Gpu Machine, Reads Tensors And Targets From

yeah, we've used pipelines in other scenarios. might be a good fit here. thanks!

2 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I added the link just in case anyway 😃

also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)

4 years ago

yeah, server (1.0.0) and client (1.0.1)

3 years ago

0 When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

exactly

4 years ago

yes

2 years ago

0 Hey Guys, I Keep Getting

do you have any idea why cleanup task keeps failing then (it used to work before the update)

4 years ago

0 Is There Any Way To Post Slack Alerts For The Frozen Experiments? (Eg, After Server Restart They Sometimes Get Stuck In Running Mode, Or