AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi There, I'Ve Encountered A Problematic Behavior In Python. When Defining An Argument A Default Value Of

I mean , the python package, not the trains-server version

4 years ago

0 Hi Everyone! Quick Question: I Have A Script That Allows The Model To Be Saved Out In Case Of An Early Exit. At The Moment The Script Is Catching The Sigint And Sigterm Signals, Ending The Training And Writing Out The Model. I Understand I Could Use Check

Many thanks 🙂

4 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

Logger.current_logger()Will return the logger for the "main" Task.
The "Main" task is the task of this process, a singleton for the process.
All other instances create Task object. you can have multiple Task objects and log different things to them, but you can only have a single "main" Task (the one created with Task.init).
All the auto-magic stuff is logged automatically to the "main" task.
Make sense ?

4 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

Should work out of the box, as long as the task was started. You can forcefully start the task with:
task.mark_started()

4 years ago

0 Hi Guys, Until Today I Always Requested Data Scientists To Use Cli To Create Tasks. After That I Usually Reconfigure Them So They Can Be Pointed On Git Repo And So On. Unfortunately This Is Becoming A Big Task Since Now We Have Pipelines With Many Tasks A

maybe you can check also

--version

that returns the helm menu

What do you mean? --version on cleaml-task ?

3 years ago

0 So I'M In A Colab Notebook, And After Running My Trainer(), How Do I Upload My Test Metrics To Clearml? Clearml Caught These Metrics And Uploaded Them:

(second cell)

3 years ago

0 Is It Possible To Increase The Polling Interval For K8S Glue? Currently It Is 5 Seconds I Believe. Would Adding An Argument For It Help? Can Do A Pr If So

Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:

Run with --debug as the first parameter
Are you running the latest from the git repo ?

3 years ago

0 Hi, We Are Having Issues With Clearml-Session For Vscode. Apparently It'S Hardcoded To Download From

I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.

That makes sense, it downloads the vscode in runtime, do you have an alternative location? or maybe it is easier to built a container with the vscode pre installed ?

3 years ago

0 Hi, I Am Trying To Run A Task In An Agent From A Repository With An

Notice this is only when:
Using Conda as package manager in the agent the requested python version is already installed (multiple python version installation on the same machine/container are supported)

3 years ago

0 Hi, I Am Trying To Run A Task In An Agent From A Repository With An

python version to be used and conda will install it

clearml does that automatically (albeit it is not shown in the UI, which should be fixed)

3 years ago

0 Hi, Is It Possible To Specify Per Experiment (Task In Clearml) Where The Results (Artifacts) Are Saved?

We use nifty images, except for an 3D array the image also contains voxel spacing, and origin and direction in a world frame

Yep, make sense ... you can just upload them as debug samples from local files.
I guess the main difference is the context, debug samples (used for debugging) vs artifacts (might be useful from other Tasks / context)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L36

3 years ago

0 Hi, Is It Possible To Specify Per Experiment (Task In Clearml) Where The Results (Artifacts) Are Saved?

BTW: GreasyPenguin14 you can also upload them as debug samples (when setting the output_uri, the debug samples will be uploaded to the same destination)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L21

3 years ago

0 Hi, Is It Possible To Specify Per Experiment (Task In Clearml) Where The Results (Artifacts) Are Saved?

Is it possible to get the folder with the artifacts/models? (edited)

You can directly get the artifacts/models url then deduce the folder
task = Task.get_task('my_task_id') print(task.artifacts['my artifact'].url)

3 years ago

0 Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

We are using k8s glue to spawn the job. ...

I think this is actual network latency, nothing to do with the jobs, could it be the server is very far away?
What happens when you manually start a Task from your machine ?
Is the latency fixed? Is it just when starting a new Task?

3 years ago

Hi SubstantialElk6

32 CPU cores, 64GB ram

Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded

3 years ago

0 Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

artifact I guess

3 years ago

0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

We have tried to manually restart tasks reloading all the scalars from a dead task and loading latest saved torch model.

Hi ThickKitten19
how did you try to restart them ? how are you monitoring dying instances ? where . how they are running?

2 years ago

0 Hi, Guys! Thank You A Lot For Your Great Software, But I'Ve Got A Problem. I Have Got Two Remotes: Gitlab And Gitea. The Branch From Which I Run The Code Is Upstreamed With Gitea. However, In The Clearml Experiment, Gitlab Repository Is Automatically Sele

How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?

Yes 😞 I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin

2 years ago

0 Is There A Way To Get A Task'S Docker Container Id/Name? I'M Generally Interested In Resource Profiling Of Each Container, So I Noticed I Can Use

ElegantCoyote26 could be, if the Task run is under 30sec?!

2 years ago

0 Hello! I Have A Problem With Tutorial Client Code Crashes On Starting Pipelines Remotely Via

Hi FancyWhale93
pipe.start() should actually stop the local pipeline logic execution and fire it on the "services queue".
The idea is that you can launch the pipeline locally, but the actual execution of the entire logic is remote.
You can have the pipeline running locally if you call pipe.start_locally or also run the steps locally (as sub processes) with pipe.start_locally(run_pipeline_steps_locally=False)
BTW: based on your example, a more intuitive code might be the pi...

2 years ago

0 Hi, I'M Trying To Get An Understanding Of How

Hi GiddyTurkey39 ,

When you say trains agent, are you referring to the trains agent command ...

I mean running the trains-agent daemon on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/

Is it sufficient to queue the experiments

Yes there is no ne...

4 years ago

0 Is There A Way To Programatically Define Api_Host/Web_Host?

EnviousStarfish54 you can use Use Task.set_credentials
Notice that OS environment or trains.conf will override the programmatic credentials
https://allegro.ai/docs/task.html#trains.task.Task.set_credentials

4 years ago

0 Hi, I Assume It Is Very Basic But How Can I Add The Model That Is Created In The Training To The Artifacts And To See It In The Models Tab?

Your code should have worked, i.e. you should see the 'model.h5' in the artifacts tab. What do you have there?
It should look something like this one:
https://demoapp.trains.allegro.ai/projects/531785e122644ca5b85b2e19b0321def/experiments/e185cf31b2634e95abc7f9fbdef60e0f/artifacts/output-model

BTW:
To manually register any model:

from trains import Task, OutputModel task = Task.init('examples', 'my model') OutputModel().update_weights('my_best_model.h5')

4 years ago

0 Hi, Does Trains Support Plotting Charts Like Roc Of Precision Recall Curve?

EnviousStarfish54 Sure, see scatter2d
https://allegro.ai/docs/examples/reporting/scatter_hist_confusion_mat_reporting/#2d-scatter-plots

4 years ago

0 How Come

WackyRabbit7

we did execute locally

Sure, instead of pipe.start() use pipe.start_locally(run_pipeline_steps_locally=False) , this is it 🙂

3 years ago

0 Hi, Is It Possible To Specify Per Experiment (Task In Clearml) Where The Results (Artifacts) Are Saved?

It is for storing the predictions a trained model makes, so two different models do create slightly different images

That actually makes sense.
So how would you create exactly the same file (i.e. why do you need to manually control the upload folder, wouldn't creating a new unique folder suffice ?)

3 years ago

0 Hi, Is It Possible To Specify Per Experiment (Task In Clearml) Where The Results (Artifacts) Are Saved?

. It is not possible to specify the full output destination right?

Correct 😞

3 years ago

0 Hi All, I Use .Get_Local_Copy() To Get A Local Copy For Each Of My Artifacts Logged In A Task. I Currently Have 160 Files Which I Want To Get A Local Copy. Each Artifact Is A Numpy Array (.Npz File) Uploaded Using .Upload_Artifact() Before. When I Run .Ge

Hi ScatteredClams84

Is there any parameter that adjusts the "number of files that can be stored in the cache"? I am using clearml python version 1.0.3 to upload artifacts and get the artifacts back from a task. (edited)

Yes you are correct, the default value is 100 entries.
You can configure it in the clearml.conf, just add:
sdk.storage.cache.default_cache_manager_size = 1000or from code:
` from clearml.storage.cache import CacheManager
CacheManager.get_cache_manager(cache_file_...

3 years ago

0 So, Here'S A Question. Does Clearml Automatically Save Everything Necessary To Continue Training A Pytorch Language Model? Specifically, I'Ve Been Looking At The Checkpoint Folders Created When I'M Training A Huggingface Robertaformaskedlm. I Checked What

Hi SmallDeer34
The any generally any pytorch.save(...) is logged/uploaded by clearml automatically. specifically in your case I think the only missing one is the trainer_sate.json, which I assume is general json file, and I imagine is part of huggingface framework. You can easily upload it as additional artifact with Task.upload_artifact wdyt?

3 years ago

Could I use "register artifact"

I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json every checkpoint:
Task.current_task().upload_artifact('trainer_state.json, name='state') `

3 years ago

Show more results