AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi, Another Question If You May. Is It Possible To Edit A Logged Task? For Instance - Remove All The Metrics From Some Step Onward?

I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')

get the second from last checkpoint

task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()

do some fpr loop and report the prev_scalars with logger.report_scalars

new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)

start the train `

4 years ago

0 Hi, Is There Any Way To Get Experiment Debug Images Programmatically?

Basically try with the latest RC 🙂
pip install trains 0.15.2rc0

5 years ago

0 Well, This Is My Question... I'M Trying To Adapt Clearml To Aws Using Basically Ecs Fargate + Documentdb + Aws Es + Elasticache + Efs. I Could Start The Fileserver Component, But Now I'M Trying To Start The Api Server And Is Not Working, Before Stop The T

SubstantialKoala71 not sure I follow, what's the goal here ?

4 years ago

0 + Side Question - Any Plans To Include Native Support For

We should update the readme 🙂

4 years ago

0 Hi Great Trains Community! I Have A Question Regarding Version Control. How Trains Manages Model/Dataset Version Control?

is the model overridden or its version is automatically increased?

You will have another model, with the same name (assuming the second Task has the same name), but a new ID. So if I understand you correctly, we have auto-versioning :)

5 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

DilapidatedDucks58 trains-agent adds the artifactory URL as --extra-index-url , are you sure you are getting the correct torch version in the container? because the torch html is not an artifactory html, it is a list of links, I just want to make sure you are getting the correct version, because otherwise it can default to the CPU version, which we don't want 🙂 anyhow you can use the direct link in the "installed packages and just put there " https://download.pytorch.org/whl/nightly/cu101...

5 years ago

0 Hi, I Run 'Manually' On My Local Machine With No Errors. Then, I Clone The Completed Task And Enqueue It. I Get To Stage When 'Environment Setup Completed Successfully'. But Right After I Get An Error Related To 'Connect' Method - Task.Connect(Config.Mode

@<1571308003204796416:profile|HollowPeacock58> seems like an internal issue copying this object config.model
This is a complex object, and it seems that for some reason
None

As a workaround just do not connect this object. it seems you cannot pickle it / copy it (see GH issue)

2 years ago

0 Hi, I Would Like To Pass In Some Pip Arguments That Clearml-Agent Would Include When Setting Up The Venv On The Containers. How Should I Specify This? The Argument In Question Are --Trusted-Host And --Find-Links . I Need Them As I'Ve Installed A Pypi Repo

Hmm, I think you should use --template-yaml

4 years ago

0 Hi There! Is There Any Way To Boost Creating Sha2 Hashes During

It uses only one CPU core, could I use multiprocessing somehow?

Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?

3 years ago

0 Hey, I Hope This Is The Right Place To Ask. We'Re A Small Data Science Team That Wants To Log Everything About Our Ml Models. Looking Around On The Internet, Mostly Mlflow Is Being Recommended, But Occasionally The Name Trains Pop-Ups. According To You,

JitteryCoyote63

I agree that its name is not search-engine friendly,

LOL 😄
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI 🚆 😉
Anyhow, this one definitely backfired...

5 years ago

0 Hi, Recently Came Across Trains And Very Impressed By The Work So Far. But A Problem Has Been Bugging Me, This Is Part Of The Trains Log Files I Thought Might Be Useful From Cloning And Enqueuing The Same Task On 2 Remote Machines. The First Machine Defau

AntsySeagull45 kudos on sorting it out 🙂
quick note, trains-agent will try to run the python version specified by the original Task. i.e. if you were running python3.7 it will first try to look for python 3.7 then if it is not there it will run the default python3. This allows a system with multiple python versions to run exactly the python version you had on your original machine. The fact that it was trying to run python2 is quite odd, one explanation I can think of is if the original e...

5 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

Ohh now I get it...
Wait a couple of hours, 0.16 is out today with trains-agent --stop flag 🙂

5 years ago

0 Hi I Have An Issue Where Experiments Are All Showing That They Started From Iteration 0. This Is Even True For Experiments Which I Know Used To Show The Correct Iteration, So It Seems To Be Due To An Update Of The Web Interface. Here You Can See That Sup

MassiveHippopotamus56
the "iteration" entry is actually the "max reported iteration over all graphs" per graph there is different max iteration. Make sense ?

3 years ago

0 How Can I Log My Configuration Like This? I Have A Dict Params = {'Data':{'Data_Key':123}, 'Model':{'Model_Key':123}}, But It Become Data/Datakey Instead Of An Foldable Config. In Addition, I Don'T Want To Name It As "General", Where Can I Change It?

EnviousStarfish54 generally speaking the hyper parameters are flat key/value pairs. you can have as many sections as you like, but inside each section, key/value pairs. If you pass a nested dict, it will be stored as path/to/key:value (as you witnessed).
If you need to store a more complicated configuration dict (nesting, lists etc), use the connect_configuration, it will convert your dict to text (in HOCON format) and store that.
In both cases you can edit the configuration and then when ru...

5 years ago

0 Hey

When you login with user/pass in the UI the same "process" happens and you get a Token to work with, this is the same as secret/key
Since in both cases you provide credentials and get back access token, it should work
(This is of course only if you are setting user/pass manually and disabling pass_hashed as you have)

one year ago

0 Hey, What Is The Exact Difference Between

It should work 🙂 as long as the versions match, if they don't the venv will install the version you need (which is great, only penalty is the install, download wise it will be cached)

5 years ago

0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

I think I found something, let me dig deeper 🙂

5 years ago

0 Hi, I Think I'Ve Found A Strange Bug In Scheduler. I Would Like To Run Some Job Every Monday At 15:40 Utc. So I Run This At 15:35:

Any chance you can test with the latest RC ? 1.8.4rc2

2 years ago

0 Hello Folks! We Have Started Using Clearml In Kubernetes. The Trainings Are Run In K8S With Help Of K8Sintegration And Some Custom Coding. Now For The Clearml-Session Tasks, A Port-Forward Should Be Done Each Time If I Need To Access The Jupyter Notebook

Hi DisgustedDove53

Now for the clearml-session tasks, a port-forward should be done each time if I need to access the Jupyter notebook UI for example.

So basically this is why the k8s glue has --ports-mode.
Essentially you setup a k8s service (doing the ingest TCP ports) then the template.yaml that is used by the k8s glue should specify said service. Then the clearml-session knows how to access the actual pod, by a the parameters the k8s glue sets on the Task.
Make sense ?

4 years ago

0 Hi! I Was Wondering Why Clearml Recognize Scikit-Learn Scalers As Input Models... Am I Missing Something Here? For Me It Would Make Sense To Include The Scalers As A Configuration Object Of The Trained Model, Not Outside

GiganticTurtle0 is it just --stop that throws this error ?
btw: if you add --queue default to the command line I assume it will work, the thing is , without --queue it will look for any queue with the "default" tag on it, since there are none, we get the error.
regardless that should not happen with --stop I will make sure we fix it

Just so we do not forget, can you please open an issue on clearml-agent github ?

3 years ago

0 Does The New 2.0 Helm Charts (App Ver 1.1.0) Not Support Nfs?

neat! please update on your progress, maybe we should add an upgrade section once you have the details worked out

4 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

I think you are correct, it seems like it is missing requirements to boto/azure/google (I will make sure this is added). In the meantime, you can stop the "triton serving engine" Task, reset it, add boto3 to the installed packages and relaunch.
That said your main issue might be packaging the python model. Basically you need to create a model from the entire folder (with whatever there is inside the folder), then Triton should be able to run it (if the config.pbtxt is correct).
` m = OutputMo...

4 years ago

0 After I Finish Training A Model, I Want To Call Logger.Report_Scalars To Help Monitor Inferencing Status (We Do A Lot Of Batch) But After The Model Finishes Training, Scalars Are No Longer Accepted By The Task As It Is Considered Completed. Help!

Could it be that this is the callback that causes it?
None

2 years ago

0 Hi, Together With

To be honest, I'm not sure I have a good explanation on why ... (unless on some scenarios an exception was thrown and caught silently and caused it)

5 years ago

0 How Can I Integrate Trains-Server To Aws Ec2 Api

Hi AstonishingSwan80 , what do you mean by "ec2 API"?

5 years ago

I added the link just in case anyway

Smart move :)

DilapidatedDucks58 , Of course there is 🙂 actually with the latest pip 20.1 and the next RC it will be automatically detected and put into "installed package"

You can treat the "installed packages" just like you would any other "requirements.txt", just add:
git+ https://github.com/ ... and you are good to go

5 years ago

0 Hi Again. As I Am Running My Experiment From Server Using Agent, I Am Failing On The Point, Where The Arguments Of Argparse Are Processed. When Is The Agent Task Registered. I Am Getting None For Task.Current_Task() At The Begining Of My Script.

Hi WorriedParrot51
Let me shed some light on this complicated mechanism, because this is not very straight forward.
Basically the agent signals the trains package it should ignore the code calls, and use a specific Task in the backend (i.e. if in manual mode, the trains package logs the data into the trains-server, in agent mode (remote mode), it does the opposite and takes the data from the trains-server "into" the code)

Specifically, just like in manual mode, calling argparse.parse is be...

5 years ago

or shall I call the Task.init even from the agent

WorriedParrot51 I think something is lost here.
Task.init() is always called, even when the agent is executing the code. The difference is in what happens inside the Task.init() call. When the codebase itself is executed by the trains-agent, it signals through OS environment to the task.init() that instead of a new created task, it should use the already created one. from this point all data flows from the trains-server back into the c...

5 years ago

0 Hi There, I'Ve Encountered A Problematic Behavior In Python. When Defining An Argument A Default Value Of

I mean , the python package, not the trains-server version

5 years ago

0 Getting An Odd Error When Trying To Open My Plots (See Picture Attached) Also, Not Able To Save Any Plots To Trains

CloudyHamster42 what's the trains-server version ?

5 years ago

Show more results