AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi Guys, How Does Allegro Keep Track Of The Requirements (I'M Running The Scripts On A Remote Train-Agent With

if in the "installed packages" I have all the packages installed from the requirements.txt than I guess I can clone it and use "installed packages"

After the agent finished installing the "requirements.txt" it will put back the entire "pip freeze" into the "installed packages", this means that later we will be able to fully reproduce the working environment, even if packages change (which will eventually happen as we cannot expect everyone to constantly freeze versions)

My problem...

3 years ago

0 Hi Guys, How Does Allegro Keep Track Of The Requirements (I'M Running The Scripts On A Remote Train-Agent With

Make sure you have the S3 credentials in your agent's clearml.conf :
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L210

3 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

i.e. change them per experiment ?

3 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

DefeatedCrab47 if TB has it as image, you should find it under "debug_samples" as image.
Can you locate it there ?

3 years ago

0 Another Question Is If I Have A Conda Env Available On My Workers Systemwide.. Can I Use That Env Directly When Running Tasks With

PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv

4 years ago

0 Good Morning, I Want To Verify Behaviour On Trains, If The Server Dies What Happens To All The Experiments Who Keep Trying To Write Results, Will They Get Aborted At Some Point?

YummyMoth34

It tried to upload all events and then killed the experiment

Could you send a log?
Also, what's the train package version ?

4 years ago

0 Hi, I'M Getting A Lot Of The Following Logs

PompousBeetle71 , the reason I'm asking is the warning you see is due to the fact it cannot detect the filename you are saving your model to ... I'm trying to figure out how that actually happened .
BTW: in the next version we will probably remove this warning altogether, but I'm still curious on how to reproduce 🙂

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

shows that the trains-agent is stuck running the first experiment, not

the trains_agent execute --full-monitoring --id a445e40b53c5417da1a6489aad616fee
is the second trains-agent instance running inside the docker, if the task is aborted, this process should have quit...

Any suggestions on how I can reproduce it?

4 years ago

0 Hello All, We’Re Trying To Use

Interesting, if this is the issue, a simple sleep after reporting should prove it. Wdyt?
BTW are you using the latest package? What's your OS?

one year ago

0 Hi, I'Ve Recently Upgraded To 0.15.1 From 0.14.2, And For Some Reason A Code That Previously Worked In Which I'M Getting The Tags Of A Model Using

PompousBeetle71 you can also use ModelOutput.update_weights_package to store multiple files at once (they will all be packaged into a single zip, and unpacked when you get them back via ModelInput). Would that help?

4 years ago

0 Hi! Is There A Way To Export The Credentials Of The Aws Account Only During The Creation Of The Docker? I Don’T Want Every User In My Team To Know The Credentials To Access S3 Buckets. I Just Want Them To Be Able To Write In The Bucket Without The Credent

it would be clearml-server’s job to distribute to each user internally?

So you mean the user will never know their own S3 access credentials?
Are those credentials unique per user or once"hidden" for all of them?

2 years ago

0 Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)

2 years ago

0 Hi, I'M Having Problems With The Installed Packages When Creating An Experiment. The Installed Packages Used To Be A List With The Versions Of All The Installed Packages In The Venv. However, Now I Get The Following:

Ok, I think figured it out.

Nice!

ClearML doesn't add all the imported packages needed to run the task to the Installed Packages

It does (but not derivative packages, that are used by the required packages, the derivative packages will be added when the agent is running it, because it creates a new clean venv and then it add the required packages, then it updates back with everything in pip freeze, because it now represents All the packages the Task needs)

Two questions:
Is t...

2 years ago

0 Hi All, I Have Python File Build_Pipeline, That Contain Pipelinecontroller With One Step Only. When I Try To Run The File I Get 'Build_Pipline.Py': [Errno 2] No Such File Or Directory' On The Webui. What I Do Wrong? Thanks!

SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step function or are you decorating a function with PipelineDecorator ?

2 years ago

0 Getting This Error At

You cannot call exit(0) and kill the kernel from the SageMake notebook

3 years ago

0 So I'M In A Colab Notebook, And After Running My Trainer(), How Do I Upload My Test Metrics To Clearml? Clearml Caught These Metrics And Uploaded Them:

No, they're not in Tensorboard

Yep that makes sense

Logger.current_logger().report_scalar("test", test_metric, posttrain_metrics[test_metric], 0)

That seems like a great solution

3 years ago

0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

Hi SubstantialElk6
clearml-agent was just updated, it should solve the issue.2. Notice that "torch" / "torchvision" packages are resolved by the agent based on the pytorch compatibility table. Is there a way to reproduce the issue where it fails resolving the torch version? could you send a full log?
3. If you want a specific torch version , you can put a direct link to the torch wheel, for example: https://download.pytorch.org/whl/cu102/torch-1.6.0-cp37-cp37m-linux_x86_64.whl

3 years ago

0 Hi All! I'M Using Clearml With Hydra As Configuration Manager. I'M Trying To Rerun A Task By Overriding Some Of The Configurations From The Ui. I Tried To Change The Config_Name Args In The Args Section And Also The Omegaconf Configuration In Configuratio

Do you think It can be fixed somehow? It would be the easiest way to launch new experiments with a different configuration

Let me check, it might be it.

It would be the easiest way to launch new experiments with a different configuration

Definitely

3 years ago

0 Hi There :) Can Anybody Tell Me What The Best Practice Is For Performing A Normalization In The Preprocess.Py Script Used By Clearml-Serving? Currently I Use A Sklearn Minmaxscaler Which Is Loaded And Applied Before And After The Data Is Send To The Model

And as far as I can see there is no mechanism installed to load other objects than the model file inside the Preprocess class, right?

Well actually this is possible, let's assume you have another Model that is part of the preprocessing, then you could have:
something like that should work

def preprocess(...)
    if not getattr(self, "_preprocess_model):
        self._preprocess_model = joblib.load(Model(model_id).get_weights())

one year ago

0 Hi, Is It Possible To Migrate A Dataset From A Self Hosted Clearml Solution To The Clearml Hosted Solution?

Yeah I can write a script to transfer it over, I was just wondering if there was a built in feature.

unfortunately no 😞
Maybe if you have a script we can put it somewhere?

one year ago

0 Hi All. I Was Using Clearml Server Hosted On A Box That I Reach Behind Traefik Using Alias For Web, File And Api. After Migration It Works Perfect For New Experiments. I Changed The Name Of The Alias From

But the artifacts and my dataset of my old experiments still use the old adress for the download ( is there a way to change that ) ?

MotionlessCoral18 the old artifacts are stored with direct links, hence the issue, as SweetBadger76 noted you might be able to replace the links directly inside the backend databases

2 years ago

0 Hi, We Saw 2 More Issues With Images Logging: 1. You Need To Use Tf.Summary.Image And Not Summary_Ops_V2.Image 2. Image Needs To Be In Range [0, 1] And Not [0, 255] (Matplotlib And Tensorboard Can Handle Either One) Is That Expected?

I get the same "white" image in both TB & ClearML 😞

3 years ago

0 We Are Facing Performance Issues Of Our Self-Hosted Clearml Server Looking At The Cpu Utilization \ Memory \ Networking We Couldn'T Identify A Bottleneck We Are At The Moment Using ~100 Workers For Some Hpo, And The Main Performance Issues We Observe Are

Hmm we might need more detailed logs ...
When you say there is a lag, what exactly doe s that mean? if you have enough apiserver instances answering the requests, the bottleneck might be the mongo or the elastic ?

3 years ago

0 Hi, I Would Like To Bring Awareness

if this is the case pytorch really messed things up, this means they removed packages
Let me check something

one year ago

0 Hi, Is There Any Way To Upload Data To A Clearml Dataset Without Compression At All? I Have Very Small Text Files That Make Up A Dataset And Compression Seems To Take Most Of The Upload Time And It Provide Almost No Benefits W.R.T Size

or can I directly open a PR?

Open a direct PR and link to this thread, I will make sure it is passed along 🙂

one year ago

0 I .

100% of things with

task_overrides

would be the most convenient way

I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:

pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)

2 years ago

0 If I Am Using The Demo Servers, Do I Need To Do Something Special To Use

HealthyStarfish45
No, it should work 🙂

3 years ago

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:
pip install git+

3 years ago

0 Thanks For Releasing This Awesome Experiment Manager! I Was Logging A Single Training Session On Multiple Gpus (Using Detectron2), And Torch.Mp Is Called For Each Gpu. This Creates A Separate Task In Trains For Each Gpu, And Only One Of The Tasks Has The

Hi VexedKangaroo32 , there is now an RC with a fix:
pip install trains==0.13.4rc0Let me know if it solved the problem

4 years ago

0 Hi Everybody. When I Want To Force The Agent To Not Reproduce My Local Pip Environment, I Add

task.set_script(working_dir=dir, entry_point="my_script.py")Why do you have this part? isn't it the same code, the script entry point is auto detected ?

... or when I run my_script.py locally (in order to create and enqueue the task)?

the latter, When the script is running locally

So something like

os.path.join(os.path.dirname(file), "requirements.txt")

is the right way?

Sure this will work 🙂

2 years ago

Show more results