AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 5 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 I’M Getting 404 Errors When Trying To Click Links For Notebook Artifacts And I’M Trying To Figure Out If It’S The File Or If It’S The File Server. Is There Some Sort Of Endpoint We Can Hit On The Fileserver To Verify It’S Available?

looks like at the end of the day we removed

proxy_set_header Host $host;

and use the fqdn for the proxy_pass line

And did that solve the issue?

3 years ago

0 Hi Everyone, Quick Question: Is There Any Easy Way To

ScantChimpanzee51 what's the use case for the full path without specific artifact?

one year ago

0 I Have Some Code That Launches Ml Tasks And It Accepts A Yaml File,

So the way it works anything in the " extra_docker_shell_script " section is executed inside the container everytime the container spins. I'm thinking that the
extra_docker_shell_script will pull the environment file from an S3 bucket and apply all "secrets" (or secrets are embedded into the startup bash script, like "export AWS_SECRET=abcdef"), that said this will not be on a per user basis 😞
Does that help?

2 years ago

0 Hi, I Have One Doubt Related To Pipeline I Have One Pipeline With Eg 3 Tasks, Preprocess, Train And Test Now I Want To Clone The Pipeline And Change The Hyperparameters Of Train Task, Is It Possible? If So, How??

like this.. But when I am cloning the pipeline and changing the parameters, it is running on default parameters, given when pipeline was 1st run

Just making sure, you are running the cloned pipeline with an agent. correct?
What is the clearml version you are using?
Is this reproducible with the pipeline example ?

one year ago

0 Hi, I'M Having Problems With The Installed Packages When Creating An Experiment. The Installed Packages Used To Be A List With The Versions Of All The Installed Packages In The Venv. However, Now I Get The Following:

Thank you, I would love to make sure we fix it

2 years ago

0 More Clarification On Documentation (Clearml Data):

Hi UnevenDolphin73

This differentiable storage - does it only work on file additions/removal, or also on intra-file changes?

This is on a file level, meaning you change a single byte in the file, the entire file will be packaged in the new version.
Make sense ?

2 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

CleanWhale17 what is " Online-Training Support(for Dataset Shifts" ?

4 years ago

0 Hello! Does Anyone Know How To Do

Glad to hear!
(yeah @<1603198134261911552:profile|ColossalReindeer77> I'm with you the override is not intuitive, I'll pass the info to the technical writers, hopefully they can find a way to make it easier to understand)

one year ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

DeterminedToad86
So based on the log it seems the agent is installing:
torch from https://download.pytorch.org/whl/cu102/torch-1.6.0-cp36-cp36m-linux_x86_64.whl
and torchvision from https://torchvision-build.s3-us-west-2.amazonaws.com/1.6.0/gpu/cuda-11-0/torchvision-0.7.0a0%2B78ed10c-cp36-cp36m-manylinux1_x86_64.whl

See in the log:
Warning, could not locate PyTorch torch==1.6.0 matching CUDA version 110, best candidate 1.7.0But torchvision is downloaded from the cuda 11 folder...
I...

3 years ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Actually it is better to leave it as is, it will just automatically mount the .ssh folder into the container, i will make sure the docs point to this option first

one year ago

0 Hi. I'M Encountering A Problem With

BTW:

If I try to find the right model in the

task.models["output"]

(this time there is just one but in my code there may be several) it appears with the

(see other attached screenshot).

What would make sense here ? (I have to be honest I'm not sure).
To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task (i.e. task.models["output"]["model-key"] )

one year ago

0 Hi. When Using Sklearn'S

DistressedGoat23

We are running a hyperparameter tuning (using some cv) which might take a long time and might be even aborted unexpectedly due to machine resources.
We therefore want to see the progress

On the HPO Task itself (not the individual experiments the one controlling it all) there is the global progress of the optimization metric, is this what you are looking for ? Am I missing something?

one year ago

0 Hello Everyone, I Deployed Clearml (

That seems like the k8s routing, can you try the web server curl?

3 years ago

0 Hi, I Would Like To Follow-Up In This

JitteryCoyote63 oh dear, let me see if we can reproduce (version 1.4 is already in internal testing, I want to verify this was fixed)

2 years ago

0 Hello Folks! I Don'T Know If This Issue Has Already Been Addressed. I Have A Basic Pipelinecontroller Script With Two Steps: One Of Task Is For Preprocessing Purposes And The Other For Training A Model. Currently I Am Placing The Code Related To The Pack

I see, that means xarray is not an actual package but a folder add to the python path.
This explains why Task.add_requirements fails, as it is supposed to add python packages to the equivalent of "requirements.txt" ...
Is the folder part of the git repository ? How would you pass it to the remote machine the cleamrl-agent is running on?

3 years ago

0 {"Detail":"Error Processing Request: Error: Failed Loading Preprocess Code For 'Py_Code_Best_Model': [Errno 2] No Such File Or Directory: '/Root/.Clearml/Cache/Storage_Manager/Global/Cd46Dd0091D71B5294Dc6870Ac6D17Dc..._Artifacts_Archive_Py_Code_Best_Model

Nice!!!

one year ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

DefeatedCrab47 yes that is correct. I actually meant if you see it on the tensorboard's UI 🙂
Anyhow if it there, you should find it in the Tasks Results Debug Samples

3 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

CleanWhale17 nice ... 🙂
So the answer is Trains supports the Pipeline / Automation of it, but lacks that dataset integration (that is basically up to you to manage, with either artifacts or any other method)
The Allegro Enterprise allows you to rerun the code, on a new version of the dataset from the UI (or automation) without changing a single line of code 🙂

4 years ago

0 Reducing Docker Container Spin-Up Time With Clearml Agent

Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)

2 years ago

0 Hi There, I'Ve Been Trying To Work With Trains And I Wanted To Save A Folder As The Model Like When Using The "Transformers" Library. They Have This "Save_Pretrained" Method To Their Models. It Saves The Pytorch Model And You Detect It Well, But Only That

Hi PompousBeetle71 , Trains will log all the torch.save call, I'm assuming they do not actually use it for the rest of the files on that folder.
If you like to share a code snippet we could see if we could auto-magically log it You could use artifacts and store the entire folder. It will zip it an upload it. Then you can reuse it from other experiments. https://allegro.ai/docs/task.html?highlight=artifact#trains.task.Task.upload_artifact
Example:
` task.upload_artifact('transformer', './my_...

4 years ago

0 Hi, I Am Currently Experimenting With Clearml Pipelines In Offline Environments And I Am Trying To Establish A Best Practice For Changing And Maintaining Dependencies In The Pipeline Tasks. I Noticed That Clearml Automatically Pulls The Requirements.Txt

ShakyOstrich31

I am reusing an old task ...

Which means that the old Task stores the requirements on the Task itself (see "Installed Packages" section), Notice it also stores the exact git commit to use.
When you are cloning the Task (i.e. in the pipeline), you should probably:
set the commit / branch to the latest in the branch clear the "installed packages" section, which would cause the agent to use the "requirements.txt" stored in the git repo itself.As far as I understand this s...

2 years ago

0 Hello, I'M A Bit Lost In The Docs For The Mlops, I Have Script Which Already Integrate Clearml Logging, Should I Use Clearml-Task To Launch It On An Agent ? (I Already Have A Clearml-Server And A Clearml-Agent Running).

Hi VirtuousFish83
Apologies for the documentation in the docs 🙂 It sounds complicated but actually should be relatively simple. Based on what I understand, you already have the server setup and you code integrated. The question is "can you see an experiment in the UI"? If you do, then you can right click it, clone the experiment , edit parameters and send for execution (enqueue). If the experiment is not in the UI you can either (1) run the code with the Task.init call, it ill automatica...

3 years ago

0 Hey Guys, Sorry For The Rapid Fire Questions In The Past Few Days. I Have Another Issue Though. I Initially Ran A Task, Directly From A Repo. It Succesfully Installed The Requirements From The Requirements File In The Repo And Ran The Task Without Any Iss

however when I clone or reset said task after completion and then enqueue it again, I get the above error.

This part is somewhat confusing... There is no magic happening behind the scenes, cloning a Task and creating it, is basically the same ... Do you have a reference to the YOLOv5 code base itself, maybe I can figure out what's the issue?

2 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

Hi ExcitedFish86
Good question, how do you "connect" the 3 nodes? (i.e. what the framework you are using)

3 years ago

0 Hey

When you login with user/pass in the UI the same "process" happens and you get a Token to work with, this is the same as secret/key
Since in both cases you provide credentials and get back access token, it should work
(This is of course only if you are setting user/pass manually and disabling pass_hashed as you have)

10 months ago

0 Is There A Way To Interface With Clearml Agent (Cli?) To Handle Model Repositories And Data Versioning (But So, Not Experimentation, Tight Integration, Pipelining, Etc)?

UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md

3 years ago

0 In Pipelinev2, Is It Possible To Register Artifacts To The Pipeline Task? I See There Is A Private Variable

Yep 🙂 but only in RC (or github)

2 years ago

0 I’M Getting These Errors When Using Agent In Docker Mode

btw,

I launch the agent

daemon

outside docker (with

--docker

) , that’s the way it is supposed to work right?

Yep that should work
is it ?

3 years ago

0 I Have A Set Up An Agent, On A Gpu Machine, And Spun Up The Daemon In Docker Moder, And Specifically Specified A Gpu That It Will Work With. The Image Is Okay And I Verified That By Running

LOL

4 years ago

0 Hello, I Have Been Using Clearml Interactive Session For More Than 3 Months And I Am Facing With Random Ssh Disconnection Errors In Vscode Once In A While After Creating The Session. Sometimes Reconnecting Works, If It Does Not Work I Reconnect The Clear

Hi @<1699955693882183680:profile|UpsetSeaturtle37>
What's your clearml-session version? where is the remote machine ?
And yes if the network connection is bad we have seen this behavior you can try with --keepalive=true
Notice that these are SSH networking issue, not something to do with the clearml-session layer the --keepalive is trying to automatically detect these disconnects and make sure it reconnects for you.

4 months ago

Show more results