AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi, I Need Your Help Setting Up An Trains Agent Running In Docker. I Have An Python Script Calling Wget As System Command Which Runs Fine On My Dev Engine. When Cloning The Experiment And Scheduling It Into The Services Queue I Get An Error That The Call

Okay, so basically set a template for the pod, specifying the docker image. Make sure you pass the correct trains-server configuration (i.e. api/web/file server addresses and credentials), and select the queue name the agent will listen to.

container image / details
https://hub.docker.com/r/allegroai/trains-agent

https://github.com/allegroai/trains-agent/tree/master/docker/agent

Full environment variable list to pass can be found here:
https://github.com/allegroai/trains-server/blob/...

3 years ago

WickedGoat98
Put the agent.docker_preprocess_bash_script in the root of the file (i.e. you can just add the entire thing at the top of the trains.conf)

Might it be possible that I can place a trains.conf in the mapped local folder containing the filesystem and mongodb data etc e.g.

I'm assuming you are referring to the trains-=agent services, if this is the case, sure you can,
Edit your docker-compose.yml, under line https://github.com/allegroai/trains-server/blob/b93591ec3226...

3 years ago

Nice!!!

3 years ago

0 Hello Everyone, I Have A Question Regarding Datasets. I Writing A Python Script Where It Takes As Inputs A Project Name And Returns All Datasets That Exist Within That Project. I Am Using

Any chance @<1578918150261444608:profile|RoundJellyfish71> you can open a GitHub issue so that we can track it? (I think this is indeed a good idea)

one year ago

0 Hello Everyone, I Have A Question Regarding Datasets. I Writing A Python Script Where It Takes As Inputs A Project Name And Returns All Datasets That Exist Within That Project. I Am Using

Hi @<1547028031053238272:profile|MassiveGoldfish6>

The issue I am running into is that this command does not give me the dataset version number that shows up in the UI.

Oh no, I think you are correct, it will not return the version per dataset 😞 (I will make sure we add it)
But with the dataset ID you can grab all the properties:
Dataset.get(dataset_id="aabbcc").version
wdyt

one year ago

0 Hello Everyone, I Have A Question Regarding Datasets. I Writing A Python Script Where It Takes As Inputs A Project Name And Returns All Datasets That Exist Within That Project. I Am Using

I have to assume that I do not know the dataset ID

Sorry I mean:

datasets = Dataset.list_datasets(dataset_project="some_project") 
for d in datasets:
  d["version"] = Dataset.get(dataset_id=d["id"]).version

wdyt?

one year ago

0 Hello! I'M Using A

E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.

Hmm yes, that makes sense

That'd be a great solution, thanks! I'll create a PR shortly

Thank you! 🙏 🤩

one year ago

0 Hi, We'Re Facing An Error When Uploading Model Checkpoints To Clearml During Training (Using Clearml Version 1.9.0 And Pytorch Lightning 1.7.6), Anyone Knows How To Solve? Thanks! The Error: Clearml.Storage - Error - Failed Uploading: Httpsconnectionpool(

But this is clearml python package, it is not really related to the server. Could it be you also update the clearml package ?

one year ago

0 Hi All! I Have A Question Regarding Clearml-Agents That I Have Not Been Able To Find In The Documentation. I Have Seen An Agent Is Also Called A 'Worker'. When An Agent Is Spinned Up, A New Process Is Spawned? Or Can It Also Be A Thread? What Limits The M

GiganticTurtle0 notice that when you spin an agent with --services-mode, you basically let it run many Tasks at once (this is in contrast to the default behavior, when you have one Task per agent).

2 years ago

0 Is It Possible To Import User-Defined Modules When Wrapping Tasks/Steps With Functions And Decorators? As Far As I Know, When I Want To Define A Single “Step” In A Pipeline Using Function For Decorator, I Need To Import All Required Libs Inside This Wrapp

Great ascii tree 🙂
GrittyKangaroo27 assuming you are doing:
@PipelineDecorator.component(..., repo='.') def my_component(): ...The function my_component will be running in the repository root, so in thoery it could access the packages 1/2
(I'm assuming here directory "project" is the repository root)
Does that make sense ?
BTW: when you pass repo='.' to @PipelineDecorator.component it takes the current repository that exists on the local machine running the pipel...

2 years ago

0 Has Anyone Used

ElegantCoyote26 what is the model input layer definition? This implies the data format to pass to the serve endpoint

2 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

Ok the doc needs fix (edited)

suggestion?

3 years ago

0 Hey All. I Need Some Help Debugging Some Errors. I Keep Getting An Error About Failing To Clone The Repository On The Remote Instance. What Could Be The Reason Of This? Are There Any Common Errors Related To This? I Suspect Permissions, But Not Entirely

suspect permissions, but not entirely sure what and where

Seems like it.
Check the config file on the agent machine
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L18
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L19

3 years ago

0 Hi, I Recently Started Evaluating Trains. Given That Tensorboard Is Much More Mature, And Our Team Is Used To It, I Think It Is Likely We Won’T Want To Stop Using Tensorboard Completely And Just Switch To Trains. But I Am Thinking It Could Be Pretty Use

Hi LivelyLion31
Yes, the reason we designed Trains with an automagic integration is exactly that reason, so users do not need to learn another package and that with almost no effort you get most of the benefits.
Regrading the TB files, from experience most users will use the TB files short after they executed the experiment, usually for debugging and in depth capabilities (like network debugger profile etc), metric view is something that is much easier to do on a centralized server (like on...

4 years ago

0 Hello, I Have A Problem With Task.Set_Initial_Iteration(0) In Google Colab. After Continuing The Experiment, Gaps Appear On My Graph, But If You Use Colab. I Tried It On My Computer And Everything Is Normal There.

And it works correctly when running on my computer, and if I use colab, then for some reason it has no effect.

I think I'm lost on this one, when running in colab, is this continuing a previous experiment ?

2 years ago

0 Hi, I Have A Self-Hosted Instance Running Quite Well, Pretty Good Job. I'M Wondering If There Is Any Way To Have A Read-Only User? Is It Available In The

Hi SteadyFox10
Short answer no 😞
Long answer, full permissions are available in the paid tier, along side a few more advanced features.
Fortunately in this specific use case, the community service allows you to share a single (or multiple) experiments with a read-only link. Would that work ?

3 years ago

0 How Come

ShinyLobster84

fatal: could not read Username for '

': terminal prompts disabled

This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?

3 years ago

0 A Suggestion. Sometimes Newcomers That Join An Existing Project That Uses Clearml Forget To Configure Their Clearml For The Organization'S Server Resulting In Them Launching Experiments To The Public Cloud Possibly With Sensitive Data - I Think That If Y

maybe worth updating the main Readme.md in the github.. if someone try to follow the instructions there it breaks

Hmm I thought we already did, Yes you are absolutely correct, I'll make sure we do

3 years ago

0 Hey Guys. We Have Been Using Clearml For A While Now And It Has Solved Quite Some Headaches Around Our Operations. We Are Self Hosting It Using Docker Swarm And Were Wondering If This Is Something That The Community Would Be Interested In. This Would Be

Should be fairly easy to add no?

one year ago

0 When Running An Experiment From A Notebook, It Knows It’S A Notebook And Automatically Adds The Notebook As An Artifact Right? And The Uncommited Changes Becomes The Nottebook Converted To A Script? In One Case I Am Seeing Actual Git Diff Coming In Instea

(I'll make sure it is added to the docstring because apparently it was not there

3 years ago

0 Hi, We'Re Hosting Clearml On Our K8S Cluster, And I'M Running Into Problems With It... I'Ve Set It Up In A Subdomain Way - App/Files/Api.Clearml.Mydomain... But I Have Some Issues With The Ssl Certificate. When I Try Running

Good question 🙂
from clearml import Task Task.init('examples', 'test')

3 years ago

0 Hey! Do You Have Any Support For 3D Mesh Visulaization?

FranticCormorant35
See here https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py#L42

4 years ago

0 Hi All! I Noticed When A Pipeline Fails, All Its Components Continue Running. Wouldn'T It Make More Sense For The Pipeline To Send An Abort Signal To All Tasks That Depend On The Pipeline? I'M Using Clearml V1.1.3Rc0 And Clearml-Agent 1.1.0

So if any step corresponding to 'inference_orchestrator_1' fails, then 'inference_orchestrator_2' keeps running.

GiganticTurtle0 I'm not sure it makes sense to halt the entire pipeline if one step fails.
That said, how about using the post_execution callback, then check if the step failed, you could stop the entire pipeline (and any running steps), what do you think?

2 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?

3 years ago

0 How, If At All, Should We Cite Clearml In A Research Paper? Would You Like Us To? How About A Footnote/Acknowledgement?

BTW,

has this at the bottom:

Yes, it is the company legal entity name. But I think that for refrencing it makes more sense to mention the product name ClearML

I think this looks good 🙂

2 years ago

It might do the same ?!

3 years ago

0 Hi There! I'M (Again) Having Trouble With The Lack Of Documentation Regarding Task.Get_Tasks(Task_Filter={Stuff}). The Documentation Refers To Getallrequest, For Which I Couldn'T Find The Docs, And Reading The Code Was Only Partially Helpful. So I'Ve Actu

At the moment I'm querying by paging through the tasks as you recommended, and then filtering with standard python list-comprehension filters...Which is less than ideal.

At least let's do that better:
Use Task._query_tasks:
Task._query_tasks(order_by=['-started'], page_size=10, page=0, only_fields=['id', 'started'])You will get "lighter" objects returned, then you can filter them with code (but the request will be a lots faster)
SuccessfulKoala55 any suggestion on improving that ?

3 years ago

BTW: see if this works:
$ CLEARML_API_HOST_VERIFY_CERT=0 clearml-init

3 years ago

0 Hi Guys, Any Plan To Integrate The

JitteryCoyote63 The release was delayed due a last minute issue, should be released later today. Anyhow the code is updated on GitHub, so you can start implementing :) let me know if I can be of help :)

4 years ago

0 I Have No Prior Devops Experience. I'Ve Been Able To Set Up A Simple Continuous Training Setup Using Clearml. I Wanted To Ask What Should I Learn Which Would Help Me Move A Project From Mlops Level 0 To Level 1, And Then Level 2, Using Clear Ml. I Would A

I'm kind of at a point where I don't know a lot of what to even search for.

we feel you 💗 , yes there still isn't a very good source of information on where to get started...
This is because the entire field is constantly changing and evolving, and one solution will usually only apply to specific use case...
I would start with the mlops community slack channel, and youtube talks (specifically those by companies describe how they built their own internal infrastructure, i...

3 years ago

Show more results