AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 I Cloned It And Scheduled It To The Default Queue, But It Is Not Being Processed. Is The Default Queue By Default Not Usable?

WickedGoat98 did you setup a machine with trains-agent pulling from the "default" queue ?

3 years ago

0 When I Do

is this a config file on your side or something I can change, if we had enterprise version?

Yes, this is one of the things you can configure

2 years ago

0 Hi,Guys, I Have Some Questions: 1. Can I Backup All My Experiments? 2. Can I Add My Old Experiments To A New Server? 3. Can I Add Some Information To One Experiment Which Was Finished(Maybe I Want To Reevaluate Some Model)?

Hi SubstantialBaldeagle49
2. Sure follow the back procedure and restore on the new server
3. Yes
task=Task.get_task(task_id='aa')
task.get_logger().report_scalar()

4 years ago

0 Hi, From Time To Time Due To Connectivity Issues My Tasks Can'T Report To The Server For 5-20Mins And Fail Because Of That. Is There Any Way To Adjust Something In The Configuration File To Deal With That?

ContemplativeGoat37 I think there was an issues just lije you described and it was solved in later versions, upgrade to the latest clearml package version, you should be fine 🙂

2 years ago

0 Hi Everyone, How Do I Integrate Sagemaker With Clearml , Currently I Only See Wandb Integrated With The Hugging Face And Don'T See Any Tutorials On Clearml , I Am Fine Tuning A Llama Model And Following This

Hi @<1549202366266347520:profile|GorgeousMonkey78>

how do I integrate sagemaker with clearml ,

you mean to launch an experiment, or just to log it?

one year ago

0 Hello! I Faced The Issue With Hyper Parameters Optimization. When I Try To Run Optimization I Receive An Error:

Hi VastShells9
2022-12-20 12:48:02,560 - clearml.automation.optimization - WARNING - Could not find requested hyper-parameters ['duration'] on base task a6262a151f3b454cba9e22a77f4861e3Basically it is telling you it is setting a parameter it never found on the original Task you want to run the HPO o.
The parameter name should be (based on the screenshot) "Args/duration" (you have to add the section name to the HPO params). Make sense ?

one year ago

0 Hello, I Have A Problem With Task.Set_Initial_Iteration(0) In Google Colab. After Continuing The Experiment, Gaps Appear On My Graph, But If You Use Colab. I Tried It On My Computer And Everything Is Normal There.

I can't think of any actual difference in flow ...
Can you try the following?
task._setup_reporter() task.set_initial_iteration(0)

3 years ago

0 Hi, I Want To Pass Environment Variables From The Host To The Docker Containers Running My Task. I Managed To Use

but is there any other way to get env vars / any value or secret from the host to the docker of a task?

if this is docker -e/--env as argument would do the same
-e VAR=somevalue

3 years ago

0 Hi

I'll check what we can do on running in a daemon subprocess

3 years ago

0 Quick Question On

Hi SarcasticSparrow10
You will need to habe multiple trains-agent s but they will be sharing the same queue (i.e. pulling jobs from the same queue the HPO process is pushing to)
Make sense ?

4 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Task.init should be called before pytorch distribution is called, then on each instance you need to call Task.current_task() to get the instance (and make sure the logs are tracked).

3 years ago

Is this you case?

3 years ago

0 Hi, A Question About Dataset Storage Suppose I Create A Dataset Like This

Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?

one year ago

0 Hello People

Hmm not sure, try the latest anyhow 🙂

2 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hmm, #790 should be solved in 1.7.2
Yes, I always see the "model uploaded completed" for such stuck tasksAny chance this is reproducible ?
How many processes do you see running (i.e. ps -Af | grep python) ?
What is the training framework? is it multiprocess ? how are you launching the process itself? is it Linux OS? is it running inside a specific container ?

2 years ago

0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)

2 years ago

0 Hi All, I'M Updating My Code To Use Hydra, And Facing An Issue: When I Try To Init A Task In Offline Mode I'Me Getting The Following:

Hi RipeGoose2
I just test the hydra example, seems to work when you add the offline right after the import:
` from clearml import Task

Task.set_offline(True) `

3 years ago

0 Hello All

That makes total sense.
So right now you can probably use clearml-session to spin a session in any container, add the jupyterhub to the requirements like so:
clearml-session --packages jupyterhubThen ssh + run jupyerhub + tunnel port?
ssh roo@IP -p 10022 -L 6666:localhost:6666 $ jupyterhubWould that work?
Maybe it is better to add an option to use jupyterhub instead of jupyterlab ?
wdyt?

2 years ago

0 Hi. I Am Experimenting With

PanickyMoth78

LockException: [Errno 11] Resource temporarily unavailable

I'm not sure I understand how you got to this error (obviously creating datasets and getting them back works), what is unique in the setup/flow itself ?

2 years ago

0 Any Idea Why I Get This Error In All My Agents

Seems like settings on the clearml-server disappeared (specifically default queue tag?!)

3 years ago

0 Hi, I’M Getting This Error When I Try To Run Task On A Remote Agent With Docker Mode Web Ui:

Sorry just saw it,
https://github.com/allegroai/clearml-agent/commit/918dd39b87501dc873354b7cc5c9efa933650897

2 years ago

0 Is There A Link Which Describes The Differences In Community And Enterprise Versions

PompousParrot44 Enterprise licensing pricing usually custom tailored to the size of the company and based on usage. If you are interested feel free to leave details in the "contact us" form on the website, and someone from sales will contact you shortly after.

4 years ago

0 How Do I Think About Tasks/Task_Name-S? Do I See Right If I Run The Same Task With The Same Name, It Overwrites The Previous Run? Is It Possible To Fail If The Task Already Exists And Need

ahh, because task_id is the "real" id of a task

Yes the ID is a global system wide unique ID (regardless of the project etc.)

Maybe we will call tasks as

slug_yyyymmdd

Notice that you can just copy-paste the link in the address bar, it will bring you to the exact same view, meaning easily shared among users 🙂 You can, but I would actually use the Task ID. This also means that programatically you can do , task=Task,get_task(task_id_here) and interact and query a...

2 years ago

0 Hi Everyone! Is There A Way To Specify The Working Directory In A Pipeline Component? I’M Using Pipelines From Decorators, I Can Set The Repo Url Just Fine, But I’M Running Everything From A Subfolder, And The Working Dir Is Set To

This would work to load the local modules, but I’m also using poetry and the

pyproject.toml

is in the subdirectory, so the agent won’t install any dependency if I don’t set the

work_dir

hmmm true, in terms of requirements, you can list them in the decorator (see packages argument)

10 months ago

0 Dear All, Great To Join Your Community. We Are Working On Plant Growth Stage Models At Basf For Farmers And I Was Wondering If Clearml Can Be Used Also For Data Versioning Of Tabular Data, Structured Data. I Would Like To Track If This And That Row Is Par

How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?

Hi @<1543766544847212544:profile|SorePelican79>
the experiments themselves will have a link to the Dataset they were using. From a dataset perspective, the idea is not to limit you, so essentially it will package all your files, and retrieve them when you fetch the datset. In terms of specifying a row / sample. My suggestion is to mark those rows when training a...

one year ago

0 How To Use

👍

3 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

😄

4 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

I would recommend reading this blog post, it should give you a glimpse of what can be built 🙂
https://medium.com/pytorch/how-trigo-built-a-scalable-ai-development-deployment-pipeline-for-frictionless-retail-b583d25d0dd

4 years ago

0 I Am Getting This Specific Message When Trying To Run Hyper Parameters Optimization (Running Remotely My Task). Does It Affect My Flow? Do I Have Something To Worry About?

Hi EmbarrassedSpider34
Long story (see below) short, yes you can ignore this warning :)

Specifically, torch is spinning processes and killing them, every process will have a reference to the parent semaphore (for internal clearml bookkeeping), now python is not very good with this kind of thing (and it is getting better on newer python verions), bottom line python "think" someone lost a semaphore, but there reality is that subprocess never created it in the first place. Does that make sen...

2 years ago

0 Is There Any Specific Version Of Numpy You Recommend To Use With Clearml Python Library? I Am Building An Python Alpine Docker Image With Clearml==1.7.2 But It Breaks When Building Image From Dockerfile.

Oh found it:
temp.linux-aarch64-cpython-39this is Arm?!

2 years ago

Show more results