AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hey

tried it and restarted the agent, but not working properly

What do you mean not working? can you provide logs ?

one year ago

0 Hi, I Run The Trains Server In An Docker Container And Started Making Use Of Tasks ... My Tests Are Showed On The Projects Dashboard Which Is Realy Cool. What I Haven'T Found So Far Is A Way To Clean Up The System From The Tests I Did. I'M Able To Archive

Another point I see is, that in the workers & queses view the GPU usage is not been reported

It should be reported, if it is not, maybe you are running the trains-agent in cpu mode ? (try adding --gpus)

3 years ago

0 ..

@<1539780284646428672:profile|PoisedElephant79> you could turn off certificate verification:
None

one year ago

0 Hello People

Hmm not sure, try the latest anyhow 🙂

2 years ago

0 Hi, I Encountered A Few Problems:

FierceFly22 wow that is a cool hack! Trains will capture any torch.save , so I think the actual driver here is the 'model.summary' . You can also upload any artifact with task.upload_artifact('name', 'modelsummary.txt')
Touching a file will not trigger Trains, as it does not monitor the files themselves. Make sense?
BTW, how will you get the file when running with the agent? If you are using the connect_configuration it will be downloaded from the trains-server for you. Otherwise you can alw...

4 years ago

0 Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

Wait, it shows "hydra==2.5" not "hydra-core==x.y" ?

2 years ago

0 So I Bumped Onto This Comparison Shared By Dagshub. It Kinda Placed Clearml Is A Rather Bad Position Compared To Everything Else In The Industry.

TrickySheep9
you are absolutely correct 🙂

3 years ago

0 Hi, I’M Currently Running Clearml With Pytorch And Everytime I Run Into

PompousHawk82 unfortunately this is kind of binary, either you have full tracking of load/save operations or you do not.
This warning message will disappear in the next version as we will be able to log multiple models under the same Task :)

3 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

This is so odd,
could you add prints right after the Task.init?
Also could you verify it still gets stuck with the latest RC

clearml==1.16.3rc2

2 months ago

0 I Know I Can Run This Manually In Step By Step But Wondering If This Can Be Automated As Scheduled Tasks

DAG which get scheduled at given interval and

Yes exactly what will be part of the next iteration of the controller/service

an example achieving what i propose would be greatly helpful

Would this help?
from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()

4 years ago

0 Hi Guys, I Have Been Running The Clearml-Serving For A While Now And I Realize That From Time To Time After A Couple Of Hours The Serving Task (Control Plane) That Is Configured Through The Cli Goes Into Status Abort. This Happens Even Though All The Pods

Woot woot, great to hear 🎊

7 months ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

give me a minute to test

3 years ago

0 Hey There, Happy New Year To All Of You

Did you experiment any drop of performances using forkserver?

No, seems to be working properly for me.

If yes, did you test the variant suggested in the pytorch issue? If yes, did it solve the speed issue?

I haven't tested it, that said it seems like a generic optimization of the DataLoader

3 years ago

0 Hi, I Am New Here, Can I Ask Question On Trains-Server Also?

The cool thing of using the trains-agent, you can change any experiment parameters and automate the process, so you get hyper-parameter optimization out of the box, and you can build complicated pipelines
https://github.com/allegroai/trains/tree/master/examples/optimization/hyper-parameter-optimization
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py

4 years ago

0 I Wonder If There Is A Way To Setup

I see TightElk12

You can always setup the OS environments : CLEARML_API_HOST CLEARML_WEB_HOST CLEARML_FILES_HOST with the correct configuration Or you can simply set CLEARML_NO_DEFAULT_SERVER=1 which will prevent any usage of the default demo serverwdyt?

3 years ago

0 Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

Hi SubstantialElk6

32 CPU cores, 64GB ram

Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded

3 years ago

0 If I Have A Dataset And I Process It And I Want The Processed Data As Another Dataset, Is Parent The Right Approach?

Parent makes sense if you are changing the data of the parent version, but some data is preserved. Which will make the delta-based storage only store the diff.
If everything is different, and you call sync for example, then it will not reference any previous "snapshot", so there will be no redundancy in storage, but you still get a pointer to the "parent" version.
Make sense ?

3 years ago

0 Hi

(also could you make sure all posts regrading the same question are put in the thread of the first post to the channel?)

one year ago

0 I’M Getting These Errors When Using Agent In Docker Mode

clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0

If the user running this command can run "docker run", then you should ne fine

3 years ago

0 Hi, I Have One Doubt Related To Pipeline I Have One Pipeline With Eg 3 Tasks, Preprocess, Train And Test Now I Want To Clone The Pipeline And Change The Hyperparameters Of Train Task, Is It Possible? If So, How??

How are you building your pipeline?
None
None

one year ago

0 Hi! I Deployed Clearml Server Along With Jupyterhub On Azure K8S (Aks). The Way It Works Is That Every User Is Assigned A New Pod That Is Spawned With A Docker Image Of A Choice (One Of Them With Clearml Sdk Installed). I Managed To Configure Most Of The

GreasyPenguin66 Nice !!!
Very cool setup, and kudos on making it work with multiple users!
Quick question, shouldn't the JUPYTERHUB_API_TOKEN env variable be enough to gain access to the server? Why did you need to add it to the 'nbserver-x.json' as well?

3 years ago

0 Does The New 2.0 Helm Charts (App Ver 1.1.0) Not Support Nfs?

neat! please update on your progress, maybe we should add an upgrade section once you have the details worked out

3 years ago

0 Hello There! I Was Trying To Update The Url For Debug Samples After Migration Of The Server To A New Domain And Was Following The Steps From Here:

Hi @<1684010629741940736:profile|NonsensicalSparrow35>

But the provided command is missing the url target for the curl so it is not complete.

Not sure I followed. did you specify "NEW_ADDRESS" ?
or is it the in both cases the URL is locahost ?

6 months ago

0 Hi Again! I Am Doing Batch Inference From A Parent Task (That Defines A Base Docker Image). However, I'Ve Encountered An Issue Where The Task Takes Several Minutes (Approximately 3-5 Minutes) Specifically When It Reaches The Stage Of "Environment Setup Co

Here this new entry in the log is 2 min after env completed =>

1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415

This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution: it literally calls the python script, nothing else...

7 months ago

Hi @<1529633468214939648:profile|CostlyElephant1>
what seems to be the issue? I could not locate anything in the log

"Environment setup completed successfully
Starting Task Execution:"

Do you mean it takes a long time to setup the environment inside the container?

CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,

It seems to be working, as you can see no virtual environment is created, the only thing that is installed is the cleartml-agent that i...

7 months ago

0 Hi Guys, How Does Allegro Keep Track Of The Requirements (I'M Running The Scripts On A Remote Train-Agent With

SmugOx94 could you please open a GitHub issue with this request, otherwise we might forget 🙂
We might also get some feedback from other users

3 years ago

0 Hi. I Have A

Are you saying this component should pull a specific git repo?
PipelineDecorator.component( ..., )seems like there is no reference to a specific repo (arguments repo and repo_branch etc are missing) is that correct?

2 years ago

0 Hi, I Have This Python Package That'S Located On My Base Image..(E.G. /Code/App/Flair). Within Then Folder There'S A Package Called Flair And A Data.Py File. I Appended Python Path With /Code/App/Flair In My Base Image And Execute It Using K8S Glue. In T

I appended python path with /code/app/flair in my base image and execute

the python path is changing since it installs a new venv into the system.
Let me check what's going on with the pythonpath, because it is definitely is changed when running the code (the code base root folder is added to it). Maybe we need to make sure that if you had PYTHON PATH pre-defined we restore it.

3 years ago

0 Hi Again, I Was Wondering What Would Be A Good Practice With Respect To Saving Different Datasets (While Preprocessing It In Several Steps/Stages). Mainly With The Use Of Remove_Files(). Is It Ok To Delete Raw Data After Preprocessing For Example? In That

Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?

one year ago

0 Hello All, I'M Trying To Adapt Clearml With My Workflow. I Installed A Server At My Server, With Workers Attached To It. I'M Trying To Execute A Task From My Local Within One Of My Workers. Trying To Use Docker Mode And A Custom Image. I Also Have A Local

Ohh yes, if the execution script is not on git and git exists, it will not add it (it will add it if it is in a tracked file via the uncommitted changes section)
ZanyPig66 in order to expand the support to your case. Can you explain exactly which files are on git and which are not?

2 years ago

Show more results