AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi, I Do The Following:

Hmm, let me check, there is a chance the level is dropped when manually reporting (it might be saved for internal critical reports). Regardless I can't see any reason we could not allow to control it.

4 years ago

0 Hi, I Do The Following:

How about a git Issue? It sounds like it could be useful 🙂

4 years ago

0 I Get These Warnings Whenever I Run Pipelines And I Have No Idea What It Means Or Where It Comes From:

Let me check if we can reproduce it

one year ago

0 Is The

WackyRabbit7 this is funny, it is not ClearML providing this offering
some generic company grabbed the open-source and put t there, which they should not 🙂

2 years ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Hmm can you run the agent in debug mode, and check the specific console log?
'''
clearml-agent --debug daemon --foreground ...

2 years ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Did you you set 'force_git_ssh_protocol: true '?
https://github.com/allegroai/clearml-agent/blob/249b51a31bee97d63f41c6d5542e657962008b68/docs/clearml.conf#L39

2 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:
task.set_initial_iteration(0)

4 years ago

0 Is It Possible To Give The Agent Access To Install Private Pip Packages (Needs To Be Installed From The Repo)?

I wonder if using our own containers which should have most the deps will work better than a simpler container.

Why not, it's transparent, just run in --docker mode and provide a default docker image if the Task doesn't specify one.

4 years ago

0 Can You Help Me Make The Case For Clearml Pipelines/Tasks Vs Metaflow? Context Within...

Hi @<1541954607595393024:profile|BattyCrocodile47>

Can you help me make the case for ClearML pipelines/tasks vs Metaflow?

Based on my understanding

Metaflow cannot have custom containers per step (at least I could not find where to push them)
DAG only execution. I.e. you cannot have logic driven flows
cannot connect git repositories to different component in the pipeline
Visualization of results / artifacts is rather limited
Only Kubernetes is supported as underlying prov...

2 years ago

0 Hello, Does Anyone Know How To Bypass Package Management By Clearml If Using Docker Mode? I Want To Achieve The Following -

TroubledHedgehog16 if you have a preinstalled conda env then why would you need to reinstall it from yml file? Also if this is the default python env, clearml-agent will inherit from it and use i, (no real overhead there)
Notice the reason for "inheriting system" python environments is so that the agent could cache the individual Task requirements, meaning next time it will not need to reinstall anything
wdyt?

2 years ago

0 Hello All, I'M Trying To Queue A Task In Python But I'D Like To Reuse The Prior Task Id. In The Webapp You Can

I'm trying to queue a task in python but I'd like to reuse the prior task ID.

is it your own Task? i,,e, enqueue yourself, if this is the case use task.execute_remotely it will do just that.
If this is another Task, then if it is aborted then you can just enqueue it, by definition it will continue with the Same Task ID.

one year ago

0 Hi I Have A Most Probably A Beginer Question Abour Loading The Data In Pycharm And Later On In Google Colab From An Dataset From Clearml. I Used From Page:

If i point directly to the data.yaml the training starts without any problem

what do you mean? how do you know where the extracted file is?
basically:

data_path = Dataset.get(...).get_local_copy()

then you should be able to open your file with open(data_path + "/data.yaml", "rt")
doe that work?

one year ago

0 Is It Possible To Perform Debugging Operations With Pycharm Integration Using Remote Session?

Thanks for the ping ConvolutedChicken69 , I missed it 😞

from what i see in the docs it's only for Jupyter / VS Code, i didn't see anything about pycharm

PyCharm is basically SSH, which is supported 🙂
(Maybe we should mention it in the docs?)

3 years ago

0 Ok, I Faced Quite Funny Issue. Sorry For Spamming In This Chat, But I Am Just Ramping Up With Clearml And Its A Bit Turbulent.. Issue (As I Understand It) Is Following: My Package That I Use For Model Trainings Has The Same Name As Some Package In Pip (I

Worker just installs by name from pip, and it installs not my package!

Oh dear ...
Did you configure additional pip repositories in the Agent's clearml.conf ? https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L77 It might be that (1) is not enough, as pip will first try to search the package in the pip repository, and only then in the private one. To avoid that, in your code you can point directly to an https of your package` Ta...

3 years ago

0 Hi Guys, How Does Allegro Keep Track Of The Requirements (I'M Running The Scripts On A Remote Train-Agent With

Does it work if I launch the clearml-agent on a docker and pip doesn't know the packages to install

Not sure I follow... the "detect_with_pip_freeze" flag (when set) will tell clearml (at runtime) to create the "installed packages" directly from pip freeze (instead of analyzing the code)

4 years ago

0 Hello! Since Today I Get

Sorry, env file for conda, the one you are using to install

4 years ago

0 Hi Community!, I'M Facing This Kind Of Error When Using Git Action To Run My Clearml Training Model File. This Error Occurs When It Reached Task.Init() Command In My Model Training File. "Valueerror: Clearml Configuration Could Not Be Found (Missing `~/Cl

MoodyCentipede68 seems you did not pass any configuration (os env or conf file) so it does nor know how to find the server and authenticate. Make sense?

3 years ago

0 Hi Everyone! Is It Possible To Read Data Directly From Server W/O Using Get_Local_Copy()?

I want to call that dataset in local PC without downloading

when you say "call" what do you mean? the dataset itself is a set of files compressed and stored in the clearml file server (or on your S3 bucket etc.)

one year ago

0 Hello, I Downloaded The Docker-Compose For Windows But When Starting It Up I'M Getting The Following Error For Mongo:

Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?

4 years ago

0 Hello. I Have A Question Regarding Jupyter Notebook Execution With Clearml Worker. Long Story Short: I Turn A Remote Machine Into Worker With Clearml-Agent. Then I Made A Task With The Local Machine By Executing Notebook Cells, Which Imports A Training M

Hi @<1555362936292118528:profile|AdventurousElephant3>
I think your issue is that Task supports two types of code,

single script/jupyter notebook
git repo + git diffIn your example (If I understand correctly) you have a notebook calling another notebook, which means the first notebook will be stored on the Task, but the second notebook (not being part of a repository) will not be stored on the task, and this is why when the agent is running the code it fails to find the second notebook....

2 years ago

0 Hi Guys, We Are Running Clearml-Serving On A Kube Cluster On Aws And We Have Noticed That We Are Getting Some 502 Errors Once In A While That We Can'T Seem To Trace Back.

Hmm reading this: None
How are you checking the health of the serving pod ?

one year ago

0 Hi Everyone! I Built A Task That Processes Data And In The End Generates A Large Folder Containing Images (Aprox 10Gb). I Would Like To Be Able To Upload This Folder As An Artifact But When I Try To Do This With Task.Upload_Artifact I Get This Error. Is T

Hi SmoggyGoat53
There is a storage limit on the file server (basically 2GB per file limit), this is the cause of the error.
You can upload the 10GB to any S3 alike solution (or a shared folder). Just set the "output_uri" on the Task (either at Task.init or with Task.output_uri = " s3://bucket ")

3 years ago

0 I Am Trying To Use

yes its the JWT issue

4 years ago

0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Is ClearML combined with DataParallel or DistributedDataParallel officially supported / should that work without many adjustments?Yes it is suported, and should work
If so, would it be started via python ... or via torchrun ... ?Yes it should, hence the request for a code snippet to reproduce the issue you are experiencing
What about remote runs, how will they support the parallel execution?Supported, You should see in the "script entry" something like "-m -m torch.di...

2 years ago

0 Please Tell Me, When Migrating A Local Server, We Have Problems That The Saved Images Are Not Displayed, It Says "Unable To Load Image" And Links To The Old Server If You Click "Copy Image Url" Or "Open Image". The Migration Was Done According To Backup'

CheerfulGorilla72

yes, IP-based access,

hmm so this is the main downside of using IP based server, the links (debug images, models, artifacts) store the full URL (e.g. http://IP:8081/ http://IP:8081/... ) This means if you switched IP they will no longer work. Any chance to fix the new server to the old IP?
(the other option is somehow edit the DB with the links, I guess doable but quite risky)

3 years ago

0 Hi There - I Am Attempting To Use The Hp Optimization Feature, But Keep Getting The Following Error:

Hi CharmingBeetle38
On the base task, do you see those arguments under the Configuration tab?
Also, if they are under Args section, you should add "Args/" prefix to the HP optimization (this is how you differentiate between the sections)

4 years ago

0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Hi ScantChimpanzee51
How are you launching the code ?
Basically the easiest way is to do so with the example you just mentioned,
Can this issue be reproduced ?

2 years ago

0 Hello, I Would Like To Optimize Hparams Saved In Configuration Objects. I Used Hydra And Omegaconf For Hparams Definition (See Img). How Should I Define The Name Of Hparam In

CurvedHedgehog15 there is not need for :
task.connect_configuration( configuration=normalize_and_flat_config(hparams), name="Hyperparameters", )Hydra is automatically logged for you, no?!

3 years ago

0 Hi There - I Am Attempting To Use The Hp Optimization Feature, But Keep Getting The Following Error:

CharmingBeetle38 try adding "General/" before the arguments. This means batch_size becomes General/batch_size. This is only because we are accessing the parameters externally, when the task is executed it is resolved automatically

4 years ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

BTW: you can quite easily add an option to set the offline folder, check here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/config/init.py#L31
PRs are always appreciated :)

5 years ago

Show more results