AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 Hi, Currently It Seems That Trains-Agent Writes Files With The User "Nobody", Group "Nogroup" And Permissions 777 To Created Files. How Can I Change That? To The Very Least, Change The User Group It Uses? Running On Linux Ubuntu

nfs version 3

That's the thing, NFS will automatically set file access and flags based on the mount options you cannot change them post mount.
How about creating a new user just for the agent, it makes sense from security / credentials perspective

4 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

SubstantialElk6
Notice if you are using a manual setup the default is "secure: false" you have to change it to "secure: true":
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L251

4 years ago

0 Hey! Did Anyone Try Hpo On Yolov5 Model According To The Following Tutorial:

Hi CheekyFox58
If you are running the HPO+training on your own machine, it should work just fine in the Free tier

The HPO with the UI and everything, is designed to run the actual training on remote machines, and I think this makes it a Pro feature.

2 years ago

0 With

I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything.

Sure man 🙂 no rush, I appreciate the gesture regardless of the outcome
Many thanks!

4 years ago

0 With

Looking at the

supervisor

method of the base

AutoScaler

class, where are the worker IDs kept.
Is it in the class attribute

queues

?

Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?

4 years ago

0 Thought I Would Share This. Something To Think About Over The New Year.

Thanks SubstantialElk6 !
Happy new year 🎉 🍺 🍾 🎇

3 years ago

0 With

Hopefully once things calm down at work I will find more time.

Sounds good 🙂

4 years ago

0 With

What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models.

This is very aligned with the goals of ClearML 🙂
I would to understand more on what is currently missing in ClearML so we can better support this approach

my inexperience in using them a lot until recently. I can see how that is a better solution

I think I failed in explaining my self, I me...

4 years ago

0 Hi All, I Am Having Trouble Using The

What about output_uri?

If you are using StorageManager directly, output_uri is not relevant

3 years ago

0 Hey, I Run A Programm Without Allegro On The Gpu And It Works. Then I Run It With Allegro. But The Training Does Not Start. The Gpu Is Allocated But The Training Does Not Start. The Programm Is Stuck. I Am Using The Newest Allegro Version 1.0.2 How Can I

Hi UnsightlySeagull42
Could you test with the latest RC
pip install clearml==1.0.4rc0Also could you provide some logs?

4 years ago

No worries 🙂

4 years ago

0 Hi, I Have A Question About Clearml-Data. Clearml-Data Probably Does Well On Data Versioning, But When It Comes To Actual Loading Of Data, Are There Examples Of How It Can Make Use Of Advanced Features Such That Those In

Hi SubstantialElk6
ClearML-Data doesn't actually "load" the data, it brings it locally and returns a folder with all your data files, from that point onward, it's up to your code to load it to the framework. Make sense ?

4 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

hmmm I see...
It seems to miss the fact that your process do uses the GPU.
Maybe it only happens later, that the GPU is used?
Does that make sense ?

5 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is ru...

4 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Ohh okay something seems to half work in terms of configuration, the agent has enough configuration to register itself, but fails to pass it to the task.
Can you test with the latest agent RC:
0.17.2rc4

4 years ago

Hi DeliciousBluewhale87
clearml-agent 0.17.2 was just release with the fix, let me know if it works

4 years ago

0 Hi, I Noticed That Clearml Does Not Work Together With The Debugger In Pycharm. Everytime I Use The Debugger I Have To First Comment Out The Clearml Code. Is It Possible To Solve This?

pip install clearml==0.17.5rc4

4 years ago

0 Hi, I Am New Here, Can I Ask Question On Trains-Server Also?

CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.

5 years ago

0 Hi, I Noticed That Clearml Does Not Work Together With The Debugger In Pycharm. Everytime I Use The Debugger I Have To First Comment Out The Clearml Code. Is It Possible To Solve This?

GreasyPenguin14 thank you! that will make our life a lot easier 🙂

4 years ago

0 Another Question: Is It Possible To Specify In Which Directory To Save All The Files That Clearml-Agent Creates (E.G. Cache Files Or Results Of The Currently Running Experiments)

What do you mean cache files ? Cache is machine specific and is set in the clearml.conf file.
Artifacts / models are uploaded to the files server (or any other object storage solution)

4 years ago

0 Another Question: Is It Possible To Specify In Which Directory To Save All The Files That Clearml-Agent Creates (E.G. Cache Files Or Results Of The Currently Running Experiments)

I was hoping that there's a universal flag somewhere. Asking this because I want all the Models and Artifacts to be stored in one place and the users shouldn't have to edit their configuration files.

You mean like make sure all models/artifacts are always uploaded?

4 years ago

0 Another Question: Is It Possible To Specify In Which Directory To Save All The Files That Clearml-Agent Creates (E.G. Cache Files Or Results Of The Currently Running Experiments)

Hmm, so the way the configuration works is it loads the default configuration (equivalent to the example in the docs) then it adds the ~/clearml.conf on top. That means that you can tell your users to just copy paste the credentials from the UI into a template you make. How is that ?

4 years ago

0 Hi, I'M Having A Hard Time Trying To Understand The Dataset Class. What I Need Is To Be Able To Get The Dataset, Delete A File, And Upload It Again. But The Problem Is When I Call The

Thanks MagnificentSeaurchin79 ! This code snippet is exactly what I needed, let me check if I can reproduce it.

4 years ago

0 Hi, I'M Having A Hard Time Trying To Understand The Dataset Class. What I Need Is To Be Able To Get The Dataset, Delete A File, And Upload It Again. But The Problem Is When I Call The

Thanks!
I think this one will cover both case (the issue is with files on the root of the dataset)
if not (fnmatch(k, path) and fnmatch(k if '/' in k else '/{}'.format(k), '*/' + wildcard))}

4 years ago

0 Hi, I'M Having A Hard Time Trying To Understand The Dataset Class. What I Need Is To Be Able To Get The Dataset, Delete A File, And Upload It Again. But The Problem Is When I Call The

You are doing great 🙂 don't worry about it

4 years ago

0 Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

BattyLion34 Okay, I'll try to see if we can solve the multi-instance issue on Windows (because obviously it should be automatic)

4 years ago

0 Hi All! I’M Trying To Set Up Remote-Launching Of Training Scripts On Clearml Autoscaler, And I Can’T Figure Out One Thing: How To Make Remote Clearml Agent Do

Hi @<1716987933514272768:profile|SuccessfulPuppy43>

How to make remote ClearML agent do

pip install -e .

in theory there is no need to do that clearml-agent adds the repo root folder to the python path.
If you insist on actually installing it, try to add to your "installed packages" section a "requirement.txt" compatible line:

-e .

one year ago

0 Hi Everyone, I Have Questions Related To Clearml-Serving.

self.task.upload_artifact('trend_step', self.trend_step + 1)Out of curiosity why would every request generate an artifact ? Wouldn't it be better to report as part of the statistics ?
What would be the size / type of the matrix X (i.e. np.size / np.dtype) ?

3 years ago

0 If I Am Using The Demo Servers, Do I Need To Do Something Special To Use

Hi HealthyStarfish45

is there an advantage in using tensorboard over your reporting?

Not unless your code already uses TB or has some built in TB loggers.

html reporting looks powerfull, can one inject some javascript inside?

As long as the JS is self contained in the html script, anything goes :)

4 years ago

0 If I Am Using The Demo Servers, Do I Need To Do Something Special To Use

there was a problem with index order when converting from pytorch tensor to numpy array

HealthyStarfish45 I'm assuming you are sending numpy to report_image (which makes sense) if you want to debug it, you can also test tensorboard add_image or matplotlib imshow. both will send debug images

4 years ago

Show more results