AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

So the TB issue was reported images were not logged.
We are now talking about the caching, which is actually a UI thing which clearml-server version are you using ?
And where are the images stored (the default files server or is it S3/GS etc.) ?

3 years ago

Okay I found it, this is due to the fact the newer versions are sending the events/images in a subprocess (it used to be a thread).
The creation of the object is done on he main process, updating file index (round robin manner), but the check itself, happens on the subprocess., which is not "aware" of the used indexes (i.e. it is always 0, hence when exceeding the history side, it skips it)

3 years ago

why doesn't this happen on my other experiments?

same 100+ reports ?
(My new theory is that calling Task.reload() will fix it, and it might be called internally for the other experiments, like when reporting models/artifacts)
Could that be the case ?

3 years ago

MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:
pip install git+

3 years ago

0 Hi, I Would Like To Bring Awareness

@<1523701066867150848:profile|JitteryCoyote63>
I just created a new venv and run

pip install "torch==1.11.0.*" --extra-index-url

Then started python:

import torch
torch.cuda.is_available()

And I get True

what are you getting?

one year ago

0 Hi, I Would Like To Bring Awareness

Hi @<1523701066867150848:profile|JitteryCoyote63>
RC is out,

pip3 install clearml-agent==1.5.3rc3

Then in pytorch_resolve: "direct"
None

Let me know if it worked

one year ago

0 Hi, I Would Like To Bring Awareness

Hi @<1523701066867150848:profile|JitteryCoyote63>
Thank you for bringing it! can you verify with the latest clearml-agent 1.5.3rc2 ?

one year ago

0 Hi, I Would Like To Bring Awareness

Hi @<1523701066867150848:profile|JitteryCoyote63>

Could you please push the code for that version on github?

oh seems like it is not synced, thank you for noticing (it will be taken care immediately)
Regrading the issue:
Look at the attached images
None does not contain a specific wheel for cuda117 to x86, they use the pip defualt one
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT9ATQXJ5-F05744CK09L/screenshot...

one year ago

0 Hi, I Would Like To Bring Awareness

No, I think the default version already supports cuda 117

one year ago

0 Hi, I Would Like To Bring Awareness

So I suppose clearml-agent is not responsible, because it finds a wheel for torch 1.11.0 with cu117.

The thing is, the agent used to do all the heavy parsing because pytorch never actually had a pip compatible artifactory
But now they do, so the agent basically passed the parsing to pip and just added the correct additional pytorch pip repo.
It seems we need to switch back... wdyt?

one year ago

0 Hi, I Would Like To Bring Awareness

The wheel you download from pip, for example this one torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl
is actually both CPU and cuda 117

one year ago

0 Hi, I Would Like To Bring Awareness

if this is the case pytorch really messed things up, this means they removed packages
Let me check something

one year ago

0 Hi, I Would Like To Bring Awareness

I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible

So I tested the "old" code that did the parsing and matching, and it did resolve to the correct wheel (i.e. found that there is no 117 only 115 and installed this one)
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt?

one year ago

0 Hi Guys, I Am Having Some Trouble Running Some Training Scripts With The Agent Functionality:

When we enqueue the task using the web-ui we have the above error

ShallowGoldfish8 I think I understand the issue,
basically I think the issue is:
task.connect(model_params, 'model_params')Since this is a nested dict:
model_params = { "loss_function": "Logloss", "eval_metric": "AUC", "class_weights": {0: 1, 1: 60}, "learning_rate": 0.1 }The class_weights is stored as a String key, but catboost expects "int" key, hence it fails.
One op...

2 years ago

0 Any Pointers On Running Gpu Tasks With K8S Glue?

For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944

3 years ago

0 Any Pointers On Running Gpu Tasks With K8S Glue?

--docker or in clearml.conf https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L153

3 years ago

0 Hi All, I'M New With Clearml And I Have A Question. I Have A Modular Code, And When I'M Trying To Run It In A Remote Machine With The Agent, I Get An Error On The Line 'From X Import Y', Which Says That There Isn'T Such Module X. Any Help? Thanks.

Hi BroadMole64

'from X import Y', which says that there isn't such module X. any help? thanks.

can you see package X under the "Execution" tab "Installed Packages" section ?
(think of this section as requirements.txt section, in order for the agent to install the package on the remote machine it should have it listed there)

3 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

The problems comes from ClearML that thinks it starts from iteration 420, and then adds again the iteration number (421), so it starts logging from 420+421=841

JitteryCoyote63 Is this the issue ?

3 years ago

After you call task.set_initial_iteration(0) what do you get with task.get_initial_iteration() , is it 0 ?

3 years ago

Let me see if I can reproduce something

3 years ago

0 Hi, Currently We Can Add "Tags" On Experiments. When Filtering The Tags In The Dashboard, It Seems To Default To Filter As A "Or" Condition, Is It Possible To Search With "And" Condition, Such As Search With "Dataset_Version1 + Nn_Model"

Hi EnviousStarfish54
Verified with the frontend / backend guys.
Backend allows to search for "all" tags, and frontend will add a toggle button for the UI to select or/all for the selected Tags.
Should be part of the next release

3 years ago

Hi EnviousStarfish54
I remember this feature request, let me check where it stands..

3 years ago

0 Hi, I Try To Write An Article On Medium About Clearml And Face Some A Problem With Plotly Figures. When Displaying The Figure Locally In A Browser Works Fine, But On The Cleaml Server (I Use The Free Tier Service) The Plot Is Empty And Has The Title 'Unkn

WickedGoat98 Nice!!!
BTW: The fix should solve both (i.e. no need to manually cast), I'll make sure the fix is on GitHub so you'll be able to verify 🙂

3 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

CleanWhale17 per your request :)

An automated ML Pipeline 👍 Automated Data Source Integration 👍 Data Pooling and Web Interface for Manual Annotation of Images(Seg. / Classif) [Allegro Enterprise] or users integrate with open-source Storage of Annotation output files(versioned JSON) 👍 Online-Training Support(for Dataset Shifts) [Not Sure what you mean] Data Pre-processessing (filter/augment) [Allegro Enterprise] or users integrate with open-source Data-set visualization(stats...

4 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

So can you verify it can download the model ?

3 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

CleanWhale17 what is " Online-Training Support(for Dataset Shifts" ?

4 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

As I'm a Full-stack developer at Core. I'd be looking to extend the TRAINS Frontend and Backend APIs to suit my need of On-Prem data storage integration and lots of other customization for Job Scheduler(CRON)/Dataset Augmentation/Custom Annot. tool etc.

That is awesome! Feel free to post a specific question here, and I'll try to direct to the right place 🙂

Can you guide me to one such tutorial that's teaching how to customize the backend/front end with an example?

You mean l...

4 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Is it vanilla pytorch ?

3 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Yes this is Triton failing to load the actual model file

3 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Task.init should be called before pytorch distribution is called, then on each instance you need to call Task.current_task() to get the instance (and make sure the logs are tracked).

3 years ago

Show more results