Came to ClearML since it had slick dashboard and showed me the info that mattered. Loved that I could share the results of each epoch so we could make sure things were headed in the correct direction.
each epoch runs about 55 minutes, and that screenshot I posted earlier kind of show the logs for the rest of the info being output, if you wanted to check that out
I thought you disabled the stdout log. no?
Maybe ClearML is using
tensorboard
in ways that I can fine tune? I
You can open your TB and see, every report there is logged into clearml
But I will try to set the reduce the number of log reports first
Under your profile you should be able to see it
So, might be in the minority here, but seems like capturing stdout and sending that over to clearml via API should be disabled by default. Like I get maybe capturing stderr, but stdout? In a training scenario, that's MILLIONS of API calls just in progress bar indicators, right? Like it might actually be better for the ClearML servers just in general to make the user turn that on if they want it, otherwise we're just blasting your servers. In my case, I did not even know it was sending that over until I got into digging where these API calls were coming from, and saw the CONSOLE tab in clearml that had every single line of stdout captured.
is number of calls performed, not what those calls were.
oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones
My training is on roughly 50 classes as a subset of the Open Images Dataset for Segmentation
this one, right ? report_period_sec
in ~/clearml.conf
correct ?
Maybe ClearML is using tensorboard
in ways that I can fine tune? I saw there was a manual way if you were not using tensorboard
to send over data, but the videos I saw from your team used this solution when demoing YOLOv8 on YouTube ( there were a few collab videos your team did with theirs, so I just followed their instructions ). But my gut is telling me that might be the issue for the remaining data being sent over that I have no insight into.
Welp, it's been a day with the new settings, and stats went up 140K for API calls 😢 ... going to check again tomorrow to see if any of that was spill over from yesterday
In case of scalars it is easy to see (maximum number of iterations is a good starting point
I would love to be able to fine tune this as needed, but in my profile I only see a Billings & Usage, and it states at the top that "Usage data is updated once every day" ... and even then, all the shows under "Platform Usage" is number of calls performed, not what those calls were.
I guess last followup question, is there a way to cap costs?
Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user 🙂 )
I appreciate your help @<1523701205467926528:profile|AgitatedDove14> 🙂
well from 2 to 30sec is a factor of 15, I think this is a good start 🙂
I am running this on a 3090 GPU locally, just been letting it run for about two weeks now I think. Just have the one GPU, ha ha. It's at epoch 368 out of the 1,000 I have it set to cap out on ( if it does not hit the default YOLO "patience" limit of 50 before then and self terminate ).
@<1572395184505753600:profile|GleamingSeagull15> see " Can I control what ClearML automatically logs? " in None (specifically the auto_connect_frameworks
argument to Task.init()
)
Thanks, will do. Heck, for my use case, I only need like once every 10 minutes.
If you do not have a lot of workers, that I would guess console outputs
might be a feature request then, as ya, having transparency into something we are charged for would be nice. At this point, I have zero idea what is driving this usage and just want to make sure the costs for training do not bloat too much. I personally am just using ClearML as a central dashboard for a few people. I don't need it to be live data, I just need a rough overview of progress. Even if it only posted updates to ClearML once an hour, that is honestly fine.
Just wish I could actually see somewhere what is being sent over API so I could know where to focus my efforts to refine this kind of stuff 😉
(Not sure it actually has that information)
hmmm, this is just a personal project, honestly was just hoping this would let me take the results of each epoch and put it in a central dashboard. Having this generate 1M+ api calls and only being like 1/4 of the way though training is a bit much. Current pricing is $1/100K API calls at the PRO tear, which I am on ... so it would be like another $50 just in API calls at this pace 😞 Would love to just cap it at a fixed amount for a month for API calls.
Since it's literally something we have to pay for ( which I signed up to do ) I would love to know what drives this cost
I did notice that the last 24 hours I dropped quite a bit, so my theory that the 140K might have some spillover from previous day might have been correct. Last 24 hours went from 1.24M to 1.32M, so about half as much as the day before, with the same training running.
Scary to think how common that might be, could be interesting way to optimize your platform, detect excessive console logging and prompt user to confirm continued usage ( or link to docs on how to disable if they want to stop it )
One single experiment using the code above. I have no idea how many scalars I am sending since as far as I can tell, I am not setting anything specific to define what I am sending over to ClearML, literally first time using YoloV8 or ClearML. Just using the super basic python to run.
Hi @<1572395184505753600:profile|GleamingSeagull15>
Try adjusting:
None
to 30 sec
It will reduce the number of log reports (i.e. API calls)