
Reputation
Badges 1
36 × Eureka!Welp, it's been a day with the new settings, and stats went up 140K for API calls 😢 ... going to check again tomorrow to see if any of that was spill over from yesterday
Came to ClearML since it had slick dashboard and showed me the info that mattered. Loved that I could share the results of each epoch so we could make sure things were headed in the correct direction.
It was at 1.1M when I shut it down yesterday, and today it's at 1.24M
Math checks out that if I was generating around 140K a day, and this had been running for 9 days, it had 1.2M when I caught it . So I think the next day after I shut it down I was seeing previous days numbers before shut down added . And another 24 hours it barely changed, so ya, it was 100% the stdout logging .
this one, right ? report_period_sec
in ~/clearml.conf
correct ?
Is there a place in ClearML that shows Platform Usage? Like, what's actually taking up the API calls?
Glad I got that sorted. I was OK being a paying customer, but gettin overage charges for that console stuff would have been a bummer if we had not figured it out. Next month things should be back to normal 😉
Just wish I could actually see somewhere what is being sent over API so I could know where to focus my efforts to refine this kind of stuff 😉
So, might be in the minority here, but seems like capturing stdout and sending that over to clearml via API should be disabled by default. Like I get maybe capturing stderr, but stdout? In a training scenario, that's MILLIONS of API calls just in progress bar indicators, right? Like it might actually be better for the ClearML servers just in general to make the user turn that on if they want it, otherwise we're just blasting your servers. In my case, I did not even know it was sending that...
My training is on roughly 50 classes as a subset of the Open Images Dataset for Segmentation
Since it's literally something we have to pay for ( which I signed up to do ) I would love to know what drives this cost
I am running this on a 3090 GPU locally, just been letting it run for about two weeks now I think. Just have the one GPU, ha ha. It's at epoch 368 out of the 1,000 I have it set to cap out on ( if it does not hit the default YOLO "patience" limit of 50 before then and self terminate ).
I guess last followup question, is there a way to cap costs? Like if this is running at this scale, I am not sure I can use ClearML for my purpose if I am just going to get overage charged repeatedly ( which I am already looking like I will be doing ).
well, in my case, if I am trying to make sure I do not go over the allotted usage, it matters, as I am already hitting the ceiling and I have no idea what is pushing this volume of data
In future collab community videos and sample source for YoloV8, might be worthwhile to call that out as something folks might want to turn off unless they need it :) . Like I mentioned, I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally : (
But I will try to set the reduce the number of log reports first
Ya, sorry, I meant that if you needed more info on what was being run, it was in that screenshot ( showed instances/epochs/batch size, etc ) . But yes, it's since been disabled .
Ya . I don't see any links on the FAQ site pointing back to your main . SEO wise that'd help with relevancy .
@<1523701087100473344:profile|SuccessfulKoala55> You are my hero !!! This is EXACTLY what I needed !!!
I appreciate your help @<1523701205467926528:profile|AgitatedDove14> 🙂
Assuming GitHub, but just making sure you don't have another PM tool you'd rather use .
I think we're good now :) Appreciate the help !!!
FYI, found log_stdout
in that same setting and default for that was true
so set that to false
so it would not log all stdout & stderr
Thanks, will do. Heck, for my use case, I only need like once every 10 minutes.
Actually looking at the counts today, they've barely changed. So I think this actually fixed it, and was just that the counts are only updated daily so I needed to get 48 hours out from when I made the change to see clean results to assure no spill over counts from previous days.
might be a feature request then, as ya, having transparency into something we are charged for would be nice. At this point, I have zero idea what is driving this usage and just want to make sure the costs for training do not bloat too much. I personally am just using ClearML as a central dashboard for a few people. I don't need it to be live data, I just need a rough overview of progress. Even if it only posted updates to ClearML once an hour, that is honestly fine.
Scary to think how common that might be, could be interesting way to optimize your platform, detect excessive console logging and prompt user to confirm continued usage ( or link to docs on how to disable if they want to stop it )
I would love to be able to fine tune this as needed, but in my profile I only see a Billings & Usage, and it states at the top that "Usage data is updated once every day" ... and even then, all the shows under "Platform Usage" is number of calls performed, not what those calls were.
Maybe ClearML is using tensorboard
in ways that I can fine tune? I saw there was a manual way if you were not using tensorboard
to send over data, but the videos I saw from your team used this solution when demoing YOLOv8 on YouTube ( there were a few collab videos your team did with theirs, so I just followed their instructions ). But my gut is telling me that might be the issue for the remaining data being sent over that I have no insight into.