My training is on roughly 50 classes as a subset of the Open Images Dataset for Segmentation
FYI, I did not even know to look into this until I logged in and saw that I was being throttled because I had hit my monthly limit with API calls ( on my very first use of your platform ), and my last dozen or so epochs were just not even logged ( also a bummer ). I only had that one model in training, and thought there was no way I sent over a million API requests, so had to figure out where those were coming from, and tracked it down to that STDOUT, and was like ... wait, what?!?! Found that console tab, which I did not even use before, and saw that screenshot I posted, and was like ... well, there's your problem, ha ha
Came to ClearML since it had slick dashboard and showed me the info that mattered. Loved that I could share the results of each epoch so we could make sure things were headed in the correct direction.
@<1572395184505753600:profile|GleamingSeagull15> see " Can I control what ClearML automatically logs? " in None (specifically the auto_connect_frameworks
argument to Task.init()
)
I'm not sure on the frequency it updates though
Since it's literally something we have to pay for ( which I signed up to do ) I would love to know what drives this cost
It was at 1.1M when I shut it down yesterday, and today it's at 1.24M
Maybe ClearML is using tensorboard
in ways that I can fine tune? I saw there was a manual way if you were not using tensorboard
to send over data, but the videos I saw from your team used this solution when demoing YOLOv8 on YouTube ( there were a few collab videos your team did with theirs, so I just followed their instructions ). But my gut is telling me that might be the issue for the remaining data being sent over that I have no insight into.
Welp, it's been a day with the new settings, and stats went up 140K for API calls
... going to check again tomorrow to see if any of that was spill over from yesterday
140K calls a day, how often are you sending scalars ? how long is it running? how many experiments are running ?
Glad I got that sorted. I was OK being a paying customer, but gettin overage charges for that console stuff would have been a bummer if we had not figured it out. Next month things should be back to normal 😉
I guess last followup question, is there a way to cap costs? Like if this is running at this scale, I am not sure I can use ClearML for my purpose if I am just going to get overage charged repeatedly ( which I am already looking like I will be doing ).
I guess last followup question, is there a way to cap costs?
Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user 🙂 )
I would love to be able to fine tune this as needed, but in my profile I only see a Billings & Usage, and it states at the top that "Usage data is updated once every day" ... and even then, all the shows under "Platform Usage" is number of calls performed, not what those calls were.
I appreciate your help @<1523701205467926528:profile|AgitatedDove14> 🙂
I think we're good now :) Appreciate the help !!!
is number of calls performed, not what those calls were.
oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones
If you do not have a lot of workers, that I would guess console outputs
Welp, it's been a day with the new settings, and stats went up 140K for API calls 😢 ... going to check again tomorrow to see if any of that was spill over from yesterday
But I will try to set the reduce the number of log reports first
this one, right ? report_period_sec
in ~/clearml.conf
correct ?
In case of scalars it is easy to see (maximum number of iterations is a good starting point
In future collab community videos and sample source for YoloV8, might be worthwhile to call that out as something folks might want to turn off unless they need it :) . Like I mentioned, I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally : (
Scary to think how common that might be, could be interesting way to optimize your platform, detect excessive console logging and prompt user to confirm continued usage ( or link to docs on how to disable if they want to stop it )
I did notice that the last 24 hours I dropped quite a bit, so my theory that the 140K might have some spillover from previous day might have been correct. Last 24 hours went from 1.24M to 1.32M, so about half as much as the day before, with the same training running.
FYI, found log_stdout
in that same setting and default for that was true
so set that to false
so it would not log all stdout & stderr
@<1523701087100473344:profile|SuccessfulKoala55> You are my hero !!! This is EXACTLY what I needed !!!
So, might be in the minority here, but seems like capturing stdout and sending that over to clearml via API should be disabled by default. Like I get maybe capturing stderr, but stdout? In a training scenario, that's MILLIONS of API calls just in progress bar indicators, right? Like it might actually be better for the ClearML servers just in general to make the user turn that on if they want it, otherwise we're just blasting your servers. In my case, I did not even know it was sending that over until I got into digging where these API calls were coming from, and saw the CONSOLE tab in clearml that had every single line of stdout captured.