I Seem To Be Missing Something ... I'Ve Only Got One Task Running To Train A Segmentation Model On My Local Machine, And In A Few Days It'S Hit Over 1.15M Api Calls. It Looks Like It'S Sending Every Single Console Output ... Are There Settings To Control

Answered

I seem to be missing something ... I've only got one task running to train a segmentation model on my local machine, and in a few days it's hit over 1.15M API calls. It looks like it's sending every single console output ... are there settings to control what gets logged? I only care about the results from each epoch. I don't need each line of the console posted up ( that's 99% of the API usage right there ). I can't find a way to prevent this and can see each line in the clearml console that's already in my terminal window ( each tick in the progress bar for each epoch seems to be an API call to post that local console output to clearml ). Any tips to stop console from getting sent?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Votes Newest

Answers 51

might be a feature request then, as ya, having transparency into something we are charged for would be nice. At this point, I have zero idea what is driving this usage and just want to make sure the costs for training do not bloat too much. I personally am just using ClearML as a central dashboard for a few people. I don't need it to be live data, I just need a rough overview of progress. Even if it only posted updates to ClearML once an hour, that is honestly fine.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

I would love to be able to fine tune this as needed, but in my profile I only see a Billings & Usage, and it states at the top that "Usage data is updated once every day" ... and even then, all the shows under "Platform Usage" is number of calls performed, not what those calls were.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

hmmm, this is just a personal project, honestly was just hoping this would let me take the results of each epoch and put it in a central dashboard. Having this generate 1M+ api calls and only being like 1/4 of the way though training is a bit much. Current pricing is $1/100K API calls at the PRO tear, which I am on ... so it would be like another $50 just in API calls at this pace 😞 Would love to just cap it at a fixed amount for a month for API calls.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Welp, it's been a day with the new settings, and stats went up 140K for API calls 😢 ... going to check again tomorrow to see if any of that was spill over from yesterday

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

It'd be great if it just posted to clearml after each epoch is completed and the CSV with the results gets updated . I only care about using the dashboard to track completed progress . I can use my local computers terminal window to monitor current epoch training . No need to send that to clearml every second ;) Results once an hour or so is fine after each completes :)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Correct

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Came to ClearML since it had slick dashboard and showed me the info that mattered. Loved that I could share the results of each epoch so we could make sure things were headed in the correct direction.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Actually looking at the counts today, they've barely changed. So I think this actually fixed it, and was just that the counts are only updated daily so I needed to get 48 hours out from when I made the change to see clean results to assure no spill over counts from previous days.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Thanks, will do. Heck, for my use case, I only need like once every 10 minutes.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

each epoch runs about 55 minutes, and that screenshot I posted earlier kind of show the logs for the rest of the info being output, if you wanted to check that out

I thought you disabled the stdout log. no?

Maybe ClearML is using

tensorboard

in ways that I can fine tune? I

You can open your TB and see, every report there is logged into clearml

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

One single experiment using the code above. I have no idea how many scalars I am sending since as far as I can tell, I am not setting anything specific to define what I am sending over to ClearML, literally first time using YoloV8 or ClearML. Just using the super basic python to run.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

I guess last followup question, is there a way to cap costs? Like if this is running at this scale, I am not sure I can use ClearML for my purpose if I am just going to get overage charged repeatedly ( which I am already looking like I will be doing ).

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Hi @<1572395184505753600:profile|GleamingSeagull15>
Try adjusting:
None
to 30 sec
It will reduce the number of log reports (i.e. API calls)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally

Yeah, that is way too much, I think relates to the frequency it updates the console 😞

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

In future collab community videos and sample source for YoloV8, might be worthwhile to call that out as something folks might want to turn off unless they need it :) . Like I mentioned, I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally : (

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

FYI, found log_stdout in that same setting and default for that was true so set that to false so it would not log all stdout & stderr

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Math checks out that if I was generating around 140K a day, and this had been running for 9 days, it had 1.2M when I caught it . So I think the next day after I shut it down I was seeing previous days numbers before shut down added . And another 24 hours it barely changed, so ya, it was 100% the stdout logging .

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Under your profile you should be able to see it

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

is number of calls performed, not what those calls were.

oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

this one, right ? report_period_sec in ~/clearml.conf correct ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

I guess last followup question, is there a way to cap costs?

Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user 🙂 )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Would love to just cap it at a fixed amount for a month for API calls.

Try the timeout configuration, I think this shoud solve all your issues, and will be fairly easy to set for everyone

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

My training is on roughly 50 classes as a subset of the Open Images Dataset for Segmentation

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Glad I got that sorted. I was OK being a paying customer, but gettin overage charges for that console stuff would have been a bummer if we had not figured it out. Next month things should be back to normal 😉

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

I think we're good now :) Appreciate the help !!!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

I'm not sure on the frequency it updates though

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So, might be in the minority here, but seems like capturing stdout and sending that over to clearml via API should be disabled by default. Like I get maybe capturing stderr, but stdout? In a training scenario, that's MILLIONS of API calls just in progress bar indicators, right? Like it might actually be better for the ClearML servers just in general to make the user turn that on if they want it, otherwise we're just blasting your servers. In my case, I did not even know it was sending that over until I got into digging where these API calls were coming from, and saw the CONSOLE tab in clearml that had every single line of stdout captured.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Ya, sorry, I meant that if you needed more info on what was being run, it was in that screenshot ( showed instances/epochs/batch size, etc ) . But yes, it's since been disabled .

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

In case of scalars it is easy to see (maximum number of iterations is a good starting point

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I am running this on a 3090 GPU locally, just been letting it run for about two weeks now I think. Just have the one GPU, ha ha. It's at epoch 368 out of the 1,000 I have it set to cap out on ( if it does not hit the default YOLO "patience" limit of 50 before then and self terminate ).

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GleamingSeagull15
				
					0
					 × 1

Show more results

Write your answer

167K Views

51 Answers

2 years ago