Reputation
Badges 1
100 × Eureka!I'm aware of that but it doesn't help this situation.
@<1523701435869433856:profile|SmugDolphin23> Yes. I'll try it in about 14 hours when I'm back at work and let you know how it goes. 😂
After some digging we found it was actually caused by the routers IPS protection. I thought it would be strange for github to be throttling things at this scale.
It sounds like you didn't set up your config. Did you ever initialize clearml?
They will be related through the task. Get the task information from the dataset, then get the model information from the task.
I might have found the answer. I'll reply if it works as expected.
That make sense. I was confused what the source was.
I'm using pro. Sorry, for the delay, I didn't notice I never sent the response.
Thanks for the reply @<1523701070390366208:profile|CostlyOstrich36> !
It says in the documentation that:
Add a folder into the current dataset. calculate file hash, and compare against parent, mark files to be uploaded
It seems to recognize the dataset as another version of the data but doesn't seem to be validating the hashes on a per file basis. Also, if you look at the photo, it seems like some of the data does get recognized as the same as the prior data. It seems like it's the correct...
I have manually verified that the line-by-line content of the csv files is identical using hashlib.sha256(). Why would it be that the file content is the same, they are generated by the same process (literally just rerunning the same code twice) but ClearML treats them differently.
The original file sizes are the same but the compressed sizes seem to be different.
Alright, I tried testing it out by commenting out the code for generating new csv's, so for successive runs the CSVs are identical. However, when I use dataset.add_files() it still generated a new version of the dataset.
# log the data to ClearML if a task is passed
if self.task:
self.clearml_dataset = Dataset.create(dataset_name="[LTV] Dataset")
self.clearml_dataset.add_files(path=save_path, verbose=True)
if self.tags is not None:
...
Thanks again for the info. I might experiment with it to see first hand what the advantages are.
It's a corporate one. We are also looking into options on Github's end.
This does appear to resolve the issue. I'll keep you updated if I find any other issues. Thanks @<1523701435869433856:profile|SmugDolphin23>
Ah, that makes sense. What is supposed to be hidden changes depending on the section your in, which makes sense. Now there needs to a packman sprite easter egg hidden somewhere else.
✨ It works ✨
Thanks @<1523701205467926528:profile|AgitatedDove14> 😁
Well, if I stop the cron service and start it back up I don't have to re-register each schedule. If, for instance, I start the TaskScheduler, register a task, and stop the scheduler, how do I restart the TaskScheduler in a way that re-register the tasks? Because, in theory, they could be registered from several users and I might be unaware of tasks that were previously scheduled. What is the best practices to preserve state?
Hi Jake 👍 ,
Maybe the content is cached? The repo isn't big. I didn't realize the log was missing content. I believe I copied everything but I'll double check in a moment.
This doesn't really make a lot of sense. ClearML would be better served for tracking which version of the code you used for a corresponding task and you'd use something like github or gitlab to track code and host your code. You could use ClearML to help you reconstruct the environment and code from a task given it's being tracked by git and hosted somewhere you can access.
Hi @<1523701087100473344:profile|SuccessfulKoala55> - We tried to delete some additional hyperparameter tunings but it doesn't seem to have impacted metrics stored. It's not clear to me what is occupying all the metric storage space.
Yes, I'm experimenting with this. I actually wrote my own process to do this so I just had to adapt it as a callable to pass to the scheduler. However, I'm running into an issue and I don't think this is a user error this time. When I start the scheduler, it starts running, shows up in the web-app, but then an error message in the web-app pops up Fetch parents failed
and the Scheduler task disappears from the web-app. I can't even see an error log because the task is gone.
I'm running th...
Yes, it indeed appears to be a regex issue. If I run:
Dataset.list_datasets(
dataset_project=self.task.get_project_name(),
partial_name=re.escape('[LTV] Dataset Test'),
only_completed=True,
)
It works as expected. I'm not sure how raw you want to leave the partial_name features. I could create a PR to fix this but would you want me to re.escape at the list_datasets()
level? Or go deeper and do it at `Task._query_task...
Is there currently a way to bind the same GPU to multiple queues? I believe the agent complains last time I tried (which was a bit ago).
That's great! I look forward to trying this out.
Hi Again Eugen,
If I use the hyperparameter tool in ClearML, won't that create a different experiment for every step of the hyperparameter-optimizer? So this will be run across experiments. I could do something with pipelines but since the metrics are already available in the ClearML hyperparameter/metric tables I thought it would make sense to be able to plot against those values.
Depending on the framework you're using it'll just hook into the save model operation. Every time you save a model, which will probably happen every epoch for some subset of the training. If you want to do it with the existing framework you could change the checkpoint so that it only clones the best model in memory and saves the write operation for last. The risk with this is if the training crashes, you'll lose your best model.
Optionally, you could also disable the ClearML integration with...