Hi Jake 👍 ,
Maybe the content is cached? The repo isn't big. I didn't realize the log was missing content. I believe I copied everything but I'll double check in a moment.
✨ It works ✨
Thanks @<1523701205467926528:profile|AgitatedDove14> 😁
Is there currently a way to bind the same GPU to multiple queues? I believe the agent complains last time I tried (which was a bit ago).
This is odd, the ordering of the files is different and there appears to be some missing from the preview. But as far as I can tell the files aren't different. What am I missing here?
As far as I can tell there's nothing else running that isn't running on our hardware. Is there some way to see what application instances are active?
From the logs it looks like the HPO application finds a worker from the queue, attempts to serialize the config sent to the worker, and crashes because of the version conflict with Pyro4. But I don't think we control any of that. I might be misunderstanding something. 🙃
Thanks Martin. I read this method as "getting the data associated with the model training" not "getting metadata for the model". This is what I'm looking for.
Alright, I deleted everything in the ClearML web-app waited a day tried again, it seems to be showing a configuration object in the configuration section of the scheduler task again. I honestly don't know what changed. Maybe some strange caching on the server side that got cleaned up.
@<1523701205467926528:profile|AgitatedDove14> Question: Does the schedule_function option in the TaskScheduler.add_task() method run at the time the task is scheduled to execute? So if I pass a functi...
I found I was having this issue as well. I don't have an alias defined in the pipeline but in a task and I get the same error. I'm not hosting my own server but using the free web service at the moment.
Hi again @<1523701435869433856:profile|SmugDolphin23> ,
The approach you suggested seems to be working albeit with one issue. It does correctly identify the different versions of the dataset when new data is added, but I get an error when I try and finalize the dataset:
Code:
if self.task:
# get the parent dataset from the project
parent = self.clearml_dataset = Dataset.get(
dataset_name="[LTV] Dataset",
dataset_project=...
I will add a gh issue. Is this part open source? Could I make a PR?
In the mean time I still need to implement this with the current version of ClearML. So the only way would be to have one variable per parent? Is there any smarter way to work around it?
If I wanted to do this with the ID, how would I approach it?
Hi @<1523701087100473344:profile|SuccessfulKoala55> - We tried to delete some additional hyperparameter tunings but it doesn't seem to have impacted metrics stored. It's not clear to me what is occupying all the metric storage space.
We have a server that has many agents running on it because there are many instances where training can be run over several agents as a single agent doesn't take up all the resources available to the server.
@<1539780284646428672:profile|PoisedElephant79> Are you sure you're not simply referring to the get operation? That seems to exclude archived datasets. But I don't see anything like that for the list_datasets operation.
I had 2 datasets on archive and 0 unarchived. When I ran the following command:
Dataset.list_datasets(dataset_project=self.task.get_project_name(), only_completed=True)
It returned two entrees for the two datasets I had on archive.
Do you start the clearml agents on the server with the same user that has the credentials saved?
I've recently run into this error myself. Did you find any resolution?
@<1523701435869433856:profile|SmugDolphin23> Yes. I'll try it in about 14 hours when I'm back at work and let you know how it goes. 😂
The git credentials are stored in the agent config and they work when I tested them on another project (not for setting up the environment but for downloading the repo of the task itself.)
@<1523701070390366208:profile|CostlyOstrich36> ClearML: 1.10.1, I'm not self-hosting the server so whatever the current version is. Unless you mean the operating system?
@<1523701435869433856:profile|SmugDolphin23> Good to know.
Will this return a list of datasets?
You might want to start with the first steps guide then:
None
Oh, duh. I'll test that out. But I did have the agent.force_git_ssh_protocol: true