
Reputation
Badges 1
108 × Eureka!No error. Just a new task each time.
Awesome! Did you managed to solve the tailscale issue with ClearML sessions? Sorry I wasn't active with that. I don't use sessions often and I found a suitable alternative in the short time. Any hopes of the changes making their way to a PR for the official release?
I've recently run into this error myself. Did you find any resolution?
Well, if I stop the cron service and start it back up I don't have to re-register each schedule. If, for instance, I start the TaskScheduler, register a task, and stop the scheduler, how do I restart the TaskScheduler in a way that re-register the tasks? Because, in theory, they could be registered from several users and I might be unaware of tasks that were previously scheduled. What is the best practices to preserve state?
@<1523701070390366208:profile|CostlyOstrich36> Just pinging you 😄
I see. Thanks for the insight. That seems to be the case. I'm struggling a bit with datasets. For example, if I wanted to trace the genealogy of a dataset that's used by traditional tasks and pipelines. I'll try and write something up about the challenges around that when I get the chance. But your comment revealed another issue:
It appears that the partial name matching isn't going well. I'm unclear why this wouldn't be matching. In the attached photo you can see the input for `partial_nam...
Alright, I'll try and put that together for Monday.
The original file sizes are the same but the compressed sizes seem to be different.
@<1539780284646428672:profile|PoisedElephant79> Are you sure you're not simply referring to the get operation? That seems to exclude archived datasets. But I don't see anything like that for the list_datasets operation.
Thanks again for the info. I might experiment with it to see first hand what the advantages are.
I'm not self-hosting the server.
So far when I delete a task or dataset using the web interface that has artifacts on S3 it doesn't prompt me for credentials.
Hi Again Eugen,
If I use the hyperparameter tool in ClearML, won't that create a different experiment for every step of the hyperparameter-optimizer? So this will be run across experiments. I could do something with pipelines but since the metrics are already available in the ClearML hyperparameter/metric tables I thought it would make sense to be able to plot against those values.
Sorry I disappeared (went on a well deserved vacation). The problem is happening because of the ordering of the install. If I install using pip install -r ./requirements.txt
then pip installs the packages in the order of the requirements file. However, during the installation process from ClearML, it installs the packages in order UNLESS there's a custom path provided, then it's saved for last. The reason this breaks my code is I have later packages that depend on the custom packages, as ...
Provide a bit more detail. What framework are you using?
This does appear to resolve the issue. I'll keep you updated if I find any other issues. Thanks @<1523701435869433856:profile|SmugDolphin23>
Depending on the framework you're using it'll just hook into the save model operation. Every time you save a model, which will probably happen every epoch for some subset of the training. If you want to do it with the existing framework you could change the checkpoint so that it only clones the best model in memory and saves the write operation for last. The risk with this is if the training crashes, you'll lose your best model.
Optionally, you could also disable the ClearML integration with...
@<1539780284646428672:profile|PoisedElephant79> Sorry for not getting back with this sooner. Dataset.get() doesn't work like you suggested. In the documentation it's clear:
Get a
specific
Dataset. If multiple datasets are found, the dataset with the highest semantic version is returned. If no semantic version is found, the most recently updated dataset is returned. This functions raises an Exception in case no dataset can be found and the
auto_create=True
...
From the logs it looks like the HPO application finds a worker from the queue, attempts to serialize the config sent to the worker, and crashes because of the version conflict with Pyro4. But I don't think we control any of that. I might be misunderstanding something. 🙃
That make sense. I was confused what the source was.
I'd like to provide the credentials to any ec2 instances that are spun up.
That's what I was getting at. It wasn't clear to me from the documentation that it saves the state.
They will be related through the task. Get the task information from the dataset, then get the model information from the task.
Yeah, it's because it's just hooking into the save operation and capturing the output, regardless of the parent call.
Oh, duh. I'll test that out. But I did have the agent.force_git_ssh_protocol: true
The git credentials are stored in the agent config and they work when I tested them on another project (not for setting up the environment but for downloading the repo of the task itself.)