Reputation
Badges 1
98 × Eureka!That's what I was getting at. It wasn't clear to me from the documentation that it saves the state.
This doesn't really make a lot of sense. ClearML would be better served for tracking which version of the code you used for a corresponding task and you'd use something like github or gitlab to track code and host your code. You could use ClearML to help you reconstruct the environment and code from a task given it's being tracked by git and hosted somewhere you can access.
Actually this is not how it works, pip will install in any way it sees fit, and it is not consistent between versions (it has to do with dependency resolving)
Oh I see. What a pain. 🤣
You can configure the agent to first install specific packages, and only then others, just add the package names here:
That's an interesting solution. I'll keep that in mind as I work more with ClearML.
Thanks for your help Martin!
This turns out to be a layer-8 error . task.execute_remotely
does work but there was a bug in my code and I wasn't correctly setting the reuse_task
flag when run. Sorry to bother the both of you with my mistake.
This is odd, the ordering of the files is different and there appears to be some missing from the preview. But as far as I can tell the files aren't different. What am I missing here?
The original file sizes are the same but the compressed sizes seem to be different.
Alright, I tried testing it out by commenting out the code for generating new csv's, so for successive runs the CSVs are identical. However, when I use dataset.add_files() it still generated a new version of the dataset.
# log the data to ClearML if a task is passed
if self.task:
self.clearml_dataset = Dataset.create(dataset_name="[LTV] Dataset")
self.clearml_dataset.add_files(path=save_path, verbose=True)
if self.tags is not None:
...
Thanks for the reply @<1523701070390366208:profile|CostlyOstrich36> !
It says in the documentation that:
Add a folder into the current dataset. calculate file hash, and compare against parent, mark files to be uploaded
It seems to recognize the dataset as another version of the data but doesn't seem to be validating the hashes on a per file basis. Also, if you look at the photo, it seems like some of the data does get recognized as the same as the prior data. It seems like it's the correct...
@<1539780284646428672:profile|PoisedElephant79> Are you sure you're not simply referring to the get operation? That seems to exclude archived datasets. But I don't see anything like that for the list_datasets operation.
@<1523701435869433856:profile|SmugDolphin23> Yes. I'll try it in about 14 hours when I'm back at work and let you know how it goes. 😂
Thanks Martin. I read this method as "getting the data associated with the model training" not "getting metadata for the model". This is what I'm looking for.
I will add a gh issue. Is this part open source? Could I make a PR?
In the mean time I still need to implement this with the current version of ClearML. So the only way would be to have one variable per parent? Is there any smarter way to work around it?
✨ It works ✨
Thanks @<1523701205467926528:profile|AgitatedDove14> 😁
@<1523701070390366208:profile|CostlyOstrich36> ClearML: 1.10.1, I'm not self-hosting the server so whatever the current version is. Unless you mean the operating system?
@<1523701435869433856:profile|SmugDolphin23> Good to know.
Oh, I get what's happening. That segment of the code is rerun when the task is enqueued remotely. So it's deleting itself. This also explains why it works fine locally. It's an ouroboros, the task is deleting itself.
I figured as much. This is basically what I was planning to do otherwise. I have questions around that.
- It appears that the 'extra' config is displayed in plain text on the web app and downloadable in json. I was just curious if this is best practices.
- I noticed in the AWS instance that's spun up when starting the autoscaler there's 3 settings in the config:
use_credentials_chain: false, use_iam_instance_profile: false, use_owner_token: False
are these strictly for the credentials t...
There is no issues when I run the "raw" script. Also, since it's based on tasks, the code must have run without fault for it to be pulled as a task in the pipeline.
As for when it fails, looking at the log here it looks like it's on the first task or maybe as the first task is launching. But I'd have to go back to be sure. I rolled back to 1.13.1 and that's working fine. But, if you want I can help explore this bug in detail because it would be nice to find the root of the issue. LmK what y...
Interesting approach. I'll give that a try. Thanks for the reply!
I made a video of the Scheduler config error. You can see that the same code run locally works and doesn't on remote. (I just uploaded the video so the quality might suffer until YT finishes processing the higher resolution versions).
Sorry I disappeared (went on a well deserved vacation). The problem is happening because of the ordering of the install. If I install using pip install -r ./requirements.txt
then pip installs the packages in the order of the requirements file. However, during the installation process from ClearML, it installs the packages in order UNLESS there's a custom path provided, then it's saved for last. The reason this breaks my code is I have later packages that depend on the custom packages, as ...
Well, if I stop the cron service and start it back up I don't have to re-register each schedule. If, for instance, I start the TaskScheduler, register a task, and stop the scheduler, how do I restart the TaskScheduler in a way that re-register the tasks? Because, in theory, they could be registered from several users and I might be unaware of tasks that were previously scheduled. What is the best practices to preserve state?
Hi Again Eugen,
If I use the hyperparameter tool in ClearML, won't that create a different experiment for every step of the hyperparameter-optimizer? So this will be run across experiments. I could do something with pipelines but since the metrics are already available in the ClearML hyperparameter/metric tables I thought it would make sense to be able to plot against those values.
Hyperdatasets are the only ones that require a premium. If you're using normal datasets it should be fine.
It sounds like you didn't set up your config. Did you ever initialize clearml?
You might want to start with the first steps guide then:
None
I actually ran into the exact same problem. The agents aren't hosted on AWS though, just a in-house server.
Awesome! Did you managed to solve the tailscale issue with ClearML sessions? Sorry I wasn't active with that. I don't use sessions often and I found a suitable alternative in the short time. Any hopes of the changes making their way to a PR for the official release?
In this case it's the ID of the "output" model from the first task.