That's what I was getting at. It wasn't clear to me from the documentation that it saves the state.
No error. Just a new task each time.
@<1539780284646428672:profile|PoisedElephant79> Sorry for not getting back with this sooner. Dataset.get() doesn't work like you suggested. In the documentation it's clear:
Get a
specific
Dataset. If multiple datasets are found, the dataset with the highest semantic version is returned. If no semantic version is found, the most recently updated dataset is returned. This functions raises an Exception in case no dataset can be found and the
auto_create=True
...
Thanks for the reply @<1523701070390366208:profile|CostlyOstrich36> !
It says in the documentation that:
Add a folder into the current dataset. calculate file hash, and compare against parent, mark files to be uploaded
It seems to recognize the dataset as another version of the data but doesn't seem to be validating the hashes on a per file basis. Also, if you look at the photo, it seems like some of the data does get recognized as the same as the prior data. It seems like it's the correct...
This is odd, the ordering of the files is different and there appears to be some missing from the preview. But as far as I can tell the files aren't different. What am I missing here?
Actually, clearing the cache on the other project might have fixed it. I just tested it out and it seems to be working.
@<1523701205467926528:profile|AgitatedDove14>
And the Task is still running? What's he clearml python version and webui version ?
No, the task stops (it's running remote, I haven't tested it running local).
They will be related through the task. Get the task information from the dataset, then get the model information from the task.
I will add a gh issue. Is this part open source? Could I make a PR?
In the mean time I still need to implement this with the current version of ClearML. So the only way would be to have one variable per parent? Is there any smarter way to work around it?
Since this could happen with a lot of services, maybe it would be worth a retry option? Especially if it's part of a pipeline.
After some digging we found it was actually caused by the routers IPS protection. I thought it would be strange for github to be throttling things at this scale.
Hi @<1523701435869433856:profile|SmugDolphin23>
I'm a bit confused by your suggestion. To be clear, this is the logs from the HPO application instance that's spun up when you start the HPO process. I don't think we have any control over what python version or Pyro version is started in the application instance. I think this error occurs before any code on our end is run.
I just checked the clearml.conf and I'm not specifying any version of python for the agents.
Provide a bit more detail. What framework are you using?
This does appear to resolve the issue. I'll keep you updated if I find any other issues. Thanks @<1523701435869433856:profile|SmugDolphin23>
That make sense. I was confused what the source was.
I'm not self-hosting the server.
Thanks again for the info. I might experiment with it to see first hand what the advantages are.
I actually ran into the exact same problem. The agents aren't hosted on AWS though, just a in-house server.
I made a video of the Scheduler config error. You can see that the same code run locally works and doesn't on remote. (I just uploaded the video so the quality might suffer until YT finishes processing the higher resolution versions).
@<1523701205467926528:profile|AgitatedDove14> Then it isn't working at intended. To test it I started the scheduler and set a simple dead man snitch process to run once a day. In the web-app (on your site app.cleearml.ml), when looking at the scheduler process in the DevOps section, I was able to see a configuration file under artifacts but it was not as all obvious how you'd change that because it wasn't part of the configuration section, it was just an artifact. So I thought maybe it was b...
Project 2:
2024-01-22 17:21:56
task 6518c3cd13394aa4abbc8f0dc34eb763 pulled from 8a69a982f5824762aeac7b000fbf2161 by worker bigbrother:10
2024-01-22 17:22:03
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.bojpliyx.cfg):
----------------------
agent.worker_id = bigbrother:10
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_v...
I see. Thanks for the insight. That seems to be the case. I'm struggling a bit with datasets. For example, if I wanted to trace the genealogy of a dataset that's used by traditional tasks and pipelines. I'll try and write something up about the challenges around that when I get the chance. But your comment revealed another issue:
It appears that the partial name matching isn't going well. I'm unclear why this wouldn't be matching. In the attached photo you can see the input for `partial_nam...
Interesting approach. I'll give that a try. Thanks for the reply!
The original file sizes are the same but the compressed sizes seem to be different.