✨ It works ✨
Thanks @<1523701205467926528:profile|AgitatedDove14> 😁
Interesting approach. I'll give that a try. Thanks for the reply!
I made a video of the Scheduler config error. You can see that the same code run locally works and doesn't on remote. (I just uploaded the video so the quality might suffer until YT finishes processing the higher resolution versions).
Ah, I think I see the issue. In my head I was crossing ID with URL.
This is odd, the ordering of the files is different and there appears to be some missing from the preview. But as far as I can tell the files aren't different. What am I missing here?
I figured you'd say that so I went ahead with that PR. I got it working but I'm going to test it a bit further.
Thanks for always checking in @<1523701087100473344:profile|SuccessfulKoala55> 😛
Thanks Eugen for the quick reply. If I can add a suggestion/comment from my perspective: Why is schedule_function
included in the .add_task()
method? As far as I can tell if you use schedule_function
it changes the very nature of the method, it's no longer adding a task but adding a function . It seems like it would make more sense if this was broken into something like an .add_function()
method. Also, if you call schedule_function
many of the other parameters in `.add...
The original file sizes are the same but the compressed sizes seem to be different.
Strange, the code seems to work perfectly when I run it locally. To make it more confusing, the queue that I enqueue it to when I run it remotely is using agents from the same server that I'm running it locally from.
Yes, I'm experimenting with this. I actually wrote my own process to do this so I just had to adapt it as a callable to pass to the scheduler. However, I'm running into an issue and I don't think this is a user error this time. When I start the scheduler, it starts running, shows up in the web-app, but then an error message in the web-app pops up Fetch parents failed
and the Scheduler task disappears from the web-app. I can't even see an error log because the task is gone.
I'm running th...
I might have found the answer. I'll reply if it works as expected.
Sure. I'm in Europe but we can also test things async.
No error. Just a new task each time.
Hi @<1523701087100473344:profile|SuccessfulKoala55> - We tried to delete some additional hyperparameter tunings but it doesn't seem to have impacted metrics stored. It's not clear to me what is occupying all the metric storage space.
I have manually verified that the line-by-line content of the csv files is identical using hashlib.sha256(). Why would it be that the file content is the same, they are generated by the same process (literally just rerunning the same code twice) but ClearML treats them differently.
@<1523701435869433856:profile|SmugDolphin23> I spoke too soon. It does resolve the error I posted but it introduces a new error. While this error does seem to be related to VS Code the strange thing is it doesn't occur if I run it with earlier versions of clearml
.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/natephysics/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendo...
That's what I was getting at. It wasn't clear to me from the documentation that it saves the state.
Sounds good. Lmk if there's some changes that are required.
Hyperdatasets are the only ones that require a premium. If you're using normal datasets it should be fine.
They will be related through the task. Get the task information from the dataset, then get the model information from the task.
It's even attempting to install omegaconf but not from the repo, likely because it's a dependency of hydra-colorlog.
Collecting omegaconf<2.4,>=2.2
Using cached omegaconf-2.2.3-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.2-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.1-py3-none-any.whl (78 kB)
Is it possible the cached repository was cloned before you changed your agent settings?
Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?
Also, did you set
agent.enable_git_ask_pass: true
?
The only instance of it in the config is commented out.
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves pas...
Actually this is not how it works, pip will install in any way it sees fit, and it is not consistent between versions (it has to do with dependency resolving)
Oh I see. What a pain. 🤣
You can configure the agent to first install specific packages, and only then others, just add the package names here:
That's an interesting solution. I'll keep that in mind as I work more with ClearML.
Thanks for your help Martin!
It seems that the error is related to this part of the code block. However, when I comment this out I get the error I had 2 days ago with the missing configuration object.
This does appear to resolve the issue. I'll keep you updated if I find any other issues. Thanks @<1523701435869433856:profile|SmugDolphin23>
Maybe the sleep between scheduler.mark_completed()
and scheduler.delete()
is too short? But I don't get why deleting the old scheduler task would break the new scheduler. I'm going to try testing by running the scheduler locally.
Alright, I fixed the issue with the scheduler eating itself. But now I'm still getting the same bug as two days ago. So the Scheduler process starts fine and doesn't "crash." But I don't get the config object in the web-app again. It seems to work if I run it locally.
To answer your earlier question, I'm using the app.clear.ml
portal so
- WebApp: 3.20.1-1525
- Server: 3.20.1-1299
- API: 2.28
- And my Python ClearML version: 1.14
@<1523701087100473344:profile|SuccessfulKoala55> You wouldn't happen to know what's going on here. :D