Reputation
Badges 1
98 × Eureka!That behavior seems strange. In the pipeline in the clearML pagem if you click on one of the steps and select full details (see attached) you can see the commit ID and the branch. Can you validate that the branch is correct but the commit ID is incorrect?
That's great! I look forward to trying this out.
Is there currently a way to bind the same GPU to multiple queues? I believe the agent complains last time I tried (which was a bit ago).
No error. Just a new task each time.
This doesn't really make a lot of sense. ClearML would be better served for tracking which version of the code you used for a corresponding task and you'd use something like github or gitlab to track code and host your code. You could use ClearML to help you reconstruct the environment and code from a task given it's being tracked by git and hosted somewhere you can access.
Sorry I disappeared (went on a well deserved vacation). The problem is happening because of the ordering of the install. If I install using pip install -r ./requirements.txt
then pip installs the packages in the order of the requirements file. However, during the installation process from ClearML, it installs the packages in order UNLESS there's a custom path provided, then it's saved for last. The reason this breaks my code is I have later packages that depend on the custom packages, as ...
@<1523701205467926528:profile|AgitatedDove14>
And the Task is still running? What's he clearml python version and webui version ?
No, the task stops (it's running remote, I haven't tested it running local).
There is no issues when I run the "raw" script. Also, since it's based on tasks, the code must have run without fault for it to be pulled as a task in the pipeline.
As for when it fails, looking at the log here it looks like it's on the first task or maybe as the first task is launching. But I'd have to go back to be sure. I rolled back to 1.13.1 and that's working fine. But, if you want I can help explore this bug in detail because it would be nice to find the root of the issue. LmK what y...
They will be related through the task. Get the task information from the dataset, then get the model information from the task.
Alright, I deleted everything in the ClearML web-app waited a day tried again, it seems to be showing a configuration object in the configuration section of the scheduler task again. I honestly don't know what changed. Maybe some strange caching on the server side that got cleaned up.
@<1523701205467926528:profile|AgitatedDove14> Question: Does the schedule_function
option in the TaskScheduler.add_task()
method run at the time the task is scheduled to execute? So if I pass a functi...
Alright, I fixed the issue with the scheduler eating itself. But now I'm still getting the same bug as two days ago. So the Scheduler process starts fine and doesn't "crash." But I don't get the config object in the web-app again. It seems to work if I run it locally.
To answer your earlier question, I'm using the app.clear.ml
portal so
- WebApp: 3.20.1-1525
- Server: 3.20.1-1299
- API: 2.28
- And my Python ClearML version: 1.14
The answer is simple but also not completely obvious to someone new to the platform. So you can inject new command line args that hydra will recognize. This is what the Hydra section of args is for. However, if you enable _allow_omegaconf_edit_: True
, I think ClearML will βinjectβ the OmegaConf saved under the configuration object of the prior run, overwriting the overrides. Iβll experiment with this behavior a bit more to be sure.
I might have found the answer. I'll reply if it works as expected.
It seems that the error is related to this part of the code block. However, when I comment this out I get the error I had 2 days ago with the missing configuration object.
β¨ It works β¨
Thanks @<1523701205467926528:profile|AgitatedDove14> π
Yes, I'm experimenting with this. I actually wrote my own process to do this so I just had to adapt it as a callable to pass to the scheduler. However, I'm running into an issue and I don't think this is a user error this time. When I start the scheduler, it starts running, shows up in the web-app, but then an error message in the web-app pops up Fetch parents failed
and the Scheduler task disappears from the web-app. I can't even see an error log because the task is gone.
I'm running th...
Strange, the code seems to work perfectly when I run it locally. To make it more confusing, the queue that I enqueue it to when I run it remotely is using agents from the same server that I'm running it locally from.
Thanks again for the info. I might experiment with it to see first hand what the advantages are.
Thanks Martin. I read this method as "getting the data associated with the model training" not "getting metadata for the model". This is what I'm looking for.
Thanks for your reply @<1523701070390366208:profile|CostlyOstrich36> Is there an example where a pipeline is built from existing tasks? I'd like to experiment with it and I don' t see any examples of what you describe with my (clearly lacking) google-fu. What happens if you wrap a function with a task.init() with a pipeline decorator or is that the process you're speaking of?
Hi Jake π ,
Maybe the content is cached? The repo isn't big. I didn't realize the log was missing content. I believe I copied everything but I'll double check in a moment.
Thanks, that's exactly what I was looking for.
Sure. I'm in Europe but we can also test things async.
Oh, I get what's happening. That segment of the code is rerun when the task is enqueued remotely. So it's deleting itself. This also explains why it works fine locally. It's an ouroboros, the task is deleting itself.
I found I was having this issue as well. I don't have an alias defined in the pipeline but in a task and I get the same error. I'm not hosting my own server but using the free web service at the moment.
Is it possible the cached repository was cloned before you changed your agent settings?
Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?
Also, did you set
agent.enable_git_ask_pass: true
?
The only instance of it in the config is commented out.
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves pas...
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?