
Reputation
Badges 1
108 × Eureka!Strange, the code seems to work perfectly when I run it locally. To make it more confusing, the queue that I enqueue it to when I run it remotely is using agents from the same server that I'm running it locally from.
I figured you'd say that so I went ahead with that PR. I got it working but I'm going to test it a bit further.
Alright, I fixed the issue with the scheduler eating itself. But now I'm still getting the same bug as two days ago. So the Scheduler process starts fine and doesn't "crash." But I don't get the config object in the web-app again. It seems to work if I run it locally.
To answer your earlier question, I'm using the app.clear.ml
portal so
- WebApp: 3.20.1-1525
- Server: 3.20.1-1299
- API: 2.28
- And my Python ClearML version: 1.14
Sure. I'm in Europe but we can also test things async.
If I wanted to do this with the ID, how would I approach it?
Hi @<1523701205467926528:profile|AgitatedDove14> . I think I'm misunderstanding something here. I have the scheduler service running. Now that it's running how does one add a new task or remove an existing task from the scheduler? I get that I can add them before starting the scheduler service but once the service is running is there any way to connect to it and change the schedule?
I thought the advantage of this service would be we could schedule tasks just by connecting to the existing t...
Interesting approach. I'll give that a try. Thanks for the reply!
Let me give that a try. Thanks for all the help.
I figured as much. This is basically what I was planning to do otherwise. I have questions around that.
- It appears that the 'extra' config is displayed in plain text on the web app and downloadable in json. I was just curious if this is best practices.
- I noticed in the AWS instance that's spun up when starting the autoscaler there's 3 settings in the config:
use_credentials_chain: false, use_iam_instance_profile: false, use_owner_token: False
are these strictly for the credentials t...
Thanks for your reply @<1523701070390366208:profile|CostlyOstrich36> Is there an example where a pipeline is built from existing tasks? I'd like to experiment with it and I don' t see any examples of what you describe with my (clearly lacking) google-fu. What happens if you wrap a function with a task.init() with a pipeline decorator or is that the process you're speaking of?
The answer is simple but also not completely obvious to someone new to the platform. So you can inject new command line args that hydra will recognize. This is what the Hydra section of args is for. However, if you enable _allow_omegaconf_edit_: True
, I think ClearML will “inject” the OmegaConf saved under the configuration object of the prior run, overwriting the overrides. I’ll experiment with this behavior a bit more to be sure.
I'm using pro. Sorry, for the delay, I didn't notice I never sent the response.
✨ It works ✨
Thanks @<1523701205467926528:profile|AgitatedDove14> 😁
In this case it's the ID of the "output" model from the first task.
This does appear to resolve the issue. I'll keep you updated if I find any other issues. Thanks @<1523701435869433856:profile|SmugDolphin23>
I might have found the answer. I'll reply if it works as expected.
I'm not sure why the logs were incomplete. I think part of the reason it wasn't pulling from the repo was that it was pulling from cache. I cleared the clearml cache for that project and reran it. This should be the full log.
I'm not self-hosting the server.
I actually ran into the exact same problem. The agents aren't hosted on AWS though, just a in-house server.
Hi again @<1523701435869433856:profile|SmugDolphin23> ,
The approach you suggested seems to be working albeit with one issue. It does correctly identify the different versions of the dataset when new data is added, but I get an error when I try and finalize the dataset:
Code:
if self.task:
# get the parent dataset from the project
parent = self.clearml_dataset = Dataset.get(
dataset_name="[LTV] Dataset",
dataset_project=...
You might want to start with the first steps guide then:
None
@<1523701070390366208:profile|CostlyOstrich36> ClearML: 1.10.1, I'm not self-hosting the server so whatever the current version is. Unless you mean the operating system?
@<1523701435869433856:profile|SmugDolphin23> Good to know.
This turns out to be a layer-8 error . task.execute_remotely
does work but there was a bug in my code and I wasn't correctly setting the reuse_task
flag when run. Sorry to bother the both of you with my mistake.
@<1523701205467926528:profile|AgitatedDove14> Then it isn't working at intended. To test it I started the scheduler and set a simple dead man snitch process to run once a day. In the web-app (on your site app.cleearml.ml), when looking at the scheduler process in the DevOps section, I was able to see a configuration file under artifacts but it was not as all obvious how you'd change that because it wasn't part of the configuration section, it was just an artifact. So I thought maybe it was b...
That's what I was getting at. It wasn't clear to me from the documentation that it saves the state.
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?
There is no issues when I run the "raw" script. Also, since it's based on tasks, the code must have run without fault for it to be pulled as a task in the pipeline.
As for when it fails, looking at the log here it looks like it's on the first task or maybe as the first task is launching. But I'd have to go back to be sure. I rolled back to 1.13.1 and that's working fine. But, if you want I can help explore this bug in detail because it would be nice to find the root of the issue. LmK what y...