Reputation
Badges 1
100 × Eureka!Will this return a list of datasets?
I found I was having this issue as well. I don't have an alias defined in the pipeline but in a task and I get the same error. I'm not hosting my own server but using the free web service at the moment.
It's even attempting to install omegaconf but not from the repo, likely because it's a dependency of hydra-colorlog.
Collecting omegaconf<2.4,>=2.2
Using cached omegaconf-2.2.3-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.2-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.1-py3-none-any.whl (78 kB)
I actually ran into the exact same problem. The agents aren't hosted on AWS though, just a in-house server.
I figured as much. This is basically what I was planning to do otherwise. I have questions around that.
- It appears that the 'extra' config is displayed in plain text on the web app and downloadable in json. I was just curious if this is best practices.
- I noticed in the AWS instance that's spun up when starting the autoscaler there's 3 settings in the config:
use_credentials_chain: false, use_iam_instance_profile: false, use_owner_token: False
are these strictly for the credentials t...
@<1523701087100473344:profile|SuccessfulKoala55> You wouldn't happen to know what's going on here. :D
1707128614082 bigbrother:gpu0 INFO task 59d23c5919b04fd6947c1e463fa8c78c pulled from 9890a035b8f84872ab18d7ff207c26c6 by worker bigbrother:gpu0
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.vo_oc47r.cfg):
----------------------
agent.worker_id = bigbrother:gpu0
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = ...
It's verbatim from requirements as I pass that into ClearML.
The answer is simple but also not completely obvious to someone new to the platform. So you can inject new command line args that hydra will recognize. This is what the Hydra section of args is for. However, if you enable _allow_omegaconf_edit_: True
, I think ClearML will “inject” the OmegaConf saved under the configuration object of the prior run, overwriting the overrides. I’ll experiment with this behavior a bit more to be sure.
Thanks for always checking in @<1523701087100473344:profile|SuccessfulKoala55> 😛
Let me give that a try. Thanks for all the help.
Is it possible the cached repository was cloned before you changed your agent settings?
Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?
Also, did you set
agent.enable_git_ask_pass: true
?
The only instance of it in the config is commented out.
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves pas...
@<1523701205467926528:profile|AgitatedDove14>
And the Task is still running? What's he clearml python version and webui version ?
No, the task stops (it's running remote, I haven't tested it running local).
@<1539780284646428672:profile|PoisedElephant79> Sorry for not getting back with this sooner. Dataset.get() doesn't work like you suggested. In the documentation it's clear:
Get a
specific
Dataset. If multiple datasets are found, the dataset with the highest semantic version is returned. If no semantic version is found, the most recently updated dataset is returned. This functions raises an Exception in case no dataset can be found and the
auto_create=True
...
Unfortunately, that doesn't seem to have solved the problem. I tried the same thing with https and it seems to skip the lines with the @ symbol like it did before. Honestly, it seems more like it just isn't parsing those lines during the install.
Collecting darts==0.25.0
Using cached darts-0.25.0-py3-none-any.whl (760 kB)
Collecting lightgbm
Using cached lightgbm-4.1.0-py3-none-manylinux_2_28_x86_64.whl (3.1 MB)
Collecting prophet
Using cached prophet-1.1.4-py3-none-manylinux_2_1...
The git credentials are stored in the agent config and they work when I tested them on another project (not for setting up the environment but for downloading the repo of the task itself.)
In the debugger I can see that before starting the scheduler the test task is added:
ScheduleJob(name='Snitch-TaskScheduler', base_task_id='', base_function=<function main.<locals>.scheduler_function.<locals>.<lambda> at 0x7f05e1ab3600>, queue='services', target_project='DevOps', single_instance=False, task_parameters=None, task_overrides=None, clone_task=True, _executed_instances=None, execution_limit_hours=None, recurring=True, starting_time=datetime.datetime(2024, 1, 17, 10, 50, 28,...
Are you self hosting a ClearML server?
Sure. I'm in Europe but we can also test things async.
I'm not self-hosting the server.
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?
Hi @<1523701205467926528:profile|AgitatedDove14> . I think I'm misunderstanding something here. I have the scheduler service running. Now that it's running how does one add a new task or remove an existing task from the scheduler? I get that I can add them before starting the scheduler service but once the service is running is there any way to connect to it and change the schedule?
I thought the advantage of this service would be we could schedule tasks just by connecting to the existing t...
This is odd, the ordering of the files is different and there appears to be some missing from the preview. But as far as I can tell the files aren't different. What am I missing here?
I'd like to provide the credentials to any ec2 instances that are spun up.
That behavior seems strange. In the pipeline in the clearML pagem if you click on one of the steps and select full details (see attached) you can see the commit ID and the branch. Can you validate that the branch is correct but the commit ID is incorrect?
Alright, I fixed the issue with the scheduler eating itself. But now I'm still getting the same bug as two days ago. So the Scheduler process starts fine and doesn't "crash." But I don't get the config object in the web-app again. It seems to work if I run it locally.
To answer your earlier question, I'm using the app.clear.ml
portal so
- WebApp: 3.20.1-1525
- Server: 3.20.1-1299
- API: 2.28
- And my Python ClearML version: 1.14
Hyperdatasets are the only ones that require a premium. If you're using normal datasets it should be fine.
Oh, I get what's happening. That segment of the code is rerun when the task is enqueued remotely. So it's deleting itself. This also explains why it works fine locally. It's an ouroboros, the task is deleting itself.