Reputation
Badges 1
21 × Eureka!when adding a custom column to the table view from a param value. Maybe it happens because that param is not relevant for all the tasks in the table? it shouldnt through an error though, just show an empty value for the runs where is not relevant.
if you are doing logs i imagine these are done using Logger.report_scalar
if so. iteration is an argument of that method
basically running_locally()
ok, I think I have everything I need. Will give it a try.
found the env freeze. For the second workflow all I would need I guess then would be and env variable that would tell me whether this is being currently run by an agent or not
im trying to use https://clear.ml/docs/latest/docs/webapp/applications/apps_aws_autoscaler .
In the setup, I have to provide a personal access token (PAC) from git.
The agents when setting up the env to run the tasks from the queue cannot clone the repo using the pac
cloning: git@gitlab.com:<redacted>.git Using user/pass credentials - replacing ssh url 'git@gitlab.com:<redacted>.git' with https url '
` <redacted>.git'
Host key verification failed.
fatal: Could not read from remote repos...
dont think will be reproducible with the hydra example. It was just that I launched like 50 jobs and some of them because of the parameters maybe failed (strangely with no error).
But is ok for now I guess, will debug wether those experiments that failed would failed if ran independently as well
Im using the latest version of clearml and clearml-agenst and im seeing the same error
yes, the remote task is working 🙂
im running them with python my_script.py -m my_parameter=value_1,value_2,value_3
(using hydra multirun)
` ─ python run.py -m env=gpu clearml.task_name=connect_test "model=glob(*)" trainer_params.max_epochs=5
2022/09/14 01:10:07 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pytorch. If you encounter errors during autologging, try upgrading / downgrading pytorch to a supported version, or try upgrading MLflow.
/Users/juan/mindfoundry/git_projects/cvae/run.py:38: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level...
actually I really need help with this, ive been struggling for 2 days to make the aws autoscaler work.
what I want:
do a multirun with hydra where each of the runs get executed remotely
my implementation (iterated over several using create_function_task
, etc:
` @hydra.main(config_path="configs", config_name="ou_cvae")
def main(config: DictConfig):
curr_dir = Path(file).parent
if config.clearml.enabled:
# Task.force_requirements_env_freeze(requirements_file=str(cur...
ok, yes I mean the branch im working on. I can assume I;ve pushed it. So ill be using something like
def get_package_url() -> str: repo = Repo(Path(__file__).parent) branch_name = repo.active_branch.name remote_url = repo.remote().url return f"git+ssh://{remote_url.replace(':', '/')}@{branch_name}"
andTask.add_requirements("my_package", "@ {get_package_url()}")
or if you could point me to the part of the package that sets up the enviroment to figure out what=s worng, please
that did it! 🙌 thank you!
ok, yes, but this will install the package of the branch specified there.
So If im working on my own branch and want to run an experiment, I would have to manually put in the git path my current branch name. I guess I can add some logic to get the current branch from the env. Thank you
it doesnt happen with all the tasks of the multirun as you can see in the photo
1.- The script im running uses qiskit.providers
but as installed by when you install qiskit. If you try to install the submodules independently, it doesnt work. How do I use the full environment instead? cannot find this in the documentation. Also, I cannot configure the agents it seems because im using the aws autoscaler service so I dont spin them explicitly.
2.- My workflow would be that I usually, locally I run multiple sequential experiments using hydra multirun. What I want is th...
still the same result. What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running, they all revent to the same config by the end
each of those runs finished producing 10 plots each but in clearml only 1, a few, or none got uploaded
Yes, so here you have the three task (here is a slight refactor using task_func instead of task but the result is the same)
1- all different (status pending)
2- two equal (those which started)
3- all equal (all running or completed)
multirun is not working as expected
when I run python run.py -m env=gpu clearml.task_name=demo_all_models "model=glob(*)"
it should run remotely one run per model
this is the output I see locally
` ╰─ python run.py -m env=gpu clearml.task_name=demo_all_models "model=glob(*)"
2022/09/13 20:49:31 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pytorch. If you encounter errors during autologging, try upgrading / downgrading pytorch to a supported version, or...
it also happens with other configuration values like this one which is a boolean. I think it happens in general with configuration values that are passed in your run command as flags (using the override syntax of hydra)
waiting now for the run...
but I still have the problem if I try to run locally for debugging purposes clearml-agent execute --id ...