
Reputation
Badges 1
981 × Eureka!Should I open an issue in github clearml-agent repo?
(I didn't have this problem so far because I was using ssh keys globaly, but I want know to switch to git auth using Personal Access Token for security reasons)
AgitatedDove14 I was able to redirect the logger by doing so:clearml_logger = Task.current_task().get_logger().report_text early_stopping = EarlyStopping(...) early_stopping.logger.debug = clearml_logger early_stopping.logger.info = clearml_logger early_stopping.logger.setLevel(logging.DEBUG)
When an experiment on trains-agent-1 is finished, I see randomly no experiment/long experiment and when two experiments are running, I see randomly one of the two experiments
Could be, but not sure -> from 0.16.2 to 0.16.3
ok, but will it install as expected the engine and its dependencies?
Sure, it’s because of a very annoying bug that I shared in this https://clearml.slack.com/archives/CTK20V944/p1648647503942759 , that I couldn’t solve so far.
I’m not sure you can downgrade that easily ...
Yea that’s what I thought, that’s a bit of pain for me now, I hope I can find a way to fix the bug somehow
I am not sure I can do both operations at the same time (migration + splitting), do you think it’s better to do splitting first or migration first?
The file /tmp/.clearml_agent_out.j7wo7ltp.txt
 does not exist
Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300
for some reason when cloning task A, trains sets an old commit in task B. I tried to recreate task A to enforce a new task id and new commit id, but still the same issue
It seems that around here, a Task that is created using init remotely in the main process gets its output_uri
parameter ignored
here is the function used to create the task:
` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...
Hi SoggyFrog26 , https://github.com/allegroai/clearml/blob/master/docs/datasets.md
extra_configurations = {'SubnetId': "<subnet-id>"}
with brackets right?
amazon linux
and in the logs:
`
agent.worker_name = worker1
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /...
Is it safe to turn off replication while a reindex operation is happening? the reindexing is rather slow and I am wondering if turning of replication will speed up the process
Yes, that's what it looks like. Somehow when you clone the experiment repo, you correctly set the git creds in the url, but when the dependencies are installed, the git creds are not taken in account
There is a pinned github thread on https://github.com/allegroai/clearml/issues/81 , seems to be the right place?
AgitatedDove14 Same problem with clearml==1.1.5rc2
😞 , I also tried with backend==gloo
, still same problem
line 13 is empty 🤔
See my answer in the issue - I am not using docker
Sure, just sent you a screenshot in PM
I get the following error:
But I see in the agent logs:Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...
Yes, thanks! In my case, I was actually using TrainsSaver from pytorch-ignite with a local path, then I understood looking at the code that under the hood it actually changed the output_uri of the current task, thats why my previous_task.output_uri = "
s3://my_bucket
" had no effect (it was placed BEFORE the training)