Reputation
Badges 1
53 × Eureka!where should I look to see this metric? at the scalars tab?
CostlyOstrich36 yes - sorry for the wrong terminology
CumbersomeCormorant74 As you can see in the attached - there were 2 experiments at the same time, but only one agent pulled the task, even though the second agent was free and listening to the queue.
What will happen if I disable the cache? Is there a way to find out which experiment is hung and why? in order to avoid this?
Because of a server error I can't download the log so I attached a screenshot. In the log I see only the following reports (without a summary table/plot).
also tried to connect from to dataset from CLI and recieved connection error:
Thanks! the second link is exactly that what I was looking for 🙂
Hi, the URL contains some details which I wouldn't like to share on this thread. Can I send it to one of you in private message?
CostlyOstrich36 Another clarification:
The master branch cache is stored at ". clearml/vcs-cache " - the code file doesn't exist there + the problem described above is occuring in this folder (multiple cache files of same repo).
While my branch is stored at " .clearml/venvs-builds/3.7/task_repository/"
Sending you to private CostlyOstrich36
Does this relate to the error below? from reading the issue I didn't see anyone mentioning this error -clearml_agent: ERROR: Failed cloning repository.1) Make sure you pushed the requested commit:(repository=' https://gitlab.com/data_science_team/____ ', branch='ilanit', commit_id='b5c___', tag='', docker_cmd='ubuntu:18.04', entry_point='training/____.py', working_dir='src')2) Check if remote-worker has valid credentials [see worker configuration file]
Interesting I am only now seeing **optimizer_kwargs it seems that it will fix my problem. Is it too much to ask if you could add an example of how to initiate the optuna object with the kwargs (mainly how to initiate 'trial', 'study', 'objective' arguments) ? 🙂
I had a task which I have cloned and reset a bunch of times, when I created the test as a new one, the error didnt appear again
Just to clarify again - when I start the agents I run :clearml-agent daemon --detached --queue training
and then: clearml-agent daemon --detached --services-mode --queue services --docker ubuntu:18.04 --cpu-only
This is why there are 'training' and 'training_2' queues.
SuccessfulKoala55 I can't share the logs.
But I can add screenshots of the log file if necessary
EDIT CostlyOstrich36
third image - cache after running another task with new cache file created even though cache is disabled
We have been trying to resolve the issue. I will comment here again if any more problems arise. Thanks!
Yeah this is a lock which is always in our cache, cant figure out why it's there, but when I delete the lock and the other files, they always reappear when I run a new clearml task.
Another thing I should note: I have recently had an error which fix was to run git config --global --add safe.directory /root/.clearml/vcs-cache/r__ (git repo name).d7f
Ever since, once I run a new task - a new file appears in the cache with the format of <git repo name.lock file name_a bunch of numbers>
Clearing my cookies solved the issue, Thanks 🙂
CostlyOstrich36 The application problem was indeed solved 🙂 but the plots one didn't