Reputation
Badges 1
25 × Eureka!PompousParrot44 I assume the folder structure is something like:
repo_root:
--> test
-----> scripts
If this is the case, make sure the ""working directory" is .
which means repository root
You can switch to docker-mode for better control over cuda drivers, or use conda and specify cudatoolkit (this feature will be part of the next RC, meanwhile it will install the cudatoolkit based on the global cuda_version).
Hi @<1540142641931358208:profile|FancyBaldeagle86>
You mean in the UI? i.e. clone an experiment hover over the Configuration / Hyperparameter section and clicking edit ?
Hi CluelessElephant89
hey guys, I believeΒ
clearml-agent-services
Β isn't necessary right?
Generally speaking, yes you are corrected π
Specifically, this is the "services" queue agent, running your pipeline logic, services etc.
But it is not a must to get the server to work, and you can also spin it on a different host
It takes 20mins to build the venv environment needed by the clearml-agent
You are Joking?! π
it does apt-get install python3-pip , and pip install clearml-agent, how is that 20min?
So the issue is that you have two reference branches on the local git, one to gitlab one to gitea and it fails to understand which on is the correct remote ...
I wonder if "git ls-remote --get-url" will always work ?!
if project_name is None and Task.current_task() is not None: project_name = Task.current_task().get_project_name()
This should have fixed it, no?
5 seconds will be a sleep between two consecutive pulls where there are no jobs to process, why would you increase it to a higher pull freq ?
I saw documentation, but I can't make the proper dict object for hyperparams
I see, this is what you are after (I think)
https://github.com/allegroai/clearml/blob/fb644fe9ec6be36b8f2f70a34256fbdc593d663a/clearml/backend_api/services/v2_20/tasks.py#L3138
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
looks like a great idea, I'll make sure to pass it along and that someone reply π
Hi AverageBee39
Did you setup an agent to execute the actual Tasks ?
we can add non-clearml code as a step in the pipeline controller.
Yes π , btw you can kind of already do that, with pre/post function callbacks (notice they are running from the same scope as the actual pipeline controller).
What exactly did you have in mind to put there ?
HealthyStarfish45 my apologies, they do have it (this ability needs support for both trains-agent and server) but not in the open-source ...
Hi ItchyHippopotamus18
The iteration reporting is automatically detected if you are using tensorboard, matplotlib, or explicitly with trains.Logger
I'm assuming there were no reports, so the monitoring falls back to report every 30seconds where the iterations are seconds from start" (the thing is, this is a time series, so you have to have an X axis...)
Make sense ?
Hi SkinnyPanda43
Yes, I think you are right the documentation might be missing it. I'll make sure they know it π
In the meantime :task.update_output_model
https://github.com/allegroai/clearml/blob/d3929033c016476c580557639ff44f900e65904a/clearml/backend_interface/task/task.py#L734
it is just local copy so you can rerun and reconfigure
I cannot modify an autoscaler currently running
Yes this is a known limitation, and I know they are working on fixing it for the next version
We basically have flask commands allowing to trigger specific behaviors. ...
Oh I see now, I suspect the issue is that the flask command is not executed from within the git project?!
Could you download and send the entire log ?
Basically try with the latest RC π
pip install trains 0.15.2rc0
No worries, and I will make sure we output a warning if section names are not used π
tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)>
Hi UpsetCrow72
How come you are trying to sync
a "completed" (finalized) dataset ?
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_38e9acc8d56441999e806815abddee82.clearml'
Let me check this issue, it seems like the locking mechanism should have figured that there is no lock...
This is odd because the screen grab point to CUDA 10.2 ...
GiganticTurtle0 I think I located the issue:
it seems the change is in "config" (and for some reason it stores the entire dict) but the split values are not changed.
Is this it?
check if the fileserver docker is running with docker ps
Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.
Correct, I think "vault" option is only available on the paid tier π
but how should we do this for the credentials?
I'm not sure how to pass them, wouldn't it make sense to give the agent an all accessing credentials ?
Ohh I see, so basically the ASG should check if the agent is Idle, rather than the Task is running ?