Reputation
Badges 1
25 × Eureka!JitteryCoyote63 see here https://stackoverflow.com/questions/55385900/pip3-setup-py-install-requires-pep-508-git-url-for-private-repo bottom line, you have to add package@ before the link, but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip. I think that since the agent installs everything from scratch it should work for you. Wdyt?
GrittyHawk31 by default any user can login (i.e. no need for password), if you want user/password access:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config/#web-login-authentication
Notice no need to have anything else in the apiserver.conf , just the user/pass section, everything else will just be the default values.
available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is ru...
GiganticTurtle0 this one worked for me 🙂
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
return msg
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue2")
def step_2(msg: str):
msg += "\nI've also survived step 2!"
return msg
@PipelineDecorator.component(return_values=["m...
Hi ReassuredOwl55
How would I find Tasks that have the same code with different inputs/parameters?
Assuming you have the git repo
you can do:Task.query_tasks(..., task_filter={'_all_'=dict(fields=['script.repository'], pattern='github.com/user/repo'))wdyt?
BoredGoat1 where exactly do you think that happens ?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L316
?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L202
It is recommended to create a git TOKEN with read only permissions and use it (more secure) 🙂
BTW: if you need you can do the following:
` from clearml import Task
from clearml.automation import PipelineController
task = Task.init(project_name='pipelines', task_name='pipeline test')
task.set_base_docker(...)
the pipeline object is using the Current Task, hence docker image is set
pipe = PipelineController(...)
pipe.start() `
Hi @<1663354518726774784:profile|CrookedSeal85>
However, I systematically notice a jump of some number of "ghost iterations" when resuming my trainings...
Try the following:
task = Task.init(..., continue_last_task=0
from the Task.init docstring (Notice this value can be both boolean and integer)
:param bool continue_last_task: Continue the execution of a
...
- An integer - Specify initial iteration offset (override the auto automatic last_iteratio...
I think your use case is the original idea behind "use_current_task" option, it was basically designed to connect code that creates the Dataset together with the dataset itself.
I think the only caveat in the current implementation is that it should "move" the current Task into the dataset project / set the name. wdyt?
@<1687643893996195840:profile|RoundCat60> I'm assuming we are still talking about the S3 credentials, sadly no 😞
Are you familiar with boto and IAM roles ?
PompousBeetle71 BTW: if you remove the type=str from the argparse, it will do what you want, None will stay None (instead of ''), all other values will be of type str as this is always the default 🙂
Ok..so I should generally avoid connecting complex objects? I guess I would create a 'mini dictionary' with a subset of params, and connectvthis instead.
In theory it should always work, but this specific one fails on a very pythonic paradigm (see below)
from copy import copy
an_object = copy(object)
A good rule of thumb is to connect any object/dict that you want to track or change later
Just verifying the Pod does get allocated 2 gpus, correct ?
What do you have under the "script path" in the Task?
Hi MelancholyChicken65
I'm not sure you an control it, the ui deduces the URL based on the address you are browsing to: so if you go yo http://app.clearml.example.com you will get the correct ones, but you have to put them on the right subdomains:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#subdomain-configuration
SuperficialGrasshopper36 regrading the codeartifact
I think the easiest will be to have a bash script authenticating the codeartifact with the aws command at the beginning of each docker spin. This can be done by adding it to:
https://github.com/allegroai/clearml-agent/blob/81edd2860fbc09e2a179985d8315ffaba851dcd7/docs/clearml.conf#L136
For example:extra_docker_shell_script: ["apt-get install -y aws_cli_or_something", "aws cli authenticate me command"]wdyt?
Sure, run:clearml-agent initIt is a CLI wizard to configure the initial configuration file.
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
AstonishingWorm64
You can turn on the venv cache , it will just handle it's own full env caching 🙂
See here:
https://github.com/allegroai/clearml-agent/blob/4f7407084d1900a79d455570c573e60f40208742/docs/clearml.conf#L100
clearml - WARNING - Could not retrieve remote configuration named 'hyperparams'
What's the clearml-server version you are working with ?
In both logs I see (even in the single GPU log, it seems you "see" two GPUs, is that correct?)GPU 0,1 Tesla V100-SXM2-32GB (arch=7.0)
Last question, this is using relatively old clearml version (0.17.5), can you test with the latest version (1.1.1)?
UnevenDolphin73 since at the end plotly is doing the presentation, I think you can provide the extra layout here:
https://github.com/allegroai/clearml/blob/226a6826216a9cabaf9c7877dcfe645c6ae801d1/clearml/logger.py#L293
NonchalantDeer14
I think the issue is the way it spins the subprocess is not with fork but with Popen, so clearml is not "loaded" into the subprocess hence no logging.
The easiest fix is to call Task.current_task() inside the actual code (somewhere when it starts), it should trigger clearml.
Hi SmugLizard25 I was able to test and it seems that style is being ignored by the FE 😞
I passed to FE guys to make sure it is fixed in the next version.
Notice this is just for tables, anything else works as expected (i.e. styling any other type of plot)
It should have worked....
Can you run the examples from the repo and see if they work?
and the agent default runtime mode is docker correct?
Actually the default is venv mode, to run in docker mode add --docker to the command line
So I could install all my system dependencies in my own docker image?
Correct, inside the docker it will inherit all the preinstalled packages, But it will also install any missing ones (based on the Task requirements. i.e. "installed packages" section)
Also what is the purpose of the
aws
block in the clearml.c...