Reputation
Badges 1
25 × Eureka!Hi SmugSnake6
I think it was just fixed, let me check if the latest RC includes the fix
Specifically for model files, if you set the Task.init(..., output_uri=True) it will automatically upload any saved model to the files server (you can also pointΒ to any object storage / shared folder)
What's the framework you are using ?
Hi LazyFish41
Could it be some permission issue on /home/quetalasj/.clearml/cache/ ?
Ohh, clearml is designed so that you should not worry about that, download_dataset = StorageManger.get_local_copy() this is cashed, meaning the machine that runs that like the second time will not re download the path.
This means step 1 is redundant, no?
Usually when data is passed between components it is automatically uploaded as artifact to the Task (stored on the files server or object storage etc.) then downloaded and passed to the next steps.
How large is the data that you are wo...
You're suggesting that the false is considered a string and not a bool?
The clearml-server always stores the values as strings (serializing them), the casting is done when passed back to the code in runtime. The issue here is there is actually no "way" to tell the argparser this is a boolean (basically any value that will be passed is treated as string). What I think we should do is fix the casting function so that if this is exatcly the same value we use the default value (i.e. boole...
GiddyTurkey39 do you have an experiment with the jupyter notebook ?
Hmm can you run:docker run -it allegroai/clearml-agent-services:latest
JitteryCoyote63 see here https://stackoverflow.com/questions/55385900/pip3-setup-py-install-requires-pep-508-git-url-for-private-repo bottom line, you have to add package@ before the link, but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip. I think that since the agent installs everything from scratch it should work for you. Wdyt?
GrittyHawk31 by default any user can login (i.e. no need for password), if you want user/password access:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config/#web-login-authentication
Notice no need to have anything else in the apiserver.conf , just the user/pass section, everything else will just be the default values.
available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is ru...
GiganticTurtle0 this one worked for me π
` from clearml import Task
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue1")
def step_1(msg: str):
msg += "\nI've survived step 1!"
return msg
@PipelineDecorator.component(return_values=["msg"], execution_queue="myqueue2")
def step_2(msg: str):
msg += "\nI've also survived step 2!"
return msg
@PipelineDecorator.component(return_values=["m...
Hi ReassuredOwl55
How would I find Tasks that have the same code with different inputs/parameters?
Assuming you have the git repo
you can do:Task.query_tasks(..., task_filter={'_all_'=dict(fields=['script.repository'], pattern='github.com/user/repo'))wdyt?
BoredGoat1 where exactly do you think that happens ?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L316
?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L202
It is recommended to create a git TOKEN with read only permissions and use it (more secure) π
BTW: if you need you can do the following:
` from clearml import Task
from clearml.automation import PipelineController
task = Task.init(project_name='pipelines', task_name='pipeline test')
task.set_base_docker(...)
the pipeline object is using the Current Task, hence docker image is set
pipe = PipelineController(...)
pipe.start() `
Hi @<1663354518726774784:profile|CrookedSeal85>
However, I systematically notice a jump of some number of "ghost iterations" when resuming my trainings...
Try the following:
task = Task.init(..., continue_last_task=0
from the Task.init docstring (Notice this value can be both boolean and integer)
:param bool continue_last_task: Continue the execution of a
...
- An integer - Specify initial iteration offset (override the auto automatic last_iteratio...
I think your use case is the original idea behind "use_current_task" option, it was basically designed to connect code that creates the Dataset together with the dataset itself.
I think the only caveat in the current implementation is that it should "move" the current Task into the dataset project / set the name. wdyt?
@<1687643893996195840:profile|RoundCat60> I'm assuming we are still talking about the S3 credentials, sadly no π
Are you familiar with boto and IAM roles ?
PompousBeetle71 BTW: if you remove the type=str from the argparse, it will do what you want, None will stay None (instead of ''), all other values will be of type str as this is always the default π
Ok..so I should generally avoid connecting complex objects? I guess I would create a 'mini dictionary' with a subset of params, and connectvthis instead.
In theory it should always work, but this specific one fails on a very pythonic paradigm (see below)
from copy import copy
an_object = copy(object)
A good rule of thumb is to connect any object/dict that you want to track or change later
Just verifying the Pod does get allocated 2 gpus, correct ?
What do you have under the "script path" in the Task?
Hi MelancholyChicken65
I'm not sure you an control it, the ui deduces the URL based on the address you are browsing to: so if you go yo http://app.clearml.example.com you will get the correct ones, but you have to put them on the right subdomains:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#subdomain-configuration
I guess i need to do something like the following after the task was created:
...
Yes!
Why use the "post" callback and not the "pre" callback?
The post get's back the Model object. The pre allows you to decide if you actually want to log in the first place (come to think about it, maybe you want that as well π )
SuperficialGrasshopper36 regrading the codeartifact
I think the easiest will be to have a bash script authenticating the codeartifact with the aws command at the beginning of each docker spin. This can be done by adding it to:
https://github.com/allegroai/clearml-agent/blob/81edd2860fbc09e2a179985d8315ffaba851dcd7/docs/clearml.conf#L136
For example:extra_docker_shell_script: ["apt-get install -y aws_cli_or_something", "aws cli authenticate me command"]wdyt?
Sure, run:clearml-agent initIt is a CLI wizard to configure the initial configuration file.
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?