Ok..so I should generally avoid connecting complex objects? I guess I would create a 'mini dictionary' with a subset of params, and connectvthis instead.
In theory it should always work, but this specific one fails on a very pythonic paradigm (see below)
from copy import copy
an_object = copy(object)
A good rule of thumb is to connect any object/dict that you want to track or change later
store_code_diff_from_remote
don't seem to change anything in regards of this issue
Correct, it is always from remote
i'll be using the update_task, that worked just fine, thanks
(edite
Sure thing.
ShakyJellyfish91 , I took a quick look at the diff between the versions can you hack a non working version (preferably the latest) and verify the issue for me?
ShakyJellyfish91 what exactly are you passing to Task.create?
Could it be you are only passing script= and leaving repo= None ?
Thanks StrongHorse8
Where do you think would be a good place to put a more advanced setup? Maybe we should add an entry for DevOps? Wdyt?
You need trains-server support, so if trains v0.15 is working with older backend it will revert to "training" type
Hi @<1773158043551272960:profile|PungentRobin32>
1732496915556 lab03:gpuall DEBUG docker: invalid reference format.
So seems like the docker command is incorrect?! the error you are seeing is the agent failing to spin the docker, what's the OS of the host machine ?
pipeline, can I control the tags that the tasks a pipeline creates?
add_pipeline_tags
adds tags from pipeline to the tasks I suppose? But I also need to clear existing tags in those created tasks
add_pipeline_tags will add the unique ID of the pipeline execution, if you want to add specific tags you can use the task_overrides and provide:pipe.add_step(..., task_overrides={'tags': ['my', 'tags']})
additionally, I found is that clearml==1.0.5 package is able to find these partial changes, newer versions find nothing at all, maybe it's because it's always comparing against remote
Hmm it was always from remote...
it is actually doing the following:git rev-parse --abbrev-ref --symbolic-full-name @{u}Then with the branch name output,git diff --submodule=diff <add_branch_name_here>
If that's the case check the free space in the monitoring of the experiment, you will find the free space in GB logged
@<1535793988726951936:profile|YummyElephant76>
Whenever I create any task the "uncommitted changes" are the contents of
ipykernel_launcher.py
, is there a way to make ClearML recognize that I'm running inside a venv?
This sounds like a bug, it should have the entire notebook there, no?
Change to add_missing_installed_packages=False, here, and see if you end up with git diff
https://github.com/allegroai/clearml/blob/1f82b0c4010799be6157f5c845c7f6ac48e71c0c/clearml/backend_interface/task/populate.py#L158
*Actually looking at the code, when you call Task.create(...) it will always store the diff from the remote server.
Could that be the issue?
To edit the Task's diff:task.update_task(dict(script=dict(diff='DIFF TEXT HERE')))
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
Thanks ShakyJellyfish91 ! please let me know what you come up with, I would love for us to fix this issue.
CooperativeSealion8
when it first asks me to enter my full name
Where? in the Web?
Hi @<1564785037834981376:profile|FrustratingBee69>
It's the previous container I've used for the task.
Notice that what you are configuring is the Default container, i.e. if the Task does not "request" a specific container, then this is what the agent will use.
On the Task itself (see Execution Tab, down below Container Image) you set the specific container for the Task. After you execute the Task on an Agent, the agent will put there the container it ended up using. This means that ...
Hi @<1729309120315527168:profile|ShallowLion60>
How did you create those credentials ?
Hi RipeGoose2
Just to clarify, the issue with the html stuck in cache is a UI, thing, basically the webapp needs to tell the browser not to cache the artifacts, it has nothing to do with how the artifacts are created.
Regardless we love improvements so feel free to mass around with the code and PR once you get something useful 😉
Specifically this is where the html conversion happens
https://github.com/allegroai/clearml/blob/9d108d855f784e1fe7f5691d3b7bf3be64576218/clearml/backend_in...
I want to schedule bulk tasks to run via agents, so I'm running
create
I see, that makes sense.
specially when dealing with submodules,
BTW: submodule diff should always get stored, can you provide some error logs on fail cases?
Before manually modifying the diff:
If you have local commits (i.e. un-pushed) this might fail the diff apply, in that case you can set the following in your clearml.confstore_code_diff_from_remote: truehttps://github.com/allegroai/clear...
Good question 🙂
https://clear.ml/docs/latest/docs/clearml_agent#dynamic-gpu-allocation
The latest updated help will always be here as well 🙂clearml-agent daemon --help
(Just a thought, maybe we just need to combine Kedro-Viz ?)
there is probably some way to make an S3 path open up in the browser by default
You should have a pop-up asking for credentials ...
Could you check that if you add the credentials in the profile page it works ?
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...
Does StorageManager.upload and upload_artifact use the same methods?
Yes they both use StorageManager.upload
Is the only difference is task being async?
Two differences:
Upload being async Registering the artifact on the experiment. StorageManager will only upload, where as upload_artifact will make sure the file is registered as an artifact on the experiment, together with all of the artifacts properties.
Could you give an example of such configurations ?
(e.g. what would be diff from one to another)
Glue machine or K8S Worker machine?
The K8s worker machine.
You could also configure an ingest service as part of the template, so they always have an external port mapped into the port.