Reputation
Badges 1
25 × Eureka!Basically the idea is that you create the pipeline once (say debug), then once you see it is running, you have a Task of your pipeline in the system (with any custom logic you added). With a Task in the system you can always clone/modify and launch externally (i.e. from code/ui. Make sense ?
Hi ConvincingSwan15
A few background questions:
Where is the code that we want to optimize? Do you already have a Task of that code executed?
"find my learning script"
Could you elaborate ? is this connect to the first question ?
Hi ReassuredOwl55
How would I find Tasks that have the same code with different inputs/parameters?
Assuming you have the git repo
you can do:Task.query_tasks(..., task_filter={'_all_'=dict(fields=['script.repository'], pattern='github.com/user/repo'))
wdyt?
Hi GiganticTurtle0
you should actually get " file://home/user/local_storage_path "
With "file://" prefix.
We always store the file:// prefix to note that this is a local path
So this is verry odd, it looks like a pip bug:
The agent is trying to install torch==2.1.0.*
because by default it ignores the 4th+ parts (they are unstable and torch have tendency to remove them) . and for some reason pip will not match 2.1.0.*
with for example "2.1.0.dev20230306+cu118"
but based on the docs it should work:
see here: None
As a workaround you can always edit and change to the final url for example: so ...
Hi @<1684735407637401600:profile|WonderfulJellyfish65>
BTW, the training script connects to apiserver via the internal IP address
That is a big issue, because as you noticed the links to data =generated by the code will have the internal IP ...
You basically need every component to use the same address (url)
Thanks @<1523701601770934272:profile|GiganticMole91> !
(As usual MS decided to invent a new "standard")
I'll make sure the guys looks at it and get an RC with a fix
PipelineController creates another Task in the system, that you can later clone and enqueue to start a process (usually queuing it on the "services" queue)
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()
You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
... Would not work for huge llm style models.
yes I agree... but then if the model is small enough then you can just keep it in memory ...
Yes clearml is much better π
(joking aside, mlops & orchestration in clearml is miles better)
CheerfulGorilla72 What are you looking for?
With offline mode,
Later if you need you can actually import the execution (including artifacts etc.) you just need the zip file it creates when you are done.
MelancholyChicken65 what's the clearml-serving you are using ? (I believe this issue was fixed in 1.2)
DefeatedCrab47 if TB has it as image, you should find it under "debug_samples" as image.
Can you locate it there ?
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]
I think it's inside the container since it's after the worker pulls the image
Oh that makes more sense, I mean it should not build the from source, but make sense
To solve for build for source:
Add to the "Additional ClearML Configuration" section the following line:agent.package_manager.pip_version: "<21"
You can also turn on venv caching
Add to the "Additional ClearML Configuration" section the following line:agent.venvs_cache.path: ~/.clearml/venvs-cache
I will make sure w...
Hi EnviousStarfish54
The Enterprise edition extends Trains functionality.
It adds security, scale and full data management (data management and versioning being the key difference)
You can get it as a saas solution or on prem.
If you need more information, you can leave contact details on the website, I'm sure sales will be happy to help :)
You are correct, it is currently not supported in venv mode. We could not find a good use case for it. What is yours?
I've seen that the file location of a task is saved
What do you mean by that? is it the execution section "entry point" ?
I think I found something, let me see if I can reproduce it
CooperativeFox72 I would think the easiest would be to configure it globally in the clearml.conf (rather than add more arguments to the already packed Task.init) π
I'm with on 60 messages being way too much..
Could you open a Github Issue on it, so we do not forget ?
Hi EagerOtter28
Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: ...
Use Dataset.sync (or clearml-data sync) to check which files where changed/added.
All files are already hashed, right? I wonder whyΒ
clearml-data
Β does not keep files in a semi-flat hierarchy and groups them together to datasets?
It kind of does, it has a full listing of all the files with their hash (SHA2) values, ...
Hi ReassuredTiger98
It's clearml
that needs to support subparser, and it does support it.
What are you seeing in the Args section ?
(Notice that at the end all the args parsing are stored on the global "args" variable after you call the pasre_args(), clearml
will basically take those variables and put them into Args
section)
And command is a list instead of a single str
"command list", you mean the command
argument ?
if executed remotely...
You mean cloning the local execution, sending to the agent, then when running on the agent the Args/command is updated to a list ?
With remote_execution it isΒ
command="[...]"
Β , but on local it isΒ
command='train'
Β like it is supposed to be.
I'm not sure I follow, could you expand ?