Reputation
Badges 1
25 × Eureka!my experiment logic
you mean the actual code doing the training ?
so that it gets lazily executed and not at task definition time
Task definition time -> when creating the Pipeline Task? remember the base_task_factory a the end creates a Task object (it does not run the code itslef).
BTW: if you have simple training logic you can use pipeline decorators , it might be a better fit?
https://clear.ml/docs/latest/docs/fundamentals/pipelines#pipeline-from-function-decorator
FiercePenguin76 the git repo should detect only clearml
as required python package
Basically the steps are:
decide if the initial python entry script is a standlone script (i.e. no local imports) in the git repo (in your example "task_with_deps.py") If this is a "standlone script" only look for imports inside the calling python script, and list those packages under "installed packages" If this is Note a standalone script, go over All the python files inside the repository, look for "i...
TrickyRaccoon92 I didn't know that 🙂
where did you try to add it? did you report a plotly figure or is it with report_???
Hi RipeGoose2
Just to clarify, the issue with the html stuck in cache is a UI, thing, basically the webapp needs to tell the browser not to cache the artifacts, it has nothing to do with how the artifacts are created.
Regardless we love improvements so feel free to mass around with the code and PR once you get something useful 😉
Specifically this is where the html conversion happens
https://github.com/allegroai/clearml/blob/9d108d855f784e1fe7f5691d3b7bf3be64576218/clearml/backend_in...
ContemplativePuppy11
yes, nice move. my question was to make sure that the steps are not run in parallel because each one builds upon the previous one
if they are "calling" one another (or passing data) then the pipeline logic will deduce they cannot run in parallel 🙂 basically it is automatic
so my takeaway is that if the funcs are class methods the decorators wont break, right?
In theory, but the idea of the decorator is that it tracks the return value so it "knows" how t...
Hi @<1529633468214939648:profile|CostlyElephant1>
Is it possible to get user ID of the current user
On the Task.data
object itself there should be a filed named " user
" that's the user ID of the owner (creator) of the Task.
You can filter based on this id with
Tasks.get_tasks(..., task_filter={'user': ["user-id-here"]})
wdyt?
Hi @<1547028074090991616:profile|ShaggySwan64>
I have to admit that personally I do not know pdm
, could you share links, and help us understand what is the value over pip/poetry/conda ?
Is there a way to filter a experiments in a hyperparameter sweep based on a given range of a parameter/metric in the UI
Are you referring to the HPO example? or the Task comparison ?
When I have:n = 20 duration = 1000 now = time.mktime(time.localtime()) timestamps = np.linspace(now, now + duration, n) dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps] values = np.sin((timestamps - now) / duration * 2 * np.pi) fig = go.Figure(data=go.Scatter(x=dates, y=values, mode='markers')) task.get_logger().report_plotly( title="plotly", series="b", iteration=0, figure=fig)
Everything looks okay
Hi GiganticTurtle0
Sure, OutputModel can be manually connected:model = OutputModel(task=Task.current_task()) model.update_weights(weights_filename='localfile.pkl')
Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? 😁
Hi ReassuredTiger98
Basically assuming Linux, init.d will do the trick
https://unix.stackexchange.com/questions/20357/how-can-i-make-a-script-in-etc-init-d-start-at-boot
Hi SmallDeer34
On the SaaS you can right click on an experimenter and publish it 🙂
This will make the link available for everyone, would that help?
Hmm let me check, I think we changed the offline mode to use the latest API version (because by definition it cannot know what's the server).
Let me check if you can override it
GrievingTurkey78 I'm not sure I follow, are you asking how to add additional scalars ?
@<1671689437261598720:profile|FranticWhale40> this one: None
It appears that "they sell that" as Triton Management Service, part of
. It is possible to do through their API, but would need to be explicit.
We support that, but this is Not dynamically loaded, this is just removing and adding models, this does not unload them from the GRAM.
That's the main issue. when we unload the model, it is unloaded, to do dynamic, they need to be able to save it in RAM and unload it from GRAM, that's the feature that is missing on all Triton deployme...
So that agent on different nodes will probably require different cuda-version images.
That makes sense SarcasticSquirrel56
I would edit the helm chart (or deploy manually) based on a selector that will select the different nodes/gpus and assign the correct containers (i.e. matching CUDA versions to the diff GPUs / drivers)
BTW: you can also playaround with k8s glue, which would dynamically spin pods based on clearml Tasks.
wdyt?
Hmm, not a bad idea 🙂
Could you please open a Git Issue, so it will not get forgotten ?
(btw: I'm not sure how trivial it is to implement, nonetheless obviously possible 😉
AttractiveCockroach17 I verified this is an issue with hypeparemeters with "." or section names with ".", thank you for noticing!
I will make sure I pass it along, should be part of the next version (ETA a week) 🙂
suppose I have an S3 bucket where my data is stored and I wish to transfer it to ClearML file server.
Then you first have to download the entire bucket locally, then register the local copy.
Basically:
StorageManager.download_folder("
", "/target/folder")
# now register the local "/target/folder" with Dataset.add_files
and I run agent from local user and I would expect that settings to have effect -v /home/localuser/.ssh:/home/testuser/.ssh
It does not map it directly, it creates a temp copy in the host /tmp folder of the entire ".ssh" folder, than maps this folder inside the container:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/clearml_agent/commands/worker.py#L3422
Notice that the "docker_internal_mounts" section is nested inside the "agent" section ...
yeah. I am getting logs, but they are extremely puzzling to me. I would appreciate to actually have access to whole package structure..
Actual packages are updated back to "Installed Packages" section (under the execution tab).
indeed. can you maybe point where the docker command is composed.
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/clearml_agent/commands/worker.py#L3694
🙂
BTW: you can run/build the entire thing on your machin...
because a pipeline is composed of multiple tasks, different tasks in the pipeline could run on different machines.
Yes!
. Or more specifically, they could run on different queues, and as you said, in your other response, we could have a Q for smaller CPU-based instances, and another queue larger GPU-based instances.
Exactly !
I like the idea of having a queue dedicated to CPU-based instances that has multiple agents running on it simultaneously. Like maybe four agents.
Th...
So there is no copying of the data to the pod, it is simply references via the EFS
Correct
basically
would allow blocking the machine from being scaled-in when
Oh this is what I was missing 🙂 That makes sense to me!
So what you are saying is that the AWS autoscaler agent, when it is launching a Task, inside the container you will set "protection flag" when the Task ends, you will unset "protection flag"
Is that correct?