
Reputation
Badges 1
25 × Eureka!Why would that require refactoring ? Dataset class should take care if it internally ,no?
The reason my_name is a subproject , is that so every version could be a "Task" inside that project , just easier to manage (or at least that was the idea)
I still have name
my_name
, but the project name
my_project/.datasets/my_name
rather than
my_project/.datasets
Yes, this is the expected behavior
And I don't see any new projects / subprojects where that dataset creation Task is stored
They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Con...
Yes, that sounds like a good start, DilapidatedDucks58 can you open a github issue with the feature request ?
I want to make sure we do not forget
Hi GreasyPenguin14
Could you tell me what the differences are and why we should use ClearML data?
The first difference is in the approach itself, DVC ties the data with the code (i.e. git repo), where we (ClearML - but not just us) actually think data should be abstracted from the Code-Base and become a standalone argument, allowing users to build/execute against different dataset/versions. ClearML Data becomes part of the workflow as it is visible from the UI including the abili...
Hi MelancholyElk85
I have strong deja vu feeling. Credentials are OK. How to solve this? If you need the full log, how to share the full log without sharing private information? I'm fed up with this shit
Is this coming from the agent ?
We have tried to manually restart tasks reloading all the scalars from a dead task and loading latest saved torch model.
Hi ThickKitten19
how did you try to restart them ? how are you monitoring dying instances ? where . how they are running?
Hi @<1571308003204796416:profile|HollowPeacock58>
parameters = task.connect(config, name='config_params')
It seems that your DotDict does not support the python copy
operator?
i.e.
from copy import copy
copy(DotDict())
fails ?
Each of these steps,Β Β
[2], [3], [4], [5 & 6]
Β can be thought of as an independent Kedro nodes that can be reused in the future.Β Now, how to integrate this with ClearML is unclear to us.
The same can be said for ClearML, each of these steps is a clearml Task (with it's own repo/environment)
It sounds (and I might be completely off here, so please feel free to correct me) that the main use for Kedro is the nice web UI of the pipeline (which I
agree looks very cool).
Th...
And maybe adding idle time spent without a job to API is not that a bad idea π
yes, adding that to the feature list π
What if I write the last active state in an instance tag? This could be a solutionβ¦
I love this hack, yes this should just work.
BTW: if you lambda is a for loop that is constantly checking there is no need to actually store "last idle timestamp check as tag", no?
one can containerise the whole pipeline and run it pretty much anywhere.
Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:
https://kedro.readthedocs.io/en/stable/10_deployment/06_kubeflow.html
My thinking was I can use one command and run all steps locally while still registering all "nodes/functions/inputs/outputs etc" with clearml such that I could also then later go into the interface and clone an...
@<1523710674990010368:profile|GreasyPenguin14> what do you mean "but I do I get the... " ?
Configuring git user/pass will allow you to launch Tasks from private repositories on the services queue (the agent is part of the docker-compose).
That said, this is not a must, worst case you'll get an error when git fails to clone your repo :)
SpotlessFish46 unless all the code is under "uncommitted changes" section, what you have is a link to the git repo + commit id
Something like the TYPE_STRING that Triton accepts.
I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[β¦]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py
Hi LudicrousDeer3
It should not be a problem see iteration
argument in Logger.report_scalar
https://github.com/allegroai/clearml/blob/22d795f68f0175ba9511cabd444ea4dba464f3cd/examples/reporting/scalar_reporting.py#L19
https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/logger_module/logger_logger.html?highlight=report_scalar#clearml.logger.Logger.report_scalar
Yeah we should definitely have get_requirements π
Thanks @<1523704157695905792:profile|VivaciousBadger56> ! great work on the docstring, I also really like the extended example. Let me make sure someone merges it
Well it should work, make sure you see the Task "holds" all the information needed (under the execution tab). repo / uncommitted changes / python packages etc.
Then configure your agent (choose pip/conda/poetry as package managers), and spin it up (by default in venv/coda mode, or in docker mode)
Should work π
No worries π glad it worked
Hi @<1547390438648844288:profile|ScaryJellyfish75>
These hyperpaters are now in the "Args" section of my Clearml task
Sure that would probably mean
UniformParameterRange(
"Args/training/optimizer/lr",
min_value=0.00025,
max_value=0.01,
step_size=0.00025,
),
assuming your Task has training/optimizer/lr
in its Args section (under configuration tab), make sense ?
Maybe we should add it to Storage Manager? What do you think?
I hope you can do this without containers.
I think you should be fine, the only caveat is CUDA drivers, nothing we can do about that ...
. In short, I was not able to do it withΒ
Task.clone
Β andΒ
Task.create
Β , the behavior differs from what is described in docs and docstrings (this is another story - I can submit an issue on github later)
The easiest is to use task_ task_overrides
Then pass:task_overrides = dict('script': dict(diff='', branch='main'))
100% of things withΒ
task_overrides
Β would be the most convenient way
I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:
pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
The main reason we need the above mentioned functionality is because there are some experiments that need to run for a long time. Let's say weeks.
Good point!
. We need to temporarily pause(kill or something else) running HPO task and reassign the resource for other needs.
Oh I see now....
Later, when more important experiments has been completed, we can continue HPO task from the same state.
Quick question when you say the HPO Task, you mean the HPO controller logic Task...
Hi AverageBee39
Did you setup an agent to execute the actual Tasks ?
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?
The upload itself is in the background.
It should not take long to prepare the plot for sending. Are you experiencing a major delay ?