Okay, now I'm lost, is this reproducible ? are you saying Dataset with remote links to S3 does not work?
Did you provide credntials to your S3 (in tour clear.conf) ?
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
Weird issue, I'll make sure we fix compatibility with python 3.9
Then it initiate a run on aws, which I want it to use the same task-id.
BoredPigeon26 Clone the Task, it basically creates a new copy (of the setup/configuration etc.)/
Then you can launch it on an aws instance (I'm assuming with clearml-agent)
wdyt?
But it write-over the execution tab in the gui
It does you are correct, it will however Not overwrite the reports (log scalars etc)
ohh, the copy paste thing when you generate credentials ?
IrateBee40 I think I have an idea what's wrong, https
could it be there is some firewall in the middle intercepting the entwork, and without installing SSL certificate the ssl connection is failing ?
Are you saying you had that odd script entry-point created by calling Task.init? (To clarify this is the problem)
Btw after you clone the experiment you can always manually edit both entry point and working dir, which based on what you said should be "script.py" and "folder"
Okay let me check if we can reproduce, definitely not the way it is supposed to work ๐
Nothing except that Draft makes sense feels like the task is being prepped and Aborted feels like something went wrong
Yes guess that if we call execute remotely, without a queue, it makes sense for you to edit it...
Is that the case TrickySheep9 ?
If it is I think we should change it to draft when it is not queued. sounds good to you guys ?
Hi @<1523704757024198656:profile|MysteriousWalrus11>
in the pipeline quickly between pipeline.add_step() functions?
You mean you want to get access to the parent Task ids and query them directly ?
I think the easiest way is to pass it as one of the parameters
(you can get to the pipeline Task itself from the running component, then get the dag, but these are internal functions, maybe we should make them external for easier querying ?)
pipe.add_step(
name="stage_process",
...
Still this issue inside a child thread was not detected as failure and the training task resulted in "completed". This error happens now with the Task.init inside theย
if name == "main":
ย as seen above in the code snippet.
I'm not sure I follow, the error seems like your internal code issue, does that means clearml works as expected ?
Thanks GrievingTurkey78 , this is exactly what I was looking for!
Any chance you can open a GitHub issue ( jsonargparse
+ lighting support) ?
I really want to make sure this issue is addressed ๐
BTW: this is only if jsonargparse is installed:
https://github.com/PyTorchLightning/pytorch-lightning/blob/368ac1c62276dbeb9d8ec0458f98309bdf47ef41/pytorch_lightning/utilities/cli.py#L33
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
ReassuredTiger98 yes this is odd:
also:Warning, could not locate PyTorch torch==1.12 matching CUDA version 115, best candidate 1.12.0.dev20220407
Seems like it found a matching version and did not use it...
Let me check that
If this is a simple two level nesting:
You can use the section name:task.connect(param['data'], name='data') task.connect(param['model'], name='model')
Would that help?
The comparison reflects the way the data is stored, in the configuration context. that means section name & key value (which is what the code above does)
"warm" as you do not need to sync it with the dataset, every time you access the dataset, clearml
will make sure it is there in the cache, when you switch to a new dataset the new dataset will be cached. make sense?
I would suggest deleting them immediately when they're no longer needed,
This is the idea for the next RC, it will delete them after it is done using ๐
Is this a bug, or an issue with clearml not working correctly with hydra?
It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to...
So what youโre saying is to first kick off a new run and then rename the underlying Pipeline Task, which will cause that particular run to become a new pipeline name?
Correct, basically you are not changing the "pipeline" per-se but the execution name of the pipeline, if that makes sense
What would be most ideal would be to be able to right-click on a pipeline run and have a โcloneโ option, like you can with a task, where you can start a new run with a new name in a single step.
...
Hi @<1572395184505753600:profile|GleamingSeagull15>
Is there an official place to report bugs and add feature requests for the app.clear.ml website?
GitHub issues is usually the place, or the
Assuming GitHub, but just making sure you don't have another PM tool you'd rather use.
Really appreciate asking! it is always hard to keep track ๐
might be my folder permissions hmm
That actually makes sense, also notice that if you are running under a diff user, the ~ (home folder) is different
p.s. any chance you can get me the nvidia driver version? I can't seem to find the one for v22 on amazon
I suppose one way to perform this is with a
that kicks
Yes, that was my thinking.
It seems more efficient to support a triggered response to task fail.
Not sure I follow this one, I mean the pipeline logic itself monitors the execution. If I'm not mistaken, try/except will catch a step that files, and a global will catch the entire pipeline. Am I missing something ?
It will always set it's own environment, wither with static analysis or with "pip freeze" / "conda freeze"
It needs to log the exact setup that was actually installed.
When you later launch it on a remote machine, it can either use this to recreate the environment (using pip or conda), or you can clear the entire section, where it will fall back to "requirements.txt"
Any reason for specifically using the "environment.yaml" ?
BattyLion34
if I simply clone nntraining stage and run it in default queue - everything goes fine.
When you compare the Task you clone manually and the Task created by the pipeline , what's the difference ?
Thanks @<1657918706052763648:profile|SillyRobin38> this is still in the internal git repo (we usually do not develop directly on github)
I want to get familiar with it and, if possible, contribute to the project.
This is a good place to start: None
we are still debating weather to sue it directly or as part of Triton ( None ) , would love to get your feedback