Reputation
Badges 1
25 × Eureka!PlainSquid19 I will also look into it as well.
maybe for some reason model.keras_model.save_weights
is not caught ...
FierceHamster54 what you are saying that Inside the container it took 20 min to run? or that spinning the GCP instance until it registered as an Agent took 20min ?
Most of the time is took by building wheels for
nympy
and
pandas
...
BTW: This happens if there is a version mismatch and pip decides it needs to build the numpy from source, Can you send the full logs of that? Maybe we can somehow avoid that?
Hi @<1523701323046850560:profile|OutrageousSheep60>
What do you mean by "in clearml server" ? I do not see any reason a subprocess call from a Task will be an issue. What am I missing ?
Hi UptightMouse31
First, thank you π
And to your question:
variable in the project is the kpi,
You mean like add it to the experiment table and get kind of leader-board ?
UnevenOstrich23
but interesting that auto-reload config does not working as I expected.
Unfortunately the trains-agent does not support auto reloading the config file yet. If you think this will be a great feature, please feel free to open a GitHub feature request issue π
MelancholyElk85 if you are manually adding models OutputModel, then when you call update_weights(...)
upload will start in the background (if the process ends it will wait until the upload is competed). You can also specify auto_delete_file
which will delete the local copy once the upload completes
currently I'm doing it by fetching the latest dataset, incrementing the version and creating a new dataset version
This seems like a very good approach, how would you improve ?
Okay, now I'm lost, is this reproducible ? are you saying Dataset with remote links to S3 does not work?
Did you provide credntials to your S3 (in tour clear.conf) ?
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
Weird issue, I'll make sure we fix compatibility with python 3.9
Then it initiate a run on aws, which I want it to use the same task-id.
BoredPigeon26 Clone the Task, it basically creates a new copy (of the setup/configuration etc.)/
Then you can launch it on an aws instance (I'm assuming with clearml-agent)
wdyt?
But it write-over the execution tab in the gui
It does you are correct, it will however Not overwrite the reports (log scalars etc)
ohh, the copy paste thing when you generate credentials ?
IrateBee40 I think I have an idea what's wrong, https
could it be there is some firewall in the middle intercepting the entwork, and without installing SSL certificate the ssl connection is failing ?
Are you saying you had that odd script entry-point created by calling Task.init? (To clarify this is the problem)
Btw after you clone the experiment you can always manually edit both entry point and working dir, which based on what you said should be "script.py" and "folder"
Okay let me check if we can reproduce, definitely not the way it is supposed to work π
Nothing except that Draft makes sense feels like the task is being prepped and Aborted feels like something went wrong
Yes guess that if we call execute remotely, without a queue, it makes sense for you to edit it...
Is that the case TrickySheep9 ?
If it is I think we should change it to draft when it is not queued. sounds good to you guys ?
Hi @<1523704757024198656:profile|MysteriousWalrus11>
in the pipeline quickly between pipeline.add_step() functions?
You mean you want to get access to the parent Task ids and query them directly ?
I think the easiest way is to pass it as one of the parameters
(you can get to the pipeline Task itself from the running component, then get the dag, but these are internal functions, maybe we should make them external for easier querying ?)
pipe.add_step(
name="stage_process",
...
Still this issue inside a child thread was not detected as failure and the training task resulted in "completed". This error happens now with the Task.init inside theΒ
if name == "main":
Β as seen above in the code snippet.
I'm not sure I follow, the error seems like your internal code issue, does that means clearml works as expected ?
Thanks GrievingTurkey78 , this is exactly what I was looking for!
Any chance you can open a GitHub issue ( jsonargparse
+ lighting support) ?
I really want to make sure this issue is addressed π
BTW: this is only if jsonargparse is installed:
https://github.com/PyTorchLightning/pytorch-lightning/blob/368ac1c62276dbeb9d8ec0458f98309bdf47ef41/pytorch_lightning/utilities/cli.py#L33
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
ReassuredTiger98 yes this is odd:
also:Warning, could not locate PyTorch torch==1.12 matching CUDA version 115, best candidate 1.12.0.dev20220407
Seems like it found a matching version and did not use it...
Let me check that
If this is a simple two level nesting:
You can use the section name:task.connect(param['data'], name='data') task.connect(param['model'], name='model')
Would that help?
The comparison reflects the way the data is stored, in the configuration context. that means section name & key value (which is what the code above does)
"warm" as you do not need to sync it with the dataset, every time you access the dataset, clearml
will make sure it is there in the cache, when you switch to a new dataset the new dataset will be cached. make sense?
I would suggest deleting them immediately when they're no longer needed,
This is the idea for the next RC, it will delete them after it is done using π
Is this a bug, or an issue with clearml not working correctly with hydra?
It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to...
So what youβre saying is to first kick off a new run and then rename the underlying Pipeline Task, which will cause that particular run to become a new pipeline name?
Correct, basically you are not changing the "pipeline" per-se but the execution name of the pipeline, if that makes sense
What would be most ideal would be to be able to right-click on a pipeline run and have a βcloneβ option, like you can with a task, where you can start a new run with a new name in a single step.
...
Hi @<1572395184505753600:profile|GleamingSeagull15>
Is there an official place to report bugs and add feature requests for the app.clear.ml website?
GitHub issues is usually the place, or the
Assuming GitHub, but just making sure you don't have another PM tool you'd rather use.
Really appreciate asking! it is always hard to keep track π