Creating a dataset sounds like a good idea, but that does not seem to be the issue.
Can you verify you can manually clone using the same link (notice the log should specify the exact clone it is using, with the password replaced with *)
Nice guys! Notice that the clearml-task can auto add the Task.init call on the fly, so you can connect any arbitrary Task and control the argparser arguments (again as parameters to the cleaml-task)
BTW: A fix for the --task-type Issue will be pushed later today 😉
This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
Because we are working with very big files, having them stored at multiple locations is something we try to avoid
Just so I better understand, is this for storing files as part of a dataset, or as debug samples ?
In other words can two diff processes create the exact same file (image) ?
Hi SubstantialElk6
where exactly in the log do you see the credentials ?
/tmp/.clearml_agent.234234e24s.cfg
What's the exact setup ? (I mean are you using the glue? if that's the case I think the temp config file is only created inside the pod/docker so upon completion it will be deleted along side the pod.
Okay there is some odd stuff going on in the backend, I'll check with backend guys tomorrow and update 🙂
You can always log it manually:from clearml import InputModel input_model = InputModel.import_model(weights_url='/tmp/keras_example/weight.6.hdf5')
ohh sorry, weights_url=path
Basically url can be the local path to the weights file 🙂
Hi HollowDolphin18
Sure just use:Task.set_credentials( api_host=None, web_host=None, files_host=None, key=None, secret=None, store_conf_file=False )
https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L2153
ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?
So you are saying 156 chunks, with each chunk about ~6500 files ?
The imports inside the functions are because the function itself becomes a stand-alone job running on a remote machine, not the entire pipeline code. This also automatically picks packages to be installed on the remote machine. Make sense?
Follow-up; any ideas how to avoid PEP 517 with the auto scaler?
Takes a
long
time to build the wheels
enable venv caching ?
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L116
git config --system credential.helper 'store --file /root/.git-credentials'
Maybe we should use this hack for cloning with user/token in general ...
I think you can watch it after GTC on the nvidia website, and a week after that we will be able to upload it to the youtube channel 🙂
Hi JealousParrot68
I'll try to shed some light on these modules and use cases.
Storagemanager is general speaking, low level access to http/object-storage/files utility. In most cases there is no need to directly use it if objects are already stored/managed on clearml (for example artifacts/models/datasets). But, it is quite handy to use with your S3 buckets etc.
Artifacts: Passing an artifact between Tasks will usually be something like:
` artifact_object = Task.get_task('task_id').artifa...
"Updates a few seconds ago"
That just means that the process is not dead.
Yes that seemed to be stuck 😞
Any chance you can verify with the RC version?
I'll try to dig into the commits, maybe I can come up with an explanation ...
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
JitteryCoyote63 while it's running, could you give me a few details on the setup, maybe I can reproduce it.
Is it using pytorch distributed ?
Are all models uploaded to S3 ?
etc.
If you cannot change the "TrainerState" (i.e. inherit and pass it into the code)
you cloud also monkey-patch it, something like
` class OurTrainerState(TrainerState):
def init(...)
...
def load_from_json(cls, json_path: str):
super().load_from_json(json_path))
Task.current_task().upload_artifact(...)
trainer.state = OurTrainerState(trainer.state) `
- Correct. Basically the order is restapi body dictionary-> preprocess -> process -> post-process -> restapi dictionary return
How can I reproduce it?
Could you try to clone the clearml git repo, create a new notebook in it and test ?
ThickDove42 Windows also works 😞
Any specifics on the setup?
Thanks StaleKangaroo85 bug is verified. Let me check to see where exactly is the bug.
Two points
Notice that x_labels should be the size of the histogram It seems that you have to pass the labels as well (otherwise you get the trace-0), so if you add labels=['random histogram']
and labels=['random histogram2']
, you'll get the correct legend.Anyhow I'll make sure we also fix it in code so it is automatically labels are [series] if not specified, thanks!
using caching where specified but the pipeline page doesn't show anything at all.
What do you mean by " the pipeline page doesn't show anything at all."? are you running the pipeline ? how ?
Notice PipelineDecorator.component needs to be Top level not nested inside the pipeline logic, like in the original example
@PipelineDecorator.component(
cache=True,
name=f'append_string_{x}',
)