to get all the image metrics:client.events.get_task_metrics(tasks=['6adb929f66d14731bc76e3493ab89d80'], event_type='training_debug_image')
Hi HandsomeCrow5 .
Remember the debug images are events with links to the actual images, so you first have to get the events and then you can download the images with https://allegro.ai/docs/examples/examples_storagehelper/#storagemanager (which by definition has the credentials, because it was able to upload them 🙂
To get the events:from trains.backend_api.session.client import APIClient client = APIClient() client.events.debug_images(task='aabbcc')
That said, it might be different backend, I'll test with the demoserver
UnevenDolphin73 something like this one?
https://github.com/allegroai/clearml/pull/225
Hi PompousParrot44
What do you have in the Execution/"script path" ?
Ok, but it must be somewhere in the bst class
It is the XGboost callback feature, basically just reporting everything xgbosst reports:
None
How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
Basically this sounds to me like proper software developemnt design (i.e. the class vs stages).
In order to make sure Anyone can reproduce it, you mean anyone can rerun the "pipeline" ? If this is the case just add Task.init (maybe use a specific Task type) and the agents will make sure this is Fully reproducible.
If you mean the data itself is stored, the...
Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?
- Could you explain how I can reproduce the missing jupyter notebook (i.e. the ipykernel_launcher.py)
I've seen that the file location of a task is saved
What do you mean by that? is it the execution section "entry point" ?
But my previous ques and other query are still not figured out.
What do you mean by "previous ques and other query" ?
owning the agent helps, but still it's much better if the credentials don't show up in logs,
They are not, they are always filtered out,
- how does
force_git_ssh_protocol
help please? it doesn't solve the issue of the agent simply not having accessIt automatically maps the host .ssh into the container, so that git can use SSH to clone.
What exactly is not working?
and how are you configuring it?
Hey SarcasticSparrow10 see here 🙂
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#upgrading
MoodyCentipede68 could it be that the model is on one account (workspace) and your credentials (the ones provided to the docker compose) are from another workspace?
The error itself point to the triton helper failing to get the model ID from the backend. The models are uploaded to a a specific workspace, and it looks like a mismatch (I.e. the model Id is nowhere to be found) wdyt?
Let me ping you back when the GitHub repo is synced, so you can test the latest and greatest :)
Glad to hear that! 🙂
Awesome ! thank you so much!
1.0.2 will be out in an hour
GreasyLeopard35
I can update that the fix to UniformIntegerParameterRange should be pushed with tomorrows release 🙂
(which would fix in turn LogUniformParameterRange)
Is there still an issue? Could it be the browser cannot access the file server directly?
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode
Sure :task = Task.init(..., auto_connect_arg_parser={'arg_not_to_log': False})
This will cause all argparse to automatically be logged (and later editable) with the exception of the argument arg_not_to_log
Notice that if you have --arg-something, to exclude it add to the dict arg_something': False
Hi UnsightlyShark53 I think you are absolutely right, there is no reason for the trains.errors.UsageError: ArgumentParser.parse_args() ...
Error.
As you mentioned, if auto_connect_arg_parser=False
is False, it should just ignore what it picked automatically.
I will make sure the error is resolved I will also make sure, you will still be able to connect the argparse manually with task.connect(parser)
after the Task has been created. Thanks for the reference! I took a look o...
seems like the server returned 400 error, verify that you are working with your trains-server and not the demoserver :)
So this is verry odd, it looks like a pip bug:
The agent is trying to install torch==2.1.0.*
because by default it ignores the 4th+ parts (they are unstable and torch have tendency to remove them) . and for some reason pip will not match 2.1.0.*
with for example "2.1.0.dev20230306+cu118"
but based on the docs it should work:
see here: None
As a workaround you can always edit and change to the final url for example: so ...
Do you want to open an issue in pip?
Funny enough this works in:
pip3 install "torch >=2.1.0.*, <2.1.1.*" --extra-index-url
ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...
is there a way to increase the size of the text input for fields or a better way to handle lists?
No 😞
Maybe an easier way to use connect_configuration instead ? it will take an entire dict and store it as text (format is hocon, which is YAML/Json compatible, which means it is hard to break when editing)