CostlyOstrich36 This didn't work, the value is -1 however the pipe didn't stop.
since I've either added add_functional_step or add_step
I want to serve using Nvidia Triton for now.
after creating, I tried adding batch to it and got this error
Tagging AgitatedDove14 SuccessfulKoala55 For anyone available right now to help out.
Thanks for the help.
Also, do I have to manually keep track of dataset versions in a separate database? Or am I provided that as well in ClearML?
Basically, as soon as I get the trigger that a new dataset has been published, I want to pass the dataset id to the script as an cli argument and pass the code to the agent
I'll try to see how to use the sdk method you just shared
I was looking to see if I can just get away with using get_local_copy instead of the mutable one but I guess that is unavoidable.
for now installing venv fixes the problem.
I already have the dataset id as a hyperparameter. I get said dataset. I'm only handling one dataset right now but merging multiple ones is a simple task as well.
Also I'm not very experienced and am unsure what proposed querying is and how and if it works in ClearML here.
Any way to make it automatically install any packages it finds that it requires? Or do I have to explicitly pass them in packages?
Just to be absolutely clear.
Agent Listening on Machine A with GPU listening to Queue X.
Task enqueued onto queue X from Machine B with no GPU.
Task runs on Machine A and experiment gets published to server?
My current approach is, watch a folder, when there are sufficient data points, just move N of them into another folder and create a raw dataset and call the pipeline with this dataset.
It gets downloaded, preprocessed, and then uploaded again.
In the final step, the preprocessed dataset is downloaded and is used to train the model.
Also, since I plan to not train on the whole dataset and instead only on a subset of the data, I was thinking of making each batch of data a new dataset and then just merging the subset of data I want to train on.
Found it.
https://clear.ml/docs/latest/docs/guides/clearml-task/clearml_task_tutorial/
The second example here, executing a local script. I think that was it. Thank you for the help.
Wait is it possible to do what i'm doing but with just one big Dataset object or something?
Ok since its my first time working with pipelines, I wanted to ask. Does the pipeline controller run endlessly or does it run from start to end with me telling it when to start based on a trigger?
I understand your problem. I think you normally can specify where you want the data to be stored in a conf file somewhere. people here can better guide you. However in my experience, it kinda uploads the data and stores it in its own format.
Like there are files in a specific folder on Machine A. A script on Machine A, creates a Dataset, adds files located in that folder, and publishes it. Now can you look at that dataset on the server machine? Not from the ClearML interface but inside normal directories, like in /opt/clearml etc. this directory mentioned is just an example.
I understand that storing data outside ClearML won't ensure its immutability. I guess this can be built in as a feature into ClearML at some future point.
How about instead of uploading the entire dataset to the clearml server, upload a text file with the location of the dataset on the machine. I would think that should do the trick.
And multiple agents can listen to the same queue right?
I recall being able to pass a script to the agent using the command line along with a requirements file.
I know how to enqueue in using the UI. I'm trying to do it programatically.