Reputation
Badges 1
371 × Eureka!So in my head, every time i publish a dataset, it should get triggered and run that task.
i think it downloads from the curl command
I'll look into those 3. Do those files use step 1, step 2 and step 3 files though?
because those spawned processes are from a file register_dataset.py , however I'm personally not using any file like that and I think it's a file from the library.
are there other packages other than venv required on the agent? Since I'm not sure exactly what packages do I need on the agent. Since the function normally wouldn't need venv. It just adds a number by 1
Basically since I want to train AI Models right. I'm trying to set up the architecture where I can automate the process from data fetching to model training, and need GPU for training.
It works, however it shows the task is enqueued and pending. Note I am using .start() and not .start_remotely() for now
set the host variable to the ip assigned to my laptop by the network.
How would the two be different? Other than I can pass the directory to local mutable copy
Thank you, I'll take a look
Lastly, I have asked this question multiple times, but since the MLOps process is so new, I want to learn from others experience regarding evaluation strategies. What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?
AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong
Let me tell you what I think is happening and you can correct me where I'm going wrong.
Under certain conditions at certain times, a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
N publishes = N Triggers = N Anonymous Tasks, right?
I'll read the 3 examples now. Am I right to assume that I should drop Pipeline_Controller.py
Also, the task just prints a small string on the console.
basically don't want the storage to be filled up on the ClearML Server machine.
Thanks, I went through it and this seems easy
Let me try to be a bit more clear.
If I have a training task in which I'm getting multiple ClearML Datasets from multiple ClearML IDs. I get local copies, train the model, save the model, and delete the local copy in that script.
Does ClearML keep track of which data versions were gotten and used from ClearML Data?
I don't think I changed anything.
To be more clear. An example use case for me would be, that I'm trying to make a pipeline which every time a new dataset/batch is published using clearml-data,
Get the data Train it Save the model and publish it
I want to start this process with a trigger when a dataset is published to the server. Any example which I can look to for accomplishing something like this?
I get the following error.
I'm not in the best position to answer these questions right now.
Anyway, in the docs, there is a function called task.register_artifact()
This is the original repo which I've slightly modified.
I recall being able to pass a script to the agent using the command line along with a requirements file.