Also, the steps say that I should run the serving process on the default queue but I've run it on a queue I created called a serving queue and have an agent listening for it.
Okay so when I add trigger_on_tags, the repetition issue is resolved.
Is there a difference? I mean my use case is pretty simple. I have a training and it basically creates a lot of checkpoints. I just want to keep the n best checkpoints and whenever there are more than N checkpoints, I'll delete the worst performing one. Deleted both locally and from the the task artifacts.
Since that is an immediate concern for me as well.
Also my execution just completed and as of yet, I can only see the hyperparameters as a report. not in a configurable form. I've just started with ClearML and am having these issues.
My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train
That but also in proper directory on the File System
So minimum would be 2 cores with 8 gigs for ram. I'm going to assume 4 cores and 16 gigs would be recommended.
Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.
That's weird, it doesn't work on my main ubuntu installation but does work on a vm i created of ubuntu on windows
Its a simple DAG pipeline.
I have a step, at which I want to run a task which finds the model I need.
That makes sense. But doesn't that also hold true for dataset.get_local_mutable_copy()?
So it won't work without clearml-agent? Sorry for the barrage of questions. I'm just very confused right now.
It's basically data for binary image classification, simple.
I recall being able to pass a script to the agent using the command line along with a requirements file.
I get the following error.
CostlyOstrich36 This didn't work, the value is -1 however the pipe didn't stop.
Sorry for the late response. Agreed, that can work, although I would prefer a way to access the data by M number of batches added instead of a certain range, since these cases aren't interchangeable. Also a simple thing that can be done is that you can create an empty Dataset in the start, and then make it the parent of every dataset you add.
So the api is something new for me. I've already seen the sdk. Am I misremembering sending python script and requirements to run on agent directly from the cli? Was there no such way?
After the step which gets the merged dataset, I should use pipe.stop if it returned -1?
As I wrap my head around that, in terms of the example given in the repo, can you tell me what the serving example is in terms of the explanation above and what the triton serving engine is, in context to the above explanation
When I try to get local copy, I get this error.
File "load_model.py", line 8, in <module>
location = input_model.get_local_copy()
File "/home/fawad-nizamani/anaconda3/envs/ocr-sip-clearml/lib/python3.8/site-packages/clearml/model.py", line 424, in get_local_copy
return self.get_weights_package(return_path=True, raise_on_error=raise_on_error)
File "/home/fawad-nizamani/anaconda3/envs/ocr-sip-clearml/lib/python3.8/site-packages/clearml/model.py", line 318, in get_weights_package
...
Also could you explain the difference between trigger.start() and trigger.start_remotely()
Basically saving a model on the client machine and publishing it, then trying to download it from the server.
dataset = Dataset.create(data_name, project_name)
print('Dataset Created, Adding Files...')
dataset.add_files(data_dir)
print('Files added succesfully, Uploading Files...')
dataset.upload(output_url=upload_dir, show_progress
Should I not run the scheduler remotely if I'm monitoring a local folder?
On both the main ubuntu and the vm, I simply installed it in a conda environment using pip
were you able to reproduce it CostlyOstrich36 ?