Reputation
Badges 1
25 × Eureka!Create a new version of the dataset by choosing what increment in SEMVER standard I would like to add for this version number (major/minor/patch) and uploadOh this is already there
` cur_ds = Dataset.get(dataset_project="project", dataset_name="name")
if version is not given it will auto increase based on semantic versions incrementing the last number 1.2.3 -> 1.2.4
new_ds = Dataset.create(dataset_project="project", dataset_name="name", parents=[cur_ds.id]) `
but out of curiosity, whats the point on doing a hyperparam search on the value of the loss on the last epoch of the experiment
The problem is that you might end up with global min that is really nice, but it was 3 epochs ago, and you have the last checkpoint ...
BTW, global min and last min should not be very diff if the model converge, wdyt?
Hi RoughTiger69
Interesting question, maybe something like:
` @PipelineDecorator.component(...)
def process_sub_list(things_to_do=[0,1,2]):
r = []
for i in things_to_do:
print("doing", i)
r.append("done{}".format(i))
return r
@PipelineDecorator.pipeline(...)
def pipeline():
create some stuff to do:
results = []
for step in range(10):
r = process_sub_list(list(range(step*10, (step+1)*10)))
results.append(r)
push into one list with all result, this will ac...
Oh think I understand you point now.
basically you can:
Create the initial Task, once it is in the system clone it and adjust parameters externally. A simple example here:
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/automation/manual_random_param_search_example.py#L41
wdyt?
hmm that is odd, it should have detected it, can you verify the issue still exists with the latest RC?pip3 install clearml-agent==1.2.4rc3
Hi UnevenDolphin73
This differentiable storage - does it only work on file additions/removal, or also on intra-file changes?
This is on a file level, meaning you change a single byte in the file, the entire file will be packaged in the new version.
Make sense ?
CurvedHedgehog15 the agent has two modes of opration:
single script file (or jupyter notebook), where the Task stores the entire file on the Task itself. multiple files, which is only supported if you are working inside a git repository (basically the Task stores a refrence to the git repository and the agent pulls it from the git repo)Seems you are missing the git repo, could that be?
Hmm I suspect the 'set_initial_iteration' does not change/store the state on the Task, so when it is launched, the value is not overwritten. Could you maybe open a GitHub issue on it?
Is Task.current_task() creating a task?
Hmm it should not, it should return a Task instance if one was already created.
That said, I remember there was a bug (not sure if it was in a released version or an RC) that caused it to create a new Task if there isn't an existing one. Could that be the case ?
GiganticTurtle0 in the PipelineDecorator.component
, did you pass helper_functions=[]
with refrence to all the sub component ?
You mean the job with the exact same arguments ?
do you have other arguments you are passing ?
Are you using Optuna / HBOB ?
can i run it on an agent that doesn't have gpu?
Sure this is fully supported
when i run clearml-serving it throughs me an error "please provide specific config.pbtxt definion"
Yes this is a small file that tells the Triton server how load the model:
Here is an example:
https://github.com/triton-inference-server/server/blob/main/docs/examples/model_repository/inception_graphdef/config.pbtxt
What do you already have working from the above steps ? and which parts are missing or we can think of automating ?
Yes this is Triton failing to load the actual model file
And command is a list instead of a single str
"command list", you mean the command
argument ?
But functionality is working
Awesome , I will wait with the merge until tested internally .
There is a resale coming out after the weekend, once it is out I expect we will merge it.
WittyOwl57 I can verify the issue reproduces! 🎉 !
And I know what happens, TQDM is sending an "up arrow" key, if you are running inside bash, that looks like CR (i.e. move the cursor to the begining of the line), but when running inside other terminals (like PyCharm or ClearML log) this "arrow-key" is just unicode character to print, it does nothing, and we end up with multiple lines.
Let me see if we can fix it 🙂
Hi TightElk12
it would raise an error if the env where execution happens is not configured to track things on our custom server to prevent logging to the public demo server ?
What do you mean by that? catching the default server instead of the configured one ?
ohh AbruptHedgehog21 if this is the case, why don't you store the model with torch.jit.save
and use Triton to run the model ?
See example:
https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch
(BTW: if you want a full custom model serve, in this case you would need to add torch to the list of python packages)
Feel free to open an issue on GitHub making sure this is not forgotten
So this is why 🙂
an agent can only run one Task at a time.
The HPO (being a Task on its own) should run on the "services" queue, where the agent can run multiple "cpu controller" Tasks like the HPO.
Make sense ?
Hi PanickyMoth78PipelineDecorator.set_default_execution_queue('default')
Would close the current process and launch the pipeline logic on the "serices" queue. Which means the local process is being terminated (specifically in your case the notebook kernel). Does that make sense ?
If you want the pipeline logic to stay on the local machine you can say:@PipelineDecorator.pipeline(..., pipeline_execution_queue=None)
In your code, can you print the following:import os print(os.environ.keys())
There should be a few keys the Pycharm plugin is sending from the local machine, pointing to the git repo
Yes, because when a container is executed, the agent creates a new venv and inherits from the system wide installed packages, but it cannot inherit or "understand" there is an existing venv, and where it is.
Is this consistent on the same file? can you provide a code snippet to reproduce (or understand the flow) ?
Could it be two machines are accessing the same cache folder ?
Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly