Reputation
Badges 1
25 × Eureka!I have a lot of parameters, about 40. It is inconvenient to overwrite them all from the window that is on the screen.
Not sure I follow, so what are you suggesting?
- I'm happy tp hear you found a work around
- Seems like there is something wrong with the way the pbtxt is being merged, but I need some more information
{'detail': "Error processing request: object of type 'NoneType' has no len()"}
Where are you seeing this error?
What are you seeing in the docker-compose log.
data["encoded_lengths"]
This makes no sense to me, data is a numpy array, not a pandas frame...
Hi @<1597399925723762688:profile|IrritableStork32>
I think that if you have clearml installed an configured on your machine it should just work:
None
@<1569496075083976704:profile|SweetShells3> remove these from your pbtext:
name: "conformer_encoder"
platform: "onnxruntime_onnx"
default_model_filename: "model.bin"
Second, what do you have in your preprocess_encoder.py ?
And where are you getting the Error? (is it from the triton container? or from the Rest request?
Yeah I think that for some reason the merge of the pbtxt raw file is not working.
Any chance you have an end to end example we could debug? (maybe just add a pbtxt for one of the examples?)
Thanks @<1569496075083976704:profile|SweetShells3> for bumping it!
Let me check where it stands, I think I remember a fix...
Hi SubstantialElk6
Could you test with the latest RC6 ?
pip install clearml==0.17.5rc6
Can you run the entire thing on your own machine (just making sure it doesn't give this odd error) ?
SubstantialElk6 "Execution Tab" scroll down you should have "Installed Packages" section, what do you have there?
Nice SubstantialElk6 !
BTW: you can configure your cleaml client to store the changes from the latest Pushed commit (and not the default which is latest local commit)
see store_code_diff_from_remote: in clearml.conf:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/docs/clearml.conf#L150
Hi @<1654294828365647872:profile|GorgeousShrimp11>
can you run a pipeline on a
schedule
or are schedules only for Tasks?
I think one tiny details got lost here, Pipelines (the logic driving them) are a type of Task, this means you can clone and enqueue them like other tasta
(Task.enqueue / Task.clone)
Other than that looks good to me, did I miss anything ?
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
do you know how can i save all the logs and all the metric images?
These are stored into clearml-server, no? what am I missing ?
Sounds good.
BTW, when the clearml-agent is set to use "conda" as package manager it will automatically install the correct cudatoolkit on any new venv it creates. The cudatoolkit version is picked direcly when "developing" the code, assuming you have conda installed as development environment (basically you can transparently do end-to-end conda, and not worry about CUDA at all)
Whoa, are you saying there's an autoscaler that
doesn't
use EC2 instances?...
Just to be clear the ClearML Autoscaler (aws) will spin instances up/down based on jobs in the queue it is listening to (the type of EC2 instances and configuration is fully configurable)
WittyOwl57 that is odd there is a specific catch for SystemExit
https://github.com/allegroai/clearml/blob/51d70efbffa87aa41b46c2024918bf4c584f29cf/clearml/backend_interface/task/repo/scriptinfo.py#L773
How do I reproduce this issue/warning ?
Also: "Repository and package analysis timed out (300.0 sec), giving up" seriously ove 5 minutes ?! how large is the git repo?
I ended up usingΒ
task_overrides
Β for every change, and this way I only need 2 tasks (a base task and a step task, thus I useΒ
clone_base_task=True
Β and it works as expected - yay!)
Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory argument in add_step
I think it's still an issue, not critical though, because we have another way to do it and it works
I could not reproduce it, I think the issue w...
CheerfulGorilla72 could it be the server address has changed when migrating ?
Hi StickyMonkey98
I'm (again) having trouble with the lack of documentation regarding Task.get_tasks(task_filter={STUFF}).
Yes we really have to add documentation there... Let me add that to the todo list
How do I filter tasks by time started? It seems there's a "started" property, and the web ui uses "started" as a key-word in the url query, but task_filter results in an error when I try that...Is there some other filter keyword for filtering by start-time??
last 10 started ...
Hi MelancholyBeetle72 , that's a very interesting case. I can totally understand how storing a model and then immediately renaming it breaks the upload. A few questions, is there a way for pytorch lightning not to rename the model? Also I wonder if this scenario happens a lot (storing model and changing it) . I think the best solution is for Trains to create a copy of the file and upload it in the background. That said the name will still end with .part What do you think?
MelancholyBeetle72 thanks! I'll see if we could release an RC with a fix soon, for you to test :)
True, this is exactly the reason. That said, you can always manually add it. You can see the default values : https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf
Or you can do:
param={'key': 123}
task.connect(param)
Hi GreasyPenguin14
Sure you can, although a bit convoluted (I'll make sure we have a nice interface π )import hashlib title = hashlib.md5('epoch_accuracy_title'.encode('utf-8')).hexdigest() series = hashlib.md5('epoch_accuracy_series'.encode('utf-8')).hexdigest() task_filter = { 'page_size': 2, 'page': 0, 'order_by': ['last_metrics.{}.{}'.format(title, series)] } queried_tasks = Task.get_tasks(project_name='examples', task_filter=task_filter)
Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.
Hi VexedKangaroo32 , there is now an RC with a fix:pip install trains==0.13.4rc0Let me know if it solved the problem
BTW, VexedKangaroo32 are you using torch launch ?
Could you extend on the use case of #18 ? how would you use it? what problem will it be solving ?