Reputation
Badges 1
25 × Eureka!Hi IntriguedRat44
You can make log it offline (i.e. into a local folder/zip) by calling:Task.set_offline(True)You can also set the environment variable:TRAINS_OFFLINE_MODE=1You could also just skip the Trains.init call π
Does that help?
Hi @<1590514584836378624:profile|AmiableSeaturtle81>
I think you should use add_external_files , instead of add_files (which is for local files)
None
Hi @<1523702786867335168:profile|AdventurousButterfly15>
I am running cross_validation, training a bunch of models in a loop like this:
Use the wildcard or disable all together:
task = Task.init(..., auto_connect_frameworks={"joblib": False})
You can also do
task = Task.init(..., auto_connect_frameworks={"joblib": ["realmodelonly.pkl", ]})
Can you share the modified help/yaml ?
Did you run any specific migration script after the upgrade ?
How many apiserver instances do you have ?
How did you configure the elastic container? is it booting?
ResponsiveCamel97
could you attach the full log?
Hi FancyWhale93 you can disable the auto model uploading with@PipelineDecorator.component(..., auto_connect_frameworks={'pytorch': False}) def step(): pass
The latest image seems to require drivers on the host 460+
try this one:
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-12.html#rel_20-12
I'm assuming you cannot directly access port 10022 (default ssh port on the remote machine) from your local machine, hence the connection issue. Could that be?
Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...
, the easiest way possible would be if could just some how run task and let the lsf manage the environment
You mean let the LSF set the conda/venv ? or do you also mean to get the code-base, changes etc ?
Not yet π
It should not be complex to implement,
The actual aws auto scaler class is implementing just two functions:
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L104
def spin_down_worker(self, instance_id):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L...
Hi ReassuredTiger98
I think it used to be the default and then it was removed, it has no real affect on performance but it remove all asserts ... what is your use case ? do you see any performance gains ?
LOL love that approach.
Basically here is what I'm thinking,
` from clearml import Task, InputModel, OutputModel
task = Task.init(...)
run this part once
if task.running_locally():
my_auxiliary_stuff = OutputModel()
my_auxiliary_stuff.system_tags = ["DATA"]
my_auxiliary_stuff.update_weights_package(weights_path="/path/to/additional/files")
input_my_auxiliary = InputModel(model_id=my_auxiliary_stuff.id)
task.connect(input_my_auxiliary, "my_auxiliary")
task.execute_remotely()
my_a...
Fully automatic, just have them defined and Task.init and everything else will work out of the box.
Notice the Env will override clearml.conf, so you can have clearml.conf with other default values inside the container, and have the Env override the definition
(not to worry, it is Not a must to have clearml.conf , it's just a nice way to add default values)
(I suspect you are correct, but I'm missing some information in order to understand where the problem is)
WackyRabbit7 can you send mock code that explains how you create the pipeline ?
the storage configuration appears to have changed quite a bit.
Yes I think this is part of an the cloud ready effort.
I think you can find the definitions here:
https://artifacthub.io/packages/helm/allegroai/clearml
JitteryCoyote63 oh dear, let me see if we can reproduce (version 1.4 is already in internal testing, I want to verify this was fixed)
I guess I just have to make sure that total memory usage of all parallel processes are not higher than my gpu's memory.
Yep, unfortunately I'm not aware of any way to do that automatically π
Hi @<1578555761724755968:profile|GrievingKoala83>
mount s3 as a cache folder
I'm not sure that would be fast enough for cache ...
How to override
/root/.cache/pip
path?
in your clearml.conf fille:
None
then set it to your PV
BTW: there is a full Pipeline class that does everything for you, example here:
https://github.com/allegroai/clearml/tree/master/examples/pipeline
btw: what's the OS and python version?
Oh I see the pipeline controller itself (not the components) is the one with the repo
To fix that add at the top of the script the following:
` from clearml import Task
Task.force_store_standalone_script()
@PipelineDecorator.pipeline(...) `That should do the trick
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
Hi SubstantialElk6 I'll start at the end, you can run your code directly on the remote GPU machine π
See clearml-task documentation, on how to create a task from existing code and launch it
https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md
That said, the idea is that you add the Task.init call when you are writing/coding the code itself, then later when you want to run it remotely you already have everything defined in the UI.
Make sense ?
named asΒ
venv_update
Β (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable anΒ
output_uri
Β parameter in theΒ
PipelineDecorator.componen...
okay let's PR this fix ?