That being said it returns none for me when I reload a task but it's probably something on my side.
MistakenDragonfly51 just making sure, you did call Task.init, correct ?
What duesfrom clearml import Task task = Task.current_task()
returns ?
Notice that you need to create the Task before actually calling Logger.current_logger()
or Task.current_task()
I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it 🙂
SubstantialElk6 feel free to tweet them on their very inaccurate comparison table 🙂
PlainSquid19 Trains will analyze the entire repository if this is a git repo code, and a single script file if there is no repository found.
It will not analyze an entire folder if it is not in a git repository, because it will not be able to recreate this folder anyhow. Make sense ?
Hi SmallDeer34
Can you try with the latest RC , I think we fixed something with the jupyter/colab/vscode support!pip install clearml==1.0.3rc1
instead of the one that I want or the one of the env which it is started from.
The default is the python that is used to run the agent.agent.ignore_requested_python_version = true agent.python_binary = /my/selected/python3.8
LovelyHamster1 Now I see... Interesting credentials ability. Specifically all the S3 access on trains is derived from the ~/clearml.conf
credentials section :
https://github.com/allegroai/clearml/blob/ebc0733357ac9ead044d0ed32d41447763f5797e/docs/clearml.conf#L73
( or the AWS S3 environment variables )
I'm not sure how this AWS feature works, I suspect it is changing the AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY variables on the ec2 instance. If this is the case, it should work out of...
Hi MortifiedCrow63
saw
, ...
By default ClearML
will only log the exact local place where you stored the file, I assume this is it.
If you pass output_uri=True
to the Task.init
it will automatically upload the model to the files_server and then the model repository will point to the files_server (you can also have any object storage as model storage, e.g. output_uri=s3://bucket
)
Notice yo...
So “wait” is a better metaphore for me
So I would do something like (I might have a few typos but that's the gist):
def post_execute_callback_example(a_pipeline, a_node):
# type (PipelineController, PipelineController.Node) -> None
print('Completed Task id={}'.format(a_node.executed))
# wait until model is tagged, then pass it as argument
while True:
found = Moodel.query_models(...) # model filter here, inlucing tag and project
if found:
...
FrothyShark37 any chance you can share snippet to reproduce?
JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)
Hi PungentLouse55 ,
Yes we have integration with hydra on the todo list since it was first released, we actually know the guy behind Hydra, he is awesome!
What are your thoughts on integration, we would love to get feedback and pointers (Hydra itself is quite capable, and we waiting until we have multiple configuration support, and with v0.16 it was added, so now it is actually possible)
is there a way to visualize the pipeline such that this step is “stuck” in executing?
Yes there is, the pipelline plot (see plots section on the Pipeline Task, will show the current state of the pipeline.
But I have a feeling you have something else in mind?
Maybe add Tag on the pipeline Task itself (then remove it when it continues) ?
I'm assuming you need something that is quite prominent in the UI, so someone knows ?
(BTW I would think of integrating it with the slack monitor, to p...
DeterminedToad86 I suspect that since it was executed on sagemaker it registered a specific package that is unique for Sagemaker (no to worry installed packages can be edited after you clone/reset the Task)
LovelyHamster1
Also you can use pip freeze
instead of the static code analysis , on your development machines set:detect_with_pip_freeze: false
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/docs/clearml.conf#L169
Hi BoredHedgehog47
Just make sure it is installed as part of the "installed packages" 🙂
You should end up with something likegit+
You can actually add it from your code:Task.add_requirements("git+
") task = Task.init(...)
Notice you can also add a specific commit or branch git+
https://github.com/user/repo.git@ <commit_id_here_if_needed>
Is this what you are looking for ?
EDIT:
you can also do "-e ." that should also work:
` Task.add_requirements("-e .")
task = Ta...
EnviousStarfish54 generally speaking the hyper parameters are flat key/value pairs. you can have as many sections as you like, but inside each section, key/value pairs. If you pass a nested dict, it will be stored as path/to/key:value (as you witnessed).
If you need to store a more complicated configuration dict (nesting, lists etc), use the connect_configuration, it will convert your dict to text (in HOCON format) and store that.
In both cases you can edit the configuration and then when ru...
. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
It’s only on this specific local machine that we’re facing this truncated download.
Yes that what the log says, make sense
Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly?
hmm exec into the container? what did you put in clearml.conf?
Hi @<1671689437261598720:profile|FranticWhale40>
You mean the download just fails on the remote serving node becuause it takes too long to download the model?
(basically not a serving issue per-se but a download issue)
I came across it before but thought its only relevant for credentials
We are working on improving the docs, hopefully it will get clearer 😉
the services queue (where the scaler runs) will be automatically exposed to new EC2 instance?
Yes, using this extra_clearml_conf
parameter you can add configuration that will be passed to the clearml.conf
of the instances it will spin.
Now an example to the values you want to add :agent.extra_docker_arguments: ["-e", "ENV=value"]
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
wdyt?
I lost you SmallBluewhale13 is this the Task init call you used:task = Task.init( project_name="examples", task_name="load_artifacts", output_uri="s3://company-clearml/artifacts/bethan/sales_journeys/", )
Hi @<1547028074090991616:profile|ShaggySwan64>
I'm guessing just copying the data folder with rsync is not the most robust way to do that since there can be writes into mongodb etc.
Yep
Does anyone have experience with something like that?
basically you should just backup the 3 DBs (mongo, redis, elastic) each one based on their own backup workflows. Then just rsync the files server & configuration.
I think I was not able to fully express my point. Let me try again.
When you are running the pipeline Fully locally (both logic and components) the assumption is this is for debugging purposes.
This means that the code of each component is locally available, could that be a reason?
in this week I have met at least two people combining ClearML with other tools (one with Kedro and the other with luigi)
I would love to hear how/what is the use case 🙂
If I run the pipeline twice, changing only parameters or code of taskB, ...
I'll start at the end, yes you can clone a pipeline in the UI (or from code) and instruct it to reuse previous runs.
Let's take our A+B example, Let's say I have a pipeline P, and it executed A and then B (which relies on A's output...
Sure thing :)
BTW could you maybe PR this argument (marked out) so that we know for next time?
Is there a way to connect to the task without initiating a new one without overriding the execution?
You can, but not with automagic, you can manually send metrics/logs...
Does that help? or do we need the automagic?
PompousParrot44 , so you mean like a base conda env?
Configuring trains-agent to use conda is done here:
https://github.com/allegroai/trains-agent/blob/699d13bbb34649c7e5337b4187cda59b7fa6fd33/docs/trains.conf#L44
Then for every experiment trains-agent will create a new conda environment based on the requirements of that experiment.
You can tell it to inherit the base conda env (or the one it is running from, I think) by settingsystem_site_packages: true
https://github.com/allegroai/tr...