I have install a python environment by virtualenv tool, let's say
/home/frank/env
and python is
/home/frank/env/bin/python3.
How to reuse the virtualenv by setting clearml agent?
So the agent is already caching the entire venv for you, nothing to worry about, just make sure you have this line in clearml:
https://github.com/allegroai/clearml-agent/blob/249b51a31bee97d63f41c6d5542e657962008b68/docs/clearml.conf#L131
No need to provide it an existing...
MoodyCentipede68 from your log
clearml-serving-triton | E0620 03:08:27.822945 41 model_repository_manager.cc:1234] failed to load 'test_model_lstm2' version 1: Invalid argument: unexpected inference output 'dense', allowed outputs are: time_distributed
This seems the main issue of triton failing to.load
Does that make sense to you? how did you configure the endpoint model?
New version will contain much more advanced search (including all the task fields)
are there any more fields in this function with partial matching? for example project? tags?
Yes they can all be filtered (basically everything you see in the UI)
notice: tags are strings (you can provide list of tags), project is an ID of the project
(Use Task.get_project_id, I think)
Hi SmarmyDolphin68
I see this in between my training epochs, what could be causing this?
This is basically saying we are saving a second model on the same Task and even though both are logged, only the last is stored on the Task itself.
This will change as in the next version a Task will be able to hold reference to multiple models in the artifactory 🙂
Thanks OutrageousGrasshopper93
I will test it "!".
By the way the "!" is in the project or the Task name?
IntriguedRat44 how do I reproduce it ?
Can you confirm that marking out the Task.init(..) call will fix it ?
- Be able to trigger the “pure” function (e.g. train()) locally, without any
code running, while driving it from a configuration e.g. path to the data.
When you say " without any http://clear.ml code" do mean without the agent, or without using the Clearml.Dataset ?
Be able to trigger the “
decorator” (e.g. train_clearml()) while driving it from configuration e.g. dataset_id
Hmm I can think of:
` def train_clearml(local_folder=None, dataset_id=None):
...
Hi SarcasticSparrow10
The plots in the UI allow you to control the colors of the graphs interactively (click on the color in the legend), it also allows you you toggle the legend on/off. This is on purpose so you can later adjust according to your taste 🙂
Is the layout okay (it was hard for me to understand form the screen-grab) ?
I'll make sure to reply the GitHub issue as well
ZanyPig66 this should have worked, any chance you can send the full execution log (in the UI "results -> console" download full log) and attach it here? (you can also DM it so it is not public)
BTW, VexedKangaroo32 are you using torch launch ?
SmilingFrog76 this is not a weird mechanism at all , this is proper HPC scheduler 🙂trains-agent
is not actually aware of other nodes, it is responsible for launching a Task on its own hardware (with whatever configuration it was set). What can be done is to use the trains-agent
inside a 3rd party scheduler and have the scheduler allocate the node and trains-agent spin the experiment. There is a k8s example here: basically pulling jobs for the trains-server queue and pushing ...
Yes! I checked it should work (it checks if you have load(...) function on the preprocess class and if you do it will use it:
None
def load(local_file)
self._model = joblib.load(local_file_name)
self._preprocess_model = joblib.load(Model(hard_coded_model_id).get_weights())
. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS
and clearml.conf contains no "project" section it crashed when starting ?
Hi @<1523704757024198656:profile|MysteriousWalrus11>
"parents": [
"step_two",
"step_four"
],
Seems like step 5 depends on steps 2+4 , how did you create it? what did the console say ?
Could it be your not actually passing any output from step3 ? how is it dependent on it ?
Hmm I see, add this for example
extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]
Try to add '--network host' to the docker args on the task you are launching
on the host machine or inside the containers that are spinning on the host machine ?
LazyTurkey38 notice the assumption is that the docker entry-point ends with bash, and only then the agent take charge. I'm assuming this is not te case hence the agent spins the docker, then the docker just ends, could that be?
@<1610083503607648256:profile|DiminutiveToad80> try to turn on:
None
enable_git_ask_pass: true
@<1558624430622511104:profile|PanickyBee11> how are you launching the code on multiple machines ?
are they all reporting to the same Task?
try:
None
docker_install_opencv_libs: true
I would recommend reading this blog post, it should give you a glimpse of what can be built 🙂
https://medium.com/pytorch/how-trigo-built-a-scalable-ai-development-deployment-pipeline-for-frictionless-retail-b583d25d0dd
if I encounter the need for that, I will adapt and open a PR
Great!
The easiest is to pass an entire trains.conf
file
If the load balancer it Gateway can do the computation and leverage caching,
Oh that's True. But unfortunately out of scope for the open-source (well at the end someone needs to pay our salaries 🙂 )
I’d prefer not to have our EC2 instance directly exposed to the public Internet.
Yep, I tend to agree 🙂
Sorry @<1524922424720625664:profile|TartLeopard58> 😞 we probably missed it
clearml-session is still being developed 🙂
Which issue are you referring to ?
In Azure VMSS, there is a method called "Custom Data", which is basically a way of passing things to be executed
I know that it is in the to do list to add "azure_autoscaler" which is basically asybling to the aws_autoscaler.
With the same idea of the "custom data" as initial bash script:
You can check here:
https://github.com/allegroai/clearml/blob/4a2099b53c09d1feaf0e079092c9e075b43df7d2/clearml/automation/aws_auto_scaler.py#L54