thought the agent created a new conda env and installed all packages
It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point...
Hi @<1742355077231808512:profile|DisturbedLizard6>
the problem maybe in returning None in get_local_model_file()
This tracks, it means that the model file cannot be downloaded for some reason,
when you click on the model here: None
what doe sit say under "MODEL URL:"?
.replace('file://', '', 1)
I have mounted my s3 bucket at the location /opt/clearml/data/fileserver/ but I can see my data is not being stored in s3 but its storing in ebs. How so?
I'm assuming the mount was not successful
What you should see is a link to the files server inside clearml, and actual files in your S3 bucket
Hi @<1668427971179843584:profile|GrumpySeahorse51>
Could you provide the full stack log?
this erros seems to originate from psutil (which is used) but it lacks the clearml-session context
PompousBeetle71 , the reason I'm asking is the warning you see is due to the fact it cannot detect the filename you are saving your model to ... I'm trying to figure out how that actually happened .
BTW: in the next version we will probably remove this warning altogether, but I'm still curious on how to reproduce 🙂
JitteryCoyote63 correct, you could also use Task.create that creates a Task but does not do any automagic.
I also saw the PR for set_parent, will be merged shortly 🙂 thanks!
Now I see, the scenario is similar to the HyperParameter scenario , see the TrainsJob https://github.com/allegroai/trains/blob/master/trains/automation/job.py
I still don't see why you would change the type of the cloned Task, I'm assuming the original Task had the correct type, no?
BTW: what's the use case? Why do you need to open two Tasks in the same code/script ?
GiganticTurtle0 BTW, this mock example worked out of the box (python 3.6 on Ubuntu):
` from typing import Any, Dict, List, Tuple, Union
from clearml import Task
from dask.distributed import Client, LocalCluster
def start_dask_client(
n_workers: int = None, threads_per_worker: int = None, memory_limit: str = "2Gb"
) -> Client:
cluster = LocalCluster(
n_workers=n_workers,
threads_per_worker=threads_per_worker,
memory_limit=memory_limit,
)
client = Cli...
That said, the arguments are passed Inside the code executed (i.e. monkey patched into the frameworks). This allows it to log and change All the arguments, including the default ones , and allow you to edit them.
Does that make sense ?
Hi @<1610083503607648256:profile|DiminutiveToad80>
You mean the pipeline logic? It should autodetect the imports of the logic function (like any Task.init call)
You can however call Task.force_requirements_env_freeze and pass a local requiremenst.txt
Make sure to call it before create the Pipeline object
None
What are you seeing in the Task that was cloned (i.e. the one the HPO created not the original training task)?
by that I mean, configuration section, do you have the Args there ? (seems like the pic you attached, but I just want to make sure)
Also in the train.py file, do you also have Task.init ?
Hi JitteryCoyote63 you can bus obviously you should be careful they might both try to allocate more GPU memory than they the HW actually has.TRAINS_WORKER_NAME=machine_gpu0A trains-agent daemon --gpus 0 --queue default --detached TRAINS_WORKER_NAME=machine_gpu0B trains-agent daemon --gpus 0 --queue default --detached
it certainly does not use tensorboard python lib
Hmm, yes I assume this is why the automagic is not working 😞
Does it have a pythonic interface form the metrics ?
(with matplotlib 3.2+ I get no warning, let me check with 3.1)
ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)
Sorry ScaryLeopard77 I missed the reply,
the tutorial in the readme of clearml-serving repo doesn't mention it though. Where should I set it?
oh dear ... you are right (I think it was there in previous versions)clearml-serving --helphttps://github.com/allegroai/clearml-serving/blob/ce6ec847b1e01c6f5bf35d638e6ceb8148db8a7a/clearml_serving/main.py#L142
This is the equivalent of what is created here in the example:
https://github.com/allegroai/clearml-serving/blob/ce6ec847b...
Hmm MiniatureHawk42 how many files in the zip ?
Hi PanickyMoth78
` torch.save(net.state_dict(), PATH) # auto-uploads to GCS
get all the models from the Task
output_models = Task.current_task().models["output"]
get the last one
last_model = output_models[-1]
set meta-data
last_model.set_metadata(key="my key", value="my value", type="str") `
IrritableOwl63 in the profile page, look at the bottom right corner
Hmm, let me see if you can somehow "signal" to the subprocess that it should not use the main process Task. (btw: are you forking or spawning a subprocess?)
Basically create a token and use it as user/password
EDIT:
With read-only permissions 🙂
Yes, it will always create a new Task.
- Suppose that the serving project A is serving some model version 1 and a new model is trained and it starts serving model version 2, but on runtime due to some reason reason we need to revert to model version 1, what would be the best way to achieve the above?
If you archive the model, then the cleaml-session will pick the "latest" non-archived model, essentially reverting to the previous version. Also notice that it supports multiple versions on a single endpoint (again also a feat...