It completed after the max_job limit (10)
Yep this is optuna "testing the water"
Hi EnviousStarfish54
Artifacts are stored per experiment, that means that storage wise every experiment uploading an artifact (even if it is the same file content as previous execution) will create a new file on the central storage (default being the trains-server)
As for the preferred way to share data / artifacts. Where do you have your trains server ? Is it local ? Cloud? Where do you access it from home? VPN?
but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
How so?
Thanks SmallDeer34 !
Would you like us to? How about a footnote/acknowledgement?
How about a reference / footnote ?@misc{clearml, title = {ClearML - Your entire MLOps stack in one open-source tool}, year = {2019}, note = {Software available from
}, url={
}, author = {allegro.ai}, }
WackyRabbit7 I do 'pkill -f trains' but it's the same... If you need to debug and test run with --foreground and just hit ctrl-c to end the process (it will never switch to background...). Helps?
ShaggyHare67 notice that the services queue is designed to run CPU based tasks like monitoring etc.
For the actual training you need to run your trains-agent
on a GPU machine.
Did you run the trains-agent init
? it will walk you through the configuration (git user/pass) included.
If you want to manually add them, you can see an example of the configuration file in the link below.
You can find it on ~\trains.conf
https://github.com/allegroai/trains-agent/blob/master/docs/tr...
BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version π
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]
os.environ['CLEARML_PROC_MASTER_ID'] = ''
Nice catch! (I'm assuming you also called Task.init somewhere before, otherwise I do not think this was necessary)
I think i solved it by deleting the project and running the base_task one time before the hyper parameter optimzation
So isit working now? everything is there ?
This line π
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it
pipe.start_locally() will run the DAG compute part on the same machine, where pipe.start() will start it on a remote worker (if it is not already running on a remote worker)
basically "pipe.start()" executed via an agent, will start the compute (no overhead)
does that help?
π
Okay But we should definitely output an error on that
I'm hoping i can find an end to end solution that also includes experiment management
Well of course biased here, but ClearML with the hyperdatasets is probably the most complete one.
Specifically with model performance analysis I would add voxel open-source to dissect specific results. but the combination of the abstraction and query capabilities of hyperdatasets, orchestration and experiment management are really unmatched for.
(and again of course I'm biased, but really there is n...
Hi @<1532532498972545024:profile|LittleReindeer37>
Does Hydra support notebooks ? If it does, can you point to an exapmle?
Hi WickedGoat98
Regardless on the ingress configuration (which seems like you have the hang of), the API instance itself needs to be configured with persistent volume (the web / file server do not need direct access to the API server).
Can you get the API to run properly ?
Regrading the trains-agent
once you have the API/Web/File server configured, you can configure it like the trains-agent-services is configured inside the docker-compose (e.g. set the environment variable with the c...
https://github.com/allegroai/clearml/blob/master/clearml/automation/trigger.py
Example coming soon, with docs :)
ImmensePenguin78 it might be... Let me check, worst case sync after the weekend π
(pypi does contain 1.2.0rc4 and we are finalizing tests so that we can release a stable 1.2.0)
Hi @<1715900788393381888:profile|BitingSpider17>
Notice that you need __ (double underscore) for converting "." in the clearml.conf file,
this means agent.docker_internal_mounts.sdk_cache
will be CLEARML_AGENT__AGENT__DOCKER_INTERNAL_MOUNTS__SDK_CACHE
None
Hi SpicyOtter88plt.plot([0, 1], [0, 1], 'r--', label='')
ti cannot have a legend without a label, so it gives it "anonymous" label, I think it should just get "unlabeled 0" wdyt?
so all models are part of the same experiment and has the experiment name in their name.
Oh that explains it, (1) you can use the model filename to control the model name in clearml (2) you can disable the autologging and manually upload the model, then you can control the model name
wdyt?
Yes, could you send the full log? screen grab ?
Hi ConvolutedSealion94
Yes πTask.set_random_seed(my_seed=123) # disable setting random number generators by passing None task = Task.init(...)
Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?
Hi IrritableJellyfish76
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_tasks
task_name
(
str
) β The full name or partial name of the Tasks to match within the specified
project_name
(or all projects if
project_name
is
None
). This method supports regular expressions for name matching. (Optional)
You are right, this is a bit confusing, I will make sure that we add in the docstring an examp...
AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task π
This is causing the artifact .get()
method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do π )
EDIT: basically clearml failed to realize you also need pandas because it was never imported ...
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137def callback(_, model_info): model_info.name = "my new name" return model_info
Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
This workflow however is the only way I have found to easily fix my previous βModule not foundβ errors
Hmm okay make sense,
Did you try to set these ?
or even hack the sys.path with something likeimport sys, os sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)+"/../")