But I'm sure there is a cleaner way to proceed.
Maybe ?!path = task.get_output_destination().replace('file://', '', 1)
@<1523701523954012160:profile|ShallowCormorant89> can you verify it is reproducible in 1.9.3 ? because if it is I'd like to fix that 🙂
will it be possible for us to configure the "new run" button in a way so that it always clones from a particular pipeline ?
What do you mean by "particular pipeline" ? by default it will clone the last successful one, and by right clicking a specific one you can run a copy of that one. what am I missing ?
Hi GracefulDog98
Are argument parameters to the script not passed on to the workers, or am I missing something?
The arguments are passed directly when the code is executed (i.e. the argparser parse_args is called).
If the code fails, I'm assuming the argparse is called before clearml is imported, could that be the case ?
HandsomeCrow5
So using the _edit
method you have the ability to add/edit the execution.script field, without worrying about the API version (I guess the name edit
is misleading, it also does add :)
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?
Is there any progress made on the clearml-serving repo?
Hi JitteryCoyote63
yes, things are progressing slower than expected, I'm expecting actual work will be pushed in early Jan. On the bright side we are trying to work closely with TorchServing team and Nvidia Triton to expand capabilities.
Currently it seems the setup will be "proxy server container" for per-post processing, then serving engine container (Triton/Torch), with monitoring container as control plan (i.e. collecting s...
Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?
I wonder if I just need to join 2 docker-compose files to run everything in one session
Actually that could also work
But for reference, when I said IP i meant the actual host network IP not the 127.0.0.1 (which is the same as localhost)
Hi JitteryCoyote63 ,
upload_artifacts was designed to upload pre made artifacts, which actually covers everything.
With register_artifacts we tried to have something that will constantly log PD artifact, the use case was examples used for training and their order, so we could compare the execution of two different experiments and detect dataset contamination etc.
Not Sure it is actually useful though ...
Retrieving an artifact from a Task is done by:
` Task.get_task(task_id='aaa').artifact...
CheekyFox58 what do you have in the plots Tab?
what is user properties
Think of them as parameters you can add post execution, that you can also add to the Task table (i.e. customize columns)
how can I add parameters
task.set_user_properties([{"name": "backbone", "description": "network type", "value": "great"},]
UptightMouse31 You can add any metric (KPI) with "manual" loggingLogger.current_logger().report_scalar("KPI", "metric", iteration=0, value=1.1)
This means you can later add a column KPI/metric to your experiment table.
Will this do the trick ?
see here the docker_setup_bash_script
argument
None
It will be executed (no need for the #!/bin/bash
btw) before starting to setup the env inside the container, so apt-get and the like can be executed if needed. Notice that if this is something that Always needs to be executed, you can put the same list of commands here: [None](https://github.com/allegroai/clearml-agen...
SubstantialElk6 on the client side?
like what all are important metric monitoring queries w.r.t. the serving tasks that can be visualized and shown in grafana?
Basically latency amd requests per minute are automatically reported. Additional reports are based on your RestAPI in/out.
Imagine the following restapi request json payload
{x=123, y=456}
and a return json of
{z=789}
The metrics you can add to the monitoring are the keys on both these jsons, i.e. "x", "y", "z"
These metrics can be both log...
The latest TAO doesn't use python for fine tuning, rather it uses the CLI entirely
It's a good question, but I think the CLI actually just runs a python code (the CLI is their interface). Generally speaking I'm pretty sure it will not be complicated to convert the TLT integration to support TAO (Nvidia helps with that, and I think we had a similar proces with Nvidia Clara/MONAI)
BTW: how are you using Nvidia TAO ?
Hi WackyRabbit7 ,
Regrading git credentials, see here in the trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L18
Trains assumes one of two (almost three) possible setups
Your code/script is in a git repository. Then when executing manually all the git references incl` uncommitted changes are stored. Then when executing with the trains-agent, it will clone the code based on these references apply the uncommitted changes and run your code. To do that the ...
Okay
Try to reset the experiment and resend for execution, let me know if you still get the error, if you do, could you send a screen grap of the Execution tab? Trains supports either git repo, or standalone code (jupyter) but not a mixture of the two. This means that if you want to run the jupyter/colab the cloning will have to be part of the notebook itself (as you already have it). That said, due to the way CoLab works, Trains will log your execution history (as opposed to the entire jupy...
no requests are being served as in there is no traffic indeed
It might be that it only pings when requests are served
what is actually setting the task status to
Aborted
?
server watchdog, basically saying, no one is pinging "I'm alive" on this "Task" I should abort it
my understanding was that the deamon thread was deserializing the task of the control plane every 300 seconds by default
Yeah.. let me check that
Basically this sounds like a sort of a bug,...
Does StorageManager.upload and upload_artifact use the same methods?
Yes they both use StorageManager.upload
Is the only difference is task being async?
Two differences:
Upload being async Registering the artifact on the experiment. StorageManager will only upload, where as upload_artifact will make sure the file is registered as an artifact on the experiment, together with all of the artifacts properties.
Sure thing 🙂
BTW: ReassuredTiger98 this is definitely an interesting use case, and I think you can actually write some code to solve it if you like.
Basically let's followup on you setup:Machine X: agent listening to queue A, B_machine_a *notice we have two agents here Machine Y: agent listening to queue B_machine_b
Now we (the users) will push our jobs into queues A and B
Now we have a service that does the following:
` see if we have a job in queue B
check if machine Y is working...
However I'm quite confident, that plots and scalars are not visible online only when 'git diff to large to store' appears.
These should be unrelated, are you seeing console outputs ?
why is pushing into the services queue required ...
The services queue is usually connected with an agent running in "services mode" which means this agent is executing multiple tasks in parallel (as opposed to regular agent that only launches one Task at a time, the assumption is that "service" Tasks are usually not heavy on cpu/ram so multiple instances make sense)
Oh sorry:pip install clearml-agent==1.2.0rc4
Also automatically detects if you have an active venv inside the container and uses it instead of the system wide python
TypeError:Â
init
() got an unexpected keyword argument 'base_pod_num'
Could you post the entire log?
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
Hmm yes, @<1570220858075516928:profile|SlipperySheep79> I think you are right in your case it make sense to do add this option.
Could you add GH issue with the feature request? it should be fairly easy to add and we use GH to make sure we track those requests
wdyt?