Reputation
Badges 1
25 × Eureka!Sorry, I mean a vault on the clearml-server holding the credentials per user, then agent pulls it based on the user, and it is transparent from the user perspective
I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
, but it seems like I can only trigger a task using a Task scheduler but not a pipeline.
@<1523701132025663488:profile|SlimyElephant79> Maybe we should better state it, but Pipeline is "just" another type of Task. so triggering a Task with the Pipeline ID is essentially triggering the pipeline (do notice you need to select the "services" queue to be used so that the pipeline runs on the correct resource). Make sense ?
Yes it should
here is fastai example, just in case π
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
RipeGoose2 you are not limited to the automagic
From anywhere in your code you can always do:from trains import Logger Logger.current_logger().report_plotly(...)
So you can add any manual reporting on top of the one generated by lightning .
Sounds good?
Looking at theΒ
supervisor
Β method of the baseΒ
AutoScaler
Β class, where are the worker IDs kept.
Is it in the class attributeΒ
queues
Β ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
ScantMoth28 where are you seeing this warning ?
Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.
Since this fix is all about synchronizing different processes, we wanted to be extra careful with the release. That said I think that what we have now should be quite stable. Plan is to have the RC available right after the weekend.
SmallBluewhale13 in your code what are you getting when you print the version:from clearml import __version__ print(__version__)
Hi SteadyFox10
Short answer no π
Long answer, full permissions are available in the paid tier, along side a few more advanced features.
Fortunately in this specific use case, the community service allows you to share a single (or multiple) experiments with a read-only link. Would that work ?
Probably less secure though :)
Yeah the ultimate goal I'm trying to achieve is to flexibly running tasks for example before running, could have a claim saying how many resources I can and the agent will run as soon as it find there are enough resources
Checkout Task.execute_remotely()
you can push it anywhere in your code, when execution get to it, If you are running without an agent it will stop the process and re-enqueue it to be executed remotely, on the remote machine the call itself becomes a noop,
I...
SlipperyDove40
FYI:args = task.connect(args, name="Args")
Is "kind of" reserved section for argparse. Meaning you can always use it, but argparse will also push/pull things from there. Is there any specific reason for not using a different section name?
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
VexedCat68 both are valid. In case the step was cached (i.e. already executed) the node.job will be None, so it is probably safer to get the Task based on the "executed" field which stores the Task ID used.
Does adding external files not upload them ti the dataset output_uri?
@<1523704667563888640:profile|CooperativeOtter46> If you are adding the links with add_external_files
these files are Not re-uploaded
GrievingTurkey78 I have to admit I can't see the difference, can you help me out π
PompousBeetle71 cool, next RC will have the argparse exclusion feature :)
AbruptWorm50 can you send full image (X axis is missing from the graph)
So I might be a bit out of sync, but I think there should be Triton serving and OpenVino serving built into it (or at least in progress).
Hi CleanPigeon16
can I make the steps in the pipeline use the latest commit in the branch?
Yes:
manually clone the stesp's Task (in the UI), and in the UI edit the Execution section and change to "last sommit on branch" and specify the branch name programmatically (as the above, clone+edit)
ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
Seems like the "run_experiment" step is not defined. Could that be ...
Hi SuperiorDucks36
you have such a great and clear GUI
π
I personally would love to do it with a CLI
Actually a lot of stuff are harder to get from UI (like current state of your local repository etc.) But I think your point stands π We will start with CLI, because it is faster to deploy/iterate, then when you guys say this is a winner we will have a wizard in the UI.
What do you think?
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
ElegantKangaroo44 I tried to reproduce the "services mode" issue with no success. If it happens again let me know maybe will better understand how it happened (i.e. the "master" trains-agent gets stuck for some reason)
What's the python, torch, clearml version?
Any chance this can be reproducible ?
What's the full error trace/stack you are getting?
Can you try to debug it to where exactly it fails here?
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/import_bind.py#L48
RoughTiger69 wdyt?