Reputation
Badges 1
25 × Eureka!My data is already in a directory on the clearml-server machine and I do not want to copy it, just add it to clearml as dataset.
So the short answer is, no, it needs to packager it (read "zip it")
The reason is clearml-data creates an Immutable copy, and just "pointing" to files located somewhere will usually break very easily.
That said, actually it will be relatively easy to add as dataset itself stores links to the files and these links could actually point to an S3 bucket (for exa...
Yes, though the main caveat is the data is not really immutable π
VexedCat68 make sense, we could also (if implementing this feature) add a special Tag to the dataset , so you know it contains "external" links, wdyt?
Anyone wants to open a github issue, so we actually end up implementing it π ?
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.
Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control ...
I guess this is doable:
You can get the entire set of scalars like as pandas DF: https://www.tensorflow.org/tensorboard/dataframe_api
(another example: https://stackoverflow.com/a/45899735 )
Then iterate over the different runs and create + report scalars)
` from clearml import Task
for run in runs:
task = Task.create_task(...)
logger = task.get_logger()
not real code, just example:
w_times, step_nums, vals = zip(*event_acc.Scalars('Accuracy'))
for step, val in zip(step_nums...
The agent is using Bash (but when you add command line to the docker run, .bashrc is not executed, hence no conda
in PATH)
Maybe add the full path to the conda executable:ocker_setup_bash_script= [ "export PATH=""/workspace/miniconda/bin:$PATH", "export LOCAL_PYTHON=/workspace/miniconda/bin/python3","/workspace/miniconda/bin/conda activate /PATH_GOES_HERE"])
Oh I see the pipeline controller itself (not the components) is the one with the repo
To fix that add at the top of the script the following:
` from clearml import Task
Task.force_store_standalone_script()
@PipelineDecorator.pipeline(...) `That should do the trick
Hi SarcasticSquirrel56
But if I then clone the task, and execute it by sending it to a queue, the experiment succeeds,
I'm assuming that on the remote machine the "files_server" is not configured the same way as the local execution. for example it points to an S3 bucket the credentials for the bucket are missing.
(in your specific example I'm assuming that the plot is non-interactive which means this is actually a PNG stored somewhere, usually the file-server configuration). Does tha...
Hi @<1539055479878062080:profile|FranticLobster21>
hey, how do I use local files as dependencies?
You mean like a repository ?
Can I specify in task what local files do I use that should be packaged?
In a git repo?
Basically the agent can do two things, either replicate a single script or clone a git repo + uncommitted changes
Hi UpsetTurkey67
The status that you see on the graph is fetched from the pipeline itself (for example cached), I think that what happened is that the pipeline Logic has yet to update itself on the status of the running component. If the pipeline is indeed running, it should update the status shortly (actually you can set the polling frequency for that). If for some reason the pipeline Task died than indeed this is an odd state (that we should probably fix in the UI)
ok, I will do a simple workaround for this (use an additional parameter that I can update using parameter_override and then check if it exists and update the configuration in python myself)
Yep sounds good, something like this?from clearml.utilities.dicts import ReadOnlyDict, merge_dicts overrides = {} task.connect(overrides) configuration = {#stuff here} task.connect_configuration(configuration) merge_dicts configuration.update(overrides)
BTW: this will allow you to override any s...
It only happens in the clearml environment, works fine local.
Hi BoredHedgehog47
what do you mean by "in the clearml environment" ?
Hi UnevenDolphin73
I cannot initialize a task before loading the file, but the docs for
connect_configuration
Yes, that's basically the problem. you have to decide where is the main driver.
If you are executing the code "manually" (i.e. not via the agent) then there is no problem, obviously you have the local file and you can use it to load the "project name" etc, then you just call Task.connect_configuration to log the content.
If you are running the same code via the agent...
how to put or handle this configuration and where?
In your clearml.conf on the machine with the agent just add at the bottom of the file agent.venvs_cache.path=~/.clearml/venvs-cache
is it normal that it's slower than my device even though the agent is much more powerful than my device? or because it is just a simple code
Could be the agent is not using the GPU for some reason?
Hi GrittyCormorant73
At the end everything goes through session.send, you can add a print there?
btw: why would you print all the requests? what are we debugging here?
Hi CrookedWalrus33
I think there if you are already logged in and you pressed on the "signup" tab instead of the "login" tab (frontend are working on a solution)
In the meantime just make sure you are clicking on the "login" tab
Hi @<1547028031053238272:profile|MassiveGoldfish6>
What is the use case? the gist is you want each component to be running on a different machine. and you want to have clearml do the routing of data and logic between.
How would that work in your use case?
last iteration is no reset and I still have a gap in my scalars
Hmm is this reproducible ? can you check with the latest clearml version (1.10.3) ?
btw: I'm assuming continue_last_task=0
I think I found the issue, the fact the agent is launching it causes it to ignore the "overridden" set_initial_iteration
Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?
We are able to
set_initial_iteration
to 0 but not
get_last_iteration
.
Are you saying that if your code looks like:
Task.set_initial_iteration(0)
task = Task.init(...)
and you abort and re-enqueue, you still have a gap in the scalars ?
Hi @<1567321739677929472:profile|StoutGorilla30>
Is it necessary to serve keras model using triton engine?
It is not, but it is the most efficient way to serve keras models, and this is why by default clearml-serving is using Nvidia Triton (we are talking 10x factors)
I would start with the keras example, see that it works and then work your way into your example (notice you always need to provide the layers form the in/out of the model)
[None](https://github.com/allegroai/clearml-s...
Hi @<1523701504827985920:profile|SubstantialElk6>
I would split the first stage into two. The first one passing data to the others, the second as "monitoring ", Wdyt?
The downstream stages are rankN scripts, they are waiting for the IP address of the first stage.
Is this like a multi-node training, rather than a pipeline ?
Hi @<1523702868694011904:profile|AbruptCow41>
Check what are you getting when running git status
inside the working directory, this is essentially how it works. Are you expecting to later run it with an agent?
Should be fairly easy to add no?
fyi: hot fix for 1.3.0 (smoothing graphs) was just released see v1.3.1
I am actually considering rolling back to 1.1.0,
Can you share why?
JitteryCoyote63 notice from the release notes of 1.2:
Important Note!
This release requires a MongoDB migration from previous versions. Please see
for more information.
I'm not sure you can downgrade that easily ...
I see if this is the case try to set
'output_uri="file:///full/path/to/dir"'
Notice it has to have the full path there and the file:// prefix
I have a process that cleans theΒ
/tmp
Β each day,
WackyRabbit7 the files (configuration etc.) that are mapped into the containers are stored there.
They should clean themselves, that said, we have noticed that the services-mode skips this cleanup, and it will be solved on the next RC of clearml-agent.
Make sense ?