Reputation
Badges 1
25 × Eureka!Hi @<1547028052028952576:profile|ExuberantBat52>
task = Task.get_task(...)
print(task.data)
wdyt?
ElegantCoyote26 It means we need to have a keras logger that logs everything to trains, then we need to hook it automatically.
Do you feel like PR-ing the logger (the hooking I can take care of 🙂 )?
Hi ReassuredTiger98
When clearml is running inside the docker the installed packages of the WebUI get updated.
Yes, this is by design, so the agent can always reproduce the exact python environment.
(internal the original requirements is also stored, but not available in the UI).
What exactly is the use case here ? wouldn't make sense to reproduce the entire working environment when you clone the executed Task ?
HappyDove3
see here https://github.com/allegroai/clearml-pycharm-plugin 🙂
when I duplicate the experiment and clone it remote, the call is ignored and the recorded values are used?
Yes ScantChimpanzee51 exactly.
Think of it as the inital value you want to put on the Task when you are running the code on your machine, later when you clone the Task, you can edit the base docker image in the UI (or with the API), of course the new value is used when the agent spins this Task, and to avoid the actual docker (the one you changed in the UI) to be overwritten by ...
HelplessCrocodile8
Basically the file URI might be different on a different machine (out of my control) but they point to the same artifact storage location
We might have thought of that...
in your clearml.conf file:
` sdk{
storage {
path_substitution = [
# Replace registered links with local prefixes,
# Solve mapping issues, and allow for external resource caching.
{
registered_prefix = file:///mnt/data/...
OutrageousGrasshopper93 could you send an example of the two links from the artifacts (one local one remote) ?
FierceRabbit20 it seems the Pipeline Task that was created is missing the "installed requirements" section. How are you creating the actual pipeline Task? is this from code?
Like, let's say I want "a 15GB GPU or better" and there's 4 queues, two of which fit the description... is there any way to set it so that ClearML will just queue it up on whichever one's available?
How do you know that? Also if you know that, what do you know about the queues ?
Generally speaking this type of granularity is k8s, but it has lots of caveats, specifically that you need to Know what you need in term of resources, that you can specify resources that do not exist, and that...
Are you seeing the entire jupyter notebook in the "uncommitted changes" section
LethalDolphin75 Yes you are correct, we should add here:
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/optuna/optuna.py#L210elif isinstance(p, UniformLogarithmicParameterRange): hp_type = 'suggest_float' hp_params = dict(low=p.min_value, high=p.max_value if p.include_max else p.max_value - p.step_size, log=True, step=p.step_size)
btw: I'm not sure if the ...
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have trains-agent
use it? (it will be much faster)
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
I think there is force
argument, to force download.
I think the main issue is getting the size from different backends (i.e. s3 /https / etc.)
Maybe we should add it as a GitHub feature request issue?
The main limitation is that the driver "list()" does not return file size.
For example it might be an issue with the default http files-server.
wdyt?
Hmm, this is a good question, I "think" the easiest is to mount the .ssh folder form the host to the container itself. Then also mount clearml.conf into the container with force_git_ssh_protocol: true
see here
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25
btw: ssh credentials even though sound more secure are usually less (since they easily contain too broad credentials and other access rights), just my 2 cents 🙂 I ...
Hi AbruptWorm50
I was wondering if it possible to specify 'patience' of pruning algorithm?
Any of the kwargs passed to **optimizer_kwargs
will be directly passed to the optuna obejct
https://github.com/allegroai/clearml/blob/2e050cf913e10d4281d0d2e270eea1c7717a19c3/clearml/automation/optimization.py#L1096
It should allow you to control the parameters, no?
Regrading the callback, what exactly do you think to put there?
Is the callback this enough?
https://github.com/allegro...
I think, this all ties into the none-standard git repo definition. I cannot find any other reason for it. Is it actually stuck for 5 min at the end of the process, waiting for the repo detection ?
TightElk12 are you still looking for a way to create a new "sub-task" ?
Actually it would be interesting to combine the two, feast is fully open-source and supported by the linux foundation, so I cannot see the harm in that.
wdyt?
. I am not sure this is related to the fact the model is not correctly converted to TorchScript
Because Triton Only supports TorchScript (Not torch models) 🙂
Notice that the new pip syntax:packagename @ <some_link_here>
Is actually interpreted by pip as :
Install "packagename" if it is not installed use the @ "<some_link_here>" to install it.
Oh that's definitely off 🙂
Can you send a quick toy snippet to reproduce it ?
OutrageousGrasshopper93tensorflow-gpu
is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?
Ohh sorry I missed that and answered on the original message, nvm 🙂 all is well now
You can also see the code there, could you run a quick test against the demo-server, this might be a UI issue that was fixed in the last version.
Hi GreasyPenguin14
This is what I did, but I could not reproduce the hang, how is this different from your code?
` from multiprocessing import Process
import numpy as np
from matplotlib import pyplot as plt
from clearml import Task, StorageManager
class MyProcess(Process):
def run(self):
# in another process
global logger
# Create a plot
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = ...
NICE! CurvedHedgehog15 cool stuff! and my pleasure 🙂
I might have found it, tqdm is sending{ 1b 5b 41 } unicode arrow up?
https://github.com/horovod/horovod/issues/2367
SarcasticSparrow10 how do I reproduce it?
I tried launching from a sub process that is a daemon and it worked. Are you using ProcessPool ?