
Reputation
Badges 1
195 × Eureka!client has the following attributes:['auth', 'events', 'models', 'projects', 'queues', 'session', 'tasks', 'workers']
FrothyDog40 ,ClearML tries to conserve storage by limiting the history length for debug images (see sdk.metrics.file_history_size configuration entry), though the history can indeed grow large by setting a large value or using a metric/variant naming scheme to circumvent this limit.
I am aware, but I am not sure what did you try to say by thisDoes your use case call for accessing a specific iteration for all images or when looking at a specific image?
Yes, for some cases I would like...
FrothyDog40 , done 🙂
https://github.com/allegroai/clearml/issues/474
The scalars page provides a metric hide/show control - Is this the one you mean? The debug images page also provides a filter by metric - Depending on your naming policy this can easily be used to focus on more sparsely appearing images.
well, it does in a sense, but the poor version. while in the scalars you can browse over full names or headers (full_name= "header / description" format) both from a list or by regexp, in the debug images there is only the header filtering and only from a...
FrothyDog40 , is submitting an issue still relevant?
It would be very very useful for my use case, and I believe a relatively popular use case in general for example when using regular expression configurations
I understand that to report any value should be presented as string, how does the "inverse casting" work when I pull some value from the config?
*running_locally()
AgitatedDove14 , well.. having the demo server by default lowers the effort threshold for trying ClearML and getting convinced it can deliver what it promises, and maybe test some simple custom use cases. I don't know what are the behind the scenes considerations in terms of costs of keeping the demo server running, but even having a leaner version where you limit the duration in which the experiment records are deleted after a week or few days sounds useful to me
It seems to me that the source of the mismatch is the str(tuple())
thanks AgitatedDove14 ! this is what I was looking for
CumbersomeCormorant74 displaying a 3D Field varying in time, or anything varying in time really
Thanks it can be super cool, I hope it will happen
AgitatedDove14 in terms of explicit reporting I'm using the current_epoch which is correct when I check it in debug mode
AgitatedDove14 should be, I'll try to create a small example later today or tomorrow
AgitatedDove14 , mostly out of curiosity, what is the motivation behind introducing this as an environment variable knob rather then a flag with some default in Task.init?
thanks AgitatedDove14 , I will be happy to test it, however I didn't understand it fully.
I can see how it works in the single machine case, however if I want multiple machines syncing with the optimizer, for pulling the sampled hyper parameters and reporting results, I can't see how it would work
sounds like an ok option, any recommended reference for how to start?
well, kind of, I linked the other topic, but it was completely unrelated
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
and also in terms of outcome, the scalars follow the correct epoch count, but the debug samples and monitored performance metric show a different count
AgitatedDove14 no it has an offset of the value that it started with, so for example you stopped at n, then when you are running the n+1 epoch you get the 2*n+1 reported
I am actually not sure specifically about \b myself, but even when replacing with . I am getting \. double backslash instead of the single backslash ( for the tuple case ). which in the case of a regexp expression changes the meaning of the expression. the expected behavior would be registering it as single backslash
Hi AgitatedDove14 , so it looks something like this:
` Task.init
trainer.fit(model) # clearml logging starts from 0 and logs all summaries correctly according to real count
triggered fit stopping at epoch=n
something
trainer.fit(model) # clearml logging starts from n+n (thats how it seems) for non explicit scalar summaries (debug samples, scalar resources monitoring, and also global iteration count)
triggered fit stopping
... `I am at the moment diverging from this implementation to s...
we see this:
$ ps ax | grep python
10589 ? S 0:05 python3 fileserver.py
10808 ? Sl 18:07 python3 -m apiserver.server
30047 pts/0 S+ 0:00 grep --color=auto python
the solution you suggested works for the single machine case. The missing part is being able to access and "claim" spawn trials (samples in the HP plane), from multiple machines
so a different behavior between a string and a string in a tuple is by design? I find it confusing, I guess this is the YAML convention?
https://colab.research.google.com/drive/1w5lQGxsblnLGlhJEDH_b0aIiUvjGeLjy?usp=sharing
imagine the beautiful PR showing such a feature 👀
well when returning None it works as expected, no model uploads
thanks, I'll try this. Is there an efficient way to get the IDs first?
AgitatedDove14 , I am referring to some generic HPO scenario where you define some HP space lets say:param1 = np.linspace(lower_bound, upper_bound, n) param2 = np.linspace(lower_bound, upper_bound, n)
then you run an optimization that samples this HP space,
For each trial a sample is pulled from the space, some experiment is performed and you get a score. Then to analyze the behavior of your objective you want to understand the relation between the params and objective score.
Then if you ...