Reputation
Badges 1
25 × Eureka!Okay that look s good, now in the UI start here and then get to the artifacts Tab,
Is it there ?
Hmm okay let me check that, I think I understand the issue
Nested in the UI is not possible I think?
Yes, but the next version will have nested projects, that's something π
I mean that it is possible to start the subtask while the main task is still active.
You cannot call another Task.init while a main one is running.
But you can call Task.create and log into it, that said the autologging is not supported on the newly created Task.
Maybe the easiest solution is just to do the "sub-tasks" and close them. That means the main Task i...
Sounds good.
BTW, when the clearml-agent is set to use "conda" as package manager it will automatically install the correct cudatoolkit on any new venv it creates. The cudatoolkit version is picked direcly when "developing" the code, assuming you have conda installed as development environment (basically you can transparently do end-to-end conda, and not worry about CUDA at all)
Ok,Β I think figured it out.
Nice!
ClearML doesn't add all the imported packages needed to run the task to the Installed Packages
It does (but not derivative packages, that are used by the required packages, the derivative packages will be added when the agent is running it, because it creates a new clean venv and then it add the required packages, then it updates back with everything in pip freeze, because it now represents All the packages the Task needs)
Two questions:
Is t...
For now we've monkey-patched it to our usecase:
LOL, that's a cool hack
That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in
Datasets
tabs, but appear as normal tasks within the project)
So what would be a "perfect" solution here?
I think I'm missing the point on why it became an issue in the first place.
Notice that in new versions Dataset will be registered on the Tasks that use them (they are already...
So the TB issue was reported images were not logged.
We are now talking about the caching, which is actually a UI thing which clearml-server version are you using ?
And where are the images stored (the default files server or is it S3/GS etc.) ?
Hmm can you try:--args overrides="['log.clearml=True','train.epochs=200','clearml.save=True']"
Oh no, I just saw the message @<1541954607595393024:profile|BattyCrocodile47> is this stills relevant?
Hi JitteryCoyote63 report_frequency_sec=30.
controller how frequently monitoring events are sent to the server, default is every 30 seconds (you can change the UI display to wall-time to review). You can change it to 180 so it will only send an event every 3 minutes (for example).
sample_frequency_per_sec is the sampling frequency it uses internally, then it will average the results over the course of the report_frequency_sec
time window, and send the averaged result on the repo...
I imagine that these phantom dependencies will prevent parallelization. Is there a workaround?
yes, they might... workaround might be a bit ugly but copy pasting the functions and changing the name
BTW: I'll check when is the next RC scheduled for, maybe it will already contain a fix π€
For example, the
Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
This is a very good point (the current documentation is basically docstring, but we should create a structured one)
... but some visualization/inline code with explanation is also very much welcome.
I'm assuming this connected with the previous po...
But from your other answer, I think I'm understanding that you
can
have multiple agents on a single instance listening to the same queue.
Correct
So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.
Correct (that said I do not understand how come a single Task does not utilize the CPU, I was under the impression it is run...
Looking at theΒ
supervisor
Β method of the baseΒ
AutoScaler
Β class, where are the worker IDs kept.
Is it in the class attributeΒ
queues
Β ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
So if you set it, then all nodes will be provisioned with the same execution script.
This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?
I made a custom image for the VMSS nodes, which is based on Ubuntu and has multiple CUDA versions installed, as well as conda and docker pre-installed.
This is very cool, any reason for not using dockers the multiple CUDA versions?
Hi @<1523701304709353472:profile|OddShrimp85>
the venv setup is totally based on my requirements.txt instead of adding on to what the image has before. Why?
Are you using the agent in docker mode ? if this is the case it creates a venv inside the docker, inheriting from the preinstalled docker system packages,
When using the UI with regex to search for experiments, due to the greedy nature of the search, it consistently pops up the "ERROR Fetch Experiments failed" window when starting to use groups in regex (that is, parentheses of any kind).
hmm that is a good point (i.e. only on enter it would actually search)
Could it be updated so that if an invalid regex pattern is given, it simply highlights the search bar in red (or similar) rather than stop us while writing the search pattern?
...
I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
VivaciousPenguin66 I have the feeling it is the first space in the URI that breaks the credentials lookup.
Let's test it:from clearml import StorageManager uri = '
` Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
original
StoargeManager.get_local_copy(uri)
qouted
StoargeManager.get_local_copy(uri.replace(' ', '%20')) `
Hi TenseOstrich47
Thanks for following up!
Should be solved in the upcoming release (I think ETA is next week) π
Basically the links to the file server are saved in both mongo and elastic, so as long as these are host:ip based, at least in theory it should work
Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? π
Hi @<1600661423610925056:profile|StrongMouse81>
using serving base url and also other endpoint of model we add using:
clearml-serving model add
we get the attached respond:
And other model endpoints are working for you?
RoughTiger69 I think you need the latest version (+1.3.0 with UI support)
If you are using an older version, you need to specify that you are continuing an execution (Change the "Configuration/Args/continue_pipeline" to True)
EDIT: clearml 1.3.x will work with clearml-server 1.2