Reputation
Badges 1
25 × Eureka!ERROR: Error checking for conflicts. ... AttributeError: _DistInfoDistribution__dep_map
Maybe I can plot it using other lib.
I remember a while back there was integration with network visualization but it was hard to support and failed to many times...
If you have library that converts the network into html or image you can report it as debug sample?
I see, let me check something ๐
Hi @<1691620877822595072:profile|FlutteringMouse14>
Do I have to use Hydra
You can, and then the entire configuration is fully captured by ClearML (automatically) while you can still override values with the manual "key.sub=value" both in the UI and in the CLI
Otherwise you can connect nested dict with task.connect (these will be flattened with / for sub keys).
Or you can connect configuration files ( task.connect_configuration ) and edit them as is in the UI (with override of...
Hi @<1645597514990096384:profile|GrievingFish90>
You mean the agent itself inside a docker then the agent spins sibling dockers for the Tasks ?
clearml.conf is the file thatย
clearml-init
ย suppose to create, right?
Correct, specifically ~/clearml.conf
I mean manually you can get the results and rescale but, not through the UI
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call trains.init in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
PompousParrot44 Enterprise licensing pricing usually custom tailored to the size of the company and based on usage. If you are interested feel free to leave details in the "contact us" form on the website, and someone from sales will contact you shortly after.
I meant even just a link to a blank comparison and one can then add the experiments from that view
Just making sure you are aware, once you are in comparison you can always add Tasks (any Task):
Notice you can press on the "Add experiments", then select Any experiment (including all projects! as filters)
Notice you need to remove all filters (right side red x on the filter Icon)
and the clearml server version ?
JitteryCoyote63 This is odd you have both python3.9 and python3.8 on the container, and since it says (probably) ob the task the agent should run python3.9 it's trying to use it for creating the enthronement (it does not matter that agent is using python3.8).1673431344706 agent-1 DEBUG /usr/bin/python3.9 /usr/bin/python3.9: No module named pipThis is the main issue, pip is missing for python3.9 and this is why it reverted to python 3.8 when it was setting the environment.
It should prob...
But adding a simpleย
force_download
ย flag to theย
get_local_copy
That's sounds like a good idea
JitteryCoyote63 good news
not trains-server error, but trains validation error, this is easily fixed and deployed
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... ๐
I'll make sure we fix it on the next version
Hi LittleShrimp86
just to login into your clearml app (demo or server) so I can run python files related to clearml.
I think this amounts to creating a Task and enqueueing it, am I understanding correctly ?
Hmm can you run the agent in debug mode, and check the specific console log?
'''
clearml-agent --debug daemon --foreground ...
Hi @<1523701181375844352:profile|ExasperatedCrocodile76>
the docker containers should get the host IP, not the internal docker IP. what am I missing ?
Is it not possible to say just look at my requirements.txt file and the imports in the script?
I think there is a GitHub Issue for this feature
(basically the issue is, requirements.txt are very often not updated, and have no real version lock, so replicating a working env is always safer)
LOL, could be there is no remote repository ?
Okay I have an idea, it could be a lock that another agent/user is holding on the cache folder or similar
Let me check something
The easiest way would be to rename a queue to "1xgpu 16gb", then make sure only machines with >16gb GPUs listen to it.
Note that an agent can listen to Multiple queues
Do I set theย
CLEARML_FILES_HOST
ย to the end point instead of an s3 bucket?
Yes you are right this is not straight forward:CLEARML_FILES_HOST=" s3://minio_ip:9001 "
Notice you must specify "port" , this is how it knows this is not AWS. I would avoid using an IP and register the minio as a host on your local DNS / firewall. This way if you change the IP the links will not get broken ๐
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?