
Reputation
Badges 1
25 × Eureka!hmm interesting use case, why do you need to add the "--no-binary"
Could you send the "installed packages" section of the Task that was created in the notebook ?
but then the error occurs, after the training und the validating where succesfuly completed
It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)
It's seems you are are getting 401 unauthorized , is this the same domain? I'm assuming the issue the logged in cookie is not sent?
Yes, that sounds like a good start, DilapidatedDucks58 can you open a github issue with the feature request ?
I want to make sure we do not forget
Nice!
script, and the kwcoco not imported directly (but from within another package).
fyi: usually the assumption is that clearml will only list the directly imported packages, as these will pull the respective required packages when the agent will be installing them ... (meaning that if in the repository you are never actually directly importing kwcoco, it will not be listed (the package that you do import directly, the you mentioned is importing kwcoco, will be listed). I hope this ...
I think that by default the zipped package files are 0.5GB
(you can control it None look for --chunk-size)
I think the missing part of the api is understanding which chunk your specific file stored in.
You can do something like:
ds = Dataset.get(...)
the_artifact_chunk_I_need = ds.file_entries_dict["myt/file/here"].artifact_name
wdyt?
maybe worth to add an interface ?
Hmm, I think the issue is here (the docker command mount)'-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf'
SmugSnake6 I think the latest version (1.8.0) tries to parallelize it
You can also control max_workers
we can add non-clearml code as a step in the pipeline controller.
Yes 🙂 , btw you can kind of already do that, with pre/post function callbacks (notice they are running from the same scope as the actual pipeline controller).
What exactly did you have in mind to put there ?
(BTW: you can disable the auto-logging feature of joblib)Task.init(..., auto_connect_frameworks={'scikit': False})
DilapidatedDucks58 long story short:
if you do:
` from clearml import StorageManager
from clearml.storage.helper import StorageHelper
StorageHelper.get(" ", retries=5) `It should make sure that all the other s3:// links of this bucket will use the same original configuration (i.e. retries)
If this workaround works let's make sure we add it into the conf file, wdyt ?
If you wan to change the Args, go to the Args section in the Configuration tab, when the Task is in draft mode you can edit them there
Hi EmbarrassedSpider34
Long story (see below) short, yes you can ignore this warning :)
Specifically, torch is spinning processes and killing them, every process will have a reference to the parent semaphore (for internal clearml bookkeeping), now python is not very good with this kind of thing (and it is getting better on newer python verions), bottom line python "think" someone lost a semaphore, but there reality is that subprocess never created it in the first place. Does that make sen...
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
Hi ShallowCormorant89
Can you verify the http link is valid? Can you download it from code on your machine (i.e. not via an agent), maybe 8081 port is blocked from the agent machine to the server?
SteadyFox10 I suspect you are correct 🙂
CourageousLizard33 see also section (4) here:
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md#launching-the-trains-server-docker-in-linux-or-macos
Where are you seeing this message?
Probably less secure though :)
ReassuredTiger98 the environment is currently only set in runtime of the process (not before), this will change in the next RC of trains-agent (due is a few days)
that embed seems to be slightly off with regards to where the link is actually pointing to
I think this is the Slack preview... 😞
but maybe hyperparam aborts in those cases?
from the hyperparam perspective it will be trying to optimize the global minimum, basically "ignoring" the last value reported. Does that make sense ?
Hi @<1729309131241689088:profile|MistyFly99>
notice that the files server need to have an "address" that can be accessed from the browser, data is stored in a federated manner. This means your browser is directly accessing the files server, not through the API server, I'm assuming the address is not valid?
I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry)
horrible
...
I have to agree, we are changing this interface, I do not think it is good 😞
help_models is a dir in the git
And the git is registered on the experiment correctly ?
Yes, that means the nvidia drivers are present (as you mentioned the GPU seems to be detected).
Could you check you have libnvidia-ml.so.1 inside the container ?
For example in /usr/lib/nvidia-XYZ/
Okay could you test with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/.singularity.d/libs/
ERROR: Error checking for conflicts. ... AttributeError: _DistInfoDistribution__dep_map
Xeon E3-1240: 4 - 5 hours!wow... yes definitely worth upgrading 🙂
I was thinking mainly about AWS.
Meaning S3?